Prez

Welcome!

A Functional Machine Learning Classifier

Paul Brabban

sheffieldml.org.uk & (def shef)

The Problem

Meet the Irises

A blue flower, close up. We see stamens and petals curling down. — Iris Setosa

A purple flower, from a similar angle as the first figure. I can't say what is different, apart from the colour. — Iris Versicolor

A pale blue flower, from a similar angle as the first figure. Again, I can't say what is different, apart from the colour. — Iris Virginica

The Problem

Which is this?

A greyscale image of an iris. It is very difficult to judge which species it is. — Iris...?

Machine Learning

Techniques that let machines learn from experience, without being explicitly programmed.

Machine Learning

Machine learning models predict things.

how much a house will sell for
which numeric digit a digitised photo shows
whether an applicant will pay back a loan
whether a image of cells is normal or cancerous

and uncountably more...

Machine Learning

Even what species a particular iris belongs to!

VERSICOLOR

Classification

We're solving a classification problem

is this iris versicolor, setosa or virginica?

There are other kinds...

regression, like predicting a numeric house price
unsupervised, when we don't know the answers

Examples

To learn a supervised problem, we need examples

5.1,3.5,1.4,0.2,Iris-setosa

Four Features:
- Petal width & length
- Sepal width & length
One Label (Setosa | Versicolor | Virginica)

Training and Testing

Training

training examples are fed to the ML algorithm
a model is built that can predict for new examples

Testing

the model predicts for examples it hasn't seen before
"goodness" of the model is assessed

The kNN classifier

We're going to use a classic algorithm:

a k-Nearest Neighbours classifier.

The kNN classifier

An x-y scatter plot. Blue and green training data points are visible, with lines connecting a new colourless data point with its five nearest neighbours. Four of the five are blue. — A new data point's nearest neighbours with k=5. Classify as Blue!

The kNN classifier

An x-y scatter plot. Clearly delineated horizontal stripes in three colours are striking, showing the differentiating power of these two features. — Scatter plot of training data and predictions. Features plotted are petal length vs. sepal length.

Training

Most algorithms have a 'training' phase where they deduce and optimise a target function.

The k-NN classifier doesn't really have a training phase as it just 'memorizes' the training data.

Validation

...but is it good at predicting the species?

How do we measure the effectiveness of our algorithm?

Train/Test Split

assign some examples to a "training" set (say 70%)
and the rest to a "test" set (say 30%)
have the algorithm memorize the training set
predict the classes of the test set
how many did it get right? (%)

Like so...

Describes the test-train split process, randomly assigning 70% of the examples to a training set and 30% to a test set.

k-Fold Cross-Validation

Make better use of your data!

choose say k = 5, then
randomly assign examples to 5 equal sized "folds"
train with 4/5 "folds", test with the other
5 times
average the % correct

Like so...

Representation of a 5-fold split and averaging the results

That's it!

Instructions and the data set are at https://github.com/defshef/dojo-knn

Any language you like
Work alone or in groups
If you're stuck shout up for a hint
Last 20 mins will be a show-and-tell by... YOU LOT!
Thank you and enjoy!

Next Steps

See the repo for next steps ideas.

If you want to optimise your code...

Try the phishing dataset - 30 features, 11k examples

If you want to do more ML stuff...

Kaggle has datasets and ML competitions - why not have a go?