This is one of the many classic problems in machine learning, so much that it has become the hello world of machine learning, Since we want computers to recognise digits, it comes to mind to ask how do humans see digits, this of course implies the human mind is an optimal solution to recognising hand written digits, Well it is. so how exactly do we do it.
Well here's a thought we collect a set of pixels and based on the intensity of these pixels we recognised recurring patterns in the samples and then based on our generally defined labels for this patterns we say this number is this or that, our ability to categorise these numbers is based on our memory and our previous exposure to patterns of such nature that have been pre-classified. This is the idea of learning in machine learning.
So if we are going to build a solution capable of learning how to recognise numbers, how do we learn?
This is the crux of our path today, There are many ways to solve this problem and we are going to focus on only one of them, in a later post we are going to take a very deep look at the internal functionality of this solution.
Its the Support vector machine.
When we think of support vector machines, imagine this, given random sets of data which match to different random positions on the Cartesian plane, we aim to find a line that clearly divides these random data sets into two or more confirmed classifications, this is really based on the dimension of the classifications, if we divide the data points into 2 major classifications, then we can say its a 2-dimension classification, when we consider more positions we can now say we have more dimensions to consider. To learn more about support vector machines you can take a look at these resources.
In this post we are looking to see how we can utilise the support vector machines to recognise digits. In this instance we use the support vector machines to learn.
So lets get started.
First we have to consider our data, using the input data provided by kaggle in their digit recogniser problem set.
we proceed to load the different libraries to read this data.
https://gist.github.com/108227c886c0de4287588c1fb867e7f8
Here we use pandas, matplotlib and sklearn. In order to load our data
https://gist.github.com/75444ee41d9d72ca2000601adc55067b
here we get the train.csv
data set, this is a csv file therefore we have to read it with read_csv
from pandas, then proceed to collect the first 5000 of these separating them into images
and labels
, and then split the data set randomly into test dataset and train dataset.
Since our image at the moment is one dimensional we have to get the 2 dimensional form of the images.
https://gist.github.com/19c026685450f4a1158a37832b64af1f
we select and index i
can select a single image img
from the dataset using i
and proceed to represent it as a matrix, reshaped as a 28x28 grid, using plt
from the matplotlib, we can proceed to visualise the image.
So we can recognize this as a 9, cause we can see the patterns of the pixels which make this a 9, but how does a computer recognize this? Well since this is a grayscale image, we can look at a histogram to view the range of the pixel grade.
From the histogram 0 to 250 represent the range of shades, where 0 means there is no shade at all and 250 means maximum shade, so from the image the areas that are black are 0 and the areas which are of maximum white are 250, therefore all the other shades in between are show between 0 and 250.
In this case use the svm
to create a vector classifier, next we pass our training data into the fit method of the classifier which trains it into a ninja, just kidding.
Next the test Images are tested and score to access the model
https://gist.github.com/7d4dc0684e8dc4d40f438d5c0f0b4436
This shows that our model has a 0.1 accuracy, this is really bad, its like randomly guessing shit. We can understand this if we look at the intricacies of support vector machines, they find it hard to classify this type of data with multiple input / pattern scale, the grayscale image shows that we have about 250 different shade levels to consider, it can we really difficult to find a straight line optimiser dividing these classifications to represent the pattern, but what if we make every grayscale shade above 0 to become 1.
Thus making our image more like black and white.
https://gist.github.com/5a813804d46cc7ec81d1427c0069794c
based on the histogram, we should be right on point.
https://gist.github.com/e5830c3ef6043f17d3d0fa62007070b1
Our algorithm's proficiency just increased to 88.7 % which is certainly better.
Based on this we can see the limits of the algorithm, in this case we have to convert to a black and white pattern or should I say distinct pattern before the Support vector machine technique could even recognise the patterns effectively, but even, 88.7 percent is not exactly the best. But its something.
Recognising digits - in other words recognising patterns based on training is a tough feat for computers and programs, SVM aims to solve this by drawing distinct lines to classify the various different parameters of this patterns, but it fails sometime pretty miserably in some more complicated data forms. we could try to improve this by changing the various parameters of the vector classifier, but there are better solutions, in our next week's post we are going to have a look at the neural network approach.