Skip to content

Instantly share code, notes, and snippets.

@airtoxin
Last active November 23, 2015 10:24
Show Gist options
  • Save airtoxin/4b8ab69cc1d52b887922 to your computer and use it in GitHub Desktop.
Save airtoxin/4b8ab69cc1d52b887922 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MNIST For ML Beginners"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial is intended for readers who are new to both machine learning and TensorFlow. If you already know what MNIST is, and what softmax (multinomial logistic) regression is, you might prefer this [faster paced tutorial](http://tensorflow.org/tutorials/mnist/pros/index.md).\n",
"\n",
"When one learns how to program, there's a tradition that the first thing you do is print \"Hello World.\" Just like programming has Hello World, machine learning has MNIST.\n",
"\n",
"MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1700/MNIST.png)\n",
"It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.\n",
"\n",
"In this tutorial, we're going to train a model to look at images and predict what digits they are. Our goal isn't to train a really elaborate model that achieves state-of-the-art performance -- although we'll give you code to do that later! -- but rather to dip a toe into using TensorFlow. As such, we're going to start with a very simple model, called a Softmax Regression.\n",
"\n",
"The actual code for this tutorial is very short, and all the interesting stuff happens in just three lines. However, it is very important to understand the ideas behind it: both how TensorFlow works and the core machine learning concepts. Because of this, we are going to very carefully work through the code.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The MNIST Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The MNIST data is hosted on [Yann LeCun's website](http://yann.lecun.com/exdb/mnist/). For your convenience, we've included some python code to download and install the data automatically. You can either download [the code](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/input_data.py) and import it as below, or simply copy and paste it in."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Extracting MNIST_data/train-images-idx3-ubyte.gz\n",
"Extracting MNIST_data/train-labels-idx1-ubyte.gz\n",
"Extracting MNIST_data/t10k-images-idx3-ubyte.gz\n",
"Extracting MNIST_data/t10k-labels-idx1-ubyte.gz\n"
]
}
],
"source": [
"import input_data\n",
"mnist = input_data.read_data_sets(\"MNIST_data/\", one_hot=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The downloaded data is split into two parts, 60,000 data points of training data (__mnist.train__) and 10,000 points of test data (__mnist.test__). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes!"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<input_data.DataSet at 0x10cc40c90>"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mnist.train"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned earlier, every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We will call the images \"xs\" and the labels \"ys\". Both the training set and test set contain xs and ys, for example the training images are __mnist.train.images__ and the train labels are __mnist.train.labels__."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(55000, 784)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(len(mnist.train.images), len(mnist.train.images[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers:\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1701/MNIST-Matrix.png)\n",
"\n",
"We can flatten this array into a vector of 28x28 = 784 numbers. It doesn't matter how we flatten the array, as long as we're consistent between images. From this perspective, the MNIST images are just a bunch of points in a 784-dimensional vector space, with a [very rich structure](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) (warning: computationally intensive visualizations)."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.38039219, 0.37647063, 0.3019608 , 0.46274513,\n",
" 0.2392157 , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0.35294119, 0.5411765 ,\n",
" 0.92156869, 0.92156869, 0.92156869, 0.92156869, 0.92156869,\n",
" 0.92156869, 0.98431379, 0.98431379, 0.97254908, 0.99607849,\n",
" 0.96078438, 0.92156869, 0.74509805, 0.08235294, 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0.54901963, 0.98431379, 0.99607849,\n",
" 0.99607849, 0.99607849, 0.99607849, 0.99607849, 0.99607849,\n",
" 0.99607849, 0.99607849, 0.99607849, 0.99607849, 0.99607849,\n",
" 0.99607849, 0.99607849, 0.99607849, 0.74117649, 0.09019608,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0.88627458, 0.99607849, 0.81568635,\n",
" 0.78039223, 0.78039223, 0.78039223, 0.78039223, 0.54509807,\n",
" 0.2392157 , 0.2392157 , 0.2392157 , 0.2392157 , 0.2392157 ,\n",
" 0.50196081, 0.8705883 , 0.99607849, 0.99607849, 0.74117649,\n",
" 0.08235294, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0.14901961, 0.32156864, 0.0509804 ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.13333334, 0.83529419, 0.99607849, 0.99607849,\n",
" 0.45098042, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.32941177, 0.99607849, 0.99607849,\n",
" 0.91764712, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.32941177, 0.99607849, 0.99607849,\n",
" 0.91764712, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.41568631, 0.6156863 , 0.99607849, 0.99607849,\n",
" 0.95294124, 0.20000002, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.09803922, 0.45882356, 0.89411771, 0.89411771,\n",
" 0.89411771, 0.99215692, 0.99607849, 0.99607849, 0.99607849,\n",
" 0.99607849, 0.94117653, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0.26666668,\n",
" 0.4666667 , 0.86274517, 0.99607849, 0.99607849, 0.99607849,\n",
" 0.99607849, 0.99607849, 0.99607849, 0.99607849, 0.99607849,\n",
" 0.99607849, 0.55686277, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.14509805, 0.73333335, 0.99215692,\n",
" 0.99607849, 0.99607849, 0.99607849, 0.87450987, 0.80784321,\n",
" 0.80784321, 0.29411766, 0.26666668, 0.84313732, 0.99607849,\n",
" 0.99607849, 0.45882356, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.44313729, 0.8588236 , 0.99607849, 0.94901967,\n",
" 0.89019614, 0.45098042, 0.34901962, 0.12156864, 0. ,\n",
" 0. , 0. , 0. , 0.7843138 , 0.99607849,\n",
" 0.9450981 , 0.16078432, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.66274512, 0.99607849, 0.6901961 , 0.24313727,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.18823531, 0.90588242, 0.99607849,\n",
" 0.91764712, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.07058824, 0.48627454, 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.32941177, 0.99607849, 0.99607849,\n",
" 0.65098041, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.54509807, 0.99607849, 0.9333334 ,\n",
" 0.22352943, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.82352948, 0.98039222, 0.99607849, 0.65882355,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0.94901967, 0.99607849, 0.93725497, 0.22352943,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0.34901962, 0.98431379, 0.9450981 , 0.33725491, 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0.01960784,\n",
" 0.80784321, 0.96470594, 0.6156863 , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0.01568628,\n",
" 0.45882356, 0.27058825, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ],\n",
" [ 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. ]], dtype=float32)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mnist.train.images[0].reshape((28, 28))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Flattening the data throws away information about the 2D structure of the image. Isn't that bad? Well, the best computer vision methods do exploit this structure, and we will in later tutorials. But the simple method we will be using here, a softmax regression, won't.\n",
"\n",
"The result is that __mnist.train.images__ is a tensor (an n-dimensional array) with a shape of __[60000, 784]__. The first dimension indexes the images and the second dimension indexes the pixels in each image. Each entry in the tensor is the pixel intensity between 0 and 1, for a particular pixel in a particular image.\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1703/mnist-train-xs.png)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The corresponding labels in MNIST are numbers between 0 and 9, describing which digit a given image is of. For the purposes of this tutorial, we're going to want our labels as as \"one-hot vectors\". A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimensions. For example, 0 would be _[1,0,0,0,0,0,0,0,0,0,0]_. Consequently, __mnist.train.labels__ is a __[60000, 10]__ array of floats.\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1702/mnist-train-ys.png)\n",
"\n",
"We're now ready to actually make our model!"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(55000, 10)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(len(mnist.train.labels), len(mnist.train.labels[0]))"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mnist.train.labels[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Softmax Regressions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know that every image in MNIST is a digit, whether it's a zero or a nine. We want to be able to look at an image and give probabilities for it being each digit. For example, our model might look at a picture of a nine and be 80% sure it's a nine, but give a 5% chance to it being an eight (because of the top loop) and a bit of probability to all the others because it isn't sure.\n",
"\n",
"This is a classic case where a softmax regression is a natural, simple model. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.\n",
"\n",
"A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.\n",
"\n",
"To tally up the evidence that a given image is in a particular class, we do a weighted sum of the pixel intensities. The weight is negative if that pixel having a high intensity is evidence against the image being in that class, and positive if it is evidence in favor.\n",
"\n",
"The following diagram shows the weights one model learned for each of these classes. Red represents negative weights, while blue represents positive weights.\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1706/softmax-weights.png)\n",
"\n",
"We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input. The result is that the evidence for a class _i_ given an input _x_ is:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\text{evidence}_i = \\sum_j W_{i,~ j} x_j + b_i$"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"where _Wi_ is the weights and _bi_ is the bias for class _i_, and _j_ is an index for summing over the pixels in our input image _x_. We then convert the evidence tallies into our predicted probabilities _y_ using the \"softmax\" function:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$y = \\text{softmax}(\\text{evidence})$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here softmax is serving as an \"activation\" or \"link\" function, shaping the output of our linear function into the form we want -- in this case, a probability distribution over 10 cases. You can think of it as converting tallies of evidence into probabilities of our input being in each class. It's defined as:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\text{softmax}(x) = \\text{normalize}(\\exp(x))$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you expand that equation out, you get:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\text{softmax}(x)_i = \\frac{\\exp(x_i)}{\\sum_j \\exp(x_j)}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But it's often more helpful to think of softmax the first way: exponentiating its inputs and then normalizing them. The exponentiation means that one unit more evidence increases the weight given to any hypothesis multiplicatively. And conversely, having one less unit of evidence means that a hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero or negative weight. Softmax then normalizes these weights, so that they add up to one, forming a valid probability distribution. (To get more intuition about the softmax function, check out the [section](http://neuralnetworksanddeeplearning.com/chap3.html#softmax) on it in Michael Nieslen's book, complete with an interactive visualization.)\n",
"\n",
"You can picture our softmax regression as looking something like the following, although with a lot more _x_s. For each output, we compute a weighted sum of the _x_s, add a bias, and then apply softmax.\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1704/softmax-regression-scalargraph.png)\n",
"\n",
"If we write that out as equations, we get:\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1707/softmax-regression-scalarequation.png)\n",
"\n",
"We can \"vectorize\" this procedure, turning it into a matrix multiplication and vector addition. This is helpful for computational efficiency. (It's also a useful way to think.)\n",
"\n",
"![](http://api.tensorflow.org/system/image/body/1705/softmax-regression-vectorequation.png)\n",
"\n",
"More compactly, we can just write:\n",
"\n",
"$y = \\text{softmax}(Wx + b)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Implementing the Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To do efficient numerical computing in Python, we typically use libraries like NumPy that do expensive operations such as matrix multiplication outside Python, using highly efficient code implemented in another language. Unfortunately, there can still be a lot of overhead from switching back to Python every operation. This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data.\n",
"\n",
"TensorFlow also does its heavy lifting outside python, but it takes things a step further to avoid this overhead. Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. (Approaches like this can be seen in a few machine learning libraries.)\n",
"\n",
"To use TensorFlow, we need to import it."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We describe these interacting operations by manipulating symbolic variables. Let's create one:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"x = tf.placeholder(\"float\", [None, 784])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__x__ isn't a specific value. It's a __placeholder__, a value that we'll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2d tensor of floating point numbers, with a shape __[None, 784]__. (Here __None__ means that a dimension can be of any length.)\n",
"\n",
"We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: __Variable__. A __Variable__ is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be __Variable__s."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"W = tf.Variable(tf.zeros([784,10]))\n",
"b = tf.Variable(tf.zeros([10]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We create these __Variable__s by giving __tf.Variable__ the initial value of the __Variable__: in this case, we initialize both __W__ and __b__ as tensors full of zeros. Since we are going to learn __W__ and __b__, it doesn't matter very much what they initially are.\n",
"\n",
"Notice that __W__ has a shape of __[784, 10]__ because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. __b__ has a shape of __[10]__ so we can add it to the output.\n",
"\n",
"We can now implement our model. It only takes one line!"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y = tf.nn.softmax(tf.matmul(x,W) + b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we multiply __x__ by __W__ with the expression __tf.matmul(x,W)__. This is flipped from when we multiplied them in our equation, where we had __Wx__, as a small trick to deal with __x__ being a 2D tensor with multiple inputs. We then add __b__, and finally apply __tf.nn.softmax__.\n",
"\n",
"That's it. It only took us one line to define our model, after a couple short lines of setup. That isn't because TensorFlow is designed to make a softmax regression particularly easy: it's just a very flexible way to describe many kinds of numerical computations, from machine learning models to physics simulations. And once defined, our model can be run on different devices: your computer's CPU, GPUs, and even phones!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad, called the cost or loss, and then try to minimize how bad it is. But the two are equivalent.\n",
"\n",
"One very common, very nice cost function is \"cross-entropy.\" Surprisingly, cross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning. It's defined:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$H_{y'}(y) = -\\sum_i y'_i \\log(y_i)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Where __y__ is our predicted probability distribution, and __y'__ is the true distribution (the one-hot vector we'll input). In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth. Going into more detail about cross-entropy is beyond the scope of this tutorial, but it's well worth [understanding](http://colah.github.io/posts/2015-09-Visual-Information/). [see also](http://nnadl-ja.github.io/nnadl_site_ja/chap3.html).\n",
"\n",
"To implement cross-entropy we need to first add a new placeholder to input the correct answers:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y_ = tf.placeholder(\"float\", [None,10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we can implement the cross-entropy, $-\\sum y'\\log(y)$"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"cross_entropy = -tf.reduce_sum(y_*tf.log(y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, __tf.log__ computes the logarithm of each element of __y__. Next, we multiply each element of __y_ __ with the corresponding element of __tf.log(y_)__. Finally, __tf.reduce_sum__ adds all the elements of the tensor. (Note that this isn't just the cross-entropy of the truth with a single prediction, but the sum of the cross-entropies for all 100 images we looked at. How well we are doing on 100 data points is a much better description of how good our model is than a single data point.)\n",
"\n",
"Now that we know what we want our model to do, it's very easy to have TensorFlow train it to do so. Because TensorFlow know the entire graph of your computations, it can automatically use the [backpropagation](http://colah.github.io/posts/2015-08-Backprop/) algorithm to efficiently determine how your variables affect the cost you ask it minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the cost."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, we ask TensorFlow to minimize __cross_entropy__ using the gradient descent algorithm(最急降下法) with a learning rate of 0.01. Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides [many other optimization algorithms](http://tensorflow.org/api_docs/python/train.md#optimizers): using one is as simple as tweaking one line.\n",
"\n",
"What TensorFlow actually does here, behind the scenes, is it adds new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, will do a step of gradient descent training, slightly tweaking your variables to reduce the cost.\n",
"\n",
"Now we have our model set up to train. One last thing before we launch it, we have to add an operation to initialize the variables we created:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"init = tf.initialize_all_variables()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now launch the model in a __Session__, and run the operation that initializes the variables:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sess = tf.Session()\n",
"sess.run(init)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's train -- we'll run the training step 1000 times!"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for i in range(1000):\n",
" batch_xs, batch_ys = mnist.train.next_batch(100)\n",
" summary = sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each step of the loop, we get a \"batch\" of one hundred random data points from our training set. We run __train_step__ feeding in the batches data to replace the __placeholders__.\n",
"\n",
"Using small batches of random data is called stochastic training -- in this case, stochastic gradient descent. Ideally, we'd like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that's expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluating Our Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How well does our model do?\n",
"\n",
"Well, first let's figure out where we predicted the correct label. __tf.argmax__ is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, __tf.argmax(y,1)__ is the label our model thinks is most likely for each input, while __tf.argmax(y_,1)__ is the correct label. We can use __tf.equal__ to check if our prediction matches the truth."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers and then take the mean. For example, __[True, False, True, True]__ would become __[1,0,1,1]__ which would become __0.75__."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we ask for our accuracy on our test data."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9161\n"
]
}
],
"source": [
"print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This should be about 91%.\n",
"\n",
"Is that good? Well, not really. In fact, it's pretty bad. This is because we're using a very simple model. With some small changes, we can get to 97%. The best models can get to over 99.7% accuracy! (For more information, have a look at this [list of results](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html).)\n",
"\n",
"What matters is that we learned from this model. Still, if you're feeling a bit down about these results, check out the [next tutorial](http://tensorflow.org/tutorials/index.md) where we do a lot better, and learn how to build more sophisticated models using TensorFlow!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment