<a href="" target="_parent"><img src="" alt="Open In Colab"/></a>
"cell_type": "markdown",
"source": [
"# Neural Network and Deep Learning Book\n",
"## Implementing neural network to classify handwritten digits – trained using MNIST dataset using Numpy library in Python 2.7\n",
"We will learn concepts –\n",
"* Neural network (Feed forward)\n",
"* Stochastic gradient descent\n",
"* "
"cell_type": "markdown",
"source": [
"## Clone dataset"
"cell_type": "code",
"source": [
"!git clone"
"cell_type": "markdown",
"source": [
"## Let's start coding our network\n"
"cell_type": "markdown",
"source": [
"### Let's try to understand the shape of a sample network \n",
"with 4 input units(nodes/neurons), 10 hidden units and 2 output units"
"cell_type": "code",
"source": [
"import numpy as np\n",
"sizes = [4, 10, 2]\n",
"# print sizes[1:], zip(sizes[:-1], sizes[1:])\n",
"biases = [np.random.rand(y, 1) for y in sizes[1:]]\n",
"weights = [np.random.rand(y, x) for x, y in zip(sizes[:-1], sizes[1:])]\n",
"print \"Network of shape {0}\".format(sizes)\n",
"print \"-\"*50\n",
"print \"Weights shape –\"\n",
"print [x.shape for x in weights]\n",
"print \"-\"*50\n",
"print \"Biases shape –\"\n",
"print [x.shape for x in biases]\n",
"print \"-\"*50\n",
"def sigmoid(z):\n",
" return 1.0/(1.0+np.exp(-z))\n",
"activation = np.random.rand(4,1)\n",
"print \"Input\\t\\t–\\t{0}\".format(activation.shape)\n",
"print np.array2string(activation, separator=\", \")\n",
"print \"-\"*50\n",
"for idx, (b, w) in enumerate(zip(biases, weights)):\n",
" layer_nbr = idx+2\n",
" print \"Layer({0})[w]\\t–\\t{1}\".format(layer_nbr, w.shape)\n",
" print np.array2string(w, separator=\", \")\n",
" print \"Layer({0})[X]\\t–\\t{1}\".format(layer_nbr, activation.shape)\n",
" print np.array2string(activation, separator=\", \")\n",
" print \"Layer({0})[b]\\t–\\t{1}\".format(layer_nbr, b.shape)\n",
" print np.array2string(b, separator=\", \")\n",
" activation = sigmoid(, activation) + b)\n",
" print \"Layer({0})[a]\\t–\\t{1}\".format(layer_nbr, activation.shape)\n",
" print np.array2string(activation, separator=\", \")\n",
" print \"-\"*50\n",
"print \"Output\\t\\t–\\t{0}\".format(activation.shape)\n",
"print np.array2string(activation, separator=\", \")\n",
"print \"-\"*50"
"Network of shape [4, 10, 2]\n",
"Weights shape –\n",
"[(10, 4), (2, 10)]\n",
"Biases shape –\n",
"[(10, 1), (2, 1)]\n",
"Input\t\t–\t(4, 1)\n",
"[[0.7644721 ],\n",
" [0.88940606],\n",
" [0.07966966],\n",
" [0.07369021]]\n",
"Layer(2)[w]\t–\t(10, 4)\n",
"[[0.60191972, 0.27922429, 0.21050643, 0.34242027],\n",
" [0.5533741 , 0.96932907, 0.77709883, 0.26078927],\n",
" [0.53959722, 0.14262895, 0.18309747, 0.63997159],\n",
" [0.82728674, 0.01507453, 0.48617 , 0.22665386],\n",
" [0.91505228, 0.07087353, 0.78901264, 0.62890682],\n",
" [0.7592766 , 0.44019416, 0.20535865, 0.21752703],\n",
" [0.79955705, 0.00604248, 0.30669417, 0.16734993],\n",
" [0.25008503, 0.01996907, 0.59869366, 0.48222001],\n",
" [0.80851518, 0.33432035, 0.65404222, 0.11310718],\n",
" [0.57657454, 0.42013254, 0.49783748, 0.59643365]]\n",
"Layer(2)[X]\t–\t(4, 1)\n",
"[[0.7644721 ],\n",
" [0.88940606],\n",
" [0.07966966],\n",
" [0.07369021]]\n",
"Layer(2)[b]\t–\t(10, 1)\n",
" [0.02801263],\n",
" [0.12240674],\n",
" [0.85005226],\n",
" [0.2043658 ],\n",
" [0.88185538],\n",
" [0.05333432],\n",
" [0.58287661],\n",
" [0.05824644],\n",
" [0.73892886]]\n",
"Layer(2)[a]\t–\t(10, 1)\n",
" [0.80127905],\n",
" [0.67338074],\n",
" [0.82510609],\n",
" [0.74576239],\n",
" [0.86832199],\n",
" [0.66967464],\n",
" [0.705796 ],\n",
" [0.73771435],\n",
" [0.83712445]]\n",
"Layer(3)[w]\t–\t(2, 10)\n",
"[[0.18792588, 0.26570465, 0.19413644, 0.8948133 , 0.62987714, 0.55032463,\n",
" 0.18313532, 0.40327536, 0.6064623 , 0.47812655],\n",
" [0.14310153, 0.09781943, 0.73149781, 0.30451734, 0.78915406, 0.4532956 ,\n",
" 0.13800774, 0.26227233, 0.75927833, 0.77630933]]\n",
"Layer(3)[X]\t–\t(10, 1)\n",
" [0.80127905],\n",
" [0.67338074],\n",
" [0.82510609],\n",
" [0.74576239],\n",
" [0.86832199],\n",
" [0.66967464],\n",
" [0.705796 ],\n",
" [0.73771435],\n",
" [0.83712445]]\n",
"Layer(3)[b]\t–\t(2, 1)\n",
" [0.11697923]]\n",
"Layer(3)[a]\t–\t(2, 1)\n",
" [0.97151077]]\n",
"Output\t\t–\t(2, 1)\n",
" [0.97151077]]\n",
"name": "stdout"
"cell_type": "code",
"source": [
"import random # from python standard library\n",
"import numpy as np # a third party library\n",
"class Network(object):\n",
" \n",
" def __init__(self, sizes):\n",
" \"\"\"The list ``sizes`` contains the number of neurons in the\n",
" respective layers of the network. For example, if the list\n",
" was [2, 3, 1] then it would be a three-layer network, with the\n",
" first layer containing 2 neurons, the second layer 3 neurons,\n",
" and the third layer 1 neuron. The biases and weights for the\n",
" network are initialized randomly, using a Gaussian\n",
" distribution with mean 0, and variance 1. Note that the first\n",
" layer is assumed to be an input layer, and by convention we\n",
" won't set any biases for those neurons, since biases are only\n",
" ever used in computing the outputs from later layers.\"\"\"\n",
" self.num_layers = len(sizes)\n",
" self.sizes = sizes\n",
" self.biases = [np.random.rand(y, 1) for y in sizes[1:]]\n",
" self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]\n",
" \n",
" def feedforward(self, a):\n",
" \"\"\"Return the output of the network if `a` is input.\"\"\"\n",
" for b, w in zip(self.biases, self.weights):\n",
" # a = (w • a) + b \n",
" a = sigmoid(, a) + b)\n",
" return a\n",
" \n",
" def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):\n",
" \"\"\"Train the neural network using mini-batch stochastic\n",
" gradient descent. The \"training_data\" is a list of tuples\n",
" \"(x, y)\" representing the training inputs and the desired\n",
" outputs. The other non-optional parameters are\n",
" self-explanatory. If \"test_data\" is provided then the\n",
" network will be evaluated against the test data after each\n",
" epoch, and partial progress printed out. This is useful for\n",
" tracking progress, but slows things down substantially.\"\"\"\n",
" if test_data: n_test = len(test_data)\n",
" n = len(training_data)\n",
" for j in xrange(epochs):\n",
" random.shuffle(training_data)\n",
" mini_batches = [\n",
" training_data[k: k+mini_batch_size]\n",
" for k in xrange(0, n, mini_batch_size)\n",
" ]\n",
" for mini_batch in mini_batches:\n",
" self.update_mini_batch(mini_batch, eta)\n",
" if test_data:\n",
" print \"Epoch {0}: {1} / {2}\".format(\n",
" j, self.evaluate(test_data), n_test\n",
" )\n",
" else:\n",
" print \"Epochs {0} complete\".format(j)\n",
" \n",
" def update_mini_batch(self, mini_batch, eta):\n",
" \"\"\"Update the network's weights and biases by applying\n",
" gradient descent using backpropagation to a single mini batch.\n",
" The \"mini_batch\" is a list of tuples \"(x, y)\", and \"eta\"\n",
" is the learning rate.\"\"\"\n",
" nabla_b = [np.zeros(b.shape) for b in self.biases]\n",
" nabla_w = [np.zeros(w.shape) for w in self.weights]\n",
" for x, y in mini_batch:\n",
" delta_nabla_b, delta_nabla_w = self.backprop(x, y)\n",
" nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]\n",
" nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]\n",
" self.weights = [w-(eta/len(mini_batch))*nw\n",
" for w, nw in zip(self.weights, nabla_w)\n",
" ]\n",
" self.biases = [b-(eta/len(mini_batch))*nb\n",
" for b, nb in zip(self.biases, nabla_b)\n",
" ]\n",
" def backprop(self, x, y):\n",
" \"\"\"Return a tuple ``(nabla_b, nabla_w)`` representing the\n",
" gradient for the cost function C_x. ``nabla_b`` and\n",
" ``nabla_w`` are layer-by-layer lists of numpy arrays, similar\n",
" to ``self.biases`` and ``self.weights``.\"\"\"\n",
" nabla_b = [np.zeros(b.shape) for b in self.biases]\n",
" nabla_w = [np.zeros(w.shape) for w in self.weights]\n",
" # feedforward\n",
" activation = x\n",
" activations = [x] # list to store all the activations, layer by layer\n",
" zs = [] # list to store all the z vectors, layer by layer\n",
" for b, w in zip(self.biases, self.weights):\n",
" z =, activation)+b\n",
" zs.append(z)\n",
" activation = sigmoid(z)\n",
" activations.append(activation)\n",
" # backward pass\n",
" delta = self.cost_derivative(activations[-1], y) * \\\n",
" sigmoid_prime(zs[-1])\n",
" nabla_b[-1] = delta\n",
" nabla_w[-1] =, activations[-2].transpose())\n",
" # Note that the variable l in the loop below is used a little\n",
" # differently to the notation in Chapter 2 of the book. Here,\n",
" # l = 1 means the last layer of neurons, l = 2 is the\n",
" # second-last layer, and so on. It's a renumbering of the\n",
" # scheme in the book, used here to take advantage of the fact\n",
" # that Python can use negative indices in lists.\n",
" for l in xrange(2, self.num_layers):\n",
" z = zs[-l]\n",
" sp = sigmoid_prime(z)\n",
" delta =[-l+1].transpose(), delta) * sp\n",
" nabla_b[-l] = delta\n",
" nabla_w[-l] =, activations[-l-1].transpose())\n",
" return (nabla_b, nabla_w)\n",
" def evaluate(self, test_data):\n",
" \"\"\"Return the number of test inputs for which the neural\n",
" network outputs the correct result. Note that the neural\n",
" network's output is assumed to be the index of whichever\n",
" neuron in the final layer has the highest activation.\"\"\"\n",
" test_results = [(np.argmax(self.feedforward(x)), y)\n",
" for (x, y) in test_data]\n",
" return sum(int(x == y) for (x, y) in test_results)\n",
" def cost_derivative(self, output_activations, y):\n",
" \"\"\"Return the vector of partial derivatives \\partial C_x /\n",
" \\partial a for the output activations.\"\"\"\n",
" return (output_activations-y)\n"
"cell_type": "code",
"source": [
"def sigmoid(z):\n",
" return 1.0/(1.0+np.exp(-z))\n",
"def sigmoid_prime(z):\n",
" \"\"\"Derivative of the sigmoid function.\"\"\"\n",
" return sigmoid(z)*(1-sigmoid(z))\n"
"cell_type": "code",
"source": [
"# MNIST Loader\n",
"A library to load the MNIST image data. For details of the data\n",
"structures that are returned, see the doc strings for ``load_data``\n",
"and ``load_data_wrapper``. In practice, ``load_data_wrapper`` is the\n",
"function usually called by our neural network code.\n",
"#### Libraries\n",
"# Standard library\n",
"import cPickle\n",
"import gzip\n",
"# Third-party libraries\n",
"import numpy as np\n",
"def load_data():\n",
" \"\"\"Return the MNIST data as a tuple containing the training data,\n",
" the validation data, and the test data.\n",
" The ``training_data`` is returned as a tuple with two entries.\n",
" The first entry contains the actual training images. This is a\n",
" numpy ndarray with 50,000 entries. Each entry is, in turn, a\n",
" numpy ndarray with 784 values, representing the 28 * 28 = 784\n",
" pixels in a single MNIST image.\n",
" The second entry in the ``training_data`` tuple is a numpy ndarray\n",
" containing 50,000 entries. Those entries are just the digit\n",
" values (0...9) for the corresponding images contained in the first\n",
" entry of the tuple.\n",
" The ``validation_data`` and ``test_data`` are similar, except\n",
" each contains only 10,000 images.\n",
" This is a nice data format, but for use in neural networks it's\n",
" helpful to modify the format of the ``training_data`` a little.\n",
" That's done in the wrapper function ``load_data_wrapper()``, see\n",
" below.\n",
" \"\"\"\n",
" f ='/content/neural-networks-and-deep-learning/data/mnist.pkl.gz', 'rb')\n",
" training_data, validation_data, test_data = cPickle.load(f)\n",
" f.close()\n",
" return (training_data, validation_data, test_data)\n",
"def load_data_wrapper():\n",
" \"\"\"Return a tuple containing ``(training_data, validation_data,\n",
" test_data)``. Based on ``load_data``, but the format is more\n",
" convenient for use in our implementation of neural networks.\n",
" In particular, ``training_data`` is a list containing 50,000\n",
" 2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray\n",
" containing the input image. ``y`` is a 10-dimensional\n",
" numpy.ndarray representing the unit vector corresponding to the\n",
" correct digit for ``x``.\n",
" ``validation_data`` and ``test_data`` are lists containing 10,000\n",
" 2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional\n",
" numpy.ndarry containing the input image, and ``y`` is the\n",
" corresponding classification, i.e., the digit values (integers)\n",
" corresponding to ``x``.\n",
" Obviously, this means we're using slightly different formats for\n",
" the training data and the validation / test data. These formats\n",
" turn out to be the most convenient for use in our neural network\n",
" code.\"\"\"\n",
" tr_d, va_d, te_d = load_data()\n",
" training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]\n",
" training_results = [vectorized_result(y) for y in tr_d[1]]\n",
" training_data = zip(training_inputs, training_results)\n",
" validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]\n",
" validation_data = zip(validation_inputs, va_d[1])\n",
" test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]\n",
" test_data = zip(test_inputs, te_d[1])\n",
" return (training_data, validation_data, test_data)\n",
"def vectorized_result(j):\n",
" \"\"\"Return a 10-dimensional unit vector with a 1.0 in the jth\n",
" position and zeroes elsewhere. This is used to convert a digit\n",
" (0...9) into a corresponding desired output from the neural\n",
" network.\"\"\"\n",
" e = np.zeros((10, 1))\n",
" e[j] = 1.0\n",
" return e"
"cell_type": "code",
"source": [
"training_data, validation_data, test_data = load_data_wrapper()"
"cell_type": "code",
"source": [
"# initialize the network\n",
"net = Network([784, 30, 10])"
"cell_type": "code",
"source": [
"net.SGD(training_data, 30, 10, 3.0, test_data=test_data)"
"Epoch 0: 9026 / 10000\n",
"Epoch 1: 9207 / 10000\n",
"Epoch 2: 9326 / 10000\n",
"Epoch 3: 9335 / 10000\n",
"Epoch 4: 9355 / 10000\n",
"Epoch 5: 9413 / 10000\n",
"Epoch 6: 9404 / 10000\n",
"Epoch 7: 9441 / 10000\n",
"Epoch 8: 9452 / 10000\n",
"Epoch 9: 9447 / 10000\n",
"Epoch 10: 9473 / 10000\n",
"Epoch 11: 9463 / 10000\n",
"Epoch 12: 9447 / 10000\n",
"Epoch 13: 9478 / 10000\n",
"Epoch 14: 9466 / 10000\n",
"Epoch 15: 9456 / 10000\n",
"Epoch 16: 9477 / 10000\n",
"Epoch 17: 9472 / 10000\n",
"Epoch 18: 9461 / 10000\n",
"Epoch 19: 9497 / 10000\n",
"Epoch 20: 9479 / 10000\n",
"Epoch 21: 9490 / 10000\n",
"Epoch 22: 9489 / 10000\n",
"Epoch 23: 9467 / 10000\n",
"Epoch 24: 9456 / 10000\n",
"Epoch 25: 9473 / 10000\n",
"Epoch 26: 9474 / 10000\n",
"Epoch 27: 9483 / 10000\n",
"Epoch 28: 9502 / 10000\n",
"Epoch 29: 9481 / 10000\n"
"name": "stdout"
"cell_type": "markdown",
"source": [
"### Function to plot the sample as an image"
"cell_type": "code",
"source": [
"from matplotlib import pyplot as plt\n",
"def gen_image(arr):\n",
" two_d = (np.reshape(arr, (28, 28)) * 255).astype(np.uint8)\n",
" plt.imshow(two_d, interpolation='nearest')\n",
" return plt"
"cell_type": "markdown",
"source": [
"### Let's take a random sample from the test set "
"cell_type": "code",
"source": [
"import random\n",
"rand_idx = random.randint(0, len(test_data)-1) # a random index\n",
"X, y = test_data[rand_idx]\n"
"cell_type": "markdown",
"source": [
"### and display the image before we run the sample through network"
"cell_type": "code",
"source": [
image/png: kgwFAOOnUZ5Yej0fnzp1TZGSk6urq1NzcrLi4OOXm5vIH3wH0auZYnjhxQgUFBTp8+LCq\nqqoUGxur5ORkFRYW6u+//9amTZtCPSsAOMb0q0Pl5eU6cOCAioqKNHDgQLndbiUnJ0uSUlNTVV1d\nHdIhAcBpfmP54MED5efnq6CgwHv1e/Xq1aqpqZEkVVZWKjExMbRTAoDD/F7gOX78uOrr65WVleXd\ntmDBAmVlZSk6OloxMTHKy8sL6ZAA4DR+KR0ADLjdEQAMiCUAGBBLADAglgBgQCwBwIBYAoABsQQA\nA2IJAAbEEgAMiCUAGBBLADAglgBgQCwBwIBYAoABsQQAA2IJAAbEEgAMiCUAGBBLADAglgBgQCwB\nwIBYAoABsQQAA2IJAAbEEgAMiCUAGBBLADAglgBgQCwBwMDlxJNu27ZNFy5cUEREhDZs2KBx48Y5\nMUZQVVZWas2aNUpMTJQkjR49Wrm5uQ5PFbjq6mqtXLlSH374oZYsWaIbN25o/fr1am1t1bBhw7Rz\n505FRUU5PWanPPmacnJydPHiRcXGxkqSli1bphkzZjg7ZCfl5+fr/Pnzamlp0fLlyzV27Ngef5yk\np1/XyZMnHT9W3R7Ls2fP6tq1a/J4PLp69ao2bNggj8fT3WOExMSJE7V3716nx+iyhoYGbd26VW63\n27tt7969ysjI0OzZs7V7926VlJQoIyPDwSk7x9drkqTs7GylpKQ4NFXXnDlzRpcvX5bH41F9fb3S\n09Pldrt79HGSfL+uSZMmOX6suv1teEVFhdLS0iRJo0aN0r179/Tw4cPuHgPPEBUVpaKiIsXHx3u3\nVVZWaubMmZKklJQUVVRUODVeQHy9pp5uwoQJ2rNnjyRp0KBBamxs7PHHSfL9ulpbWx2eyoFY3rp1\nS4MHD/Y+HjJkiOrq6rp7jJC4cuWKVqxYocWLF+v06dNOjxMwl8ul/v37t9vW2NjofTsXFxfX446Z\nr9ckScXFxcrMzNTatWt1584dByYLXGRkpGJiYiRJJSUlmjZtWo8/TpLv1xUZGen4sXLkM8v/amtr\nc3qEoHj55Ze1atUqzZ49WzU1NcrMzFRZWVmP/LzIn95yzObNm6fY2FglJyersLBQ+/fv16ZNm5we\nq9NOnDihkpISHT58WLNmzfJu7+nH6b+vq6qqyvFj1e1nlvHx8bp165b38c2bNzVs2LDuHiPoEhIS\nNGfOHEVERGjkyJEaOnSoamtrnR4raGJiYvTo0SNJUm1tba94O+t2u5WcnCxJSk1NVXV1tcMTdV55\nebkOHDigoqIiDRw4sNccpydfVzgcq26P5ZQpU1RaWipJunjxouLj4zVgwIDuHiPojh07pkOHDkmS\n6urqdPv2bSUkJDg8VfBMnjzZe9zKyso0depUhyfqutWrV6umpkbS/34m+/+/ydBTPHjwQPn5+Soo\nKPBeJe4Nx8nX6wqHYxXR5sC5+q5du3Tu3DlFRERo8+bNSkpK6u4Rgu7hw4dat26d7t+/r+bmZq1a\ntUrTp093eqyAVFVVaceOHbp+/bpcLpcSEhK0a9cu5eTkqKmpScOHD1deXp769u3r9Khmvl7TkiVL\nVFhYqOjoaMXExCgvL09xcXFOj2rm8Xi0b98+vfLKK95t27dv18aNG3vscZJ8v64FCxaouLjY0WPl\nSCwBoKfhDh4AMCCWAGBALAHAgFgCgAGxBAADYgkABsQSAAyIJQAY/A+egkVTsR94zAAAAABJRU5E\nrkJggg==\n",
"text/plain": [
"<Figure size 576x396 with 1 Axes>"
"cell_type": "markdown",
"source": [
"### What about the prediction?"
"cell_type": "code",
"source": [
"pred = net.feedforward(X)\n",
"print \"Rounded network output –\"\n",
"print [round(x, 4) for x in pred]\n",
"print \"Prediction – {0}\\nActual(label) – {1}\".format(np.argmax(pred), y)"
"Rounded network output –\n",
"[0.0, 0.0, 0.0, 0.0, 0.0, 0.9999, 0.0, 0.0, 0.0, 0.0]\n",
"Prediction – 5\n",
"Actual(label) – 5\n"
"name": "stdout"
