Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rpicard92/e739f6b5022e15e309368e86c773f14d to your computer and use it in GitHub Desktop.
Save rpicard92/e739f6b5022e15e309368e86c773f14d to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assignment6\n",
"## CS-5891-01 Special Topics Deep Learning\n",
"## Ronald Picard\n",
"\n",
"In this notebook we will walk through the design, training, and testing of neural networks with multiple hidden layers using minibatch stochastic gradient descent with momentum. These neural networks will be used for logistic regression, which is an archaic name for binary classification.\n",
"\n",
"The binary classification will be performed on images of handwritten numerical digits. More specifically, the last numerical digit of my student ID. This digit happens to be 9. Therefore, the goal of our neural networks will be to output a the value of 1 when the handwritten numerical digit image input is a 9, and 0 in all other cases.\n",
"\n",
"The data set we will be using is the MNIST data set. This is a very popular data set amoung the machine learning community. The data set contains 60,000 images, and each image contains a handwritten numerical digit. Each of the images have been provided with a truth label that corresponds to the handwritten digit within the image from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. \n",
"\n",
"For our case, we only care about when the image is 9. Therefore we will need to re-label the truth labels so that all truth labels with the value of 9 are given to the value of 1, and all other truth labels are given the value of 0. \n",
"\n",
"To start we need to import some needed classes."
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import struct\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"import matplotlib.pyplot as pyplot\n",
"import csv\n",
"import time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we must change our path string to the path of our data file containing the features. (Please note that you must change this string to point to the directory with the data file on your machine data file on your machine.) \n",
"\n",
"Second, we much change the string name of the data files to the names of the MNIST data files. (Please note that you may NOT need to change these. Only change them if your MINST data files are named differently.)"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"## path\n",
"path = 'C:/Users/computer/OneDrive - Vanderbilt/Vanderbilt_Spring_2019/CS_5891_01_SpecialTopicsDeepLearning/Assignment2/'\n",
"\n",
"#Train data\n",
"fname_train_images = os.path.join(path, 'train-images.idx3-ubyte') # the training set image file path\n",
"fname_train_labels = os.path.join(path, 'train-labels.idx1-ubyte') # the training set label file path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we retrieve the data from the data files as follows. This imports the data into a feature tensor (3-D matrix) in which each index is a feature matrix corresponding to an image. The label data comes in the form of a vector where each index corresponds to the index of the feature matrix (image) of the feature tensor. "
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The training set contains 60000 images\n",
"The shape of the image is (28, 28)\n"
]
}
],
"source": [
"# open the label file and load it to the \"train_labels\"\n",
"with open(fname_train_labels, 'rb') as flbl:\n",
" magic, num = struct.unpack(\">II\", flbl.read(8))\n",
" labels = np.fromfile(flbl, dtype=np.uint8)\n",
"\n",
"# open the image file and load it to the \"train_images\"\n",
"with open(fname_train_images, 'rb') as fimg:\n",
" magic, num, rows, cols = struct.unpack(\">IIII\", fimg.read(16))\n",
" images = np.fromfile(fimg, dtype=np.uint8).reshape(len(labels), rows, cols)\n",
"\n",
"print('The training set contains', len(images), 'images') # print the how many images contained in the training set\n",
"print('The shape of the image is', images[0].shape) # print the shape of the image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to perform both two steps; feature scaling and feature normalization. Feature scaling consists of converting the 28 X 28 image matrices into 784 X 1 feature vectors. In essence we will flatten the images out into vectors so that we can use an input a vector to our single neuron. Feature normalization is a process of normalizing the pixel data to between 0 <= x <= 1 (for logistic regression). Each pixel comes on a scale of 0 <= x <= 255. Since 255 is the maximum for every pixel we shall divide each pixel by that number (elementwise) in order to normalize each pixel to between 0 and 1 (inclusive).\n",
"\n",
"One additional item we need to take care of is relabeling our label (truth) data so that we have a binary classification in which all 9s are converted to 1s and all other labels are converted to 0s."
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(784, 60000)\n",
"(784, 60000)\n",
"60000\n"
]
}
],
"source": [
"# feature scaling\n",
"matrix_side_length = len(images[0])\n",
"vector_size = matrix_side_length*matrix_side_length\n",
"\n",
"scaled_images_feature_matrix = []\n",
"for image in images:\n",
" reshaped_image = np.array(image).reshape((vector_size))\n",
" scaled_images_feature_matrix.append(reshaped_image)\n",
"\n",
"# convert to numpy array\n",
"scaled_images_feature_matrix = np.transpose(np.array(scaled_images_feature_matrix))\n",
"print(scaled_images_feature_matrix.shape) # scaled_images_feature_matrix is a matrix of 60000 X 784\n",
"#print(scaled_images_feature_matrix[0].shape)\n",
"\n",
"# feature normilization\n",
"normilization_factor = 1/255\n",
"normalized_scaled_images_feature_matrix = np.multiply(normilization_factor, scaled_images_feature_matrix)\n",
"print(normalized_scaled_images_feature_matrix.shape)\n",
"#print(normalized_scaled_images_feature_matrix[0])\n",
"\n",
"# re-label for binary classification\n",
"value_for_1 = 9\n",
"binary_labels = []\n",
"for label in labels:\n",
" if(label == 9):\n",
" binary_labels.append(1)\n",
" else:\n",
" binary_labels.append(0)\n",
"\n",
"# convert to numpy array\n",
"binary_labels = np.array(binary_labels)\n",
"print(len(binary_labels)) # binary_labels is a row vector of 1 X 60000\n",
"#print(binary_labels[0])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to test the efficacy of our neural networks, we need to split up the our label data into two data sets; a smaller and a larger one. The larger set will be the training data that we will use to train our neural networks on. The smaller set will be the testing data that we will used to test the accuracy of our neural nets. The MNIST data set contains 60,000 images. Therefore, we will use 50,000 images for our training data set, and 10,000 images for our testing data set. \n",
"\n",
"It is common practice to use a smaller subset of the total data set to debug (ensure it works) and tune hyper-parameters before using the entire time-comsuming data set. This smaller subset is known as a validation set. Therefore, we will first use a validation data set of 600 images. 500 of these images will be used for as our training data set, and the other 100 of these images will be used for our test data set. \n",
"\n",
"Thus, we will begin by sifting out a validation set from our total data set."
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(784, 500)\n",
"(500,)\n",
"(784, 100)\n",
"(100,)\n"
]
}
],
"source": [
"# create a data set\n",
"size = vector_size\n",
"\n",
"number_of_testing_images = 100\n",
"number_of_training_images = 500\n",
"number_of_validation_images = number_of_testing_images + number_of_training_images\n",
"\n",
"training_images = []\n",
"training_labels = []\n",
"testing_images = []\n",
"testing_labels = []\n",
"\n",
"factor = 0\n",
"for index in range(0, number_of_validation_images):\n",
" if(index <= number_of_training_images - 1):\n",
" training_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" training_labels.append(binary_labels[index + factor])\n",
" else:\n",
" testing_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" testing_labels.append(binary_labels[index + factor])\n",
" \n",
"# covert to numpy array\n",
"training_images = np.transpose(np.array(training_images))\n",
"training_labels = np.array(training_labels)\n",
"testing_images = np.transpose(np.array(testing_images))\n",
"testing_labels = np.array(testing_labels)\n",
"\n",
"# logger\n",
"print(training_images.shape) # validation_training_images is a matrix of 784 X 500\n",
"print(training_labels.shape) # validation_testing_labels is a row vector of 1 X 500\n",
"print(testing_images.shape) # validation_training_images is a matrix of 784 X 100\n",
"print(testing_labels.shape) # validation_testing_labels is a row vector of 1 X 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we move on to the training of our neural network with multiple hidden layers. \n",
"\n",
"Part 1 - Feed Forword:\n",
"\n",
"For these neural networks we will use multiple hidden layers with between 5-20 units per layer (neurons). The first layer will have an input of a matrix (784 X number_of_images) of vectorized images of 784 X 1, and will output a matrix (# of units X # of images). This matrix will be input into the next hidden layer, which output another matrix (# of units X # of images.) The output layer will take and input matrix that is the output matrix of the last hidden layer and will output a row vector of probabilities which we will convert into binary classifications of 0 or 1. (If P(x) >= 0.5 then we will convert it to a 1, otherwise we will convert to 0.) \n",
"\n",
"The model for the units of the hidden layers will be a vecorized linear model Z^[l] = W^[l]^T * A^[l-1] + B^[l], where W is a matrix of 5-10 units (# of units) X # of units or parameter weights, A is the input matrix of vectorized images (784 X # of images) or the output of one of the layers (1 X # of units), and B is a row vector of bias's. (Note: in this case, b will be scalar that applied in a broadcasing manner to save on memory.) The output of this model Z^[l] will be a matrix (5-10 units X # of images). Z^[1] will be subject to an activation function; which for this assignment will be relu (note: we will test tanh once for comparison). \n",
"\n",
"Hidden Layer Activation Function: relu activation function is A^[1] = relu(Z) = max(Z, 0).\n",
"\n",
"The model of the output layer will be a vectorized linear model Z^[1] = W^[l] * A^[1] + b^[l] with a single unit. This linear model will be subjected to a sigmoid activation function.\n",
"\n",
"The resultant row vector will then be used to calculate the cost function values in an elementwise manner. The cost function for this binary classification will be L(Y_Predicted, Y_Label) = -Y_Label^[l] * Log(A^[l]) - (1-Y_Predicted^[l]) * Log(1-A^[l]), where Y_Label is the True Label, Y_Predicted is the probability value predicted by the neural network, and A is the activation function value. The resultant cost row vector will be added up and divided by the number of elements in order to calculate the average cost.\n",
"\n",
"Part 2 - Back Propogation:\n",
"\n",
"The back propogation technique that we will use for training the neural network, will be gradient descent. This involves utilizing the gradient of the cost function to updated the model parameters in our layers. In order to calculate the gradient we will utilize the chain rule. The goal of back propogation is the adjust the parameter weights and bias's of our model to accurately perform binary classification. In general the chain rule can be used to find the gradient of the cost function (vecorzied rates of change) with respect to the model parameters. The following is the chain that we will utilize. \n",
"\n",
"\n",
"Generalized Chain Rule for N layers: \n",
"\n",
"dL(A^[n], Y)/dW^[l] = ∏(i = n to l) (dl(A^[i], Y)/dz^[i]) * dZ^[i]/dA^[i-1] * dA^[i-1]/dZ^[i-2] *....* dz^[l]/dW^[l];\n",
"\n",
"dL(A^[n], Y)/dB^[l] = ∏(i = n to l) (dl(A^[i], Y)/dz^[i]) * dZ^[i]/dA^[i-1] * dA^[i-1]/dZ^[i-2] *....* dz^[l]/dB^[l];\n",
"\n",
"\n",
"\n",
"Output Layer - Back Propogation:\n",
"\n",
"The partial derivative of the cost function with respect to the output layer sigmoid activation function is found by the following:\n",
"\n",
"dL(A^[n], Y)/dA^[n] = -Y/A^[n] + (1-A^[n])/(1-A^[n]).\n",
"\n",
"\n",
"Due to the chain rule, the derivative of the cost function with respect to the linear model Z^[n] is found by the following:\n",
"\n",
"dL(A^[n], y)/dz = dL(A^[n], y)/dA^[n] * dA^[n]/dZ^[n].\n",
"\n",
"The derivative of the sigmoid activation function is da/dz is found by the following:\n",
"\n",
"dA^[n]/dZ^[n] = sigma(Z^[n]) * (1-sigma(Z^[n]))\n",
"\n",
"Therefore, the derivative of the cost function with respect to the output of the linear model is found by the following:\n",
"\n",
"dL(A^[n], Y)/dA^[n] * dA^[n]/dZ^[n]. = (-Y/A^[n] + (1-Y)/(1-A^[n])) * (sigma(Z^[n]) * (1-sigma(Z^[n]))) = A^[n]-Y. (For convienence we will say dZ^[n] = A^[n]-Y.)\n",
"\n",
"Now we can extrapolate the chain rule to all the paramters of the linear model our output layer.\n",
"\n",
"dL(A^[n], Y)/dW^[n] = dZ^[n] * dZ^[n]/dW^[n] = dZ^[n] * A^[n-1] = A^[n-1] * dZ^[n] (we will change our notation to dW^[n] = A^[n-1] * dZ^[n] for convienence)\n",
"\n",
"dL(A^[n], Y)/dB^[n] = dZ^[n] * dZ^[n]/dB^[n] = dZ^[n] (we will change our notation to dW^[n] = dZ^[n] for convienence)\n",
"\n",
"\n",
"\n",
"Hidden Layers - Back Propagation:\n",
"\n",
"dL(A^[n], Y)/dZ^[l] = ∏(i = n to l) (dl(A^[i], Y)/dz^[i]) * dZ^[i]/dA^[i-1] * dA^[i-1]/dZ^[i-2] *....* dA^[l]/dZ^[l]\n",
"\n",
"dL(A^[n], Y)/dZ^[l] = dZ^[l+1] * dZ^[l+1]/dA^[l] = W^[l+1] * dz^[l+1] * (element-wise) dA^[1]/dZ^[1]. The reason this is element-wise is because we are propgating from a single neuron to a layer with multiple neurons (we shall rename this dz^[l] = W^[l+1] * dz^[l+1] * (element-wise) dA^[1]/dZ^[1] for conveinience) \n",
"\n",
"dA^[1]/dZ^[1] depends on the activation function we are using in the hidden layer (in this case relu): The derivative of relu activation function is dA^[l]/dZ^[l] = if Z^[l] > 0 then 1 else 0.\n",
"\n",
"\n",
"dL(A^[n], Y)/dW^[l] = dZ^[l] * dZ^[l]/dW^[l] = dZ^[l] * X^T (we will change our notation to dW^[l] = dZ^[l] * A[l-1]^T for convienence)\n",
"\n",
"dL(A^[n], Y)/dB^[l] = dZ^[l] * dZ^[l]/dB^[l] = dZ^[l] (we will change our notation to dB^[l] = dZ^[l] for convienence)\n",
"\n",
"\n",
"Find vector averages:\n",
"\n",
"m = # number of images\n",
"\n",
"dW^[l] = 1/m * (A^[l-1] * dZ^[l])\n",
"\n",
"dB^[l] = 1/m * (dZ^[l])\n",
"\n",
"\n",
"Finally, we will update our the weights and bias's of the layers.\n",
"\n",
"\n",
"W^[l]:= W^[l] - alpha * dW^[l]\n",
"\n",
"B^[l]:= B^[l] - alpha * dB^[l]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first thing we have have to do is initialize our weights and bias's. There are multiple ways to initialize weights and bias's. Typically we will set our values based on either a uniform distribution between or a normal distribution with a some reasonable mean and standard deviation. There is some flexibility in the initalization of the weights but in general they need to be small (not to small) and varied. The weights need to be different so that the gradients with respect to each other are different. In other words we don't aways want the relative rates of change to be 0. Additionally, we do not want to reach saturation on our output activation function where the gradients are 0 (vanishing gradiants). For this assignment we wills stick with with a uniform random between -.1 and .1. We will also set a random seed each time so that we fors our random values to be the same (or similar if there are more layers)."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we implement our minibatch stochastic gradient descent algorithm. The only difference betwen minibatch stochasic gradient descent and general full-batch gradient descent is that during every epoch (run) we split up our training data into minibatches based on the specified minibatch size. If they data does not evenly split, then the last batch will be smaller; so that we utilize all of the trainig data during every epoch. Then during evey epoch we train once on each minibatch. Since we are training on minibatches, our path the extrema of the function we are tryign to find will not as direct. Therefore, our cost and test accuracy will not decrease every epoch; however, the general trend will be decreasing. Minibatch gradient decense will help prevent us from getting stuck on local extrema, as well as increase the speed at which the code runs.\n",
"\n",
"We will also collect data on the accuracy of our networks as a function of training iterations. To do this we will need to find the number of inaccuracate binary classifications (false positives & false negatives). This will be acommplished using our test data set. We will send our test data set through the network and compare the results with the true labels of the test data set. "
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 10\n",
"Cost: 0.0024360064743517867\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 10\n",
"Cost: 0.0011374617457707798\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 10\n",
"Cost: 0.0002504582126068609\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 10\n",
"Cost: 0.0002903632841629266\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 4.991669654846191 seconds\n",
"Cost: 0.00019616014290038203\n",
"Accuracy: 95.0 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZkAAAEWCAYAAAC0Q+rDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3Xm8HFWZ//HPNxtJWASyyJaQUaKCyBKuCCKI24iICwgqiiwiEUUlv1FHUUdxFBFGRdERBwXHBXcQUUeURZhhFJhEtkDAoIJsIUESMCSELM/vj3M6t9Lpvl333qp0cu/3/Xr1q6trPdVdXU8959SiiMDMzKwOI7pdADMzG7ocZMzMrDYOMmZmVhsHGTMzq42DjJmZ1cZBxszMauMgY2YbjJL/k7RbRfNbIOlFVcyrxbw3k7RU0g5thp8s6co+pr9e0jF1lK3b8u94k6TpncYdVJCRdI+k5fmHeFjSNyVtMZh5Vi2X8eUlxvsHSWskfXVDlKtbJG0v6QJJD0n6u6Q7JX1S0ubdLlsnkt4o6XeSlkm6psXwvSTNycPnSNqrMEySzpL0t/w6W5JazOOteXtemrftNYXPSwdR9udIWtVhnM9KWpl/l8Zv8yVJk/uxnEHt2CRNk3SppEckPSbpVklvqXDZRwIPRMQdeZrGOi+VtETSdZJ6Blr+siSNlRSSdmrq/1lJ3wCIiBURsUVEPFh3efqjWMZuiXSB5TnA6Z3GrSKTeU1EbAHMAJ4PfKy/M5A0qoJyDNaxwGLgzZI225AL3lDrL2lb4PfAOGD/iNgSeAWwNfDMAcxvQ/9ujwJfBD7boixjgJ8B3wW2Ab4F/Cz3B5gJvB7YE9gDOAx4Z/N8IuKivGPZAngV8GDjc+5Xt2/l32UCcBQwDZgtadIGWDbA94G7gCnAROAE4JEK538y8J2mft/K3+0k4HrghxUuzypW+N9fArxa0oQ+J4iIAb+Ae4CXFz7/G/CL3P004ALgIeAB4NPAyDzseOB/SZHwUeDTuf9JwDzg78AdwIzcfwfgYmAR8BfgfYVlng78CPh2nu52oCcP+w6wBlgOLAX+uY91+RPwLuBh4MimYc8FrshlfRj4SO4/EvhInvbvwBzSn3MaEMCowjyuAd7Rbv1JO/mrgb+R/tQXAVsXpp+Sf9RFeZyvAJvl6Z9XGG9yXt9JLdbx08BtwIg230F/y30msATYvTD+pLz8yfnzYcDNebzfAXsMZpvL83wHcE1Tv3/M25kK/f4KHJK7fwfMLAw7Ebi+w3IOBu5v0X8KKaA9AvwZOLkw7ADgJuBxYAFwZu6/MH+3S/Nr7xbz/SzwjaZ+o0n/icZ/ZBLwq7wdPJrLsX0e9nlgNfBkXsbnc//zgPtzmW4E9muzvgJWAs/p4zs5ELgh/55/AA7oa9lN047P85/Ybp1JB6sBbFnodzhwa17m/wC7FYYtAF6Uu38AfKww7BDg7jbrMTYvZ6d2v0HzOKT/1n/l7/H3pO3/ysK0rwbm53J+gRQwjykMfycpgD8K/BLYsWk5J5H2JYuBc/r4DdbbTgrDPk7aR/4dmAu8uvDdPw5ML4y7E7CMvJ8p8T1/gLR/XVbo/z/Am/r6H1XWJiNpCnAo6Q8G6UhyFbALsDdpJ/COwiQvIP1BJwNnSDqKFDCOBbYCXgv8TdII4OfALcCOwMuAWZJeWZjXa0kb2NbAZaQdMBHxNtKO5jWRjkTPblP2A0lf+A9IAevYwrAtgSuBy0nBbhfgqjz4n4Cj83pvBbyd9KOVsc76k/7gZ+Zl7ErakZ2eyzAS+AVwLykQ7Aj8ICJW5DIXqyiOJm34i1os8+XAJRGxpmQZO5X7X0mB7+jC8DcC10bEQkkzgAtJf64JwH8Al9WUKT4XuDXylp/dmvs3ht9SGHZLYVhp+bf4L1LQ2oG0I/uIpBfnUb4CfCYitgKmA5fm/gcBq6M3K7qJEiJiJWn7PzD3GgF8DZgK/EPud04e9/3A/5EOCrbInyHtEJ9H+g1+BvxY0ugWywpSAPmPXDXZXJU0La/PR4FtSbUWl0rapo9lF+0KPB4RLTOjvF28jbRDW5r77Qd8lZRRTSAdOF7apdqP80kB4umkA9K3NwZI2o6073g/6UBgEdBTGP5mYBbwmjz9TaSsu+hVpH3lDOAESQcPoIx3AS8kHeSfBfxA0sSIWAb8hHX3FW8FfhkRS0p+z28i1XwUM5d5pNqB9vqKQJ1epExmKSny3ZsLOY70Ja4AxhXGPRr4be4+Hvhr07x+DZzaYhkvaDHuacA3c/fprHs0sRuwvKmML++wHt8ALs3d+5OOtiYXyn1Tm+nuAl7Xov80OmcEf+1Qptc3lpvLtKg4v6bv5z5ydgLMBt7YZp7zKRx1V1FuUuD6c+Hz/wLH5u7zgE+1+M5ePMjtrlUm8y+kwFvsdxFweu5eTeEInRQAgkLm02I5B9OUyQAvBuY39fskcF7uvpG0E57QNM5zgFUd1qvlESpp53Rbm2n2Ax4qfF7n6LnF+CIdCD27zfCJpBqJeaRagNnkrAv4BPD1pvGvJR/Jllj2y4B7WqzzCtI+ZDUp4zugMPybwEebprkXeEHuHmwm81heduP1JC0ymdy9BphWmMcXyPseUnXsNYVhI/O6HJM//xZ4a2H4aNJ+5umF5fQUhl8GzOrPdtJm3DuBVxa23bsLw24DXtuP7/ktLeb/eeCrfZWhikzm9RGxdUTsHBHvjojlwM75S3woN+YtIR3FFhsw72uazxRSqthsZ2CHxnzyvD5C+nEaFhS6lwFjyx7pSBpHqvu+CCAifk/KfhqNne3K1WlYJ+usv6TJkn4g6QFJj5OOciYWlnNvRKzXcBwRNwBPAC+W9BxSpnVZm2X+Ddh+gOVtWW5SFd84SS+QtDOwF/DTPGxn4P1Nv90UUgawDkkfKTSwf20A5VpKyiaLtiJVG7QavhWwNPI/pR92BqY1rdM/Advl4ceR2nz+KOmGpox7oHYkHUEjaUtJF0r6a95OfkPvdtKSpNMk3SXpMVJVzNh200TEIxHxwYjYNa/TH0nZKqR1P6Zp3Xto8Xu2sRjYskX/70TE1qRt80+kbahhZ1KmWFzmJNJ3UoXn5v3X1rkMX2wz3nakAF3c/u8tdO9QHBYRq0nVtw07A18rrMMiUk1PMVts3o/1uw1Q0on5ZI3Gcnah97f+b2CkpP2VTorZnlT12ihfp++5+b8P6fdc0leZ6jqF+T7S0cnEwg+4VUQUqyea/9z30brx+T7gL8UNISK2jIhDS5al007kcNIO56tKp0MuIH2xjSqzduXqa9gT+X18od92TeM0l+vM3G+PSFUtx5A26sZypvYROL+Vx38b8JOIeLLNeFcCh+cqyFb6Xe5IVW8/ImV8byG1yTV27PcBZzT9duMj4vvNC46Iz0RvVdLJbcrXl9uBPaR1zhjbI/dvDC+m9XsWhvXHfcCdLbbHw/N6zIuIN5EOqM4FLsknH/Q3mAFrG1kPI9V9A3yYtGN6ft5O/pHe7YTm5Uh6BfBe0na+Namaa3nTNC1FxELS0fo0pbMP7yMdQRfXffOIOKfVsluYB2wpqV2AW0iqWj2zMM59wMdbbEOXtJjFE/S97Q7GAtL6TSn0m1rofqg4LP/HmnfQxzetx7iImFNVASU9C/gyKavaNgfNu8m/dT6g+ja9+4ofRKqObZSv0/fc6vfdlXWroddTS5CJiIdIR1ifl7SVpBGSnlmot27lG8AHJO2jZJd8ZHwj8LikD0kaJ2mkpN0lPb9kcR4GntHH8ONI7QbPIx1B7UVqvN1L0vNIbSHbSZqldN78lpJeUCjzpyRNz2XeQ9KESO0hD5CO+kZKejudz97aklz1KGlH4IOFYTeSNuLPStpc6fTLAwrDv0PaiRxD2oja+QIpoH4rf7dI2lHSFyTtMcByA3yPVF/71tzd8HXg5JzlKJf91UrtXP2WyzQWGAWMyN9Do23hGlJ1y/vy7/Se3P/q/P5t4J/y+u5Aqjv/zwEU47pclll5+aPy7z4j9z82bwOrSVUxQapmWUg6ipzads7rrutoSc8lBfAtSQGL3L2MtJ1MZP2zOZu39y1J1TKLgDGkdrSxfSz3c5J2y9/100hng82NiCdIBzNHSXpZHj4udzd25n3+13ItxzWk9ql249xKOuJutOmcD7xXUk/ehraQ9FpJ41tMfjNwmKSt83/ove2W01/5wO3nwCfzeu9B2t4bLgOeL+mwvE1+kBTQG74GfEzSswEkbSPpDYMo0si8/TVeY0iZzxrSbz1C0smkTKbo26R206NZd1/Rn++ZvA6bk/abV7UbB6j27LKmYU+j96yWx0gNXW/Ow44HrmsxzcmkOvulpDMjGnXBO5BOrVxASrmvbyyX1Cbz3cI8plFoVwBeR6r+WgJ8oGl5O5JS1ue1KMt/AZ/L3bvnL3JxLsOHo7fe9WP0ns3xf/SeifKq3H8Jqd7yWtZt27iuaXnPJZ2dtpT0Z3k/hfYA0lHTpfSefXZu0/RX5t+jbRtD4bu8MK/H30l1tp8Axg+k3IX53k2q0hnT1P+Q/L0sIQXKH1M4c6if29vx+bctvv6zMHzv/B0uJ535tHdhmICzcxkfzd2dvquDaX922Y9IO9XFpHaog/KwH+Xf5++kOu9DC9OdRdoBLAH2ajHfz5ICwt9JR+V/JB2Zbt+0HVyXt5M7gXdTaOsh17vncp1Nqrb+DunMogdI7Ttr2zFalOH8PP3SXNafse4ZSQfk5S8mBc7LgB1aLbvN/N8A/LRpnZvPqHtxLu+2+fNr8+/6GPAgqe1lXB5WbJPZnFS19zhpf/MBqj27bDvSCUDtzi57TV7/dmeXnUjKnh8nVbV9rV1ZaGpfalHG5v/B3XnY5/L3vyhvb+u1k+Xf764W8y31PRfGfxvwvU7/W+WRbRMn6ULSNR39vk7JbEPJ1Zk3kKqO7uh2eYYjSd8D7oiITw9iHiIFpDdHxB/7HNdBZtOndGrpzaQj9790tzRmtrGStAspy981Ih7oNH4VfO+yTZykT5GqFv/NAcbM2pF0Nqka8V83VIABZzJmZlYjZzJmZlabjeHGlB1NnDgxpk2b1u1imJltUubMmfNIRGyom6u2tEkEmWnTpjF79uxuF8PMbJMi6d7OY9XL1WVmZlYbBxkzM6uNg4yZmdXGQcbMzGrjIGNmZrWpNchIOlXSXEm3S5qV+/1Q0s35dY+km+ssg5mZdU9tpzBL2p30zOp9gaeAyyX9MtKzNhrjfJ50x08zMxuC6sxkdgWuj4hlkZ7oeC3pmSfA2rt4vpF0C/+N0623wu9+B6tXw4UXpvcI+OY3YcWKbpfOzGyjV+fFmHOBMyRNID3f41DS88IbDgQejoj5rSaWNJP0hDemTi31nKfq7ZkfpPiVr8B73gNPPAE77ghvfzvMnw+f+Ux3ymVmtomoLZOJiHmkh+ZcQXrQzy2kB4Q1HE0fWUxEnB8RPRHRM2lSV++KAI880vu+eHHqfvjh7pXHzGwTUWvDf0RcEBEzIuIg0tMI58Pa55YfAfywzuWbmVl31XrvMkmTI2Jhfq75EcD+edDLgTsj4v46l29mZt1V9w0yL85tMiuBUyIi1zXxZjbmBn8zM6tErUEmIg5s0//4OpdbuVYPdvPD3szMOvIV/2ZmVhsHGTMzq42DjJmZ1cZBpr9Wr+52CczMNhkOMv21cmW3S2BmtslwkOmPCAcZM7N+cJApY1W+G87q1fDUU90ti5nZJsRBpoxGYFm50pmMmVk/OMiU0QgsTz3VG3B8MaaZWUcOMmW0ymSc0ZiZdeQgU0arTMZBxsysIweZMlplMj4BwMysIweZMhoBxZmMmVm/OMiU8eST6d2ZjJlZvzjIlLF8eXp3JmNm1i8OMmU0gowzGTOzfnGQKaNRXVbMZBxkzMw6cpApo1WbjKvLzMw6cpApw5mMmdmA1BpkJJ0qaa6k2yXNKvR/r6S7cv+z6yxDJVq1yTiTMTPraFRdM5a0O3ASsC/wFHC5pF8COwGvA/aIiBWSJtdVhso4kzEzG5DaggywK3B9RCwDkHQtcDjQA3w2IlYARMTCGstQjYW5iEuW9GYwS5fClVfC9OkwciSMHw+33QYvfCGMHt29spqZbUTqrC6bCxwkaYKk8cChwBTgWcCBkm6QdK2k57eaWNJMSbMlzV60aFGNxWyj1V2W77sP7rwzdT/6KLziFTBtGkyZAieeCAcfDN/97oYspZnZRq22IBMR84CzgCuAy4FbgFWk7GkbYD/gg8CPJKnF9OdHRE9E9EyaNKmuYra3Zs26nx98EP7nf9JrwQK47jp4fiE+LliQ3pcs2XBlNDPbyNVZXUZEXABcACDpM8D9pGq0SyIigBslrQEmAl1IV/qwevW6n7ffPr0anv50KAY/X6RpZraeWoOMpMkRsVDSVOAIYH9gDfBS4BpJzwLGAI/UWY4Bac5kWhkzprfbQcbMbD21BhngYkkTgJXAKRGxWNKFwIWS5pLOOjsuZzUbl+ZMppViA3/xgk0zMwPqry47sEW/p4Bj6lxuJcoEmWIms2xZencmY2a2lq/4b6e/QeaJJ9K7Mxkzs7UcZNop0yZTrC5rBBlnMmZmaznItNPfTMbPmTEzW4+DTDv9bfhvcCZjZraWg0w7/c1kGpzJmJmt5SDTTn/bZBqcyZiZreUg044zGTOzQXOQaWegQcaZjJnZWg4y7RSDzIg2X5Ory8zM+uQg006xTWbkyNbjuLrMzKxPDjLtFDOZdkHGmYyZWZ8cZNopU13mTMbMrE8OMu04kzEzGzQHmXbcJmNmNmgOMu04kzEzGzQHmXYG0iYzapQzGTOzAgeZdgaSyWy+uTMZM7MCB5l2ygQZad3Pm2/uTMbMrMBBpp0yDf/NnMmYma3DQaadMplMxLqfncmYma2j1iAj6VRJcyXdLmlW7ne6pAck3Zxfh9ZZhgEr0/DfzJmMmdk6RtU1Y0m7AycB+wJPAZdL+mUefE5EfK6uZQ/KypWpqmzFit5+/akuW706BZpRo8oHp0ZAa17OypXrnlywalUqW6vrc9asSa9Ro1KGtXp16i6KSPMYPXrd7lWr2t91evToNKxYfdiqX9GIEWnZDrhmgzd6dPl9ycYoImp5AUcB3yh8/hfgn4HTgQ/0Z1777LNPbBD//d8Ro0ZFpF1w7+uww1qPf8MN64535JHrTwsRs2al9wcfXH8eq1ZFTJ4cscUWEU88kfodcEDvtJ/4RHo/7bSIsWMjpIjvfa93+gULescdMSLivvsiDj44fb7zzojddot45SvTuMceu37ZzjknzbdVuRuvMWPW7zdiRPvx+xrml19+9e/1q18NeJcGzI6oZx9f9tUxk5F0MXAh8KuIKPG4yLXmAmdImgAsBw4FZgN/A94j6dj8+f0RsbjFcmcCMwGmTp3aj8UOwl/+ko7q3/9+mDABtt4att8eXvKS1uPvuy/8+MfpKOPhh+Hxx+EnP1l/vC9+Mb3ffHOaX9GTT8LChal78WIYPx7+9397h195ZXo/88zefn/8Y2/3rbf2dq9ZA/ffD9dckz7fey/ccUd6QW//oiuvTGWYOROmTVt/2NVXp4zkve9NZb/4YpgzJy3rYx9L5S1asQI++cnUvffecNRR6y/TzMqbPr3bJRiUMtVl5wEnAOdK+jHwnxFxZ6eJImKepLOAK4ClwC3Aqjy/TwGR3z8PvL3F9OcD5wP09PREqbUZrMiLefe74RnPKDfNkUf2dp97bv+XWaxSalW9VLZf2fk1e+KJ9H7CCbDffusOW706BRmAk0+G3XZLgWvOnNTvgx+ErbZad5rly3uDzIwZcNppnctgZkNWx4q+iLgyIt4KzADuAa6Q9DtJJ0hqcV+Vdaa9ICJmRMRBwKPA/Ih4OCJW56zo66Q2m41DI8g0X/9SVqvbzHRSPBut1ZlpZfuVmV+r9WoEmVZlL7b9NIYX+7VqGyrOZyDfh5kNKaVak3KV1/HAO4CbgC+Rgs4VHaabnN+nAkcA35dUrC86nFSttnEYbJBptdPtpFPmUTwBoa/xys6vWSPIdAoYjeGdgsjIkb3f30C+DzMbUsq0yVwCPAf4DvCaiHgoD/qhpNkdJr84B6iVwCkRsVjSdyTtRaouuwd454BLX7WNMZNZtqzvafo7v2Z9BZlWWUvjfcSI1mfdSWmcFSscZMysVJvMVyLi6lYDIqKnrwkj4sAW/d5WsmwbXiPIDPR0wToymUYQaDdNf+YXLZq2+qoua5W1NL+3Mnp0CjKuLjMb9srsTXeVtHXjg6RtJL27xjJ1T92ZTKvrSjplHs1BZuTIdcdrvr6lGFia59fqWpjBZDLttGq/MbNhqUyQOSkiljQ+5NONT6qvSF3UCAJ1tcm0CiKdMpnly9f93HxXgeZpikGpeVhf8+9vJtOXxvfnTMZs2CsTZEZIvXtdSSOBoXmIWncm0+lMsTJtKM33R2uephhkmof1Nf/+ZjJlOJMxG/bKtMn8GviRpK+RGutPBi6vtVTdUvfZZZ2ueSlzNlh/MpnmLKiv+XfKZBqN/P3JTpzJmA17ZYLMh0hngL0LEPAb4Bt1Fqprhlom09yeM5hMpq9+/ZmnmQ0rHYNMvmjyvPwa2jaGTKbVGWBF48eXz2SK3RF9B5lOF2P21a8/8zSzYaXMdTLTgTOB3YCxjf4RUfK+K5uQjSGT6SsQjBoFm202sExm1aq+59tqnTtVoXXiTMZs2CvT8P9NUhazCngJ8G3ShZlDz8aQyXSq0ho9emCZTH/bYxrLK9Ovv/M1s2GjTJAZFxFXAYqIeyPidOCl9RarS+q+GLNMJtMpGIwZM7BMptWdAxralduZjJkNUpmG/ycljQDmS3oP8AAwud5idUnd1WVlMpm+gkyZTGbp0s7dzarOZBrfozMZs2GvzCH7LGA88D5gH+AY4Lg6C9U1G3t1WSOTGUh1Wavb0zQ4kzGzmvSZyeQLL98YER8kPRPmhA1Sqm4Z7BX/VTT8l8lkBlJd1leQcZuMmdWkz0wmIlYD+xSv+B/SNvZMZsyYDZvJ+DoZMxukMm0yNwE/y0/FXLuniohLaitVt2wMpzB3avjfkJnMYKvLnMmYDXtlgsy2wN9Y94yyABxkmg0mk5EGn8lIvcGk2A3OZMysK8pc8T+022GKBhtkRnX4OvvKZMaPH3wms/nmvWeRFbvBmYyZdUWZK/6/Scpc1hERb6+lRN002CDTabp2mczIkTB2bLlTmPvKZPobZEaMSCc7OJMxs5qUqS77RaF7LHA48GA9xemywQaZTtplMo3g0em2MsWLMSNSOYvjb7EFPPzw+t3QOsiMGQNPPln9KcyN789BxmzYK1NddnHxs6TvA1fWVqJuGuwV/520y2Qa1WBlL8aMSE+5HDVq/UymVTe0DjKjR6cg0y5wtPoefAqzmfXDQPam04GpZUaUdKqkuZJulzSradgHJIWkiQMoQz029kymMV5xXs1tMq26oX0mU3wvozFume/ImYzZsFemTebvrNsms4D0jJlO0+1OekzzvsBTwOWSfhkR8yVNAV4B/HVApa7LYC/G7GSwmcyoUb3ZwVNPwbhxg89kGvMtqz/ZSeNBZ2Y2bJWpLttygPPeFbg+IpYBSLqW1J5zNnAO8M/AzwY473rUncn8/vew777r9vvzn1NAGDMGrrwS/vCH9tNLvdnBS16SgsOdd/YOHzu2t3v8+HWnvaTFGeeN8fuzvo2A1FeW4gzGzLIymczhwNUR8Vj+vDVwcERc2mHSucAZkiYAy4FDgdmSXgs8EBG39HUjAUkzgZkAU6eWqp0bvCqCzHnnwTOeAdddlx5/vHJlCiQ77AB/bZG4TZwIL31pCjQ//3nqt99+8LSnwYknwplnwrbbwoIFcPzx8Kxnwete15vBvOhF8MxnwnbbwbOfndpqnvlMOOSQtOxttkkZ2mOPpWW9613wk5+ktphjj4UzzoAT+jhL/ZxzYO+9ez+PHQsf/zgceWT7aa66Ci66CCZM6PfXZ2ZDi6LDkxgl3RwRezX1uyki9m43TWG8E4FTSPc9u4MUbF4I/GNEPCbpHqAnIh7paz49PT0xe/bsTosbvH/9V/jEJ9IDvlzVY2abOElzIqKnm2Uo0/DfapxSlfgRcUFEzIiIg4BHgXuAfwBuyQFmJ+APkrYrV9ya1V1dZmY2zJQJMrMlfUHSMyU9Q9I5wJwyM5c0Ob9PBY4Avh0RkyNiWkRMA+4HZkTEggGWv1oOMmZmlSoTZN5LOjvsh8CPSFVep5Sc/8WS7gB+DpwSEYsHVMoNxUHGzKxSZc4uewL48EBmHhEHdhg+bSDzrU3jKnozM6tEx0xG0hX5jLLG520k/breYnWJg4yZWaXKVJdNjIgljQ+5ymtyfUXqIgcZM7NKlQkya3LDPQCSdqbFXZmHhDVrHGTMzCpU5lTkjwLX5Sv2AQ4C3llfkbrImYyZWaXKNPxfLmkGsB8g4P91unhyk+UgY2ZWqVJ3YY6IRyLiF6Sr9k+WNLfeYnWJg4yZWaXKnF22vaRZkm4EbgdGAkfXXrJucJAxM6tU2yAj6SRJVwPXAhOBdwAPRcQnI+K2DVXADcpBxsysUn21yfw78HvgLRExG0DS0DyrrMFBxsysUn0FmR2Ao4AvSHo66ZYyQ/t5uhH1PXrZzGwYartHzY395+U7KL8MeAxYKGmepM9ssBJuSM5kzMwqVfbssvsj4nMRsQ/wemBFvcXqEgcZM7NK9ePh7klE3AV8soaydJ+v+Dczq5QbIIqcyZiZVcpBpshBxsysUmUuxryqTL8hwUHGzKxSbdtkJI0FxgMTJW1Dum8ZwFak05uHHgcZM7NK9dXw/05gFimgzKE3yDxOulBz6HGQMTOrVNsgExFfAr4k6b0R8eUNWKbucZAxM6tUmVv9f1nSC4FpxfEj4tudppV0KnASKQv6ekR8UdKngNcBa4CFwPER8eDAil8xX/FvZlapMg3/3wE+B7wIeH5+9ZSYbndSgNkX2BM4TNJ04N8iYo+I2Av4BfDxgRe/Ys5kzMwqVeZizB5gt4jo780xdwWuj4hlAPnJmodHxNmFcTZnY3qUsy/GNDOrVJm6obnAdgOY91zgIEkTJI0HDgWmAEg6Q9J9wFtpk8lImilptqTZixYtGsDiB8D/6eC+AAANm0lEQVSZjJlZpcoEmYnAHZJ+LemyxqvTRBExDzgLuAK4HLgFWJWHfTQipgAXAe9pM/35EdETET2TJk0quTqD5CBjZlapMtVlpw905hFxAXABQL5z8/1No3wP+CXwiYEuo1IOMmZmlSpzdtm1knYGpkfElbnqa2SZmUuaHBELJU0FjgD2lzQ9IubnUV4L3DnQwlfOQcbMrFIdg4ykk4CZwLbAM4Edga+RnjHTycWSJgArgVMiYrGkb0h6NukU5nuBkwda+Mo5yJiZVapMddkppNOQbwCIiPmSJpeZeUQc2KLfG/pVwg3JQcbMrFJlGv5XRMRTjQ+SRrExnXZcJV+MaWZWqTJ71GslfQQYJ+kVwI+Bn9dbrC5xJmNmVqkyQebDwCLgNtJNM/8L+FidheoaBxkzs0qVaZMZB1wYEV8HkDQy91tWZ8G6wlf8m5lVqkwmcxUpqDSMA66spzhd5kzGzKxSZYLM2IhY2viQu8fXV6QucpAxM6tUmSDzhKQZjQ+S9gGW11ekLnKQMTOrVJk2mVOBH0tqPPNle+BN9RWpixxkzMwq1WeQkTQCGAM8B3g26eFjd0bEyg1Qtg3PQcbMrFJ9BpmIWCPp8xGxP+nW/UObg4yZWaXKtMn8RtIbpGGw9/UV/2ZmlSrTJvNPpCdYrpa0nFRlFhGxVa0l6wZnMmZmlSpzq/8tN0RBNgoOMmZmlepYN6TkGEn/kj9PkbRv/UXrAl/xb2ZWqTINEF8F9gfekj8vBf69thJ1kzMZM7NKlWmTeUFEzJB0E0B+8NiYmsvVHQ4yZmaVKpPJrMw3xQwASZNIT7UcehxkzMwqVSbInAv8FJgs6QzgOuAztZaqWxxkzMwqVebssoskzQFeRjp9+fURMa/2knWDg4yZWaXaBhlJY4GTgV1IDyz7j4hY1Z+ZSzoVOIkUnL4eEV+U9G/Aa4CngD8BJ0TEkgGWv1oOMmZmleqruuxbQA8pwLwK+Fx/Zixpd1KA2RfYEzhM0nTgCmD3iNgD+CNw2gDKXQ9f8W9mVqm+qst2i4jnAUi6ALixn/PeFbg+IpbleVwLHB4RZxfGuR44sp/zrY8zGTOzSvV12L72Tsv9rSbL5gIHSZogaTxwKDClaZy3A79qNbGkmZJmS5q9aNGiASx+AHwxpplZpfrKZPaU9HjuFjAufy5177KImCfpLFL12FLgFmBtsJL00fz5ojbTnw+cD9DT0xPlVmeQnMmYmVWqbZCJiJGDnXlEXABcACDpM8D9ufs44DDgZRGxYQJIGQ4yZmaVKnPF/4BJmhwRCyVNBY4A9pd0CPAh4MWN9pqNhoOMmVmlag0ywMWSJpDad07Jt6T5CrAZcEV+RM31EXFyzeUox0HGzKxStQaZiDiwRb9d6lzmoDjImJlVyheFFDnImJlVykGmyBdjmplVynvUImcyZmaVcpApcpAxM6uUg0yRr/g3M6uUg0yRMxkzs0o5yBQ5yJiZVcpBpshBxsysUg4yRQ4yZmaVcpApcpAxM6uUg0yRg4yZWaUcZIp8xb+ZWaW8Ry1yJmNmVikHmSIHGTOzSjnIFPmKfzOzSjnIFDmTMTOrlINMkYOMmVmlHGSKHGTMzCrlIFPkIGNmVikHmSIHGTOzStUaZCSdKmmupNslzcr9jsqf10jqqXP5/eYgY2ZWqdqCjKTdgZOAfYE9gcMkTQfmAkcA/13XsgfMV/ybmVWqzj3qrsD1EbEsIlYB1wKHR8S8iLirxuUOnDMZM7NK1Rlk5gIHSZogaTxwKDCl7MSSZkqaLWn2okWLaivkOnwxpplZpWoLMhExDzgLuAK4HLgFWNWP6c+PiJ6I6Jk0aVJNpVxvoQ4yZmYVqrUBIiIuiIgZEXEQ8Cgwv87lDZqDjJlZpUbVOXNJkyNioaSppMb+/etc3qA5yJiZVarWIANcLGkCsBI4JSIWSzoc+DIwCfilpJsj4pU1l6McBxkzs0rVGmQi4sAW/X4K/LTO5Q6Yg4yZWaV8UUiRg4yZWaUcZIp8MaaZWaW8Ry1yJmNmVikHmSIHGTOzSjnIFPmKfzOzSjnIFDmTMTOrlINMkYOMmVmlHGSKHGTMzCrlIFPkIGNmVikHmSIHGTOzSjnIFDnImJlVykGmyFf8m5lVynvUImcyZmaVcpApcpAxM6uUg0yRr/g3M6uUg0yRMxkzs0o5yBQ5yJiZVcpBpshBxsysUg4yRQ4yZmaVqjXISDpV0lxJt0ualfttK+kKSfPz+zZ1lqFfHGTMzCpVW5CRtDtwErAvsCdwmKTpwIeBqyJiOnBV/rxxcJAxM6vUqBrnvStwfUQsA5B0LXA48Drg4DzOt4BrgA/VUoJPfxq+//3y4z/5pIOMmVmF6gwyc4EzJE0AlgOHArOBp0fEQwAR8ZCkya0mljQTmAkwderUgZVgu+1gt93Kj7/77nDUUQNblpmZrUcRUd/MpROBU4ClwB2kYHNCRGxdGGdxRPTZLtPT0xOzZ8+urZxmZkORpDkR0dPNMtTa8B8RF0TEjIg4CHgUmA88LGl7gPy+sM4ymJlZ99R9dtnk/D4VOAL4PnAZcFwe5TjgZ3WWwczMuqfONhmAi3ObzErglIhYLOmzwI9yVdpfATeCmJkNUbUGmYg4sEW/vwEvq3O5Zma2cfAV/2ZmVhsHGTMzq42DjJmZ1cZBxszMalPrxZhVkbQIuHeAk08EHqmwOJsCr/Pw4HUeHgazzjtHxKQqC9Nfm0SQGQxJs7t9xeuG5nUeHrzOw8Omvs6uLjMzs9o4yJiZWW2GQ5A5v9sF6AKv8/DgdR4eNul1HvJtMmZm1j3DIZMxM7MucZAxM7PaDNkgI+kQSXdJulvSh7tdnqpIulDSQklzC/22lXSFpPn5fZvcX5LOzd/BrZJmdK/kAydpiqTfSpon6XZJp+b+Q3a9JY2VdKOkW/I6fzL3/wdJN+R1/qGkMbn/Zvnz3Xn4tG6WfzAkjZR0k6Rf5M/DYZ3vkXSbpJslzc79hsT2PSSDjKSRwL8DrwJ2A46W1I/nMG/U/hM4pKnfh4GrImI6cFX+DGn9p+fXTOC8DVTGqq0C3h8RuwL7Aafk33Mor/cK4KURsSewF3CIpP2As4Bz8jovBk7M458ILI6IXYBz8nibqlOBeYXPw2GdAV4SEXsVrokZGtt3RAy5F7A/8OvC59OA07pdrgrXbxowt/D5LmD73L09cFfu/g/g6Fbjbcov0oPuXjFc1hsYD/wBeAHpyu9Ruf/a7Rz4NbB/7h6Vx1O3yz6Add2JtEN9KfALQEN9nXP57wEmNvUbEtv3kMxkgB2B+wqf78/9hqqnR8RDAPl9cu4/5L6HXCWyN3ADQ3y9c7XRzaRHlF8B/AlYEhGr8ijF9Vq7znn4Y8CEDVviSnwR+GdgTf48gaG/zgAB/EbSHEkzc78hsX3X/WTMblGLfsPxXO0h9T1I2gK4GJgVEY9LrVYvjdqi3ya33hGxGthL0tbAT4FdW42W3zf5dZZ0GLAwIuZIOrjRu8WoQ2adCw6IiAfzI+uvkHRnH+NuUus9VDOZ+4Ephc87AQ92qSwbwsOStgfI7wtz/yHzPUgaTQowF0XEJbn3kF9vgIhYAlxDao/aWlLj4LC4XmvXOQ9/GvDohi3poB0AvFbSPcAPSFVmX2RorzMAEfFgfl9IOqDYlyGyfQ/VIPN/wPR8VsoY4M3AZV0uU50uA47L3ceR2iwa/Y/NZ6PsBzzWSL83JUopywXAvIj4QmHQkF1vSZNyBoOkccDLSY3hvwWOzKM1r3PjuzgSuDpyhf2mIiJOi4idImIa6T97dUS8lSG8zgCSNpe0ZaMb+EdgLkNl++52o1BdL+BQ4I+keuyPdrs8Fa7X94GHgJWkI5oTSfXQVwHz8/u2eVyRzrL7E3Ab0NPt8g9wnV9Eqg64Fbg5vw4dyusN7AHclNd5LvDx3P8ZwI3A3cCPgc1y/7H58915+DO6vQ6DXP+DgV8Mh3XO63dLft3e2F8Nle3bt5UxM7PaDNXqMjMz2wg4yJiZWW0cZMzMrDYOMmZmVhsHGTMzq42DjA1rklbnO982XpXdsVvSNBXulm02HA3V28qYlbU8IvbqdiHMhipnMmYt5Od7nJWf6XKjpF1y/50lXZWf43GVpKm5/9Ml/TQ//+UWSS/Msxop6ev5mTC/yVfvI+l9ku7I8/lBl1bTrHYOMjbcjWuqLntTYdjjEbEv8BXSPbTI3d+OiD2Ai4Bzc/9zgWsjPf9lBunKbUjP/Pj3iHgusAR4Q+7/YWDvPJ+T61o5s27zFf82rElaGhFbtOh/D+mhYX/ON+dcEBETJD1CenbHytz/oYiYKGkRsFNErCjMYxpwRaSHTiHpQ8DoiPi0pMuBpcClwKURsbTmVTXrCmcyZu1Fm+5247SyotC9mt520FeT7j+1DzCncJdhsyHFQcasvTcV3n+fu39HukMwwFuB63L3VcC7YO3DxrZqN1NJI4ApEfFb0gO6tgbWy6bMhgIfPdlwNy4/fbLh8ohonMa8maQbSAdjR+d+7wMulPRBYBFwQu5/KnC+pBNJGcu7SHfLbmUk8F1JTyPdUfecSM+MMRty3CZj1kJuk+mJiEe6XRazTZmry8zMrDbOZMzMrDbOZMzMrDYOMmZmVhsHGTMzq42DjJmZ1cZBxszMavP/AWl2vuUvlU5hAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .1;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 500\n",
"batch_size = 50\n",
"momentum_coef = .9\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As shown, our validation set worked, so now we can move on to the full data set, and begin our evaluation and exploration.\n",
"\n",
"First, we need to split up our full data set into testing and training data. We will use 50,000 images as the training data set and 10,000 images as the testing data set. "
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(784, 50000)\n",
"(50000,)\n",
"(784, 10000)\n",
"(10000,)\n"
]
}
],
"source": [
"# create a data set\n",
"size = vector_size\n",
"\n",
"number_of_testing_images = 10000\n",
"number_of_training_images = 50000\n",
"number_of_validation_images = number_of_testing_images + number_of_training_images\n",
"\n",
"training_images = []\n",
"training_labels = []\n",
"testing_images = []\n",
"testing_labels = []\n",
"\n",
"factor = 0\n",
"for index in range(0, number_of_validation_images):\n",
" if(index <= number_of_training_images - 1):\n",
" training_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" training_labels.append(binary_labels[index + factor])\n",
" else:\n",
" testing_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" testing_labels.append(binary_labels[index + factor])\n",
" \n",
"# covert to numpy array\n",
"training_images = np.transpose(np.array(training_images))\n",
"training_labels = np.array(training_labels)\n",
"testing_images = np.transpose(np.array(testing_images))\n",
"testing_labels = np.array(testing_labels)\n",
"\n",
"# logger\n",
"print(training_images.shape) # validation_training_images is a matrix of 784 X 500\n",
"print(training_labels.shape) # validation_testing_labels is a row vector of 1 X 500\n",
"print(testing_images.shape) # validation_training_images is a matrix of 784 X 100\n",
"print(testing_labels.shape) # validation_testing_labels is a row vector of 1 X 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we must reset out weights and bias's. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we re-run minibatch stochastic gradient descent on the full data set. We will first utilize minibatches of 500 each."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 100\n",
"Cost: 0.040280227907951584\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 100\n",
"Cost: 0.02568970533038529\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 100\n",
"Cost: 0.020037735231309108\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 100\n",
"Cost: 0.026541878026984538\n",
"Main Loop Epoch: 500\n",
"Number Of Minibatches: 100\n",
"Cost: 0.013543108816859831\n",
"Main Loop Epoch: 600\n",
"Number Of Minibatches: 100\n",
"Cost: 0.013648362081607569\n",
"Main Loop Epoch: 700\n",
"Number Of Minibatches: 100\n",
"Cost: 0.007825852764969\n",
"Main Loop Epoch: 800\n",
"Number Of Minibatches: 100\n",
"Cost: 0.009420266622494554\n",
"Main Loop Epoch: 900\n",
"Number Of Minibatches: 100\n",
"Cost: 0.004402508996987994\n",
"Main Loop Epoch: 1000\n",
"Number Of Minibatches: 100\n",
"Cost: 0.0020197110841280795\n",
"Main Loop Epoch: 1100\n",
"Number Of Minibatches: 100\n",
"Cost: 0.0033471696372207216\n",
"Main Loop Epoch: 1200\n",
"Number Of Minibatches: 100\n",
"Cost: 0.0036489103960996574\n",
"Main Loop Epoch: 1300\n",
"Number Of Minibatches: 100\n",
"Cost: 0.001718017371077936\n",
"Main Loop Epoch: 1400\n",
"Number Of Minibatches: 100\n",
"Cost: 0.00119063126728554\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 2118.1275000572205 seconds\n",
"Cost: 0.0018593347070249052\n",
"Accuracy: 99.15 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 1500\n",
"batch_size = 500\n",
"momentum_coef = .9\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated the after 1400 epochs with minibatches of 500 the cost became approximately 0.001859 and the test data accuracy reached 99.15%. These results are very good. The test accuracy is high because minibatch stochastic gradient descent inately provides a form of regularization.\n",
"\n",
"We now wish to explore the impact of adjusting the minibatch size. We will re-run the algorithm with a smaller minibatch size of 27 and see what the results we achieve.\n",
"\n",
"First we reinitialize our weights and bias's."
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we re-run our minibatch stochastic gradient descent algorithm."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.00040557499288796804\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.00022533569347979733\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 1852\n",
"Cost: 8.647075623703261e-09\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 1852\n",
"Cost: 4.046850915373965e-05\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 744.6950306892395 seconds\n",
"Cost: 1.6063560949634133e-05\n",
"Accuracy: 99.18 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 500\n",
"batch_size = 27\n",
"momentum_coef = .9\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated, after 500 epochs with minibatches of 27 the cost became approximately 1.6e-05 and the test data accuracy reached 99.18%. These results are very good. The test accuracy is high because minibatch stochastic gradient descent inately provides a form of regularization.\n",
"\n",
"Additionally, it should be noted that this algorithm reached convergence in approximatley 120 epochs, which was much less than with larger minibatches. \n",
"\n",
"Now we will run minibatch stochastic gradient descent with momentum. Momentum is a tehcnique that takes into account the historical rate of change (velocity) of the gradients when adjusting the weights and bias's rather than strictly adjusting them based on the current gradient. This momentum intuitively reflects a ball heading a down a hill to the bottom. When the hill (gradient) is steep for several steps, momentum picks up and the ball heads to its destination faster. Implementing momentum should helps use arrive at convergence in a quicker manner. With this in mind, we implement a new hyper-parameter for momentum which controls the amoung of focus we put on the gradient momentum versus the current gradient. We will start by setting this momentum coefficient to 0.9 in order to heavely focus on the momentum of the gradient. We will leave our minibatch size at 27 since we have achieved good results with it. \n",
"\n",
"First we reinitialize our weights and bias's."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we run our minibatch stochastic gradient descent algorithm with momentum."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 10\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.003075740130681305\n",
"Main Loop Epoch: 20\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.0002421045249690861\n",
"Main Loop Epoch: 30\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.0001461015181742864\n",
"Main Loop Epoch: 40\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.001100090690208689\n",
"Main Loop Epoch: 50\n",
"Number Of Minibatches: 1852\n",
"Cost: 4.661676444755369e-07\n",
"Main Loop Epoch: 60\n",
"Number Of Minibatches: 1852\n",
"Cost: 2.430669273114621e-07\n",
"Main Loop Epoch: 70\n",
"Number Of Minibatches: 1852\n",
"Cost: 1.0348610490080223e-06\n",
"Main Loop Epoch: 80\n",
"Number Of Minibatches: 1852\n",
"Cost: 6.290041172420984e-08\n",
"Main Loop Epoch: 90\n",
"Number Of Minibatches: 1852\n",
"Cost: 4.956932166353874e-09\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 229.67204904556274 seconds\n",
"Cost: 6.6395106245545e-08\n",
"Accuracy: 99.22 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 10\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = True\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 100\n",
"batch_size = 27\n",
"momentum_coef = .9\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated, after 100 epochs with minibatches of 27 the cost became approximately 6.6395e-08 and the test data accuracy reached 99.22%. These results are excellent. The test accuracy is high because minibatch stochastic gradient descent inately provides a form of regularization. In fact, we converged in approximatley 45 iterations to a very high accuracy and low cost. This lines up with what we would intuitively expect by adding in momentum. Our algorithm focused highly on the historical trend of the gradient rather than each step, and as a result converged faster.\n",
"\n",
"Now we wish to explore the impact of adjusting the momentum hyper-paramter. Therefore, we will adjust its value to .1 and re-run our algorithm. \n",
"\n",
"First we reinitialize our weights and bias's."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we re-run our minibatch stochastic gradient descent algorithm with momentum."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 10\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.01529208269507389\n",
"Main Loop Epoch: 20\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.004077579948707226\n",
"Main Loop Epoch: 30\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.028170922925060595\n",
"Main Loop Epoch: 40\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.0042440696729762\n",
"Main Loop Epoch: 50\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.0001764298629206007\n",
"Main Loop Epoch: 60\n",
"Number Of Minibatches: 1852\n",
"Cost: 0.00039509593526748134\n",
"Main Loop Epoch: 70\n",
"Number Of Minibatches: 1852\n",
"Cost: 2.1314596748085606e-06\n",
"Main Loop Epoch: 80\n",
"Number Of Minibatches: 1852\n",
"Cost: 8.821269877158751e-06\n",
"Main Loop Epoch: 90\n",
"Number Of Minibatches: 1852\n",
"Cost: 2.0576062738186753e-06\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 261.67622208595276 seconds\n",
"Cost: 0.0002186226582897106\n",
"Accuracy: 99.16 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 10\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = True\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 100\n",
"batch_size = 27\n",
"momentum_coef = .1\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated, after 100 epochs with minibatches of 27 the cost became approximately 0.0002186 and the test data accuracy reached 99.16%. These results are excellent. The test accuracy is high because minibatch stochastic gradient descent inately provides a form of regularization. This time, we converged in approximatley 55 iterations to a very high accuracy and low cost. This lines up with what we would intuitively expect by making the momentum coefficient low. Our algorithm focused less on historical trend of the gradient and more on each step. As a results we converged slower than with a higher momentum coefficient."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have illustrated how the minibatch stochastic gradient provies many benefits for training neural networks. This technique provides a form of regularization by preventing the network from getting stuck local minima. The networks also converge faster. We have also explored how taking into account the momentum of the gradient can speed up the convergence of the network."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment