Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rpicard92/5935a6477c06e408c605c1b8ceb78b68 to your computer and use it in GitHub Desktop.
Save rpicard92/5935a6477c06e408c605c1b8ceb78b68 to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assignment7\n",
"## CS-5891-01 Special Topics Deep Learning\n",
"## Ronald Picard\n",
"\n",
"In this notebook we will walk through the design, training, and testing of neural networks with multiple hidden layers using minibatch stochastic gradient descent with adaptive moments (ADAM). These neural networks will be used for logistic regression, which is an archaic name for binary classification.\n",
"\n",
"The binary classification will be performed on images of handwritten numerical digits. More specifically, the last numerical digit of my student ID. This digit happens to be 9. Therefore, the goal of our neural networks will be to output a the value of 1 when the handwritten numerical digit image input is a 9, and 0 in all other cases.\n",
"\n",
"The data set we will be using is the MNIST data set. This is a very popular data set amoung the machine learning community. The data set contains 60,000 images, and each image contains a handwritten numerical digit. Each of the images have been provided with a truth label that corresponds to the handwritten digit within the image from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. \n",
"\n",
"For our case, we only care about when the image is 9. Therefore we will need to re-label the truth labels so that all truth labels with the value of 9 are given to the value of 1, and all other truth labels are given the value of 0. \n",
"\n",
"To start we need to import some needed classes."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import struct\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"import matplotlib.pyplot as pyplot\n",
"import csv\n",
"import time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we must change our path string to the path of our data file containing the features. (Please note that you must change this string to point to the directory with the data file on your machine data file on your machine.) \n",
"\n",
"Second, we much change the string name of the data files to the names of the MNIST data files. (Please note that you may NOT need to change these. Only change them if your MINST data files are named differently.)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"## path\n",
"path = 'C:/Users/computer/OneDrive - Vanderbilt/Vanderbilt_Spring_2019/CS_5891_01_SpecialTopicsDeepLearning/Assignment7/'\n",
"\n",
"#Train data\n",
"fname_train_images = os.path.join(path, 'train-images.idx3-ubyte') # the training set image file path\n",
"fname_train_labels = os.path.join(path, 'train-labels.idx1-ubyte') # the training set label file path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we retrieve the data from the data files as follows. This imports the data into a feature tensor (3-D matrix) in which each index is a feature matrix corresponding to an image. The label data comes in the form of a vector where each index corresponds to the index of the feature matrix (image) of the feature tensor. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The training set contains 60000 images\n",
"The shape of the image is (28, 28)\n"
]
}
],
"source": [
"# open the label file and load it to the \"train_labels\"\n",
"with open(fname_train_labels, 'rb') as flbl:\n",
" magic, num = struct.unpack(\">II\", flbl.read(8))\n",
" labels = np.fromfile(flbl, dtype=np.uint8)\n",
"\n",
"# open the image file and load it to the \"train_images\"\n",
"with open(fname_train_images, 'rb') as fimg:\n",
" magic, num, rows, cols = struct.unpack(\">IIII\", fimg.read(16))\n",
" images = np.fromfile(fimg, dtype=np.uint8).reshape(len(labels), rows, cols)\n",
"\n",
"print('The training set contains', len(images), 'images') # print the how many images contained in the training set\n",
"print('The shape of the image is', images[0].shape) # print the shape of the image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to perform both two steps; feature scaling and feature normalization. Feature scaling consists of converting the 28 X 28 image matrices into 784 X 1 feature vectors. In essence we will flatten the images out into vectors so that we can use an input a vector to our single neuron. Feature normalization is a process of normalizing the pixel data to between 0 <= x <= 1 (for logistic regression). Each pixel comes on a scale of 0 <= x <= 255. Since 255 is the maximum for every pixel we shall divide each pixel by that number (elementwise) in order to normalize each pixel to between 0 and 1 (inclusive).\n",
"\n",
"One additional item we need to take care of is relabeling our label (truth) data so that we have a binary classification in which all 9s are converted to 1s and all other labels are converted to 0s."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(784, 60000)\n",
"(784, 60000)\n",
"60000\n"
]
}
],
"source": [
"# feature scaling\n",
"matrix_side_length = len(images[0])\n",
"vector_size = matrix_side_length*matrix_side_length\n",
"\n",
"scaled_images_feature_matrix = []\n",
"for image in images:\n",
" reshaped_image = np.array(image).reshape((vector_size))\n",
" scaled_images_feature_matrix.append(reshaped_image)\n",
"\n",
"# convert to numpy array\n",
"scaled_images_feature_matrix = np.transpose(np.array(scaled_images_feature_matrix))\n",
"print(scaled_images_feature_matrix.shape) # scaled_images_feature_matrix is a matrix of 60000 X 784\n",
"#print(scaled_images_feature_matrix[0].shape)\n",
"\n",
"# feature normilization\n",
"normilization_factor = 1/255\n",
"normalized_scaled_images_feature_matrix = np.multiply(normilization_factor, scaled_images_feature_matrix)\n",
"print(normalized_scaled_images_feature_matrix.shape)\n",
"#print(normalized_scaled_images_feature_matrix[0])\n",
"\n",
"# re-label for binary classification\n",
"value_for_1 = 9\n",
"binary_labels = []\n",
"for label in labels:\n",
" if(label == 9):\n",
" binary_labels.append(1)\n",
" else:\n",
" binary_labels.append(0)\n",
"\n",
"# convert to numpy array\n",
"binary_labels = np.array(binary_labels)\n",
"print(len(binary_labels)) # binary_labels is a row vector of 1 X 60000\n",
"#print(binary_labels[0])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to test the efficacy of our neural networks, we need to split up the our label data into two data sets; a smaller and a larger one. The larger set will be the training data that we will use to train our neural networks on. The smaller set will be the testing data that we will used to test the accuracy of our neural nets. The MNIST data set contains 60,000 images. Therefore, we will use 50,000 images for our training data set, and 10,000 images for our testing data set. \n",
"\n",
"It is common practice to use a smaller subset of the total data set to debug (ensure it works) and tune hyper-parameters before using the entire time-comsuming data set. This smaller subset is known as a validation set. Therefore, we will first use a validation data set of 600 images. 500 of these images will be used for as our training data set, and the other 100 of these images will be used for our test data set. \n",
"\n",
"Thus, we will begin by sifting out a validation set from our total data set."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(784, 500)\n",
"(500,)\n",
"(784, 100)\n",
"(100,)\n"
]
}
],
"source": [
"# create a data set\n",
"size = vector_size\n",
"\n",
"number_of_testing_images = 100\n",
"number_of_training_images = 500\n",
"number_of_validation_images = number_of_testing_images + number_of_training_images\n",
"\n",
"training_images = []\n",
"training_labels = []\n",
"testing_images = []\n",
"testing_labels = []\n",
"\n",
"factor = 0\n",
"for index in range(0, number_of_validation_images):\n",
" if(index <= number_of_training_images - 1):\n",
" training_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" training_labels.append(binary_labels[index + factor])\n",
" else:\n",
" testing_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" testing_labels.append(binary_labels[index + factor])\n",
" \n",
"# covert to numpy array\n",
"training_images = np.transpose(np.array(training_images))\n",
"training_labels = np.array(training_labels)\n",
"testing_images = np.transpose(np.array(testing_images))\n",
"testing_labels = np.array(testing_labels)\n",
"\n",
"# logger\n",
"print(training_images.shape) # validation_training_images is a matrix of 784 X 500\n",
"print(training_labels.shape) # validation_testing_labels is a row vector of 1 X 500\n",
"print(testing_images.shape) # validation_training_images is a matrix of 784 X 100\n",
"print(testing_labels.shape) # validation_testing_labels is a row vector of 1 X 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we move on to the training of our neural network with multiple hidden layers. \n",
"\n",
"Part 1 - Feed Forword:\n",
"\n",
"For these neural networks we will use multiple hidden layers with between 5-20 units per layer (neurons). The first layer will have an input of a matrix (784 X number_of_images) of vectorized images of 784 X 1, and will output a matrix (# of units X # of images). This matrix will be input into the next hidden layer, which output another matrix (# of units X # of images.) The output layer will take and input matrix that is the output matrix of the last hidden layer and will output a row vector of probabilities which we will convert into binary classifications of 0 or 1. (If P(x) >= 0.5 then we will convert it to a 1, otherwise we will convert to 0.) \n",
"\n",
"The model for the units of the hidden layers will be a vecorized linear model Z^[l] = W^[l]^T * A^[l-1] + B^[l], where W is a matrix of 5-10 units (# of units) X # of units or parameter weights, A is the input matrix of vectorized images (784 X # of images) or the output of one of the layers (1 X # of units), and B is a row vector of bias's. (Note: in this case, b will be scalar that applied in a broadcasing manner to save on memory.) The output of this model Z^[l] will be a matrix (5-10 units X # of images). Z^[1] will be subject to an activation function; which for this assignment will be relu (note: we will test tanh once for comparison). \n",
"\n",
"Hidden Layer Activation Function: relu activation function is A^[1] = relu(Z) = max(Z, 0).\n",
"\n",
"The model of the output layer will be a vectorized linear model Z^[1] = W^[l] * A^[1] + b^[l] with a single unit. This linear model will be subjected to a sigmoid activation function.\n",
"\n",
"The resultant row vector will then be used to calculate the cost function values in an elementwise manner. The cost function for this binary classification will be L(Y_Predicted, Y_Label) = -Y_Label^[l] * Log(A^[l]) - (1-Y_Predicted^[l]) * Log(1-A^[l]), where Y_Label is the True Label, Y_Predicted is the probability value predicted by the neural network, and A is the activation function value. The resultant cost row vector will be added up and divided by the number of elements in order to calculate the average cost.\n",
"\n",
"Part 2 - Back Propogation:\n",
"\n",
"The back propogation technique that we will use for training the neural network, will be gradient descent. This involves utilizing the gradient of the cost function to updated the model parameters in our layers. In order to calculate the gradient we will utilize the chain rule. The goal of back propogation is the adjust the parameter weights and bias's of our model to accurately perform binary classification. In general the chain rule can be used to find the gradient of the cost function (vecorzied rates of change) with respect to the model parameters. The following is the chain that we will utilize. \n",
"\n",
"\n",
"Generalized Chain Rule for N layers: \n",
"\n",
"dL(A^[n], Y)/dW^[l] = ∏(i = n to l) (dl(A^[i], Y)/dz^[i]) * dZ^[i]/dA^[i-1] * dA^[i-1]/dZ^[i-2] *....* dz^[l]/dW^[l];\n",
"\n",
"dL(A^[n], Y)/dB^[l] = ∏(i = n to l) (dl(A^[i], Y)/dz^[i]) * dZ^[i]/dA^[i-1] * dA^[i-1]/dZ^[i-2] *....* dz^[l]/dB^[l];\n",
"\n",
"\n",
"\n",
"Output Layer - Back Propogation:\n",
"\n",
"The partial derivative of the cost function with respect to the output layer sigmoid activation function is found by the following:\n",
"\n",
"dL(A^[n], Y)/dA^[n] = -Y/A^[n] + (1-A^[n])/(1-A^[n]).\n",
"\n",
"\n",
"Due to the chain rule, the derivative of the cost function with respect to the linear model Z^[n] is found by the following:\n",
"\n",
"dL(A^[n], y)/dz = dL(A^[n], y)/dA^[n] * dA^[n]/dZ^[n].\n",
"\n",
"The derivative of the sigmoid activation function is da/dz is found by the following:\n",
"\n",
"dA^[n]/dZ^[n] = sigma(Z^[n]) * (1-sigma(Z^[n]))\n",
"\n",
"Therefore, the derivative of the cost function with respect to the output of the linear model is found by the following:\n",
"\n",
"dL(A^[n], Y)/dA^[n] * dA^[n]/dZ^[n]. = (-Y/A^[n] + (1-Y)/(1-A^[n])) * (sigma(Z^[n]) * (1-sigma(Z^[n]))) = A^[n]-Y. (For convienence we will say dZ^[n] = A^[n]-Y.)\n",
"\n",
"Now we can extrapolate the chain rule to all the paramters of the linear model our output layer.\n",
"\n",
"dL(A^[n], Y)/dW^[n] = dZ^[n] * dZ^[n]/dW^[n] = dZ^[n] * A^[n-1] = A^[n-1] * dZ^[n] (we will change our notation to dW^[n] = A^[n-1] * dZ^[n] for convienence)\n",
"\n",
"dL(A^[n], Y)/dB^[n] = dZ^[n] * dZ^[n]/dB^[n] = dZ^[n] (we will change our notation to dW^[n] = dZ^[n] for convienence)\n",
"\n",
"\n",
"\n",
"Hidden Layers - Back Propagation:\n",
"\n",
"dL(A^[n], Y)/dZ^[l] = ∏(i = n to l) (dl(A^[i], Y)/dz^[i]) * dZ^[i]/dA^[i-1] * dA^[i-1]/dZ^[i-2] *....* dA^[l]/dZ^[l]\n",
"\n",
"dL(A^[n], Y)/dZ^[l] = dZ^[l+1] * dZ^[l+1]/dA^[l] = W^[l+1] * dz^[l+1] * (element-wise) dA^[1]/dZ^[1]. The reason this is element-wise is because we are propgating from a single neuron to a layer with multiple neurons (we shall rename this dz^[l] = W^[l+1] * dz^[l+1] * (element-wise) dA^[1]/dZ^[1] for conveinience) \n",
"\n",
"dA^[1]/dZ^[1] depends on the activation function we are using in the hidden layer (in this case relu): The derivative of relu activation function is dA^[l]/dZ^[l] = if Z^[l] > 0 then 1 else 0.\n",
"\n",
"\n",
"dL(A^[n], Y)/dW^[l] = dZ^[l] * dZ^[l]/dW^[l] = dZ^[l] * X^T (we will change our notation to dW^[l] = dZ^[l] * A[l-1]^T for convienence)\n",
"\n",
"dL(A^[n], Y)/dB^[l] = dZ^[l] * dZ^[l]/dB^[l] = dZ^[l] (we will change our notation to dB^[l] = dZ^[l] for convienence)\n",
"\n",
"\n",
"Find vector averages:\n",
"\n",
"m = # number of images\n",
"\n",
"dW^[l] = 1/m * (A^[l-1] * dZ^[l])\n",
"\n",
"dB^[l] = 1/m * (dZ^[l])\n",
"\n",
"\n",
"Finally, we will update our the weights and bias's of the layers.\n",
"\n",
"\n",
"W^[l]:= W^[l] - alpha * dW^[l]\n",
"\n",
"B^[l]:= B^[l] - alpha * dB^[l]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first thing we have have to do is initialize our weights and bias's. There are multiple ways to initialize weights and bias's. Typically we will set our values based on either a uniform distribution between or a normal distribution with a some reasonable mean and standard deviation. There is some flexibility in the initalization of the weights but in general they need to be small (not to small) and varied. The weights need to be different so that the gradients with respect to each other are different. In other words we don't aways want the relative rates of change to be 0. Additionally, we do not want to reach saturation on our output activation function where the gradients are 0 (vanishing gradiants). For this assignment we wills stick with with a uniform random between -.1 and .1. We will also set a random seed each time so that we for our random values to be the same (or similar if there are more layers)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n",
"RMSProp Weights Shape: (20, 784)\n",
"RMSProp Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"R_dW = []\n",
"R_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"_R_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_R_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"R_dW.append(_R_dW)\n",
"R_dB.append(_R_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"R_dW = np.array(R_dW)\n",
"R_dB = np.array(R_dB)\n",
"\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit\n",
"print('RMSProp Weights Shape: ' + str(R_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('RMSProp Bias Shape: ' + str(R_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we implement our minibatch stochastic gradient descent algorithm. The only difference betwen minibatch stochasic gradient descent and general full-batch gradient descent is that during every epoch (run) we split up our training data into minibatches based on the specified minibatch size. If the data does not evenly split, then the last batch will be smaller; so that we utilize all of the trainig data during every epoch. Then during evey epoch we train once on each minibatch. Since we are training on minibatches, our path the extrema of the function we are tryign to find will not as direct. Therefore, our cost and test accuracy will not decrease every epoch; however, the general trend will be decreasing. Minibatch gradient decense will help prevent us from getting stuck on local extrema, as well as increase the speed at which the code runs.\n",
"\n",
"We will also include ADAM in our back propogation. This is a combination of RMSProp and Momentum. Momentum updates the network paramters based on the exponentially weighted moving average of the gradient which helps speed up the convergence rate and prevent the network from getting stuck on local minima. RMSProp penalizes momentum by the exponentially weighted moving average of the square of the gradient which prevents the network from focusing on specific features of the training data, and therefore, helps to prevent overfitting. We will start by setting both of our momentum and RMSProp hyperparamters to .9 with a learning rate of 0.1 since this provides good results.\n",
"\n",
"We will also collect data on the accuracy of our networks as a function of training iterations. To do this we will need to find the number of inaccuracate binary classifications (false positives & false negatives). This will be acommplished using our test data set. We will send our test data set through the network and compare the results with the true labels of the test data set. "
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 10\n",
"Cost: 0.1101922786244274\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 10\n",
"Cost: 0.017543898313142785\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 10\n",
"Cost: 0.05816910301022392\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 10\n",
"Cost: 0.05842772204814277\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 6.019030570983887 seconds\n",
"Cost: 0.05483394259959518\n",
"Accuracy: 94.0 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3XmYXFWd//H3J+mEkJCEQMKWAAFBUNkJ+yJgBpFVHBAYdmFQfuOAu6IzI24zroDMOCiLG6CIqMAAghiJGlEg7EtAwpqQQDpAkg4hgSTf3x/nVPXtSnenqSWVrnxez1NP3f2cc7fvPefeuqWIwMzMDGBAszNgZmarDwcFMzMrc1AwM7MyBwUzMytzUDAzszIHBTMzK1sjgoKk30o6tdn5sPqQdKqk39Z7WluRpO0l3VWnZU2U9Gw9ltXD8nvd1pKmSDqth3FbSWrZ5/Ml7SLpz32ZtqFBQdKzkiY2Mo2+iIj3RcRPGrFsSSMkXSTpeUkLJU3P/aMbkV49STpf0ps536XPloXxO0m6V9Ki/L1TYZwkfUPSy/nzTUnqJo0TC8t+XdLyYnrV5DsifhIR76v3tG+VpJm5TAslzZP0F0lndbceepi/5hORpKMlPShpgaS5kn4vabM6pv1V4FuF+YplflHSFZKG1VKGvpB0pqTJ3QyfKekAaOy2rkUxj80SEfcBr0ta6frp9zUFSW1NTHswMAl4F3AIMALYG3gZ2L2K5TWjLL+IiHUKn6dzXgYDNwBXAaOAnwA35OEAZwHvB3YEdgAOBz5cufCIuLq0bOB9wKxiepXTN3N7Vul9uRzjSSfPzwOXroqEJW0D/Ag4FxgJbAF8H1hep+WPA/YF/q9iVKnMuwB7AJ+pR3rWGIVj6mq6OUYrNS0oSDpc0gP5CutOSTsUxn1O0lOSOiQ9JunowrjT8hXZhZJeAc7Pw6ZI+rakVyU9U4yIkiZLOrMwf2/TbiHpTznt30v6nqSreijGKcBmwNER8VhELI+IORHxlYi4JS8vJG1VWP6PJX01dx+QryI+K+lF4EeSpkk6vDB9W74C3CX375nX17x8hXhALduhFwcAbcBFEbEkIi4GBByUx58KfCciZkbEC8B3gNOqSSivg09LehhYlIf9m6Sn83Z4VNKRhenLV415/YSkDyvV0l6VdHGV0w5UquW9nNP+175eyUfEvIi4HjgBOEPStnmZR+b9vEOpNvnvhdn+lKcp1Zx2k7S1pDtyHuZKulLSyB6S3RmYHhGTI+mIiOsiYmZe7gBJn8/H0lxJ10ga1VPa3Sz/YOCeiFjSQ5lnAb8DijXIIZIukDRD0kuS/lfSkMp5C9tifGHYVZLO76GsK6WK2oSkQyQ9IWm+pO+S9t/SuIH5HPKypKdIF3XFZa0r6UeSZuf988uSBhTS+WOef17eVw6uIr/rS7pFUnveF/9P0tg87gRVNNvl88R1ubvH9azcTJe3/YvAZXkRk4F/kDSot3w1JSjkE9wPSVFrfeAHwI2S1sqTPAXsR7r6+RJwlaSNC4vYA3ga2AD4WmHYE8Bo4JvAFVKP1fjepv0ZcHfO1/nAyb0UZSJwa0RU1QySbQSsB2xOuvr+OenEUvJeYG5E3Jd3mJtJVfr1gE8Bv5I0pob0j5D0Sj7xnl0Y/i7goej6HpSH8vDS+AcL4x4sjKvG8aSaROkE+Hdgn9z/NeBnkjbsZf5DgV1JJ8qT1HuzZU/Tnk3apjsAE4APvNVCRMRfgRdJ+y/AQuCkXI4jgHPVGfT3z/OUak73kE5cXwU2Bt4JbAkUA0nRvcD2kr4j6UCt2IzzCeCwnM444DWgFAS7S7vS9qTjpFuSNiWdTKcXBn+bVGPZAdiaVIP6Qk/LaBRJGwDXAZ8jHeczScd9ydmkoLcjqVb/wYpFXAW8DryNtC8cBpxeGL838DDpPHEhcEUV2RxAOmFvRjr+3wS+m8ddD2wjaevC9CcBV+bula3nccA6edn/DyAiniPtX8VlrigiGvYBngUmdjP8EuArFcOeAN7dw3IeAI7K3acBz1eMP410xVTqHwoEsFHunwycubJp8wpcCgwtjL8KuKqHfN0OfH0l6yCArQr9Pwa+mrsPAN4AhhTGbwV0lPJAqvL9R+7+LHBlxfJvA06tcvu8E9gEGEjayWcDJ+Rx/w5cUzH91cD5uXsZsG1h3Na5rOolvQOAmd0MnwmcspK8PgIclrvPBCbn7rac7p6FaX8NfKqKaf8EnFEYd0g6RHrM00zggG6GTwU+28M8/wN8q7Cte1x+nuYY0tV6T+P3Bn4JzAUWky62SvvOkxSOKWBTYAnpZNSXtH9U2lcryrww76NBqimMzOMG5DxsXph+P+DJ3D0ReLZiW4yvONbO7yEvZ5KOzXkVn+WlbVCxrT8ETCnMPyDv36cVtvWZhfGHltYHMJYUENYqjD8ZuL2QzuOFcSNyWUa/lf2km+kmAO2F/suAL+XunfI2HtTH9bwYGNxNGi8Be/eWj2Y1H20OfDJXveZJmkfaYTcBkHSKOpuW5gHbkaJ9yYxulvliqSMiFuXOFdqsVzLtJsArhWE9pVXyMumKrhbtEbG4kJ/pwDTSFfxQ4EhS7QXSeju2Yr3t210e1PUGb7dPZERq8poVEcsi4k7SVcoxefRC0s5eNIJ0Muhu/AhgYemoqkKX9azUzPdgoZzb0nUfqPRioXsRPW/73qbdpCIfvW373owFXgGQtJdS82W7pPmkE0qP5ZC0kaRrJb0gaQHpIqLH6SPizog4NiJGk67+DwLOy6M3A/6vsA4fJp28NuhjOV4Fhncz/PCIGA68h1Q7XC8P3whYCyhut5veQnorMyUi1i1+gFk9TNtlW0bEctLJudvxwHOF7s1J5XipUI7vAcWaauU+BL3vcyuQNEzS5UrNiguAP9B1W/8EODF3n0S6//cmfVvPL0XEG90kO5wUTHvUrKAwA/haxQYeGhE/l7Q5KUJ+FFg/b/hHKLQHknbsRpgNrJdPxiWb9jL974H3dlNtL1pEqo2UbFQxvruylJqQjgIey4EC0nq7smK9DYuIr1cuIAo3eKPvT2QEnev5UWCHiia4HfLw0vgdC+N2LIyrRnk9KD0BdQmpil/aBx6n6z7QCLNJ1e6S3rZ9tyTtSTp5TMmDrgF+BWwaESOBy+ksR3fb/hukq/ntI2IEqWbbp3JHxN2kZoft8qCZwD9U7C9DIuLFHtKu9BDw9l7S+wPp6r70dNJLpJrvNoX0RuZyV867NJezt2OjFrMpbL98P2BcT+NJAbRkBum4Xa9QjhERsQP19RlSE9DueVsfVBwZEVNy3vchnQ9KTUd9Wc8rbN98boVUg+zRqggKg/JNkdKnjXTS/4ikPZQMk3SYpOHAMFKB2gEknU7nTt5QkdrcppJuXg+WtBepHbgnV5J2oF9J2lbpxt76+QbPoXmaB4B/UrqxdQjw7j5k5RpSe+fZdNYSIB2AR0h6b17eEKWb1eO6XcpKSDpK0qi8DXYHziE9cQSpyW0ZcI6ktSR9NA//Q/7+KfAJSWMlbQJ8knRVWw/r0LkPSOkhgW3rtOzeXAt8TNImSjdkP93XGSWNVLoZ/jPgxxExLY8aTqp9Ls4B4/jCbHOAUOEx4Dz9a8D83Gb/qV7SfLfSTc8Ncv87SPvr3/Ik3wf+U/kRVUkbqPOGfXdpV/odsJs6nzjrzoXAoZK2i4hlpKB3kaQxeb8ap55vwj4InJj35cNItd56uQnYKe/jbcDHgeK9t9K2HitpfVLTLAARMQP4I/BtpUfOByg9wrt/DfkZ3M15cDgp+Lya8/Af3cx3JekC6bWI+FvO31tdzyXvBn6faxs9WhVB4RZS+1zpc35ETAX+mdS++irpRtVpkJo0SE+y/JUUEbcH/rIK8llyIrAXqWnoq8AvSFc0K4j0VMZE0lXs7cAC0k3q0UDpyYFzSQfqvLzs61eWgYiYTSr/3jn90vAZpNrD50knzBmkE1e12/F40rrvIJ3kvxH59xy56vl+0hNW80httO8vVEl/QHpU8WFSTe7mPKxmEfEQ6Ybo3aQrum3pXJ+NdAkpGD5Muol7M+mKrDe/Vfq9xfOkm5rfIjURlZwN/JekDtJ2u7Y0IiI6gP8C7srNABOAL5JufM4HbiTVMnryKnA08EjOwy15+d/J4y8AbgUm5fTvBHbrJe0uIj1d9Gd6uTDKtY6r6bwZ/klSU8zduQy/o+cbm+fk/M8Djs3lrYuIeAk4jrQ9XibVBIr70CWkx8kfBu4h3ZQuOol0gfoYaT3/ktpqMrfR9Tz4b6TtMzLn706gu2ben5Iuiq+sGP5W1nPJiaQLhV6p+ibgNYOkX5BuKn2x2XmxVUvSEaRHct/W7Lw0i6TtgcsiYs9m52VNlJum5wDbRcQzNSxnZ+C/I2KltTEHhQpKz2u/AjxDasK5HtgrIu5vasas4fIBuB+p1rcx8BvgjxHRYxOOWSNJ+gzpCc63/DuIavW3X4+uChuRHlNcn3Sj7mwHhDWGSL+JuI7Urn8T6XcyZqucpJmk3y4ctUrTdU3BzMxK+v27j8zMrH5Wq+aj0aNHx/jx45udDTOzfuPee++dGxG1vOqmi9UqKIwfP56pU6c2OxtmZv2GpOdWPlXfufnIzMzKHBTMzKzMQcHMzMocFMzMrMxBwczMyhwUzMyszEHBzMzKWiMoLFoEV14JfmWHmVlNWiMofPKTcMopMHlys3NiZtavtUZQmJn/erWjo/fpzMysV60RFErU6L/wNTNrba0RFHwvwcysLlorKLimYGZWEwcFMzMrc1AwM7MyBwUzMytzUDAzszIHBTMzK3NQMDOzMgcFMzMrc1AwM7Oy1ggKJQ4KZmY1aY2g4NdcmJnVRWsFBdcUzMxq4qBgZmZlDgpmZlbmoGBmZmUOCmZmVtZaQcHMzGrSWkHBwcHMrCYOCmZmVtbQoCDp45IelfSIpJ9LGtKQhBwUzMzqomFBQdJY4BxgQkRsBwwEjm9UeoCDgplZjRrdfNQGrC2pDRgKzGpoasuXN3TxZmatrmFBISJeAL4NPA/MBuZHxO8qp5N0lqSpkqa2t7dXm1jXbzMzq0ojm49GAUcBWwCbAMMknVQ5XURcGhETImLCmDFjqkvMQcHMrC4a2Xw0EXgmItoj4k3g18DeDUnJQcHMrC4aGRSeB/aUNFSSgPcA0xqSkoOCmVldNPKewl3AdcB9wMM5rUsblFjXbzMzq0pbIxceEV8EvtjINHJCXb/NzKwq/kWzmZmVOSiYmVmZg4KZmZW1RlAo8S+azcxq0hpBwTUFM7O6cFAwM7MyBwUzMytzUDAzszIHBTMzK3NQMDOzMgcFMzMrc1AwM7MyBwUzMytrjaBQ4qBgZlaT1ggKpWDg11yYmdWktYKCawpmZjVxUDAzszIHBTMzK3NQMDOzMgcFMzMrc1AwM7MyBwUzMytzUDAzszIHBTMzK2uNoFDioGBmVpPWCAp+zYWZWV20VlBwTcHMrCYOCmZmVuagYGZmZQ4KZmZW5qBgZmZlDgpmZlbmoGBmZmUOCmZmVtbQoCBpXUnXSXpc0jRJezUkIQcFM7O6aGvw8r8L3BoRx0gaDAxtaGoOCmZmNWlYUJA0AtgfOA0gIt4A3mhIYn7NhZlZXTSy+WhLoB34kaT7JV0uaVjlRJLOkjRV0tT29vbqUnLzkZlZXTQyKLQBuwCXRMTOwGvA5yoniohLI2JCREwYM2ZMdSk5KJiZ1UUjg8JMYGZE3JX7ryMFifpzUDAzq4uGBYWIeBGYIWmbPOg9wGMNSqzrt5mZVaXRTx/9K3B1fvLoaeD0hqbmoGBmVpOGBoWIeACY0Mg0ckJdv83MrCr+RbOZmZU5KJiZWVlrBIUSBwUzs5q0RlDwL5rNzOqitYKCawpmZjVpjaAwc2b6dlAwM6tJawSF9ddP3w4KZmY1aY2gUOKgYGZWk9YJCpKDgplZjRwUzMyszEHBzMzKHBTMzKzMQcHMzMpaJygMGOCgYGZWo9YJCpJfc2FmVqPWCgquKZiZ1aRPQUHSlX0Z1lQOCmZmNetrTeFdxR5JA4Fd65+dGjgomJnVrNegIOk8SR3ADpIW5E8HMAe4YZXksK8cFMzMatZrUIiI/4qI4cC3ImJE/gyPiPUj4rxVlMe+cVAwM6tZX5uPbpI0DEDSSZIukLR5A/P11jkomJnVrK9B4RJgkaQdgc8AzwE/bViuquGgYGZWs74GhaUREcBRwHcj4rvA8MZlqwoOCmZmNWvr43Qdks4DTgb2y08fDWpctqrgoGBmVrO+1hSOA5YAH4qIF4GxwLcalqtq+DUXZmY161NQyIHgamCkpMOBxRGx+t1T8GsuzMxq0tdfNH8QuBs4FvggcJekYxqZsbfMzUdmZjXr6z2FLwC7RcQcAEljgN8D1zUqY2+Zg4KZWc36ek9hQCkgZC+/hXlXDQcFM7Oa9bWmcKuk24Cf5/7jgFsak6UqOSiYmdWs16AgaStgw4j4tKQPAPsCAv5KuvG8+nBQMDOr2cqagC4COgAi4tcR8YmI+DiplnBRozP3ljgomJnVbGVBYXxEPFQ5MCKmAuMbkqNqOSiYmdVsZUFhSC/j1q5nRmrmoGBmVrOVBYV7JP1z5UBJZwD39iUBSQMl3S/ppmoy2Gf+RbOZWc1W9vTRx4DfSDqRziAwARgMHN3HNM4FpgEjqsphX/kXzWZmNes1KETES8Dekg4EtsuDb46IP/Rl4ZLGAYcBXwM+UUtG+5CYawpmZjXq0+8UIuIO4I4qln8R6f8XenzNtqSzgLMANttssyqSKC/IQcHMrEYN+1VyfnHenIjo9d5DRFwaERMiYsKYMWNqSdBBwcysRo18VcU+wJGSngWuAQ6SdFXDUnNQMDOrWcOCQkScFxHjImI8cDzwh4g4qVHpOSiYmdVu9XqpXS0cFMzMatbXF+LVJCImA5MbmoiDgplZzVxTMDOzMgcFMzMra52g4NdcmJnVrHWCgl9zYWZWs9YKCq4pmJnVxEHBzMzKHBTMzKzMQcHMzMocFMzMrMxBwczMyhwUzMyszEHBzMzKHBTMzKysdYKCX3NhZlaz1gkKfs2FmVnNWisouKZgZlYTBwUzMytzUDAzszIHBTMzK3NQMDOzMgcFMzMrc1AwM7MyBwUzMytzUDAzs7LWCQoDBvgXzWZmNWqdoDBwICxb1uxcmJn1a60TFAYNgqVLm50LM7N+rXWCQlsbvPlms3NhZtavtU5QGDTIQcHMrEatExTa2tx8ZGZWo9YJCq4pmJnVzEHBzMzKWicouPnIzKxmrRMUXFMwM6tZw4KCpE0l3SFpmqRHJZ3bqLQABwUzszpoa+CylwKfjIj7JA0H7pV0e0Q81pDU3HxkZlazhtUUImJ2RNyXuzuAacDYRqXnmoKZWe1WyT0FSeOBnYG7uhl3lqSpkqa2t7dXn0gpKPhNqWZmVWt4UJC0DvAr4GMRsaByfERcGhETImLCmDFjqk+oLbeE+U2pZmZVa2hQkDSIFBCujohfNzItBg1K325CMjOrWiOfPhJwBTAtIi5oVDplpaDgm81mZlVrZE1hH+Bk4CBJD+TPoQ1LrdR85JqCmVnVGvZIakRMAdSo5a/AzUdmZjVrrV80g5uPzMxq0DpBwc1HZmY1a52g4OYjM7OatV5QcPORmVnVWicouPnIzKxmrRMU3HxkZlaz1gsKbj4yM6ta6wQFNx+ZmdWsdYKCm4/MzGrWekHBzUdmZlVrnaDg5iMzs5q1TlBw85GZWc1aLyi4+cjMrGqtExTcfGRmVrPWCQpuPjIzq1nrBIVRo2DAALjzzmbnxMys32qdoDB6NJx2Glx+OSxf3uzcmJn1S60TFAC22SY1H73+erNzYmbWL7VWUBg+PH13dDQ3H2Zm/ZSDgpmZlbVmUFi4sLn5MDPrp1orKKyzTvp2TcHMrCqtFRTcfGRmVpPWDAqHHw5nnw3PPNPc/JiZ9TOtFRRKzUcA3/8+bLstRDQvP2Zm/UxrBYVSTaHkjTdg+vTm5MXMrB9qraBQrCmUTJq06vNhZtZPtVZQKL0pFWDkSNh8c7j55ublx8ysn2mtoFA0Zw4cdxz89repu1k6OuCss+DVV5uXBzOzPmq9oPDHP6anjgYPhqOPhmXLmvvm1CuugMsug29+s3l5MDPro7aVT9LP7L9/Z/cWW6TvGTOakxfofGPra681Lw9mZn3UejWFojFjYK21mhsU3nij67eZ2WqstYPCgAGw6abw85/DvHnNycNLL6XvZqW/Onj5ZfjNb7oOmzbNvyGxVau9ves+9/rrsGRJZ/+iRfD886s+X6uZ1g4KkG4yz5yZ/pnt+ONhwYK+zbdoUZqvO8uWwZVXdu5QlS/ge+yxtANCZ1B46KHul/XcczB//orDZ8/uXEal5cvhySdTPu67D/78586ayKpsplq8OP1/xauvpjwVD7CSJUvSev/ABzp/M3LfffDOd8IFFzQmX+3tff9b1jfeaE5wmj07XSj0tI91p9Evemxvh1mzqpt3+fIU/Pti8eK0zhcsSCfh+fM7j8sFC+CVV9Lx151q18ELL8AGG8BXvpKCwWuvwbhxsM8+ndMcf3x6YnHx4r4vs6/HW+nY6A+/nYqIhn2AQ4AngOnA51Y2/a677hp1d9xxEWkX7PxsuWXEz34Wcf75EcuXRyxbFrF4cfp+9dX0ffDBado33oh4/vmIX/0qdUdEXH11Gvcv/xLx8MOp+7rr0rg330z9b3tbWvaBB3ame/HFEY8/npYfEbF0aRq+884r5hsihg1L3cuWpeUuXpz6v/KVNP4974loa0vd//RPEddem7ofe6xzOfPmdc4XkfJU8tJLqb+0DopeeSVi+vSIp5+OeOGFlP4bb0TMmhWxcGGaZtSoiH32iVh33YjDD09pX3991+Xsu29n+X/wgzTssstS/y67dJ+v0nqeOzflo9LcuRFz5qTu+fMjZs5M6/Lll1M+N9gg4pRTOtfdwoVdl18s49ixaT9Ytiyio6Nz3IIFaZml+ZYv7yx3RMSiRSvmNyLNE5HysnRpmq64/t98M6K9vev+eOONnduhUmnYlCkRa60VcfnlaRnF5c2cmdZDqb+Un+XL07Yr5amU146OrmUtOeigiPHjI5Ys6Zp2pY6OtO8U18GXv5zKMmVK53zFfaq0febNi5AivvCFzmMM0nZYvjzi7W9P/W1taV8r5vvii9O4v/41lXPBgu7zt3x5SqdU7lmzIr7//a7rfPTozu7S9in1T5q04vJeeSXioosiDjss5eXll9O0Rx+d1kd7e8SMGWl9L1uW0n799TT/pz4VsdFG6dzykY9EDBmS8ldadkdH57RVAKZGPc/b9VxYlwXDQOApYEtgMPAg8M7e5mlIUHjttYiHHor49rcj7rwz7bwjR64YKCBiwoSIAQMiTj21c9iYMV2nKZ38Sp/SSfm9700b9tZbO8ftsUfa+SZOTCeq4nxf+EIKJKX+hQsjZs9OB/i0aZ3Dr7yys3uDDSIeeCCdhIvLWm+9lI899kj9X/5yxIsvpoNn6NA0bOjQiHPPjdhii4gnn4z43/9Nww84IGK33dIyJ09OB8hzz3VNY+TIiB13jNhhh9S/664R99/f/TocMCBNP3RoxI9/vOL6ffbZrut3/PiI++5L4844Ix28pXUKEYMHR9x1VzqgJk+O2HTTNHzQoIif/jRi3LiuaVx+eWf3+9/f2b3//p0n09LJ/iMf6TrvsGERzzwT8d3vphPXLrtEbLVVxHe+E/HZz6ZpPv/5iHPOSdO+8ELEhRem9X/rrRFXXJG20c03dwa99dZLJ7wZM1K6O++84jo74oiI00+POPLItB8sWpROQn/6UzqZ3HprWueV8519dsTw4Z3b6C9/SRc8w4dHPPVUxH//d/fbqPS57bbOYDdrVtp2EPGNb0Qce2zEO96Rgv7JJ0c88UTE3/+eLnZK8++wQzoZPv/8isfU2WdHbLxxxA03pOPiox+N8oVRT/mp3F8gHbN33tn99G1tEffck7bp/Plp+957b8TAgWn8xz+eTr6jRvW+Hn75y65pnHBC2ralYPH1r3edvvJYrvy8/e1pW26+ecTUqZ3DDz20M28/+EG6QCxdNG21VTpXVaE/BYW9gNsK/ecB5/U2T0OCQncmTYrYaafeN2xfPjvu2Nk9YEA6gXU33S23RHziE7WnV/qsvXYKdNttl07ekybVb9mlz8CBEf/5n+mk15fpiwfK0KErPxBXNh7Siakv05VOOMVP8Uqw9BkxImKddVL3Wmul71L/yj5SfdZpsf/yy1Mw6Ov8pcBU/Oy/f8T3vtcZHEqf0gm+8vOP/xjxrW+lYDx4cNcADBGbbNK3vLzrXbWti5626zveEXHTTeniYNCgFcf/8pddLwQGDerc1tKK83S3Hs45p7O7ePEzalTEBz7QddrK9dPb55RTUkCtTHP06M4LlLXXTp/KeU88serTWb2DQiMfSR0LFB/7mQnsUTmRpLOAswA222yzBman4KCD4P774cUX4emn4amnYODA1Na4fHlqt5w9O/3nc0Rq9x03DjbbDJ54It283m671EZ5+eXp+/HHU9vo2LGpXTIiLfO11+CQQ2DffdPjsjvtBPfck9ojIbWtL1oEjz6afpE9ZEhKf/jwlL+2tjTvRhultsjp0+HAA2H77eGmm9I0e+wB116b7lvsvXf6XcbIkelG+7bbpqewnnoKHnkEdtsNHn44PZX1oQ/BbbfB0qUpjRtvTPnt6Ehvmt1335TH7bZLv/94/XXYccc0zZIlsO666X7CTjul9fPkk/CXv8Dpp6dXjlxySecvy/fbD265Jc3X0ZF+Q/L3v6f1/PzzKY+jRsHUqSlfL7yQyvn443D99WkdjR2blrfJJjB6NEyZkrbJEUek/M6aldbjuHEwcWJ6xcmAAanM06al6UeMSMtYuDBtt5NPhquvhg03THlZtCiV66ST4IYb0ra+++607I9+NK3ztjZYf3148MGU53Hj0vIWLoTx41Ob+H77wR13wNZbpzSnTEnt5SNHprw98wwccwwcdRTsvnvaD7fdFubOTfvg4MGpbXvMmJSvrbdObd6HHJLGzZ+f7kedeWbKz957w+9/D1tumdqt77orraMRI2DYsFS+xYvTuho8OO3znaaZAAAHRUlEQVQz11yT8jNyZMrzdtvBzjunBzMmTIC1106fO+9M6/Gpp+CAA9Lyjjwy7XPPPZfWxdKlsMsuaR0880y6PzFqVLq/NmZMSnvp0nRcTJwIe+0FF14Ip54KkyenY2zePDj22JQ2pOPl2mvTfrfNNrDxxqn8EyfCs8+m5f3iF2ldjBuX7rHNmZPKse666XiISNt5woS0fw4Zkrb5hz+ctseGG6b7g8OGwcEHw9vfnh6KmDcv3SN57TUYOhTOOCPtv6Vt+O53p/zccEO6L/G3v8EHP5jyvcsu8MADKV9SGr7VVvCjH8Hb3paOmVmz0vlhww3TMXzKKavm3NcHSoGmAQuWjgXeGxFn5v6Tgd0j4l97mmfChAkxderUhuTHzKwVSbo3IibUa3mNfPpoJrBpoX8cUOWjDWZmtio0MijcA2wtaQtJg4HjgRsbmJ6ZmdWoYfcUImKppI8Ct5GeRPphRDzaqPTMzKx2DX33UUTcAtzSyDTMzKx+Wv8XzWZm1mcOCmZmVuagYGZmZQ4KZmZW1rAfr1VDUjvwXBWzjgbm1jk7qzuXec2xJpbbZe67zSNiTL0ysVoFhWpJmlrPX/T1By7zmmNNLLfL3DxuPjIzszIHBTMzK2uVoHBpszPQBC7zmmNNLLfL3CQtcU/BzMzqo1VqCmZmVgcOCmZmVtbvg4KkQyQ9IWm6pM81Oz/1IumHkuZIeqQwbD1Jt0t6Mn+PysMl6eK8Dh6StEvzcl49SZtKukPSNEmPSjo3D2/ZcksaIuluSQ/mMn8pD99C0l25zL/Ir59H0lq5f3oeP76Z+a+FpIGS7pd0U+5v6TJLelbSw5IekDQ1D1vt9u1+HRQkDQS+B7wPeCdwgqR3NjdXdfNj4JCKYZ8DJkXE1sCk3A+p/Fvnz1nAJasoj/W2FPhkRLwD2BP4l7w9W7ncS4CDImJHYCfgEEl7At8ALsxlfhU4I09/BvBqRGwFXJin66/OBaYV+teEMh8YETsVfo+w+u3b9fzD51X9AfYCbiv0nwec1+x81bF844FHCv1PABvn7o2BJ3L3D4ATupuuP3+AG4B/WFPKDQwF7iP9l/lcoC0PL+/npP8n2St3t+Xp1Oy8V1HWcaST4EHATYDWgDI/C4yuGLba7dv9uqYAjAVmFPpn5mGtasOImA2QvzfIw1tuPeQmgp2Bu2jxcudmlAeAOcDtwFPAvIhYmicplqtc5jx+PrD+qs1xXVwEfAZYnvvXp/XLHMDvJN0r6aw8bLXbtxv6JzurgLoZtiY+Y9tS60HSOsCvgI9FxAKpu+KlSbsZ1u/KHRHLgJ0krQv8BnhHd5Pl735fZkmHA3Mi4l5JB5QGdzNpy5Q52yciZknaALhd0uO9TNu0Mvf3msJMYNNC/zhgVpPysiq8JGljgPw9Jw9vmfUgaRApIFwdEb/Og1u+3AARMQ+YTLqfsq6k0kVbsVzlMufxI4FXVm1Oa7YPcKSkZ4FrSE1IF9HaZSYiZuXvOaTgvzur4b7d34PCPcDW+amFwcDxwI1NzlMj3QicmrtPJbW5l4afkp9Y2BOYX6qS9idKVYIrgGkRcUFhVMuWW9KYXENA0trARNLN1zuAY/JklWUurYtjgD9EbnTuLyLivIgYFxHjScfsHyLiRFq4zJKGSRpe6gYOBh5hddy3m33zpQ43bw4F/k5qh/1Cs/NTx3L9HJgNvEm6ajiD1I46CXgyf6+XpxXpKayngIeBCc3Of5Vl3pdURX4IeCB/Dm3lcgM7APfnMj8C/EceviVwNzAd+CWwVh4+JPdPz+O3bHYZaiz/AcBNrV7mXLYH8+fR0rlqddy3/ZoLMzMr6+/NR2ZmVkcOCmZmVuagYGZmZQ4KZmZW5qBgZmZlDgrW8iQty2+mLH3q9jZdSeNVeJOtWX/X319zYdYXr0fETs3OhFl/4JqCrbHy++2/kf/P4G5JW+Xhm0ualN9jP0nSZnn4hpJ+k//74EFJe+dFDZR0Wf4/hN/lXyYj6RxJj+XlXNOkYpq9JQ4KtiZYu6L56LjCuAURsTvwP6T375C7fxoROwBXAxfn4RcDf4z03we7kH6ZCumd99+LiHcB84B/zMM/B+ycl/ORRhXOrJ78i2ZreZIWRsQ63Qx/lvQHN0/nF/G9GBHrS5pLenf9m3n47IgYLakdGBcRSwrLGA/cHulPUpD0WWBQRHxV0q3AQuB64PqIWNjgoprVzDUFW9NFD909TdOdJYXuZXTeqzuM9P6aXYF7C28ANVttOSjYmu64wvdfc/edpLd3ApwITMndk4CzofzHOCN6WqikAcCmEXEH6c9k1gVWqK2YrW585WJrgrXzP5uV3BoRpcdS15J0F+kC6YQ87Bzgh5I+DbQDp+fh5wKXSjqDVCM4m/Qm2+4MBK6SNJL0xssLI/1fgtlqzfcUbI2V7ylMiIi5zc6L2erCzUdmZlbmmoKZmZW5pmBmZmUOCmZmVuagYGZmZQ4KZmZW5qBgZmZl/x+c3OcGKQfavQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"adam = True\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .1;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 500\n",
"batch_size = 50\n",
"momentum_coef = .9\n",
"RMSProp_coef = .9\n",
"epsilon = 1e-20\n",
"t = 0\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(adam == True):\n",
" epsilon = 1e-6\n",
"\n",
" # ADAM - RMSProp + Momentum\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(1-momentum_coef, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(1-momentum_coef, _dB)\n",
" R_dW[layer] = np.multiply(RMSProp_coef, R_dW[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dW, _dW))\n",
" R_dB[layer] = np.multiply(RMSProp_coef, R_dB[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dB, _dB))\n",
" \n",
" # index decay in bias correction\n",
" t = t + 1\n",
" \n",
" # correct bias for initial rounds\n",
" V_dW[layer] = np.multiply(V_dW[layer], 1/(1-np.power(momentum_coef, t)))\n",
" V_dB[layer] = np.multiply(V_dB[layer], 1/(1-np.power(momentum_coef, t)))\n",
" R_dW[layer] = np.multiply(R_dW[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" R_dB[layer] = np.multiply(R_dB[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" \n",
" val1 = 1/(np.sqrt(R_dW[layer])+ epsilon)\n",
" val2 = 1/(np.sqrt(R_dB[layer])+ epsilon)\n",
" \n",
" W[layer] = W[layer] - np.multiply(alpha, np.multiply(V_dW[layer], val1 ))\n",
" B[layer] = B[layer] - np.multiply(alpha, np.multiply(V_dB[layer], val2 ))\n",
" elif(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B, t = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B, t\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # saftey check\n",
" if(adam == True and momentum == True):\n",
" print(\"ERROR! Please Select Either Adam OR Momentum OR Neither, Not Both.\")\n",
" break\n",
"\n",
" # saftey check\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B, t = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As shown, our validation set worked, so now we can move on to the full data set, and begin our evaluation and exploration.\n",
"\n",
"First, we need to split up our full data set into testing and training data. We will use 50,000 images as the training data set and 10,000 images as the testing data set. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(784, 50000)\n",
"(50000,)\n",
"(784, 10000)\n",
"(10000,)\n"
]
}
],
"source": [
"# create a data set\n",
"size = vector_size\n",
"\n",
"number_of_testing_images = 10000\n",
"number_of_training_images = 50000\n",
"number_of_validation_images = number_of_testing_images + number_of_training_images\n",
"\n",
"training_images = []\n",
"training_labels = []\n",
"testing_images = []\n",
"testing_labels = []\n",
"\n",
"factor = 0\n",
"for index in range(0, number_of_validation_images):\n",
" if(index <= number_of_training_images - 1):\n",
" training_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" training_labels.append(binary_labels[index + factor])\n",
" else:\n",
" testing_images.append(normalized_scaled_images_feature_matrix[:, index + factor]) \n",
" testing_labels.append(binary_labels[index + factor])\n",
" \n",
"# covert to numpy array\n",
"training_images = np.transpose(np.array(training_images))\n",
"training_labels = np.array(training_labels)\n",
"testing_images = np.transpose(np.array(testing_images))\n",
"testing_labels = np.array(testing_labels)\n",
"\n",
"# logger\n",
"print(training_images.shape) # validation_training_images is a matrix of 784 X 500\n",
"print(training_labels.shape) # validation_testing_labels is a row vector of 1 X 500\n",
"print(testing_images.shape) # validation_training_images is a matrix of 784 X 100\n",
"print(testing_labels.shape) # validation_testing_labels is a row vector of 1 X 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we must reset our weights and bias's. "
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n",
"RMSProp Weights Shape: (20, 784)\n",
"RMSProp Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"R_dW = []\n",
"R_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"_R_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_R_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"R_dW.append(_R_dW)\n",
"R_dB.append(_R_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"R_dW = np.array(R_dW)\n",
"R_dB = np.array(R_dB)\n",
"\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit\n",
"print('RMSProp Weights Shape: ' + str(R_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('RMSProp Bias Shape: ' + str(R_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we re-run minibatch stochastic gradient descent with ADAM on the full data. We will first utilize minibatches of 50 each."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 1000\n",
"Cost: 6.328639783644706e-08\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.011801970848913768\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.021871124168560296\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.0\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 1264.1851885318756 seconds\n",
"Cost: 0.0\n",
"Accuracy: 98.36 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"adam = True\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 500\n",
"batch_size = 50\n",
"momentum_coef = .9\n",
"RMSProp_coef = .9\n",
"epsilon = 1e-20\n",
"t = 0\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(adam == True):\n",
" epsilon = 1e-6\n",
"\n",
" # ADAM - RMSProp + Momentum\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(1-momentum_coef, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(1-momentum_coef, _dB)\n",
" R_dW[layer] = np.multiply(RMSProp_coef, R_dW[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dW, _dW))\n",
" R_dB[layer] = np.multiply(RMSProp_coef, R_dB[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dB, _dB))\n",
" \n",
" # index decay in bias correction\n",
" t = t + 1\n",
" \n",
" # correct bias for initial rounds\n",
" V_dW[layer] = np.multiply(V_dW[layer], 1/(1-np.power(momentum_coef, t)))\n",
" V_dB[layer] = np.multiply(V_dB[layer], 1/(1-np.power(momentum_coef, t)))\n",
" R_dW[layer] = np.multiply(R_dW[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" R_dB[layer] = np.multiply(R_dB[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" \n",
" val1 = 1/(np.sqrt(R_dW[layer])+ epsilon)\n",
" val2 = 1/(np.sqrt(R_dB[layer])+ epsilon)\n",
" \n",
" W[layer] = W[layer] - np.multiply(alpha, np.multiply(V_dW[layer], val1 ))\n",
" B[layer] = B[layer] - np.multiply(alpha, np.multiply(V_dB[layer], val2 ))\n",
" elif(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B, t = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B, t\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # saftey check\n",
" if(adam == True and momentum == True):\n",
" print(\"ERROR! Please Select Either Adam OR Momentum OR Neither, Not Both.\")\n",
" break\n",
"\n",
" # saftey check\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B, t = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated the after 500 epochs with minibatches of 50 the cost became approximately 0.0 (or too low for python to estimate) and the test data accuracy reached 98.36%. These results are very good. The test accuracy is high because minibatch stochastic gradient descent inately provides a form of regularization, combined with the ADAM (momentum and RMSProp) which prevents us from getting stuck on local minima, and from focusing too much on specific features based on large gradients. It is important to note that having a cost of zero usually means we have overfit the training data; however, in this senario that doesn't appear to be the case since we still have a very high test accuracy.\n",
"\n",
"We now wish to explore the impact of adjusting the momentum hyper-parameter size for Adam. We will re-run the algorithm with smaller momentum hyper-paramter of .5 see what the results we achieve.\n",
"\n",
"First we reinitialize our weights and bias's."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n",
"RMSProp Weights Shape: (20, 784)\n",
"RMSProp Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"R_dW = []\n",
"R_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"_R_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_R_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"R_dW.append(_R_dW)\n",
"R_dB.append(_R_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"R_dW = np.array(R_dW)\n",
"R_dB = np.array(R_dB)\n",
"\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit\n",
"print('RMSProp Weights Shape: ' + str(R_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('RMSProp Bias Shape: ' + str(R_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we re-run our minibatch stochastic gradient descent algorithm with ADAM."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.0\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.0\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.0\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.0\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 1049.1028203964233 seconds\n",
"Cost: 0.0\n",
"Accuracy: 98.94 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"adam = True\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 500\n",
"batch_size = 50\n",
"momentum_coef = .5\n",
"RMSProp_coef = .9\n",
"epsilon = 1e-20\n",
"t = 0\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(adam == True):\n",
" epsilon = 1e-6\n",
"\n",
" # ADAM - RMSProp + Momentum\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(1-momentum_coef, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(1-momentum_coef, _dB)\n",
" R_dW[layer] = np.multiply(RMSProp_coef, R_dW[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dW, _dW))\n",
" R_dB[layer] = np.multiply(RMSProp_coef, R_dB[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dB, _dB))\n",
" \n",
" # index decay in bias correction\n",
" t = t + 1\n",
" \n",
" # correct bias for initial rounds\n",
" V_dW[layer] = np.multiply(V_dW[layer], 1/(1-np.power(momentum_coef, t)))\n",
" V_dB[layer] = np.multiply(V_dB[layer], 1/(1-np.power(momentum_coef, t)))\n",
" R_dW[layer] = np.multiply(R_dW[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" R_dB[layer] = np.multiply(R_dB[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" \n",
" val1 = 1/(np.sqrt(R_dW[layer])+ epsilon)\n",
" val2 = 1/(np.sqrt(R_dB[layer])+ epsilon)\n",
" \n",
" W[layer] = W[layer] - np.multiply(alpha, np.multiply(V_dW[layer], val1 ))\n",
" B[layer] = B[layer] - np.multiply(alpha, np.multiply(V_dB[layer], val2 ))\n",
" elif(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B, t = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B, t\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # saftey check\n",
" if(adam == True and momentum == True):\n",
" print(\"ERROR! Please Select Either Adam OR Momentum OR Neither, Not Both.\")\n",
" break\n",
"\n",
" # saftey check\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B, t = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated, after 500 epochs with minibatches of 50 the cost became approximately 0.0 and the test data accuracy reached 98.84%. These results are very good. The test accuracy is high because minibatch stochastic gradient descent inately provides a form of regularization, combined with the ADAM (momentum and RMSProp) which prevents us from getting stuck on local minima, and from focusing too much on specific features based on large gradients. It is important to note that having a cost of zero usually means we have overfit the training data; however, in this senario that doesn't appear to be the case since we still have a very high test accuracy.\n",
"\n",
"We now wish to explore the impact of adjusting the RMSProp hyper-parameter size for Adam. We will reset the momentum hyper-paramter to 0.9 re-run the algorithm with smaller RMSProp hyper-paramters of .5 see what the results we achieve.\n",
"\n",
"First we reinitialize our weights and bias's."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature Size: 784\n",
"Weights Shape: (20, 784)\n",
"Bias Shape: (20, 1)\n",
"Velocity Weights Shape: (20, 784)\n",
"Velocity Bias Shape: (20, 1)\n",
"RMSProp Weights Shape: (20, 784)\n",
"RMSProp Bias Shape: (20, 1)\n"
]
}
],
"source": [
"# initialize weights & bias\n",
"np.random.seed(10)\n",
"print('Feature Size: ' + str(size))\n",
"\n",
"lower_bound = -.1\n",
"upper_bound = .1\n",
"\n",
"#mean = 0.015\n",
"#std = 0.005\n",
"\n",
"# hyper-parameters: hidden layers\n",
"hidden_layers = 2\n",
"units_array = [20, 10]\n",
"Weights = []\n",
"Bias = []\n",
"V_dW = []\n",
"V_dB = []\n",
"R_dW = []\n",
"R_dB = []\n",
"for i in range(0, hidden_layers):\n",
" if(i == 0):\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], size]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], size]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" else:\n",
" _W = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], units_array[i-1]]))\n",
" _B = np.float64(np.random.uniform(lower_bound, upper_bound, [units_array[i], 1]))\n",
" _V_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _V_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" _R_dW = np.float64(np.zeros([units_array[i], units_array[i-1]]))\n",
" _R_dB = np.float64(np.zeros([units_array[i], 1]))\n",
" Weights.append(_W)\n",
" Bias.append(_B)\n",
" V_dW.append(_V_dW)\n",
" V_dB.append(_V_dB)\n",
" R_dW.append(_R_dW)\n",
" R_dB.append(_R_dB)\n",
" \n",
"# output layer\n",
"_W = np.float64(np.random.uniform(lower_bound, upper_bound, [1, units_array[i]]))\n",
"_b = np.float64(np.random.uniform(lower_bound, upper_bound)) # b will be added in a broadcasting manner\n",
"_V_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_V_dB = np.float64(np.zeros(1))\n",
"_R_dW = np.float64(np.zeros([1, units_array[i]]))\n",
"_R_dB = np.float64(np.zeros(1))\n",
"Weights.append(_W)\n",
"Bias.append(_b)\n",
"V_dW.append(_V_dW)\n",
"V_dB.append(_V_dB)\n",
"R_dW.append(_R_dW)\n",
"R_dB.append(_R_dB)\n",
"\n",
"Weights = np.array(Weights)\n",
"Bias = np.array(Bias)\n",
"V_dW = np.array(V_dW)\n",
"V_dB = np.array(V_dB)\n",
"R_dW = np.array(R_dW)\n",
"R_dB = np.array(R_dB)\n",
"\n",
"\n",
"for index in range(0, len(Weights) - 1):\n",
" Weights[index] = np.where(Weights[index] != 0, Weights[index], np.random.uniform(lower_bound, upper_bound))\n",
"\n",
"#print(train_X.shape)\n",
"#print(np.ravel(train_Y).shape)\n",
"\n",
"print('Weights Shape: ' + str(Weights[0].shape)) # matrix with a size of # of units X 784\n",
"print('Bias Shape: ' + str(Bias[0].shape)) # vector with a size of the # of unit\n",
"print('Velocity Weights Shape: ' + str(V_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('Velocity Bias Shape: ' + str(V_dB[0].shape)) # vector with a size of the # of unit\n",
"print('RMSProp Weights Shape: ' + str(R_dW[0].shape)) # matrix with a size of # of units X 784\n",
"print('RMSProp Bias Shape: ' + str(R_dB[0].shape)) # vector with a size of the # of unit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we rerun our algorithm."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Main Loop Epoch: 100\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.23567681362319903\n",
"Main Loop Epoch: 200\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.235604659004334\n",
"Main Loop Epoch: 300\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.4115037462206437\n",
"Main Loop Epoch: 400\n",
"Number Of Minibatches: 1000\n",
"Cost: 0.23921075030401118\n",
"\n",
"Results:\n",
"\n",
"\n",
"Run Time: 1448.7536845207214 seconds\n",
"Cost: 0.23185894095927148\n",
"Accuracy: 90.39 %\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# gradient descent\n",
"detailed_logger = False\n",
"main_logger = True\n",
"main_logger_output_epochs = 100\n",
"L2 = False\n",
"Dropout = False\n",
"momentum = False\n",
"adam = True\n",
"hidden_layer_relu = True\n",
"hidden_layer_tanh = False\n",
"hidden_layer_sigmoid = False\n",
"\n",
"# hyber-parameters\n",
"alpha = .01;\n",
"epsilon = .85\n",
"keep_prob = .9\n",
"number_of_epochs = 500\n",
"batch_size = 50\n",
"momentum_coef = .9\n",
"RMSProp_coef = .5\n",
"epsilon = 1e-20\n",
"t = 0\n",
"\n",
"# copy initalization\n",
"W = Weights.copy()\n",
"B = Bias.copy()\n",
"\n",
"# data arrays\n",
"cost_array = []\n",
"accuracy_array = []\n",
"interation_array = []\n",
"\n",
"# rename\n",
"X_train = np.float64(training_images).copy()\n",
"Y_train = np.float64(training_labels).copy()\n",
"\n",
"X_test = np.float64(testing_images).copy()\n",
"Y_test = np.float64(testing_labels).copy()\n",
"\n",
"#m = size\n",
"m = number_of_training_images\n",
"\n",
"def model(W, B, A):\n",
" return np.dot(W, A) + B\n",
"\n",
"def activation_relu(Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" return np.where(Z > 0, Z, 0)\n",
"\n",
"def activation_tanh(Z):\n",
" return np.tanh(Z)\n",
"\n",
"def activation_sigmoid(Z):\n",
" return 1/(1 + np.exp(-Z))\n",
"\n",
"def loss(A, Y):\n",
" epsilon = 1e-20\n",
" return np.where((Y == 1), np.multiply(-Y, np.log(A + epsilon)), -np.multiply((1 - Y), np.log(1 - A + epsilon)))\n",
" #return np.multiply(-Y, np.log(A)) - np.multiply((1 - Y), np.log(1 - A)) \n",
" \n",
"def cost(L):\n",
" return np.multiply(1/L.shape[1], np.sum(L))\n",
"\n",
"def cost_L2(L, W, epsilon):\n",
" L2 = np.multiply(epsilon/(2*W.shape[1]), np.multiply(W[len(W)-3], W[len(W)-3]).sum() + np.multiply(W[len(W)-2], W[len(W)-2]).sum() + np.multiply(W[len(W)-1], W[len(W)-1]).sum())\n",
" J = cost(L)\n",
" return L2 + J\n",
"\n",
"def prediction(A):\n",
" return np.where(A >= 0.5, 1, 0)\n",
" \n",
"def accuracy(prediction, Y):\n",
" return 100 - np.multiply(100/Y.shape[0], np.sum(np.absolute(Y - prediction))) \n",
" \n",
"def forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, A, A_layers, Z_layers, layer, D, keep_prob)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" Z_layers.append(Z)\n",
" A = activation_sigmoid(Z)\n",
" if(Dropout == True):\n",
" _D = np.float64(np.where(np.random.uniform(0, 1, A.shape) < keep_prob, 1, 0))\n",
" D.append(_D)\n",
" A = np.multiply(A, _D)\n",
" A_layers.append(A)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Training Data: ' + str(layer))\n",
" print('Forward Propagation Training Data Complete')\n",
" return A_layers, Z_layers, D\n",
"\n",
"def forward_propagation(W, B, A, layer):\n",
" if(layer < len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" if(hidden_layer_relu == True):\n",
" A = activation_relu(Z)\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" A = forward_propagation(W, B, A, layer)\n",
" elif(layer == len(W) - 1):\n",
" Z = model(W[layer], B[layer], A)\n",
" A = activation_sigmoid(Z) \n",
" layer = layer + 1\n",
" if(detailed_logger == True):\n",
" print('Forward Layer Testing Data: ' + str(layer))\n",
" print('Forward Propagation Testing Data Complete')\n",
" return A\n",
"\n",
"def dZ(dZ, W, Z):\n",
" Z = np.where(~np.isnan(Z), Z, 0)\n",
" W = np.where(~np.isnan(W), W, 0)\n",
" dZ = np.where(~np.isnan(dZ), dZ, 0)\n",
" Z = np.where(~np.isinf(Z), Z, 0)\n",
" W = np.where(~np.isinf(W), W, 0)\n",
" dZ = np.where(~np.isinf(dZ), dZ, 0)\n",
" if(hidden_layer_relu == True):\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.where(Z > 0, 1, 0))\n",
" elif(hidden_layer_tanh == True):\n",
" A = activation_tanh(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), 1- np.multiply(A, A))\n",
" elif(hidden_layer_sigmoid == True): \n",
" A = activation_sigmoid(Z)\n",
" return np.multiply(np.dot(np.transpose(W), dZ), np.multiply(A, (1-A)))\n",
"\n",
"def dW(dZ, A):\n",
" return np.multiply(1/dZ.shape[1], np.dot(dZ, np.transpose(A)))\n",
"\n",
"def dW_L2(dZ, A, W, epsilon):\n",
" return np.multiply(epsilon/Z.shape[1], W) + dW(dZ, A)\n",
"\n",
"def dB(dZ):\n",
" return np.multiply(1/dZ.shape[1], np.sum(dZ))\n",
"\n",
"def backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t):\n",
" if(layer >= 0):\n",
" if(layer == len(W) - 1):\n",
" _dZ = A_layers[layer+1] - Y\n",
" elif(layer >= 0):\n",
" _dZ = dZ(_dZ, W[layer+1], Z_layers[layer])\n",
" if(Dropout == True):\n",
" _dZ = np.multiply(_dZ, D[layer])\n",
" if(L2 == True):\n",
" _dW = dW_L2(_dZ, A_layers[layer], W[layer], epsilon)\n",
" else:\n",
" _dW = dW(_dZ, A_layers[layer])\n",
" _dB = dB(_dZ)\n",
" if(adam == True):\n",
" epsilon = 1e-6\n",
"\n",
" # ADAM - RMSProp + Momentum\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(1-momentum_coef, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(1-momentum_coef, _dB)\n",
" R_dW[layer] = np.multiply(RMSProp_coef, R_dW[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dW, _dW))\n",
" R_dB[layer] = np.multiply(RMSProp_coef, R_dB[layer]) + np.multiply(1-RMSProp_coef, np.multiply(_dB, _dB))\n",
" \n",
" # index decay in bias correction\n",
" t = t + 1\n",
" \n",
" # correct bias for initial rounds\n",
" V_dW[layer] = np.multiply(V_dW[layer], 1/(1-np.power(momentum_coef, t)))\n",
" V_dB[layer] = np.multiply(V_dB[layer], 1/(1-np.power(momentum_coef, t)))\n",
" R_dW[layer] = np.multiply(R_dW[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" R_dB[layer] = np.multiply(R_dB[layer], 1/(1-np.power(RMSProp_coef, t)))\n",
" \n",
" val1 = 1/(np.sqrt(R_dW[layer])+ epsilon)\n",
" val2 = 1/(np.sqrt(R_dB[layer])+ epsilon)\n",
" \n",
" W[layer] = W[layer] - np.multiply(alpha, np.multiply(V_dW[layer], val1 ))\n",
" B[layer] = B[layer] - np.multiply(alpha, np.multiply(V_dB[layer], val2 ))\n",
" elif(momentum == True):\n",
" V_dW[layer] = np.multiply(momentum_coef, V_dW[layer]) + np.multiply(alpha, _dW)\n",
" V_dB[layer] = np.multiply(momentum_coef, V_dB[layer]) + np.multiply(alpha, _dB)\n",
" W[layer] = W[layer] - V_dW[layer]\n",
" B[layer] = B[layer] - V_dB[layer] \n",
" else:\n",
" W[layer] = W[layer] - np.multiply(alpha, _dW)\n",
" B[layer] = B[layer] - np.multiply(alpha, _dB)\n",
" if(detailed_logger == True):\n",
" print('Backward Layer: ' + str(layer))\n",
" layer = layer - 1\n",
" W, B, t = backward_propagation(W, B, Y, A_layers, Z_layers, _dZ, alpha, epsilon, layer, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" if(detailed_logger == True):\n",
" print('Backward Propagation Complete')\n",
" return W, B, t\n",
" \n",
"\n",
"def shuffle(X, Y, number_of_training_images):\n",
" random_array = np.random.permutation(np.arange(number_of_training_images))\n",
" return X[:, random_array], Y[random_array]\n",
" \n",
"start_time = time.time() \n",
"# main loop\n",
"for epoch in range(1, number_of_epochs):\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Main Loop Epoch: ' + str(epoch))\n",
" \n",
" # saftey check\n",
" if(adam == True and momentum == True):\n",
" print(\"ERROR! Please Select Either Adam OR Momentum OR Neither, Not Both.\")\n",
" break\n",
"\n",
" # saftey check\n",
" if(hidden_layer_relu + hidden_layer_tanh + hidden_layer_sigmoid != 1):\n",
" print(\"ERROR! Please Select Only 1 Hidden Layer Activation Function\")\n",
" break\n",
" \n",
" # shuffle data\n",
" X, Y = shuffle(X_train.copy(), Y_train.copy(), number_of_training_images)\n",
" number_of_batches = int(np.floor(number_of_training_images/batch_size))\n",
" split_index = number_of_batches*batch_size\n",
"\n",
" # parse into minibatches\n",
" X_minibatches = np.split(X[:, 0:split_index], number_of_batches, axis=1)\n",
" if not(split_index == number_of_training_images):\n",
" X_left_over_portion = X[:, split_index:number_of_training_images]\n",
" X_minibatches.append(X_left_over_portion)\n",
" \n",
" Y_minibatches = np.split(Y[0:split_index], number_of_batches, axis=0)\n",
" if not(split_index == number_of_training_images):\n",
" Y_left_over_portion = Y[split_index:number_of_training_images]\n",
" Y_minibatches.append(Y_left_over_portion)\n",
" \n",
" number_of_minibatches = len(Y_minibatches)\n",
" \n",
" # logger\n",
" if(main_logger == True and epoch % main_logger_output_epochs == 0):\n",
" print('Number Of Minibatches: ' + str(number_of_minibatches))\n",
"\n",
" for index in range(0, number_of_minibatches-1):\n",
" X_minibatch = X_minibatches[index]\n",
" Y_minibatch = Y_minibatches[index]\n",
"\n",
" # forward propogation training data set\n",
" A_layers, Z_layers, D = forward_propagation_return_layers(W, B, X_minibatch, [X_minibatch], [], 0, [], keep_prob)\n",
" L = loss(A_layers[len(A_layers) - 1], Y_minibatch)\n",
" if(L2 == True):\n",
" C = cost_L2(L, W, epsilon) \n",
" else:\n",
" C = cost(L) \n",
"\n",
" # backpropogation\n",
" W, B, t = backward_propagation(W, B, Y_minibatch, A_layers, Z_layers, 0, alpha, epsilon, len(W) - 1, D, V_dW, V_dB, R_dW, R_dB, t)\n",
" \n",
" if(epoch % main_logger_output_epochs == 0):\n",
" print('Cost: ' + str(C))\n",
"\n",
" # forward propogation test data set\n",
" A_test = forward_propagation(W, B, X_test, 0)\n",
"\n",
" # accuracy\n",
" _prediction = prediction(A_test) \n",
" _accuracy = accuracy(_prediction, Y_test) \n",
"\n",
" # storage for plotting\n",
" cost_array.append(C)\n",
" accuracy_array.append(_accuracy)\n",
" interation_array.append(epoch)\n",
"\n",
"\n",
"end_time = time.time()\n",
"run_time = end_time - start_time\n",
" \n",
"print('')\n",
"print('Results:')\n",
"print('')\n",
" \n",
"print('')\n",
"print('Run Time: ' + str(run_time) + ' seconds')\n",
"print('Cost: ' + str(C)) \n",
"print('Accuracy: ' + str(_accuracy) + ' %') \n",
"print('')\n",
"print('')\n",
"\n",
"\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, cost_array, 'red')\n",
"pyplot.title('Learning Curve - ' + str(len(X[0])) + ' Training Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Cost')\n",
"pyplot.show()\n",
"\n",
"# plot percent accuracy curve\n",
"pyplot.figure()\n",
"pyplot.plot(interation_array, accuracy_array, 'red')\n",
"pyplot.title('Percent Accuracy Curve - ' + str(len(X_test[0])) + ' Test Data Set (Relu Hidden Layer)')\n",
"pyplot.xlabel('Epochs')\n",
"pyplot.ylabel('Percent Accuracy')\n",
"pyplot.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated, after 500 epochs with minibatches of 50 we reached a cost of approximately 0.23; however, our test data accuracy is stuck at approximately 90%. This indicates that we overfit the training data, since no 9s were detected in the test data. This lines up with what we would intuitively expect. Because the momentum hyper-paramter was high and the RMSProp was low, the weights with consistent large gradients were focused on and not penalized as much as when the RMSProp hyper-parameter was high. This has likely caused the network to over focus on specific features from the training data, and therefore, overfit the training data and perform poorly on the test data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have illustrated how the minibatch stochastic gradient descent with ADAM provides many regularization and convergence benefits for training neural networks. When the momentum and RMSProp hyper-paramters are high, this technique provides a good form of regularization by preventing the network from getting stuck local minima while also preventing the network from over focusing on specific features of the training data. The networks also converge faster. We have explored how specific variations in the ADAM (momentum and RMSProp) hyper-paramters can alter the regularization effect of network."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment