Created
January 20, 2018 12:03
-
-
Save Z30G0D/b19edf0152890d637635d124f8504998 to your computer and use it in GitHub Desktop.
Coursera's machine learning course - multi class logistic classification
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Exercise 3 - Multiclass classification\n", | |
"## Hello all!\n", | |
"\n", | |
"This will be my featured solution for the Execise number 3 in the coursera machine learning course by Andrew NG.\n", | |
"The PDF is located [Here](https://github.com/merwan/ml-class/blob/master/ex3.pdf).\n", | |
"\n", | |
"Let's first import the packages for this exercise." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"from scipy.io import loadmat\n", | |
"from scipy.optimize import minimize\n", | |
"from PIL import Image\n", | |
"from IPython.display import Image\n", | |
"from IPython.core.display import HTML \n", | |
"import sys\n", | |
"\n", | |
"# for debugging - seeing entire array\n", | |
"#np.set_printoptions(threshold=np.inf)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll use the loadmat function to load the Matlab information since all the exercise are programmed in Matlab (why?!?!) and I had to convert them to Python. In some exercises I was assisted by different piece codes off the internet , credits follow these lines." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data = loadmat(\"Exercise3/ex3data1.mat\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"X is the samples matrix, it has 5000 samples of hand written numbers. each row is a sample picture of 20x20 pixels, so each row in X contains 400 cells. Overall 5000X400. Y is the labels matrix, so it is 5000x1 . Let's Check." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"((5000L, 400L), (5000L, 1L), numpy.ndarray, numpy.ndarray)" | |
] | |
}, | |
"execution_count": 58, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X=data['X']\n", | |
"y=data['y']\n", | |
"X.shape, y.shape, type(X), type(y)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Great, please notice that loadmat function will not work with python 2.7 , so try 3> please, it took me a while to figure this out.\n", | |
"\n", | |
"Let's define the sigmoid function." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def sigmoid(z):\n", | |
" \"\"\"sigmoid function\"\"\"\n", | |
" return 1/(1+np.exp(-z))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"And now for the cost function as stated in the exercise." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def cost(theta, X, y, lamb):\n", | |
" \"\"\"\"computing the cost function according to logistic regression including regularization term\"\"\"\n", | |
" # Avoiding loops , vectorized approach\n", | |
" X = np.matrix(X)\n", | |
" m = len(X)\n", | |
" y = np.matrix(y)\n", | |
" theta = np.matrix(theta)\n", | |
" # first term including y=1 classes\n", | |
" first = np.multiply(-y, np.log(sigmoid(X * theta.T)))\n", | |
" # second term includes y=0 classes\n", | |
" second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))\n", | |
" # reg term to avoid overfitting - excluding theta(0)\n", | |
" reg = (lamb / 2 * m) * np.sum(np.power(theta[:, 1:theta.shape[1]], 2))\n", | |
" # concluding cost\n", | |
" j = reg + np.sum(first - second) / m\n", | |
" return j\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The cost function is not very different than the one stated in other exercises. based on this formula\n", | |
"<br>\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<img src=\"https://i.stack.imgur.com/XbU4S.png\"/>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Image(url= \"https://i.stack.imgur.com/XbU4S.png\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, let's write the gradients function, remember this is the derviative" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<img src=\"https://i.stack.imgur.com/pYVzl.png\"/>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 59, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Image(url= \"https://i.stack.imgur.com/pYVzl.png\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def gradients(theta ,X, y, lamb):\n", | |
" \"\"\"Calculating gradients (derviatives) for updating the parameters\"\"\"\n", | |
" X = np.matrix(X)\n", | |
" y = np.matrix(y)\n", | |
" theta = np.matrix(theta)\n", | |
" m = len(X)\n", | |
" grads = np.zeros(param)\n", | |
" z =X *theta.T \n", | |
" # error vector\n", | |
" error = sigmoid(z) - y\n", | |
" # calculating first term(intercept parameter\\ bias) with *no* regularization to avoid penalizing all parameters\n", | |
" first_term = np.multiply(error, X[:, 0])\n", | |
" grads[0] = np.sum(first_term) / m \n", | |
" grads = ((X.T * error) / m).T + ((lamb / m) * theta)\n", | |
"\n", | |
" return grads.T\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, we can define a training session for every class.\n", | |
"eventually we will get a theta vector (weights) for every class." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def onevsall(X, y, num_labels, lamb):\n", | |
"\n", | |
" # number of columns (features)\n", | |
" params = X.shape[1]\n", | |
" # number of rows (examples)\n", | |
" rows = X.shape[0]\n", | |
" # creating a theta matrix (rows as number of calsses , columns as number of parameters)\n", | |
" theta_vector = np.zeros((num_labels, params + 1))\n", | |
"\n", | |
" # according to the pdf exercise , insert the bias term (intercept)\n", | |
" X = np.insert(X, 0, values=np.ones(rows), axis=1)\n", | |
"\n", | |
" # For one vs all we go through each class and classify it as 1, and all other as 0. so y is a vector of\n", | |
" for i in range(1, num_labels + 1):\n", | |
" # initialize theta for minimizing function\n", | |
" theta = np.zeros(params + 1)\n", | |
" # creating a y specific for our i category\n", | |
" y_i = np.array([1 if label == i else 0 for label in y])\n", | |
" y_i = np.reshape(y_i, (rows, 1))\n", | |
" theta.shape,X.shape\n", | |
" # minimize the objective function -taken from scipy documentation\n", | |
" # need to debug this one\n", | |
" fmin = minimize(fun=cost, x0=theta, args=(X, y_i, lamb), method='TNC', jac=gradients)\n", | |
" theta_vector[i - 1, :] = fmin.x\n", | |
"\n", | |
" return theta_vector" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:6: RuntimeWarning: divide by zero encountered in log\n", | |
" \n", | |
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in multiply\n", | |
" \n", | |
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:7: RuntimeWarning: invalid value encountered in power\n", | |
" import sys\n", | |
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp\n", | |
" This is separate from the ipykernel package so we can avoid doing imports until\n", | |
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log\n", | |
" \"\"\"\n", | |
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in multiply\n", | |
" \"\"\"\n" | |
] | |
} | |
], | |
"source": [ | |
"# note to self: checking sizes of matrices called by fmin\n", | |
"# training the algorithm, 10 classes, arbitrary regularization\n", | |
"all_theta = one_vs_all(X, y, 10, 1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(array([[-8.05522912e+00, 0.00000000e+00, 0.00000000e+00, ...,\n", | |
" 2.18619279e-02, 2.85921938e-07, 0.00000000e+00],\n", | |
" [-5.90990431e+00, 0.00000000e+00, 0.00000000e+00, ...,\n", | |
" 6.72129871e-02, -6.85937921e-03, 0.00000000e+00],\n", | |
" [-8.71826341e+00, 0.00000000e+00, 0.00000000e+00, ...,\n", | |
" -2.56532495e-04, -1.14937641e-06, 0.00000000e+00],\n", | |
" ...,\n", | |
" [-1.33464325e+01, 0.00000000e+00, 0.00000000e+00, ...,\n", | |
" -6.15496460e+00, 7.10885457e-01, 0.00000000e+00],\n", | |
" [-8.55318810e+00, 0.00000000e+00, 0.00000000e+00, ...,\n", | |
" -1.89349871e-01, 8.57477934e-03, 0.00000000e+00],\n", | |
" [-1.29493922e+01, 0.00000000e+00, 0.00000000e+00, ...,\n", | |
" 2.58438619e-04, 4.11482696e-05, 0.00000000e+00]]), (10L, 401L))" | |
] | |
}, | |
"execution_count": 63, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"all_theta, all_theta.shape" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So now we have 10 rows vector with 401 columns (1 bias term and 400 weights for each label)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 60, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def predict_all(X, all_theta):\n", | |
" rows = X.shape[0]\n", | |
" params = X.shape[1]\n", | |
" num_labels = all_theta.shape[0]\n", | |
" \n", | |
" # same as before, insert ones to match the shape\n", | |
" X = np.insert(X, 0, values=np.ones(rows), axis=1)\n", | |
" \n", | |
" # convert to matrices\n", | |
" X = np.matrix(X)\n", | |
" all_theta = np.matrix(all_theta)\n", | |
" \n", | |
" # calculating our hypotheses,\n", | |
" h = sigmoid(X * all_theta.T)\n", | |
" \n", | |
" # create array of the index with the maximum probability\n", | |
" maximum = np.argmax(h, axis=1)\n", | |
" \n", | |
" # because our array was zero-indexed we need to add one for the true label prediction\n", | |
" maximum = maximum + 1\n", | |
" \n", | |
" return maximum" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"('accuracy is:', 0.9748)\n" | |
] | |
} | |
], | |
"source": [ | |
"y_pred = predict_all(data['X'], all_theta)\n", | |
"correct = [1 if a == b else 0 for (a, b) in zip(y_pred, data['y'])]\n", | |
"accuracy = (sum(map(int, correct)) / float(len(correct)))\n", | |
"print(\"accuracy is:\", accuracy)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"97.48% is our final accuracy, nice.\n", | |
"For remarks please email me at tomer@nahshon.net" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.14" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment