Skip to content

Instantly share code, notes, and snippets.

@Z30G0D
Created January 20, 2018 12:03
Show Gist options
  • Save Z30G0D/b19edf0152890d637635d124f8504998 to your computer and use it in GitHub Desktop.
Save Z30G0D/b19edf0152890d637635d124f8504998 to your computer and use it in GitHub Desktop.
Coursera's machine learning course - multi class logistic classification
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 3 - Multiclass classification\n",
"## Hello all!\n",
"\n",
"This will be my featured solution for the Execise number 3 in the coursera machine learning course by Andrew NG.\n",
"The PDF is located [Here](https://github.com/merwan/ml-class/blob/master/ex3.pdf).\n",
"\n",
"Let's first import the packages for this exercise."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from scipy.io import loadmat\n",
"from scipy.optimize import minimize\n",
"from PIL import Image\n",
"from IPython.display import Image\n",
"from IPython.core.display import HTML \n",
"import sys\n",
"\n",
"# for debugging - seeing entire array\n",
"#np.set_printoptions(threshold=np.inf)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use the loadmat function to load the Matlab information since all the exercise are programmed in Matlab (why?!?!) and I had to convert them to Python. In some exercises I was assisted by different piece codes off the internet , credits follow these lines."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"data = loadmat(\"Exercise3/ex3data1.mat\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"X is the samples matrix, it has 5000 samples of hand written numbers. each row is a sample picture of 20x20 pixels, so each row in X contains 400 cells. Overall 5000X400. Y is the labels matrix, so it is 5000x1 . Let's Check."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((5000L, 400L), (5000L, 1L), numpy.ndarray, numpy.ndarray)"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X=data['X']\n",
"y=data['y']\n",
"X.shape, y.shape, type(X), type(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, please notice that loadmat function will not work with python 2.7 , so try 3> please, it took me a while to figure this out.\n",
"\n",
"Let's define the sigmoid function."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(z):\n",
" \"\"\"sigmoid function\"\"\"\n",
" return 1/(1+np.exp(-z))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now for the cost function as stated in the exercise."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def cost(theta, X, y, lamb):\n",
" \"\"\"\"computing the cost function according to logistic regression including regularization term\"\"\"\n",
" # Avoiding loops , vectorized approach\n",
" X = np.matrix(X)\n",
" m = len(X)\n",
" y = np.matrix(y)\n",
" theta = np.matrix(theta)\n",
" # first term including y=1 classes\n",
" first = np.multiply(-y, np.log(sigmoid(X * theta.T)))\n",
" # second term includes y=0 classes\n",
" second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))\n",
" # reg term to avoid overfitting - excluding theta(0)\n",
" reg = (lamb / 2 * m) * np.sum(np.power(theta[:, 1:theta.shape[1]], 2))\n",
" # concluding cost\n",
" j = reg + np.sum(first - second) / m\n",
" return j\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The cost function is not very different than the one stated in other exercises. based on this formula\n",
"<br>\n"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"https://i.stack.imgur.com/XbU4S.png\"/>"
],
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Image(url= \"https://i.stack.imgur.com/XbU4S.png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's write the gradients function, remember this is the derviative"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"https://i.stack.imgur.com/pYVzl.png\"/>"
],
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Image(url= \"https://i.stack.imgur.com/pYVzl.png\")"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"def gradients(theta ,X, y, lamb):\n",
" \"\"\"Calculating gradients (derviatives) for updating the parameters\"\"\"\n",
" X = np.matrix(X)\n",
" y = np.matrix(y)\n",
" theta = np.matrix(theta)\n",
" m = len(X)\n",
" grads = np.zeros(param)\n",
" z =X *theta.T \n",
" # error vector\n",
" error = sigmoid(z) - y\n",
" # calculating first term(intercept parameter\\ bias) with *no* regularization to avoid penalizing all parameters\n",
" first_term = np.multiply(error, X[:, 0])\n",
" grads[0] = np.sum(first_term) / m \n",
" grads = ((X.T * error) / m).T + ((lamb / m) * theta)\n",
"\n",
" return grads.T\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can define a training session for every class.\n",
"eventually we will get a theta vector (weights) for every class."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"def onevsall(X, y, num_labels, lamb):\n",
"\n",
" # number of columns (features)\n",
" params = X.shape[1]\n",
" # number of rows (examples)\n",
" rows = X.shape[0]\n",
" # creating a theta matrix (rows as number of calsses , columns as number of parameters)\n",
" theta_vector = np.zeros((num_labels, params + 1))\n",
"\n",
" # according to the pdf exercise , insert the bias term (intercept)\n",
" X = np.insert(X, 0, values=np.ones(rows), axis=1)\n",
"\n",
" # For one vs all we go through each class and classify it as 1, and all other as 0. so y is a vector of\n",
" for i in range(1, num_labels + 1):\n",
" # initialize theta for minimizing function\n",
" theta = np.zeros(params + 1)\n",
" # creating a y specific for our i category\n",
" y_i = np.array([1 if label == i else 0 for label in y])\n",
" y_i = np.reshape(y_i, (rows, 1))\n",
" theta.shape,X.shape\n",
" # minimize the objective function -taken from scipy documentation\n",
" # need to debug this one\n",
" fmin = minimize(fun=cost, x0=theta, args=(X, y_i, lamb), method='TNC', jac=gradients)\n",
" theta_vector[i - 1, :] = fmin.x\n",
"\n",
" return theta_vector"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:6: RuntimeWarning: divide by zero encountered in log\n",
" \n",
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in multiply\n",
" \n",
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:7: RuntimeWarning: invalid value encountered in power\n",
" import sys\n",
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp\n",
" This is separate from the ipykernel package so we can avoid doing imports until\n",
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log\n",
" \"\"\"\n",
"C:\\Users\\zeogo\\Miniconda2\\envs\\py35\\lib\\site-packages\\ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in multiply\n",
" \"\"\"\n"
]
}
],
"source": [
"# note to self: checking sizes of matrices called by fmin\n",
"# training the algorithm, 10 classes, arbitrary regularization\n",
"all_theta = one_vs_all(X, y, 10, 1)"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([[-8.05522912e+00, 0.00000000e+00, 0.00000000e+00, ...,\n",
" 2.18619279e-02, 2.85921938e-07, 0.00000000e+00],\n",
" [-5.90990431e+00, 0.00000000e+00, 0.00000000e+00, ...,\n",
" 6.72129871e-02, -6.85937921e-03, 0.00000000e+00],\n",
" [-8.71826341e+00, 0.00000000e+00, 0.00000000e+00, ...,\n",
" -2.56532495e-04, -1.14937641e-06, 0.00000000e+00],\n",
" ...,\n",
" [-1.33464325e+01, 0.00000000e+00, 0.00000000e+00, ...,\n",
" -6.15496460e+00, 7.10885457e-01, 0.00000000e+00],\n",
" [-8.55318810e+00, 0.00000000e+00, 0.00000000e+00, ...,\n",
" -1.89349871e-01, 8.57477934e-03, 0.00000000e+00],\n",
" [-1.29493922e+01, 0.00000000e+00, 0.00000000e+00, ...,\n",
" 2.58438619e-04, 4.11482696e-05, 0.00000000e+00]]), (10L, 401L))"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_theta, all_theta.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So now we have 10 rows vector with 401 columns (1 bias term and 400 weights for each label)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"def predict_all(X, all_theta):\n",
" rows = X.shape[0]\n",
" params = X.shape[1]\n",
" num_labels = all_theta.shape[0]\n",
" \n",
" # same as before, insert ones to match the shape\n",
" X = np.insert(X, 0, values=np.ones(rows), axis=1)\n",
" \n",
" # convert to matrices\n",
" X = np.matrix(X)\n",
" all_theta = np.matrix(all_theta)\n",
" \n",
" # calculating our hypotheses,\n",
" h = sigmoid(X * all_theta.T)\n",
" \n",
" # create array of the index with the maximum probability\n",
" maximum = np.argmax(h, axis=1)\n",
" \n",
" # because our array was zero-indexed we need to add one for the true label prediction\n",
" maximum = maximum + 1\n",
" \n",
" return maximum"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('accuracy is:', 0.9748)\n"
]
}
],
"source": [
"y_pred = predict_all(data['X'], all_theta)\n",
"correct = [1 if a == b else 0 for (a, b) in zip(y_pred, data['y'])]\n",
"accuracy = (sum(map(int, correct)) / float(len(correct)))\n",
"print(\"accuracy is:\", accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"97.48% is our final accuracy, nice.\n",
"For remarks please email me at tomer@nahshon.net"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment