Skip to content

Instantly share code, notes, and snippets.

@Z30G0D
Created February 17, 2018 13:15
Show Gist options
  • Save Z30G0D/735843c2db8c34f363cfeae4bf9b6ef9 to your computer and use it in GitHub Desktop.
Save Z30G0D/735843c2db8c34f363cfeae4bf9b6ef9 to your computer and use it in GitHub Desktop.
This is the second exercise from Andrew NG course in Machine learning
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 2- Andrew NG Machine Learning Course\n",
"## Hey all!\n",
"This is the second task from Andrew NG course of machine learning concerning simple **logistic** regression.\n",
"I implemented the exercise in python but the original implementation is in MATLAB.\n",
"feel free to review my work through tomer@nahshoh.net.\n",
"The task is locate <a href=\"https://github.com/jdwittenauer/ipython-notebooks/blob/master/exercises/ML/ex1.pdf\">here</a>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"path = os.getcwd() + '/\\Exercise2\\ex2\\ex2data1.txt'\n",
"data = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<bound method NDFrame.head of Exam 1 Exam 2 Admitted\n",
"0 34.623660 78.024693 0\n",
"1 30.286711 43.894998 0\n",
"2 35.847409 72.902198 0\n",
"3 60.182599 86.308552 1\n",
"4 79.032736 75.344376 1\n",
"5 45.083277 56.316372 0\n",
"6 61.106665 96.511426 1\n",
"7 75.024746 46.554014 1\n",
"8 76.098787 87.420570 1\n",
"9 84.432820 43.533393 1\n",
"10 95.861555 38.225278 0\n",
"11 75.013658 30.603263 0\n",
"12 82.307053 76.481963 1\n",
"13 69.364589 97.718692 1\n",
"14 39.538339 76.036811 0\n",
"15 53.971052 89.207350 1\n",
"16 69.070144 52.740470 1\n",
"17 67.946855 46.678574 0\n",
"18 70.661510 92.927138 1\n",
"19 76.978784 47.575964 1\n",
"20 67.372028 42.838438 0\n",
"21 89.676776 65.799366 1\n",
"22 50.534788 48.855812 0\n",
"23 34.212061 44.209529 0\n",
"24 77.924091 68.972360 1\n",
"25 62.271014 69.954458 1\n",
"26 80.190181 44.821629 1\n",
"27 93.114389 38.800670 0\n",
"28 61.830206 50.256108 0\n",
"29 38.785804 64.995681 0\n",
".. ... ... ...\n",
"70 32.722833 43.307173 0\n",
"71 64.039320 78.031688 1\n",
"72 72.346494 96.227593 1\n",
"73 60.457886 73.094998 1\n",
"74 58.840956 75.858448 1\n",
"75 99.827858 72.369252 1\n",
"76 47.264269 88.475865 1\n",
"77 50.458160 75.809860 1\n",
"78 60.455556 42.508409 0\n",
"79 82.226662 42.719879 0\n",
"80 88.913896 69.803789 1\n",
"81 94.834507 45.694307 1\n",
"82 67.319257 66.589353 1\n",
"83 57.238706 59.514282 1\n",
"84 80.366756 90.960148 1\n",
"85 68.468522 85.594307 1\n",
"86 42.075455 78.844786 0\n",
"87 75.477702 90.424539 1\n",
"88 78.635424 96.647427 1\n",
"89 52.348004 60.769505 0\n",
"90 94.094331 77.159105 1\n",
"91 90.448551 87.508792 1\n",
"92 55.482161 35.570703 0\n",
"93 74.492692 84.845137 1\n",
"94 89.845807 45.358284 1\n",
"95 83.489163 48.380286 1\n",
"96 42.261701 87.103851 1\n",
"97 99.315009 68.775409 1\n",
"98 55.340018 64.931938 1\n",
"99 74.775893 89.529813 1\n",
"\n",
"[100 rows x 3 columns]>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualize the data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x27c61c01cf8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"admitted = pd.DataFrame(data.loc[data['Admitted'].isin([1])])\n",
"not_admitted = pd.DataFrame(data.loc[data['Admitted'].isin([0])])\n",
"\n",
"fig = plt.figure()\n",
"ax1 = fig.add_subplot(111)\n",
"ax1.scatter(admitted['Exam 1'],admitted['Exam 2'], s=10, c='r', label='Admitted', marker='o')\n",
"ax1.scatter(not_admitted['Exam 1'],not_admitted['Exam 2'], s=10, c='b', label = 'Not admitted', marker='^')\n",
"plt.legend(loc='lower left');\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice, Ok now we can see our data more clearly. It is possible to have a decision boundary between the two different classes we got.Next we are asked to implement the sigmoid function (activation). Let's implement it)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(z):\n",
" \"\"\"sigmoid function\"\"\"\n",
" return 1/(1+np.exp(-z))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's a nice warmup exercise, let's continue with the cost function and gradients."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def costfunction(theta, X, y):\n",
" hypo = sigmoid(theta * X)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def costfunction1(theta, X, y):\n",
" theta = np.matrix(theta)\n",
" X = np.matrix(X)\n",
" y = np.matrix(y)\n",
" hypo = X * theta.T# Hypothesis of the linear regression-> inserting to a sigmoid turns it to logistic regression\n",
" first = np.multiply(-y, np.log(sigmoid(hypo)))\n",
" second = np.multiply((1 - y), np.log(1 - sigmoid(hypo)))\n",
" cost = np.sum(first - second) / X.shape[0]\n",
" return cost"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's edit the data, seperating between target variable and our features"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"data.insert(0, 'Bias', 1)\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"X = data.drop(['Admitted'], axis=1)\n",
"y =data.drop(['Bias', 'Exam 1', 'Exam 2'], axis=1) \n",
"theta = np.zeros(X.shape[1])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6931471805599453"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"costfunction1(theta, X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice, let's edit this and get the gradients."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def gradient(theta, X, y):\n",
" theta = np.matrix(theta)\n",
" X = np.matrix(X)\n",
" y = np.matrix(y)\n",
" #let's calculate gradients\n",
" hypo = X * theta.T\n",
" e = sigmoid(hypo) - y # calculating just one -avoiding redundancy\n",
" parameters = int(theta.ravel().shape[1])# looping through all theta\n",
" grads = np.zeros(theta.shape)\n",
" for j in range(parameters):\n",
" grads[0,j] = (np.sum(np.multiply(e, X[:,j]))) / X.shape[0]\n",
" \n",
" return grads"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ -0.1 , -12.00921659, -11.26284221]])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grads = gradient(theta, X, y)\n",
"grads"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, now we are going to use the optimization function from the scikit-learn module to minimize the cost function"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([-25.16131872, 0.20623159, 0.20147149]), 36, 0)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from scipy.optimize import minimize\n",
"import scipy.optimize as opt\n",
"#result =minimize(costfunction1, theta, args=(X, y), jac=gradient(theta, X, y))\n",
"result = opt.fmin_tnc(func=costfunction1, x0=theta, fprime=gradient, args=(X, y))\n",
"result\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.20349770158947425"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"costfunction1(result[0], X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our lowest cost (this is problem with a convex cost function so it has to be the lowest cost). Let's write the function predict."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def predict(theta, X):\n",
" predict = np.sum(sigmoid(X * theta.T))\n",
" return predict"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"prediction is: 0.776290625526598\n"
]
}
],
"source": [
"testing = np.matrix(np.array([1, 45, 85]))\n",
"prediction = predict(np.matrix(result[0]), testing)\n",
"print('prediction is:',(prediction))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, so according to the text instructions in 1.2.4 we got the right answer. Let's write the predict function with a threshold of 0.5>=."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"def predict1(theta, X):\n",
" X = np.matrix(X)\n",
" theta = np.matrix(theta)\n",
" predict = sigmoid(X * theta.T)\n",
" predict = predict >= 0.5\n",
" predict = predict.astype(int)\n",
" return predict"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"prediction = predict1(result[0], X)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, now we have our predictions, now let's compare them to the actual results we already have in the \"attended\" column in the original data."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy is: 89.0%\n"
]
}
],
"source": [
"accuracy = (np.sum(prediction == np.matrix(data['Admitted']).T)) / prediction.shape[0]\n",
"#np.matrix(data['Admitted']).T.shape, prediction.shape\n",
"print('accuracy is: ','{:.1%}'.format(accuracy))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, we got 89 percent, nice. Let's plot the decision bounday."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"final =result[0]\n",
"\n",
"plot_X = np.array([min(X.values[:, 2]-2), max(X.values[:, 2])+2])\n",
"plot_y = (-1/final[2]) * (final[1] * plot_X + final[0])"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x27c63358e80>]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x27c632c8550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure()\n",
"ax1 = fig.add_subplot(111)\n",
"ax1.scatter(admitted['Exam 1'],admitted['Exam 2'], s=10, c='r', label='Admitted', marker='o')\n",
"ax1.scatter(not_admitted['Exam 1'],not_admitted['Exam 2'], s=10, c='b', label = 'Not admitted', marker='^')\n",
"plt.legend(loc='lower left');\n",
"\n",
"plt.plot(plot_X, plot_y, 'k-', lw=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that some of the samples are behind the black boundary line, that's why we have 0.89 accuracy, Perhaps a different classifier would be better for this task (SVM with flexible boundaries etc...)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Test 1</th>\n",
" <th>Test 2</th>\n",
" <th>Accepted</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.051267</td>\n",
" <td>0.699560</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.092742</td>\n",
" <td>0.684940</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.213710</td>\n",
" <td>0.692250</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.375000</td>\n",
" <td>0.502190</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.513250</td>\n",
" <td>0.465640</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>-0.524770</td>\n",
" <td>0.209800</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>-0.398040</td>\n",
" <td>0.034357</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>-0.305880</td>\n",
" <td>-0.192250</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>0.016705</td>\n",
" <td>-0.404240</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0.131910</td>\n",
" <td>-0.513890</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>0.385370</td>\n",
" <td>-0.565060</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>0.529380</td>\n",
" <td>-0.521200</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>0.638820</td>\n",
" <td>-0.243420</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>0.736750</td>\n",
" <td>-0.184940</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>0.546660</td>\n",
" <td>0.487570</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>0.322000</td>\n",
" <td>0.582600</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>0.166470</td>\n",
" <td>0.538740</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>-0.046659</td>\n",
" <td>0.816520</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>-0.173390</td>\n",
" <td>0.699560</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>-0.478690</td>\n",
" <td>0.633770</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>-0.605410</td>\n",
" <td>0.597220</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>-0.628460</td>\n",
" <td>0.334060</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>-0.593890</td>\n",
" <td>0.005117</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>-0.421080</td>\n",
" <td>-0.272660</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>-0.115780</td>\n",
" <td>-0.396930</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>0.201040</td>\n",
" <td>-0.601610</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>0.466010</td>\n",
" <td>-0.535820</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>0.673390</td>\n",
" <td>-0.535820</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>-0.138820</td>\n",
" <td>0.546050</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>-0.294350</td>\n",
" <td>0.779970</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>-0.403800</td>\n",
" <td>0.706870</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>-0.380760</td>\n",
" <td>0.918860</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>-0.507490</td>\n",
" <td>0.904240</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>-0.547810</td>\n",
" <td>0.706870</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>0.103110</td>\n",
" <td>0.779970</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>0.057028</td>\n",
" <td>0.918860</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>-0.104260</td>\n",
" <td>0.991960</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>-0.081221</td>\n",
" <td>1.108900</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>0.287440</td>\n",
" <td>1.087000</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>0.396890</td>\n",
" <td>0.823830</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>0.638820</td>\n",
" <td>0.889620</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>0.823160</td>\n",
" <td>0.663010</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>0.673390</td>\n",
" <td>0.641080</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>1.070900</td>\n",
" <td>0.100150</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>-0.046659</td>\n",
" <td>-0.579680</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>-0.236750</td>\n",
" <td>-0.638160</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>-0.150350</td>\n",
" <td>-0.367690</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>105</th>\n",
" <td>-0.490210</td>\n",
" <td>-0.301900</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>-0.467170</td>\n",
" <td>-0.133770</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107</th>\n",
" <td>-0.288590</td>\n",
" <td>-0.060673</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>-0.611180</td>\n",
" <td>-0.067982</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>-0.663020</td>\n",
" <td>-0.214180</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>110</th>\n",
" <td>-0.599650</td>\n",
" <td>-0.418860</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>-0.726380</td>\n",
" <td>-0.082602</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112</th>\n",
" <td>-0.830070</td>\n",
" <td>0.312130</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>113</th>\n",
" <td>-0.720620</td>\n",
" <td>0.538740</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>114</th>\n",
" <td>-0.593890</td>\n",
" <td>0.494880</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115</th>\n",
" <td>-0.484450</td>\n",
" <td>0.999270</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116</th>\n",
" <td>-0.006336</td>\n",
" <td>0.999270</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>117</th>\n",
" <td>0.632650</td>\n",
" <td>-0.030612</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>118 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" Test 1 Test 2 Accepted\n",
"0 0.051267 0.699560 1\n",
"1 -0.092742 0.684940 1\n",
"2 -0.213710 0.692250 1\n",
"3 -0.375000 0.502190 1\n",
"4 -0.513250 0.465640 1\n",
"5 -0.524770 0.209800 1\n",
"6 -0.398040 0.034357 1\n",
"7 -0.305880 -0.192250 1\n",
"8 0.016705 -0.404240 1\n",
"9 0.131910 -0.513890 1\n",
"10 0.385370 -0.565060 1\n",
"11 0.529380 -0.521200 1\n",
"12 0.638820 -0.243420 1\n",
"13 0.736750 -0.184940 1\n",
"14 0.546660 0.487570 1\n",
"15 0.322000 0.582600 1\n",
"16 0.166470 0.538740 1\n",
"17 -0.046659 0.816520 1\n",
"18 -0.173390 0.699560 1\n",
"19 -0.478690 0.633770 1\n",
"20 -0.605410 0.597220 1\n",
"21 -0.628460 0.334060 1\n",
"22 -0.593890 0.005117 1\n",
"23 -0.421080 -0.272660 1\n",
"24 -0.115780 -0.396930 1\n",
"25 0.201040 -0.601610 1\n",
"26 0.466010 -0.535820 1\n",
"27 0.673390 -0.535820 1\n",
"28 -0.138820 0.546050 1\n",
"29 -0.294350 0.779970 1\n",
".. ... ... ...\n",
"88 -0.403800 0.706870 0\n",
"89 -0.380760 0.918860 0\n",
"90 -0.507490 0.904240 0\n",
"91 -0.547810 0.706870 0\n",
"92 0.103110 0.779970 0\n",
"93 0.057028 0.918860 0\n",
"94 -0.104260 0.991960 0\n",
"95 -0.081221 1.108900 0\n",
"96 0.287440 1.087000 0\n",
"97 0.396890 0.823830 0\n",
"98 0.638820 0.889620 0\n",
"99 0.823160 0.663010 0\n",
"100 0.673390 0.641080 0\n",
"101 1.070900 0.100150 0\n",
"102 -0.046659 -0.579680 0\n",
"103 -0.236750 -0.638160 0\n",
"104 -0.150350 -0.367690 0\n",
"105 -0.490210 -0.301900 0\n",
"106 -0.467170 -0.133770 0\n",
"107 -0.288590 -0.060673 0\n",
"108 -0.611180 -0.067982 0\n",
"109 -0.663020 -0.214180 0\n",
"110 -0.599650 -0.418860 0\n",
"111 -0.726380 -0.082602 0\n",
"112 -0.830070 0.312130 0\n",
"113 -0.720620 0.538740 0\n",
"114 -0.593890 0.494880 0\n",
"115 -0.484450 0.999270 0\n",
"116 -0.006336 0.999270 0\n",
"117 0.632650 -0.030612 0\n",
"\n",
"[118 rows x 3 columns]"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"path = os.getcwd() + '/\\Exercise2\\ex2\\ex2data2.txt'\n",
"data2 = pd.read_csv(path, header=None, names=['Test 1', 'Test 2', 'Accepted'])\n",
"data2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualize the data like figure 3 in the pdf file."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"passed = pd.DataFrame(data2.loc[data2['Accepted'].isin([1])])\n",
"failed = pd.DataFrame(data2.loc[data2['Accepted'].isin([0])])"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'Microchips accptence and failure dataset')"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x27c65076860>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure()\n",
"ax1 = fig.add_subplot(111)\n",
"ax1.scatter(passed['Test 1'],passed['Test 2'], s=10, c='r', label='Passed', marker='o')\n",
"ax1.scatter(failed['Test 1'],failed['Test 2'], s=10, c='b', label = 'Failed', marker='^')\n",
"plt.legend(loc='upper right');\n",
"plt.title('Microchips accptence and failure dataset')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, no linear boundary is possible in this case. Let's create polynomial features in order to fit the boundary better then the simple logistic regression which is basically a straight line."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (myenv)",
"language": "python",
"name": "myenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment