Skip to content

Instantly share code, notes, and snippets.

@Z30G0D
Created January 24, 2018 19:25
Show Gist options
  • Save Z30G0D/91181130c7f238ae901fa92abcd84007 to your computer and use it in GitHub Desktop.
Save Z30G0D/91181130c7f238ae901fa92abcd84007 to your computer and use it in GitHub Desktop.
Coursera's machine learning course,
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 1- Andrew NG Machine Learning Course\n",
"## Hey all!\n",
"This is the first task from Andrew NG course of machine learning concerning simple linear regression.\n",
"I implemented the exercise in python but the original implementation is in MATLAB.\n",
"feel free to review my work through tomer@nahshoh.net.\n",
"The task is locate <a href=\"https://github.com/jdwittenauer/ipython-notebooks/blob/master/exercises/ML/ex1.pdf\">here</a>"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"from scipy.io import loadmat\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from mpl_toolkits.mplot3d import axes3d\n",
"from matplotlib import cm\n",
"from pylab import *\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"df=pd.read_csv('../Exercise1/ex1/ex1data1.txt',names = [\"X\", \"Y\"])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>6.1101</td>\n",
" <td>17.59200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5.5277</td>\n",
" <td>9.13020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>8.5186</td>\n",
" <td>13.66200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>7.0032</td>\n",
" <td>11.85400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.8598</td>\n",
" <td>6.82330</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>8.3829</td>\n",
" <td>11.88600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7.4764</td>\n",
" <td>4.34830</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8.5781</td>\n",
" <td>12.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>6.4862</td>\n",
" <td>6.59870</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>5.0546</td>\n",
" <td>3.81660</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.7107</td>\n",
" <td>3.25220</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>14.1640</td>\n",
" <td>15.50500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>5.7340</td>\n",
" <td>3.15510</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>8.4084</td>\n",
" <td>7.22580</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>5.6407</td>\n",
" <td>0.71618</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>5.3794</td>\n",
" <td>3.51290</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>6.3654</td>\n",
" <td>5.30480</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>5.1301</td>\n",
" <td>0.56077</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>6.4296</td>\n",
" <td>3.65180</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>7.0708</td>\n",
" <td>5.38930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>6.1891</td>\n",
" <td>3.13860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>20.2700</td>\n",
" <td>21.76700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>5.4901</td>\n",
" <td>4.26300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>6.3261</td>\n",
" <td>5.18750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>5.5649</td>\n",
" <td>3.08250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>18.9450</td>\n",
" <td>22.63800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>12.8280</td>\n",
" <td>13.50100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>10.9570</td>\n",
" <td>7.04670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>13.1760</td>\n",
" <td>14.69200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>22.2030</td>\n",
" <td>24.14700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>10.2360</td>\n",
" <td>7.77540</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68</th>\n",
" <td>5.4994</td>\n",
" <td>1.01730</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>20.3410</td>\n",
" <td>20.99200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>10.1360</td>\n",
" <td>6.67990</td>\n",
" </tr>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>7.3345</td>\n",
" <td>4.02590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72</th>\n",
" <td>6.0062</td>\n",
" <td>1.27840</td>\n",
" </tr>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>7.2259</td>\n",
" <td>3.34110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>5.0269</td>\n",
" <td>-2.68070</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75</th>\n",
" <td>6.5479</td>\n",
" <td>0.29678</td>\n",
" </tr>\n",
" <tr>\n",
" <th>76</th>\n",
" <td>7.5386</td>\n",
" <td>3.88450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>77</th>\n",
" <td>5.0365</td>\n",
" <td>5.70140</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78</th>\n",
" <td>10.2740</td>\n",
" <td>6.75260</td>\n",
" </tr>\n",
" <tr>\n",
" <th>79</th>\n",
" <td>5.1077</td>\n",
" <td>2.05760</td>\n",
" </tr>\n",
" <tr>\n",
" <th>80</th>\n",
" <td>5.7292</td>\n",
" <td>0.47953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>5.1884</td>\n",
" <td>0.20421</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82</th>\n",
" <td>6.3557</td>\n",
" <td>0.67861</td>\n",
" </tr>\n",
" <tr>\n",
" <th>83</th>\n",
" <td>9.7687</td>\n",
" <td>7.54350</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>6.5159</td>\n",
" <td>5.34360</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>8.5172</td>\n",
" <td>4.24150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>9.1802</td>\n",
" <td>6.79810</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>6.0020</td>\n",
" <td>0.92695</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>5.5204</td>\n",
" <td>0.15200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>5.0594</td>\n",
" <td>2.82140</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>5.7077</td>\n",
" <td>1.84510</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>7.6366</td>\n",
" <td>4.29590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>5.8707</td>\n",
" <td>7.20290</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>5.3054</td>\n",
" <td>1.98690</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>8.2934</td>\n",
" <td>0.14454</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>13.3940</td>\n",
" <td>9.05510</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>5.4369</td>\n",
" <td>0.61705</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>97 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" X Y\n",
"0 6.1101 17.59200\n",
"1 5.5277 9.13020\n",
"2 8.5186 13.66200\n",
"3 7.0032 11.85400\n",
"4 5.8598 6.82330\n",
"5 8.3829 11.88600\n",
"6 7.4764 4.34830\n",
"7 8.5781 12.00000\n",
"8 6.4862 6.59870\n",
"9 5.0546 3.81660\n",
"10 5.7107 3.25220\n",
"11 14.1640 15.50500\n",
"12 5.7340 3.15510\n",
"13 8.4084 7.22580\n",
"14 5.6407 0.71618\n",
"15 5.3794 3.51290\n",
"16 6.3654 5.30480\n",
"17 5.1301 0.56077\n",
"18 6.4296 3.65180\n",
"19 7.0708 5.38930\n",
"20 6.1891 3.13860\n",
"21 20.2700 21.76700\n",
"22 5.4901 4.26300\n",
"23 6.3261 5.18750\n",
"24 5.5649 3.08250\n",
"25 18.9450 22.63800\n",
"26 12.8280 13.50100\n",
"27 10.9570 7.04670\n",
"28 13.1760 14.69200\n",
"29 22.2030 24.14700\n",
".. ... ...\n",
"67 10.2360 7.77540\n",
"68 5.4994 1.01730\n",
"69 20.3410 20.99200\n",
"70 10.1360 6.67990\n",
"71 7.3345 4.02590\n",
"72 6.0062 1.27840\n",
"73 7.2259 3.34110\n",
"74 5.0269 -2.68070\n",
"75 6.5479 0.29678\n",
"76 7.5386 3.88450\n",
"77 5.0365 5.70140\n",
"78 10.2740 6.75260\n",
"79 5.1077 2.05760\n",
"80 5.7292 0.47953\n",
"81 5.1884 0.20421\n",
"82 6.3557 0.67861\n",
"83 9.7687 7.54350\n",
"84 6.5159 5.34360\n",
"85 8.5172 4.24150\n",
"86 9.1802 6.79810\n",
"87 6.0020 0.92695\n",
"88 5.5204 0.15200\n",
"89 5.0594 2.82140\n",
"90 5.7077 1.84510\n",
"91 7.6366 4.29590\n",
"92 5.8707 7.20290\n",
"93 5.3054 1.98690\n",
"94 8.2934 0.14454\n",
"95 13.3940 9.05510\n",
"96 5.4369 0.61705\n",
"\n",
"[97 rows x 2 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, we loaded the data. X will represent the population of the city, and Y will represent the profits we've made. Let's plot the data to see what we're talking about here."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0,0.5,'Profit in 10,000$')"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAAIABJREFUeJzt3XmUXWWV9/Hvr4ZUMGEISUCSgKhB7UBD1Aj4BpXBgUFBREUEHBukW1bLa7sAB+a3VXDqbrW1EWhAUVAQiIC2NKAILUjAJJKAEmkwAwQIBJJIKqmq/f5xzk1ubu5YueeOv89aterWGXfdunX2Oc95nn0UEZiZWffqaXYAZmbWXE4EZmZdzonAzKzLORGYmXU5JwIzsy7nRGBm1uWcCGzUJH1O0iXNjmNrSDpQ0tKtWP+7ks6qZ0ztQon/lPScpN9JepOkPzY7LqudE4GVJemDkuZKWiPpCUk/l3QAQER8MSL+Ll1ud0khqa+5EWdH0kck3ZU/LSJOiYgLmhVTkx0AvA2YFhH7RsRvIuLVuZmSHpP01uaFZ9VyIrCSJH0a+Bfgi8DOwG7AvwNHNTMuy156tl/p+PAy4LGIWNuImCxDEeEvf23xBWwPrAHeV2aZc4EfpK//AkS6zhrgLcCzwN/mLb8T8CIwuci2PgLcDXwTeB54GDgkb/4UYE66zcXASQVxXAtcA6wGHgD2yZsfwPS8ny8H/l/6+kBgad68M4E/p9tZBBydTv8bYB0wnP5+qwq3lf58Uhrfs2m8UwriOAV4BHgO+DagIu/FlPR92jFv2muBZ4B+YDrw6/R9ega4psq/aaX3+FfAP6fLvJjup+j7Dny84P04L/+9BL4PjKTbWQOc3uzPtL9Kf/mKwEp5IzAWuL7K5d+cft8hIsZHxK+Bq4ET8pY5DvjviHi6xDb2Ax4FJgHnAD+VtGM670fAUpID03uBL0o6JG/do4CfADsCPwRukNRfZez5/gy8iSQRngf8QNIuEfEQyUH8t+nvt0PhipIOBr4EvB/YBXic5D3I907gDcA+6XLvKNxORCwHfgsckzf5g8C1EbEBuAD4JTABmEZyYK9WufcY4ETgZGDbNP6i73tEXMrm78c5Bb/DiSQnB+9K519UQ4zWYE4EVspE4JmIGNqKbVwBfDCvieFEkjPFUp4C/iUiNkTENcAfgSMk7UrSHn1GRKyLiHnAJen2cu6PiNyB8uskSWz/WgOOiJ9ExPKIGEljeATYt8rVjwcui4gHImIQ+CzwRkm75y3z5YhYFRF/Ae4AZpbY1g9JEieSBHwgnQawgaRZZkr6ftxVfBNFFX2P8+ZfHhEL07/7S6n8vlsHcCKwUlYCk7bm5m9E3AusBd4i6TUkTQ1zyqyyLCLyqyA+TnImOgV4NiJWF8ybmvfzkrz9jrDpLLYmkj4kaZ6kVZJWAXuRnD1XY0oaVy6ONSTvY36cT+a9/iswvsS2riVJIlNIrrYC+E0673RAwO8kLZT0sSrjg9Lvcc6SvNfVvO/WAZwIrJTfkrQBv7vK5UuVsb2CpHnoRJKmjXVltjE1PfvN2Q1Ynn7tKGnbgnnL8n7eNfcivQKZlq4HyQH3JXnLvrTYziW9DPgecCowMW3+eZDkoAulf8ec5SRn6rntjSO5slpWco0SImIVSfPP+0mahX6UO4BHxJMRcVJETAE+Afy7pOlVbrrUe7xx1wW/T6X3veyvUeVy1mROBFZURDwPnA18W9K7Jb1EUr+kwyQVa+99muTm4CsKpn8fOJokGVxZYbc7Af+Y7ud9JDdob4mIJcD/AF+SNFbS3iQ3K6/KW/f1kt6TXsGcBgwC96Tz5pE0UfVKOpTkRnYx40gOXk8DSPooyRVBzgpgmqQxJdb/IfBRSTMlDZD0tro3Ih6r8HuX8kPgQyT3CnLNQkh6n6Rp6Y/PpTEPV7nNou9xsQWrfN/LWcGWnwdrQU4EVlJEfB34NPAFkoPjEpKz5RuKLPtX0h4nabPK/un0pSS9ePKbNkq5F9iDpCfMPwPvjYiV6bzjgN1JzlKvB86JiFvz1r0ROJbkwHgi8J70fgHAp4B3AatI2vG3iD+NdRHwNZKroRXA35L0oMm5HVgIPCnpmSLr3wacBVwHPAG8kqRtf7TmkLwfKyJift70NwD3SlqTLvOpiPhfgLSp6Pgy2yz3HhdT6X0v50vAF9LPw2eqXMeaQJs3F5rVn6TLgOUR8YUyy3wE+LuIOGAU2z+XpHvoCZWW7WZb8x5bZ+vYUaDWGtIeM+8h6QdvZi3ITUOWGUkXkNxs/Uqu6cLMWo+bhszMupyvCMzMulxb3COYNGlS7L777s0Ow8ysrdx///3PRMTkSstllgjSsgBXkgzeGQEujoh/TXt4nETaVxv4XEQU7cecs/vuuzN37tysQjUz60iSHq+8VLZXBEPAP0XEA+nIxPsl5foffyMivprhvs3MrEqZJYKIeIJkUA0RsVrSQ7hGiZlZy2nIzeK0L/lrSUY1ApwqaYGkyyRNKLHOyemTseY+/XSpqsVmZra1Mk8EksaTDLk/LSJeAL5DMvR+JskVw9eKrRcRF0fErIiYNXlyxXsdZmY2SpkmgvTBINcBV0XETwEiYkVEDKelgr9H9bXezcwsA5klgrTU7aXAQ2nxstz0XfIWO5pk5KmZmeVZuWaQ+UtWsXLNYOb7yrLX0GySKpB/kDQvnfY54DhJM0mqUT5GUk/dzMxSN85bxhnXLaC/p4cNIyNcdMzeHDkzu742WfYauotND/TIV3bMgJlZN1u5ZpAzrlvAug0jrGMEgNOvW8Ds6ZOYOH4gk326xISZWQtZ+tyL9Pdsfmju7+lh6XMvZrZPJwIzsxYybcI2bBgZ2WzahpERpk3YJrN9OhGYmbWQieMHuOiYvRnb38O2A32M7e/homP2zqxZCNqk6JyZWTc5cuZUZk+fxNLnXmTahG0yTQLgRGBm1pImjh/IPAHkuGnIzKzLORGYmXU5JwIzsy7nRGBmbaeR5Re6gW8Wm1lbaXT5hW7gKwIzaxv55RdWDw6xbsMIp1+3wFcGW8mJwMzaRjPKL3QDJwIzaxvNKL/QDZwIzKxtNKP8QjfwzWIzayuNLr/QDZwIzKztZFl+YeWawa5LMk4EZmapbu2a6nsEZmZ0d9dUJwIzM7q7a6oTgZkZ3d011YnAzIzu7prqm8VmZqlc19SFy18Agj2nbN/skBrCicDMLM9di5/pup5DbhoyM0t1a88hJwIzs1S39hxyIjAzS3VrzyEnAjOzVLf2HPLNYjOzPN1Y1C6zKwJJu0q6Q9JDkhZK+lQ6fUdJt0p6JP0+IasYzMxGY+L4AfbZdYeuSAKQbdPQEPBPEfE3wP7AJyXNAM4EbouIPYDb0p/NzKxJMksEEfFERDyQvl4NPARMBY4CrkgXuwJ4d1YxmJlZZQ25WSxpd+C1wL3AzhHxBCTJAtipxDonS5orae7TTz/diDDNzLpS5olA0njgOuC0iHih2vUi4uKImBURsyZPnpxdgGZmXS7TRCCpnyQJXBURP00nr5C0Szp/F+CpLGMwM7Pysuw1JOBS4KGI+HrerDnAh9PXHwZuzCoGMzOrLMtxBLOBE4E/SJqXTvsc8GXgx5I+DvwFeF+GMZiZWQWZJYKIuAtQidmHZLVfMzOrjUtMmJl1OScCM7Mu50RgZtblnAjMzLqcE4GZWZdzIjAz63JOBGZmGVi5ZpD5S1a1xfOO/WAaM7M6u3HeMs64bgH9PT1sGBnhomP25siZU5sdVkm+IjAzq6OVawY547oFrNswwurBIdZtGOH06xa09JWBE0GV2ukyz8yaZ+lzL9Lfs/mhtb+nh6XPvdikiCpz01AV2u0yz8yaZ9qEbdgwMrLZtA0jI0ybsE2TIqrMVwQVtONlnpk1z8TxA1x0zN6M7e9h24E+xvb3cNExe7f08499RVBB7jJvHZsyfO4yr5X/sGbWPEfOnMrs6ZNY+tyLTJuwTcsfK5wIKmjHyzyzTrdyzWDLH2Qnjh9o2dgKORFUkLvMO73gHkG7/IHNOo3v2dWfE0EV2u0yz6xT5d+zyzXXnn7dAmZPn+T/y63gRFCldrrMM+tUvmeXDfcaMrO24Xt22XAiMLO20Y5dM9uBm4bMrK34nl39ORGYWdvxPbv6ctOQmVmXqzkRSBojaVwWwZiZWeNVTASSPiXp1enrtwBLgEck/UPWwZmZWfaquSL4MLA4ff0F4EhgD+DvswrKrJO4hLm1urI3iyWdA0wBPi9pDDATeAdwKLCtpLOBX0XEnZlHataGXA7B2kHZRBAR50naL11uR+C6iDhfUg9waESc34ggzdqRyyFYu6imaejjwACwEjg9nfYq4JKsgjLrBO34pCrrThXHEUTEE8AZBdMeBh4ut56ky4B3Ak9FxF7ptHOBk4Cn08U+FxG31B62WetzOQRrF2WvCCT1SfqEpJ9LWiBpfvr6FEn9FbZ9Ocm9hELfiIiZ6ZeTgHUsl0OwdlHpiuD7wCrgPGBpOm0aSU+iHwDHlloxIu6UtPvWh2jWPFv7ABSXQ7B2UCkRvC4iXl0wbSlwj6Q/jXKfp0r6EDAX+KeIeK7YQpJOBk4G2G233Ua5K7PRq1ePH5dDsFZX6Wbxc5Lel/YSAkBSj6RjgaIH8Aq+A7ySpBvqE8DXSi0YERdHxKyImDV58uRR7Mps9PJ7/KweHGLdhhFOv26BxwJYR6qUCD4AvBdYIelPkh4BVgDvSefVJCJWRMRwRIwA3wP2rXUbZo3gHj/WTSqNI3iM9D6ApImAIuKZ0e5M0i5pLySAo4EHR7stsyy5x491k4rdRyW9BjgKmAqEpOXAjWkX0nLr/Qg4EJgkaSlwDnCgpJlAAI8Bn9iq6M0ykuvxc3rBPQK39VsnUkSUnimdARwHXM3mvYY+AFwdEV/OPEJg1qxZMXfu3EbsymwzW9tryKyZJN0fEbMqLVfpiuDjwJ4RsaFg418HFgINSQRmzeIeP9YNKt0sHiEpOldol3SetSBXuzSzWlS6IjgNuC3tLbQknbYbMB04NcvAbHRc7bJ2bv6xblep19AvJL2KpJvnVEAk9wrui4jhBsRnNXC1y9o5cZpVV3008r6G0+9uFmpB7vteGw8aM0tUejDN24F/Bx4BlqWTpwHTJf1DRPwy4/isBu77Xptc4lyXd16TS5y+grJuUukewb8Cb00Hlm0k6eXALcDfZBSXjYL7vtfGidMsUSkR9LFp/EC+ZUClMtTWBK52WT0nTrNEpURwGXCfpKvZ1GtoV5IBZZdmGZiNnvu+V8+J06xyr6EvSbqBpMTEG9nUa+j4iFjUgPjMMufEad2umkdVPgQ81IBYzMysCarpProFSVdI+o6kveodkJmZNdaoEgHwLeC/gRPrGIs1kctStBb/PayRKjYNFRMR9wH3AdfVNxyrl1rKJnh0bWvx38MardKAsu2BzwLvBnLPi3wKuBH4ckSsyja8ztGIeja5fTy47HkuuHlRVQcSl6VoLf57WDNUuiL4MXA7cGBEPAkg6aXAR4CfAG/LNLoO0YgzvNw+eiXWrk/KQFVzIPHo2tbiv4c1Q6V7BLtHxIW5JAAQEU+mD6TZLdvQOkMj6tnk7yOXBPKVqzfk0bWtxX8Pa4ZKieBxSadL2jk3QdLO6ZPLlpRZz1KNKARXbB/5yh1IcqNrx/b3sO1AH2P7ezy6ton897BmqNQ0dCxwJvBrSTul01YAc4D3ZxlYp2jEGV6xfQCMG+hleCQqHkg8ura1+O9hjVb2mcWtot2fWTxn3rIt6tnU+x5B4T7OOmIGe03d3gcSsy5W7TOLR50IJH00Iv5zVCvXqN0TATS215AP/mYG9Xt4fTnnAQ1JBJ2gEfVsXDPHzEaj0jiCBaVmATuXmGdmZm2k0hXBzsA7gOcKpgv4n0wiMjOzhqqUCG4CxkfEvMIZkn6VSURmZtZQlZ5H8PEy8z5Y/3DMzKzRRlt91MzMOkRmiUDSZZKekvRg3rQdJd0q6ZH0+4Ss9m9WC5d9tm6W5RXB5cChBdPOBG6LiD2A29KfzZrqxnnLmH3h7Zxwyb3MvvB25sxb1uyQzBoqs0QQEXcCzxZMPgq4In19BUl5a7OmaURRQLNWV1UikPSetDnneUkvSFot6YVR7G/niHgCIP2+U6kFJZ0saa6kuU8//fQodmVWWSOKApq1umqvCC4CjoyI7SNiu4jYNiK2yzKwiLg4ImZFxKzJkydXXsFsFFz22az6RLAiIh6qw/5WSNoFIP3+VB22aTZqLvtsVn2tobmSrgFuADY2nkbET2vc3xzgw8CX0+831ri+Wd257LN1u2oTwXbAX4G3500LoGQikPQj4EBgkqSlwDkkCeDHkj4O/AV43yhiritX7DSormCfPyvWqapKBBHx0Vo3HBHHlZh1SK3bykojniVsncGfFetklaqPnh4RF0n6JskVwGYi4h8ziyxj+d0Gq3nIu3Uvf1as01W6IsjdIG7vp8IUkes2mPvHhk3dBv3Pbfn8WbFOV6no3M/S71eUW64dudugVcufFet0XVt0zt0GrVr+rFin6/qH17sniFXLnxVrN3V9ZrGk2RFxd6Vp7cjP+bVq+bNinarapqFvVjnNLHMuGW1WX5W6j74R+D/AZEmfzpu1HdCbZWBmxbg/v1n9VboiGAOMJ0kY2+Z9vQC8N9vQrB1lebbuktFm2ajUffTXwK8lXR4RjzcoJmtTWZ+tuz+/WTYqNQ39S0ScBnxLUrGRxUdmFpm1nHK9Zhox+tb9+c2yUanX0JXp969mHYi1tkpn+404W8/15z+9IA5fDZhtnUqJ4CskReIOj4gzGhBPy2jHPuNZxVzN2X6jztZdMtqs/iolgl0kvQU4UtLVgPJnRsQDmUXWRO3YMyXLmKs52x/t2fpokpf785vVV6VEcDZwJjAN+HrBvAAOziKoZmrHSpNZx1zt2X6tZ+vtmHDNOlHZ7qMRcW1EHAZcFBEHFXx1XBKA5j/MfDTdL7c25kr7zKLWjruCmrWOah9Mc4GkI4E3p5N+FRE3ZRdW80ybsA3rhoY3m7ZuaLghPVNKnSFXaj7Zmvb5as/KK53tr1wzyFX3/oVv37GYMb2Vz/DdFdSsdVRba+hLwL7AVemkT6W1hj6bWWRNVFiIr1JhvnrcpC3VvLN63RAX3Lyo5IE6t++z3jmDC25aVHP7fC1NSqXa5m+ct4zTr53P4FDyPg0OVd6Wu4KatY5qn1l8BDAzIkYAJF0B/B7ouESw9LkX2aa/j9WDQxunbdPfV/JMtV7t3MXOkHslzrtpEeuHih+oC/d91hEz2Gvq9lUnpHqcleeSSS4J5Cu3LXcFNWsd1SYCgB2AZ9PX22cQS0uo5Uy1njdpi+53eIQxfT2s35STNmv7L9z3BTcv4u4zDq563/U4Ky+WTKrdlruCmrWGaquPfgn4vaTL06uB+4EvZhdW89RyY7SeN5aL7fecd+3J0MjmZ9q5g2s99l2Pm8DFkgnAQF9125o4foB9dt3BScCsiSpeEUgScBewP/AGkrEEZ0TEkxnH1jTVnqkWOwgODg0zbszoCrMW2++2Y/tKNp/Uo419a8/KC5t41g8Pc+pBe/DB/Xbzwd2sTVT1hLL0KTevb0A8RWX5hLKtNWfeMk6/bgExEgwOB2P7k7P0evaJL3UzOrfvVuiH344jsc06XbVPKKs2EXwbuDwi7qtHcLUabSJo1MFp8YrVHP7Nu1g/tOkMfWx/T03t9aO1cs0gC5e/AAR7TtneB2Ez26iuj6oEDgJOkfQYsJakeSgiYu/Rh5itRo5aXbt+mIHens0SQaP6xN+1+BmPzjWzrVJtIjgs0yjqrFFlInJXHOPG9FbVXl/vK5Qsf89Wb+pp9fjM2kml5xGMBU4BpgN/AC6NiKFy67SCRoxazQ2i6lUPwzHCUTOncsO85fT3iuGR2KLHTBZXKFn9nq1eA6jV4zNrN5W6j14BzCJJAocBX8s8ojoo1ptn/XD9ykSsXDPIp6+Zx+BQ8NcNwwwOBT+eu5T+HtgwlAzsKhz9m0VdnVrHAVRTx6jVawC1enxm7ahSIpgRESdExH+QPKP4TfXYqaTHJP1B0jxJde8OlOvS2Jf3240E3L34mbps/3u/eZThIvfY164fYf1wcMHNizY7MNXa57/awnO1jAO4cd4yZl94Oydcci+zL7ydOfOWFd1ms4vuVdLq8Zm1o0r3CDbkXkTEUDKkoG4Oioj6HJmLmD19Er09PQylZ8wbhqMu7ecr1wxy6V2Pll2msHmm1BXK8y+uZ+Wawa1qQqpmHEAt9xJavQZQq8dn1o4qXRHsI+mF9Gs1sHfutaQXGhHgaC197kXG9Nb/zDHZbvkBY4UHpsIz976e5Arlk1f9frOz89E2e1QanVvLWXQWJafrqdXjM2tHZa8IImJ0Q2QrC+CXkgL4j4i4uHABSScDJwPstttuNe9gNO3n1fRCmTZhG4aLjL3o7xVj+3pLFk/LnbkvXP4CJ105l8GhETYMJ/fdc2fnWd38rfW9aPUaQK0en1m7qaXoXD3NjojlknYCbpX0cETcmb9AmhwuhmRAWa07qKW6ZS3NMfnb7ZXYMDzCOe/ak0P3emnFA9PE8QNsv00/Y3p7NpZqhk0H+0oH7NF2mRxNpc9Wfxxkq8dn1k6akggiYnn6/SlJ15M86+DO8mvVrt7t55W2W82BqdLB/pMHTuebtz9CX0/SLTV3wN7aLpM+izazUhqeCCSNA3oiYnX6+u3A+Vntr9yZ48o1g9zx8FP09Wx+E7ya5pjRnpGWOjv/xYNPct5Ni1AE64cDCHI352tNVqWuHHwWbWbFNOOKYGfg+vQg1wf8MCJ+0eggcmfYvRJr12/+aMqse6Hkn52PG9PLtQ8s5bu/3rwn0vrhTQf8i098fdGH1tzx8FMc9JqdMh+4ZmadreGJICIeBfZp9H7z5Z9h5xs30Ft0VPDW7qvU2fldi5/Z7BGPxSS9fbRFc9La9cOc+7OFfOHGBzd7tvHWlJxw2Qaz7tSsm8UNU+zgVqx3zrgxvZz3rj23OMPeGuXOzss94jHfhpER9pyy3WY3qHNXMGsGk+/16HXkKwmz7tXRiaDUwa3YDdvhiFElgVJn0ZXOzss94jGnv1cbr05yzUl3PPwU5/5s4cYkANX3Oir3OzSiSJ+ZtaZqH1XZdsoNzpo4foCz3jmDMX09jBvoHfWgpHJlGyoN4ir1iMd8PUpGSOdMHD/AQa/ZqeTjK0c72MplG8y6W8deEZRrJrlr8TNccNMi+nvEhqFkHMDs6ZOYv2RV1e3jlc6iK52dF/YeWjc0hOjZeJMYYExv7xbNOpXGBIymm6jLNph1t45NBKUObuPG9G5xo/icOQ9y/k2L6OvZNEDs+P1fVnb7ldrjKx2wV64Z5GUTx3HTqQewdn3ynON3fusuyOvAVOpgXOlgX2s30dEMODOzztGxiaDUwW3t+uEtDuBDIzA0MkKuos/nb3iQteuHOPnNr9xiu7U8jKbUAbvUvYtaDsb1HhPgAWdm3atjEwEUP7itXDNYsW0e4Iu3PMy4gT6O32/TlUHhAfz9s6alzyEofeAuPGCXa1Jq9sHYA87MulNHJ4JiCmsFFQ4my3fezxZx6J4v3ZhACg/gP567dGPTTrUH7mqalHwwNrNG6uhEUKoJplxXzHz9vdp4gC51AF+7fph9dt2h6ph8Y9bMWk1Xdh+F0l0x8w2PxMYDdL0O4K6nb2atpmOvCKoZZVt4Q/mv64eQxJg+MTQcnHXEjKLL5spP58+vRbPvBZiZ5evYRFDtGXzhQTlXBXRMXw8X3LyIbcf2bSy1cOTMqaxeN1Ryfi18L8DMWkXHNg3V0gSTe9QjwAU3L2L90AhrBoe3aE5auWaw7Hwzs3bUsVcEsPnjISHYc8r2ZStsVmpOyupRkmZmzdTRiQDgrsXPbOw59OKG5B5A/rOF85t1ijUnDQ4lo35LzXePHzNrdx3bNARb9hwaGoENw1G0F1HOJw+czphe0Z8+taynR7zzW3cxZ94y9/gxs47U0VcElUo9x0hsbNbJjTmIkdyjIhO5mkStMvrXzKzeOjoRVCr1PDgcjBvTW/KJZfk8+tfMOlVHNw3lN+W8pL93i/lj+5ORwcXq8RcqvBewcs0g85esco8hM2t7HX1FAJv3HDrpyrkMDm1+1p87uJe7chjTKz554PSNP/uxjmbWSTr6iiBn4vgB3vyqyXzlvcVv9OZfOQz0JjeJx/b3MNAnDt/rpUji4jsfZfaFt3PVPY+XLV1Ria8kzKzVdPwVQb5iN3oXr1jNvCWrmLnrDtx9xsEbnzWQ/7CYwaGRjVcS5/1sIWP6ij/W0Q+IN7N21FWJADYv7XD2DX/gynv+snHeh964G+cf9bcbf56/ZNUW60ts1qsI/IB4M2tvXdE0VMziFas3SwIAV/72L8z935Ubfx43pneLnkSDQ8Fn3vYqxvb3MG5ML2N6VVXxOT8g3sxaVVckgmLt8nctfrrossddci9z5i0DYO364Y33DHIGesV+r5jIWUfMYMNIbCw+l1unFI9KNrNW1fFNQ8Xa5QM4/2cPFV1+w3DwmZ/MZ8Yu2zFtwjaoR5DXFKQeMW5M78bic+uHkumVmnn8gHgza1WKKP1gllYxa9asmDt3bs3rrVwzyOwLb9+seWegT0Rs2c5faExfD199794AWxy8XzZxHCdcci+rB4c2Lj9uTC/nHbknB71mp7IH93JF78zM6knS/RExq+JynZwI5i9ZtcUBe2xfDwFbjCcoZmx/D3efcTDAZgfvYgkGYPxAL0Mj0fDeQE4uZlZMtYmgKfcIJB0q6Y+SFks6M6v9FGuXXzc0wnCZwWP58ruF7rPrDls8rSx3wzinGc8ouHHeMmZfeDsnXHIvsy+8veK9CjOzQg1PBJJ6gW8DhwEzgOMkzchiXxPHD3DWEcU3Xc0vXu5m7pEzp3L3GQdz3pF7Mn5g8/IVjeoNVOm5zGZm1WjGFcG+wOKIeDQi1gNXA0dltbO9pm6/xYF6m/4+zn7XDLYpqD800CvG9FVfYnri+AEOes1ODI3UPq6gHtwl1czqoRm9hqYCS/J+XgrsV7iQpJOBkwF22223Ue9s2oRtih6oD5g+iS/z8Ob77BE3n3oAa9fAug6AAAALI0lEQVQPV93e3szeQO6Samb10IxEoCLTtrhjHREXAxdDcrN4tDsrdaCevvO2JafXqlnPKHCXVDOrh2YkgqXArnk/TwOWZ7nDUgfqeh7Am/WMAj8ox8y2VjMSwX3AHpJeDiwDPgB8sAlxAM07gNdTJ/wOZtY8DU8EETEk6VTgv4Be4LKIWJjlPl3108ystKaUmIiIW4BbGrEvV/00Myuv44vOFeti2Su5i6WZWarjE0GxLpZr1w/z4PLnmxSRmVlr6fhEUGp08QU3LdrqEbh+7KSZdYKOL0MNm0YXrxkc3jit2sdLluIb0GbWKTr+igBKjy4e7Qhc1/gxs07SFYkgv1potXWEynGNHzPrJF3RNAT1HYHrGj9m1km64oogp/C5AluznXpeYZiZNVPXXBHUm2v8mFmncCLYCq7xY2adoKuahszMbEsdnQg84MvMrLKObRrygC8zs+p05BWBB3yZmVWvIxOBB3yZmVWvIxOBB3yZmVWvIxOBB3yZmVWvY28We8CXmVl1OjYRgAd8mZlVoyObhszMrHpOBGZmXc6JwMysyzkRmJl1OScCM7Mup4iovFSTSXoaeHyUq08CnqljOFlzvNlrt5gdb7baLV6oPuaXRcTkSgu1RSLYGpLmRsSsZsdRLcebvXaL2fFmq93ihfrH7KYhM7Mu50RgZtbluiERXNzsAGrkeLPXbjE73my1W7xQ55g7/h6BmZmV1w1XBGZmVoYTgZlZl+uYRCDpMUl/kDRP0twi8yXp3yQtlrRA0uuaEWcay6vTOHNfL0g6rWCZAyU9n7fM2Q2O8TJJT0l6MG/ajpJulfRI+n1CiXU/nC7ziKQPNznmr0h6OP2bXy9phxLrlv38NDDecyUty/u7H15i3UMl/TH9PJ/ZxHivyYv1MUnzSqzbjPd3V0l3SHpI0kJJn0qnt+TnuEy82X+GI6IjvoDHgEll5h8O/BwQsD9wb7NjTuPqBZ4kGfiRP/1A4KYmxvVm4HXAg3nTLgLOTF+fCVxYZL0dgUfT7xPS1xOaGPPbgb709YXFYq7m89PAeM8FPlPFZ+bPwCuAMcB8YEYz4i2Y/zXg7BZ6f3cBXpe+3hb4EzCjVT/HZeLN/DPcMVcEVTgKuDIS9wA7SNql2UEBhwB/jojRjpzORETcCTxbMPko4Ir09RXAu4us+g7g1oh4NiKeA24FDs0s0DzFYo6IX0bEUPrjPcC0RsRSjRLvcTX2BRZHxKMRsR64muRvk6ly8UoS8H7gR1nHUa2IeCIiHkhfrwYeAqbSop/jUvE24jPcSYkggF9Kul/SyUXmTwWW5P28NJ3WbB+g9D/PGyXNl/RzSXs2MqgSdo6IJyD50AI7FVmmVd9ngI+RXBUWU+nz00inps0Al5VotmjF9/hNwIqIeKTE/Ka+v5J2B14L3EsbfI4L4s2XyWe4k55QNjsilkvaCbhV0sPpGUyOiqzT1L6zksYARwKfLTL7AZLmojVpO/ENwB6NjG+UWu59BpD0eWAIuKrEIpU+P43yHeACkvfsApLmlo8VLNOK7/FxlL8aaNr7K2k8cB1wWkS8kFy8VF6tyLSGvMeF8eZNz+wz3DFXBBGxPP3+FHA9yeVzvqXArnk/TwOWNya6kg4DHoiIFYUzIuKFiFiTvr4F6Jc0qdEBFliRa05Lvz9VZJmWe5/TG33vBI6PtDG1UBWfn4aIiBURMRwRI8D3SsTRUu+xpD7gPcA1pZZp1vsrqZ/koHpVRPw0ndyyn+MS8Wb+Ge6IRCBpnKRtc69Jbq48WLDYHOBDSuwPPJ+7PGyikmdRkl6atrsiaV+Sv9XKBsZWzBwg13viw8CNRZb5L+DtkiakzRpvT6c1haRDgTOAIyPiryWWqebz0xAF962OLhHHfcAekl6eXlV+gORv0yxvBR6OiKXFZjbr/U3/fy4FHoqIr+fNasnPcal4G/IZzvIueKO+SHpPzE+/FgKfT6efApySvhbwbZLeFn8AZjU55peQHNi3z5uWH++p6e8yn+QG0f9pcHw/Ap4ANpCcHX0cmAjcBjySft8xXXYWcEneuh8DFqdfH21yzItJ2nrnpV/fTZedAtxS7vPTpHi/n34+F5AcsHYpjDf9+XCSXiV/bma86fTLc5/bvGVb4f09gKQ5Z0He3//wVv0cl4k388+wS0yYmXW5jmgaMjOz0XMiMDPrck4EZmZdzonAzKzLORGYmXU5JwKrm3Tsw9WS/ixpkaRbJL1K0hRJ16bLzFSJippltvsRSU+nVRUXSTopg9h/Jansw8AlnSbpJXk/31KqEmSN+z5F0odqXOcXklZJuqlg+ssl3ZtWzLwmHWdQuK5UohKvSlTclPT6tLLl4nTdqobnWntwIrC6SA8M1wO/iohXRsQM4HMkdV2WR8R700VnkvSNrtU1ETGTpCrrFyXtXI+4a3QayfgPACLi8IhYtbUbjYjvRsSVNa72FeDEItMvBL4REXsAz5GMTSh0GEm5kj2Ak0nKWiBpR+AcYD+SUann5NU6+k66bG69hhQStMZwIrB6OQjYEBHfzU2IiHkR8RtJu0t6MD07PR84Nj27PzY985wMIKknPeMsWUojkuHzfwZepqSu/A3pWe09kvZOt3OupO9Luj3d/knp9APzz6AlfUvSRwr3Iek7kuYqqQl/XjrtH0kG8Nwh6Y502mO5WCV9Ov0dH1T6bIn0935I0vfSbf1S0jZF9neupM+kr38l6UJJv5P0J0lvKvE+3AasLtiOgIOBa9NJpSprlqrEW7TiZjpvu4j4bSQDj64ssV1rU04EVi97AfeXWyCSkslnk57dR8Q1wA+A49NF3grMj4hnSm1D0itIRlEuBs4Dfh8Re5NcfeSfVe8NHAG8EThb0pQafpfPR8SsdBtvkbR3RPwbSa2ZgyLioIKYXg98lORMen/gJEmvTWfvAXw7IvYEVgHHVLH/vojYl+QK5Jwa4p4IrIpNJYtLVcwsVVmz3PSlRaZbh3AisGa7DMi1j38M+M8Syx2r5OlXPwI+ERHPkgzJ/z5ARNwOTJS0fbr8jRHxYppU7qC2Imfvl/QA8HtgT5KHg5RzAHB9RKyNpFDgT0nKMgP8b0Tkntp1P7B7FfvPFRurdvmcaitmllqu1unWIZwIrF4WAq+vdaWIWEJSDfJgkjPqUrXWc1cR+0XE9em0cgeowgNVkJTwzf/Mjy1cWdLLgc8Ah6RXGjcXW65wtTLzBvNeD1Nd6ffcOtUun/MMSTNPbp1SFTNLVdYsN31akenWIZwIrF5uBwbye/RIeoOktxQst5rkMXz5LiFpIvpxRAzXsM87SZuVJB0IPBOb6rcfJWmspIkkN5jvAx4HZkgaSK8cDimyze2AtcDz6Q3pwyrEnovj3ZJeoqTy49HAb2r4Peoibb+/A8jdmN9YWVPS0ZK+lE4vVYm3aMXNdN5qSfun9yE+RPGKndamnAisLtKD0NHA25R0H11I8vzdwjPHO0gOxvMkHZtOmwOMp3SzUCnnArMkLQC+zKbSwgC/Izmbvwe4IO25tAT4MUl1x6tImn4Kf4/56fSFJM1Wd+fNvhj4ee5mcd46D5BU4PwdyROlLomILbZdT5J+A/wEOETSUknvSGedAXxa0mKSewaXptNfCeSS5C0kz+BdTPLMg39If49nSR6Gc1/6dX46DeDvSRL2YpKb9aWu3KwNufqoNZ2S/vvfiIiiPWRGsb1zgTUR8dV6bK8TSPoB8H8j4ulmx2Ktp5MeVWltSNKZJGebx1da1kYvIk5odgzWunxFYGbW5XyPwMysyzkRmJl1OScCM7Mu50RgZtblnAjMzLrc/wfoZWQF1rDTIgAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x2dcbf8191d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df = df.sort_values('X', ascending=True)\n",
"ax = df.plot.scatter(x = \"X\", y=\"Y\",title='City population vs. profit')\n",
"ax.set_xlabel(\"City Population in 10,000\")\n",
"ax.set_ylabel(\"Profit in 10,000$\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, we plotted the data after sorting it. We can see the data is not entirely balanced since there are many small cities but much less larger cities, this could harm us down the road. for fitting the linear regression."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"def computecost(X, y, theta):\n",
" m = X.shape[0]\n",
" J = np.sum((np.square((X * theta.T) - y)) / (2 * m))\n",
" return J"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check the cost function we wrote:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theta = np.array([1, 1])\n",
"xx = np.array([3, 2])\n",
"yy = np.array([1,2])\n",
"computecost(xx, yy, theta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Playing with it a bit shows ok results. Let's add a bias column to our data for theta0 parameter."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<bound method NDFrame.head of Bias X Y\n",
"74 1 5.0269 -2.68070\n",
"77 1 5.0365 5.70140\n",
"9 1 5.0546 3.81660\n",
"89 1 5.0594 2.82140\n",
"46 1 5.0702 5.13370\n",
"79 1 5.1077 2.05760\n",
"17 1 5.1301 0.56077\n",
"61 1 5.1793 -0.74279\n",
"81 1 5.1884 0.20421\n",
"30 1 5.2524 -1.22000\n",
"93 1 5.3054 1.98690\n",
"51 1 5.3077 1.83960\n",
"15 1 5.3794 3.51290\n",
"40 1 5.4069 0.55657\n",
"96 1 5.4369 0.61705\n",
"22 1 5.4901 4.26300\n",
"68 1 5.4994 1.01730\n",
"88 1 5.5204 0.15200\n",
"1 1 5.5277 9.13020\n",
"49 1 5.5416 1.01790\n",
"24 1 5.5649 3.08250\n",
"37 1 5.6063 3.39280\n",
"57 1 5.6397 4.60420\n",
"14 1 5.6407 0.71618\n",
"90 1 5.7077 1.84510\n",
"10 1 5.7107 3.25220\n",
"80 1 5.7292 0.47953\n",
"12 1 5.7340 3.15510\n",
"43 1 5.7737 2.44060\n",
"47 1 5.8014 1.84400\n",
".. ... ... ...\n",
"66 1 8.2951 5.74420\n",
"5 1 8.3829 11.88600\n",
"13 1 8.4084 7.22580\n",
"85 1 8.5172 4.24150\n",
"2 1 8.5186 13.66200\n",
"7 1 8.5781 12.00000\n",
"60 1 8.8254 5.16940\n",
"86 1 9.1802 6.79810\n",
"32 1 9.2482 12.13400\n",
"58 1 9.3102 3.96240\n",
"59 1 9.4536 5.41410\n",
"83 1 9.7687 7.54350\n",
"70 1 10.1360 6.67990\n",
"67 1 10.2360 7.77540\n",
"78 1 10.2740 6.75260\n",
"27 1 10.9570 7.04670\n",
"48 1 11.7000 8.00430\n",
"42 1 11.7080 5.38540\n",
"26 1 12.8280 13.50100\n",
"38 1 12.8360 10.11700\n",
"28 1 13.1760 14.69200\n",
"95 1 13.3940 9.05510\n",
"11 1 14.1640 15.50500\n",
"63 1 14.9080 12.05400\n",
"25 1 18.9450 22.63800\n",
"64 1 18.9590 17.05400\n",
"21 1 20.2700 21.76700\n",
"69 1 20.3410 20.99200\n",
"62 1 21.2790 17.92900\n",
"29 1 22.2030 24.14700\n",
"\n",
"[97 rows x 3 columns]>"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.insert(0, 'Bias', 1)\n",
"df.head"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"seperation = df.shape[1]\n",
"X = df.iloc[:,0:seperation-1]\n",
"y = df.iloc[:,seperation-1:seperation]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((97, 2), (97, 1), (1, 2))"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#turning to matrices for cost function multiplication\n",
"X = np.matrix(X.values)\n",
"y = np.matrix(y.values)\n",
"theta = np.matrix(np.array([0,0]))\n",
"X.shape, y.shape, theta.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see we have 97 rows and 2 columns for training (97 samples for x0=1 and x1=City Population).\n",
"For y we have 97 samples of equivelant profits.\n",
"and we have the learned parameters vector theta which composed of theta 0 (the bias term) and theta 1."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"32.072733877455676"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"computecost(X,y,theta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, exactly what we expected fromt the exercise. Now let's build the gradient descent function."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"def gradientDescent(X, y, theta, alpha, num_iters):\n",
" parameters = int(theta.ravel().shape[1])\n",
" cost_history= np.zeros(num_iters)\n",
" temp_theta = np.zeros(theta.shape)\n",
" temp_theta= np.matrix(temp_theta)\n",
" for i in range(num_iters):\n",
" e = (X * theta.T) - y # calculating just one -avoiding redundancy\n",
" for j in range(parameters):\n",
" temp_theta[0,j] = theta[0,j] - ((alpha / len(X)) * np.sum(np.multiply(e, X[:,j])))\n",
" theta = temp_theta\n",
" cost_history[i] = computecost(X, y, theta)\n",
" \n",
" return theta, cost_history"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(24.47289936878686, 19.08057847001107)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learning_rate = 0.001\n",
"iters = 14\n",
"theta = np.matrix(np.array([0,0]))\n",
"theta, cost = gradientDescent(X, y, theta, learning_rate, iters)\n",
"cost[1], cost[3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, we can see that the cost is descending, So now let's draw the hypothesis we got with the data."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matrix([[0.04572468, 0.55590616]])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theta"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"# Create a list of values in the best fit line\n",
"abline_values = [theta[0,1]* i + theta[0,0] for i in X[:,1]]\n",
"abline_values = [float(i) for i in abline_values]"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0,0.5,'Profit in 10,000$')"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x2dcbf77bfd0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = df.plot.scatter(x = \"X\", y=\"Y\",title='City population vs. profit')\n",
"plt.plot(X[:,1], abline_values, 'g')\n",
"ax.set_xlabel(\"City Population in 10,000\")\n",
"ax.set_ylabel(\"Profit in 10,000$\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, so we have our new hypothesis drawn.let's try to have more iteration and get a better fit."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"iters = 800\n",
"theta, cost = gradientDescent(X, y, theta, learning_rate, iters)\n",
"# Create a list of values in the best fit line\n",
"abline_values = [theta[0,1]* i + theta[0,0] for i in X[:,1]]\n",
"abline_values = [float(i) for i in abline_values]"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0,0.5,'Profit in 10,000$')"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x2dcbf8866a0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = df.plot.scatter(x = \"X\", y=\"Y\",title='City population vs. profit')\n",
"plt.plot(X[:,1], abline_values, 'g')\n",
"ax.set_xlabel(\"City Population in 10,000\")\n",
"ax.set_ylabel(\"Profit in 10,000$\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This looks better, Now let's try to visualize J(theta), the cost function as a function of theta(0) and theta(1)."
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [],
"source": [
"def visualize_cost():\n",
" theta0_vals = np.linspace(-10, 10, 100);\n",
" theta1_vals = np.linspace(-1, 4, 100);\n",
" J_vals = np.zeros((len(theta0_vals), len(theta1_vals)));\n",
" # Fill out J_vals\n",
" for i in range (0,len(theta0_vals)):\n",
" for j in range (0,len(theta1_vals)):\n",
" t = np.matrix([theta0_vals[i], theta1_vals[j]])\n",
" J_vals[i,j] = computecost(X, y, t)\n",
" # now let's build the contours and plot them\n",
" plt.figure()\n",
" cp = plt.contour(theta0_vals, theta1_vals, J_vals)\n",
" plt.clabel(cp, inline=True, fontsize=10)\n",
" plt.xlabel('theta0')\n",
" plt.ylabel('theta1')\n",
" plt.show()\n",
" return "
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x2dcbfe05668>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"visualize_cost()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, this is it for now,\n",
"<br>\n",
"For comments please send to my mail: tomer@nahshoh.net\n",
"<br>\n",
"Thank you!\n",
"<br>\n",
"Tomer Nahshon"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (myenv)",
"language": "python",
"name": "myenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment