Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save BankNatchapol/11b304c7ad454c8421ae9b22bed763b9 to your computer and use it in GitHub Desktop.
Save BankNatchapol/11b304c7ad454c8421ae9b22bed763b9 to your computer and use it in GitHub Desktop.
Normal Equation vs Gradient Descent.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Normal Equation vs Gradient Descent.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyNYRTrSJ4STckTDkyIRgyxm",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/BankNatchapol/11b304c7ad454c8421ae9b22bed763b9/normal-equation-vs-gradient-descent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tMsTK2Ig_658",
"colab_type": "text"
},
"source": [
"# **Linear Regression: Normal Equation vs Gradient Descent**\n",
"---\n",
"ในครั้งที่แล้วผมได้ทำ Multiple Variable Linear Regression ไปแล้วซึ่งในคราวที่แล้วการ Update parameters ในแต่ละครั้งนั้นใช้Gradient Descentเป็นหลัก <br>\n",
"ในครั้งนี้จะมาลองใช้อีก Algorithm หนึ่งที่เรียกว่า **Normal Equation**<br><br>\n",
"**ideas** : เมื่อกำหนดให้ $\\theta^T$ คือ vector <$\\theta _0 , \\theta _1,\\theta _2 , \\theta_3,...,\\theta_n$>, x คือ vector <$x_0 , x_1,x_2 , x_3,...,x_n$>\n",
"<br>จะได้<br>\n",
"\\begin{equation}\n",
"\\theta x = y\n",
"\\end{equation}\n",
"<br>\n",
"ซึ่งถ้าเราอยากหา $\\theta$ ก็สามารถทำได้โดยการย้ายข้างสมการง่ายๆ<br><br>\n",
"\\begin{equation}\n",
"\\theta = x^{-1}y\n",
"\\end{equation}\n",
"<br>\n",
"แต่ในความเป็นจริงแล้วเราไม่สามารถทำแบบนี้ได้เนื่องจาก x เป็น Matrix และ การ inverse ไม่ได้ทำได้กับทุก Matrix <br>ดังนั้นเราจึงคูณทั้ง 2 ข้างของสมการด้วย Matrix$\\;x^T $ <br>\n",
"\\begin{equation}\n",
"\\theta x^Tx = x^Ty \n",
"\\end{equation}<br>\n",
"และทำการ inverse $x^Tx $ โดยที่จะเป็นการทำ inverse แบบ Psudo-inverse เพื่อให้สามารถหา inverse ของ Matrix ได้ทุกแบบ <br><br>\n",
"\\begin{equation}\n",
"\\theta = (x^Tx)^{-1}x^Ty \n",
"\\end{equation}<br>\n",
"ดังนั้นเราก็จะสามารถ update parameters ได้แล้ว ด้วยสมการ Normal Equation"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZXNJiLFndjte",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"pd.set_option('display.max_rows', df.shape[0]+1) "
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "2yFTYsPuEGeg",
"colab_type": "text"
},
"source": [
"โดยขั้นแรกจะทำการโหลด boston housing datasets ด้วย library sklearn "
]
},
{
"cell_type": "code",
"metadata": {
"id": "a3DfkMwvd5wO",
"colab_type": "code",
"colab": {}
},
"source": [
"dat = load_boston(return_X_y=False) # boston housing datasets"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "07SsL8bYeGSc",
"colab_type": "code",
"outputId": "5d5f74be-3e23-4b2c-a3fe-e559e8f1ec65",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"df = pd.DataFrame(dat.data,columns=dat.feature_names) # create dataframe\n",
"df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CRIM</th>\n",
" <th>ZN</th>\n",
" <th>INDUS</th>\n",
" <th>CHAS</th>\n",
" <th>NOX</th>\n",
" <th>RM</th>\n",
" <th>AGE</th>\n",
" <th>DIS</th>\n",
" <th>RAD</th>\n",
" <th>TAX</th>\n",
" <th>PTRATIO</th>\n",
" <th>B</th>\n",
" <th>LSTAT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.00632</td>\n",
" <td>18.0</td>\n",
" <td>2.31</td>\n",
" <td>0.0</td>\n",
" <td>0.538</td>\n",
" <td>6.575</td>\n",
" <td>65.2</td>\n",
" <td>4.0900</td>\n",
" <td>1.0</td>\n",
" <td>296.0</td>\n",
" <td>15.3</td>\n",
" <td>396.90</td>\n",
" <td>4.98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.02731</td>\n",
" <td>0.0</td>\n",
" <td>7.07</td>\n",
" <td>0.0</td>\n",
" <td>0.469</td>\n",
" <td>6.421</td>\n",
" <td>78.9</td>\n",
" <td>4.9671</td>\n",
" <td>2.0</td>\n",
" <td>242.0</td>\n",
" <td>17.8</td>\n",
" <td>396.90</td>\n",
" <td>9.14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.02729</td>\n",
" <td>0.0</td>\n",
" <td>7.07</td>\n",
" <td>0.0</td>\n",
" <td>0.469</td>\n",
" <td>7.185</td>\n",
" <td>61.1</td>\n",
" <td>4.9671</td>\n",
" <td>2.0</td>\n",
" <td>242.0</td>\n",
" <td>17.8</td>\n",
" <td>392.83</td>\n",
" <td>4.03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.03237</td>\n",
" <td>0.0</td>\n",
" <td>2.18</td>\n",
" <td>0.0</td>\n",
" <td>0.458</td>\n",
" <td>6.998</td>\n",
" <td>45.8</td>\n",
" <td>6.0622</td>\n",
" <td>3.0</td>\n",
" <td>222.0</td>\n",
" <td>18.7</td>\n",
" <td>394.63</td>\n",
" <td>2.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.06905</td>\n",
" <td>0.0</td>\n",
" <td>2.18</td>\n",
" <td>0.0</td>\n",
" <td>0.458</td>\n",
" <td>7.147</td>\n",
" <td>54.2</td>\n",
" <td>6.0622</td>\n",
" <td>3.0</td>\n",
" <td>222.0</td>\n",
" <td>18.7</td>\n",
" <td>396.90</td>\n",
" <td>5.33</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO B LSTAT\n",
"0 0.00632 18.0 2.31 0.0 0.538 ... 1.0 296.0 15.3 396.90 4.98\n",
"1 0.02731 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 396.90 9.14\n",
"2 0.02729 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 392.83 4.03\n",
"3 0.03237 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 394.63 2.94\n",
"4 0.06905 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 396.90 5.33\n",
"\n",
"[5 rows x 13 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 52
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2_yJzlEREVJB",
"colab_type": "text"
},
"source": [
"เป้าหมายของเราคือการทำนายราคาของบ้าน จากข้อมูล"
]
},
{
"cell_type": "code",
"metadata": {
"id": "f5E8yL_2euZn",
"colab_type": "code",
"outputId": "7fffdb3f-66bd-46c5-9db0-be31ce8ac32d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"target = pd.DataFrame(dat.target.T,columns=['PRICES']) # target prices\n",
"target.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PRICES</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>21.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>34.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>36.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PRICES\n",
"0 24.0\n",
"1 21.6\n",
"2 34.7\n",
"3 33.4\n",
"4 36.2"
]
},
"metadata": {
"tags": []
},
"execution_count": 53
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "cYXf-wch1pim",
"colab_type": "code",
"outputId": "3e7d7937-8c31-400a-f47c-a145373c4ee1",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"def scaling(x): #scaling input \n",
" std = np.std(x)\n",
" std[0] = 1\n",
" mean = np.mean(x)\n",
" mean[0] = 0\n",
" return (x-mean)/std\n",
"\n",
"df_s = scaling(df)\n",
"ones = pd.DataFrame({'ones':np.ones(df_s.shape[0]).T}) \n",
"df_s = ones.join(df_s)\n",
"m = df_s.shape[0]\n",
"n = df_s.shape[1]\n",
"s = np.zeros(n)\n",
"sT = np.array([s]).T #all parameters\n",
"\n",
"def hypo(x,sT): #hypothesis function\n",
" return np.dot(x,sT)\n",
"\n",
"def costFunction(x,sT): #cost function\n",
" h = hypo(x,sT)\n",
" return (1/(2*m))*sum(np.array((h-target))**2)\n",
" \n",
"def gradientCost(x,sT): #gradient descent\n",
" h = hypo(x,sT)\n",
" return (1/(m))*np.dot(df_s.T,(h-target))\n",
"\n",
"alpha = 0.01 # learning rate\n",
"last = costFunction(df_s,sT)+1\n",
"while last - costFunction(df_s,sT)>0.001: #loop until change < 0.001\n",
" last = costFunction(df_s,sT)\n",
" sT = sT - alpha*(gradientCost(df_s,sT))\n",
"GDpredicted = np.dot(df_s,sT)\n",
"compare = pd.DataFrame({'Target Prices':np.array(target).T[0],'GD Predicted Prices':GDpredicted.T[0]}) # comparing result with target \n",
"compare.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Target Prices</th>\n",
" <th>GD Predicted Prices</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" <td>30.595625</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>21.6</td>\n",
" <td>24.982998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>34.7</td>\n",
" <td>30.979016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33.4</td>\n",
" <td>29.284144</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>36.2</td>\n",
" <td>28.673260</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Target Prices GD Predicted Prices\n",
"0 24.0 30.595625\n",
"1 21.6 24.982998\n",
"2 34.7 30.979016\n",
"3 33.4 29.284144\n",
"4 36.2 28.673260"
]
},
"metadata": {
"tags": []
},
"execution_count": 54
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vEbmTW2C8JHf",
"colab_type": "text"
},
"source": [
"ใช้ function pinv ของ numpy ในการทำ Matrix psudo-inverse "
]
},
{
"cell_type": "code",
"metadata": {
"id": "4crlW4NFr7tW",
"colab_type": "code",
"colab": {}
},
"source": [
"from numpy.linalg import inv,pinv #import การทำ psudo-inverse"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "xaGFgVyw8SF0",
"colab_type": "text"
},
"source": [
"ทำการคำนนวนหา parameters ตามสมการ <br>\n",
"\\begin{equation}\n",
"\\theta = (x^Tx)^{-1}x^Ty \n",
"\\end{equation}<br>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "v6exGE0LsD3V",
"colab_type": "code",
"outputId": "a99a138f-cc62-4fbf-b412-507880f75e50",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 235
}
},
"source": [
"df_n = df\n",
"a = pinv(np.dot(df_n.T,df_n)) # psudo inverse\n",
"b = np.dot(df_n.T,target) \n",
"theta = np.dot(a,b) #parameters\n",
"theta"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[-9.28965170e-02],\n",
" [ 4.87149552e-02],\n",
" [-4.05997958e-03],\n",
" [ 2.85399882e+00],\n",
" [-2.86843637e+00],\n",
" [ 5.92814778e+00],\n",
" [-7.26933458e-03],\n",
" [-9.68514157e-01],\n",
" [ 1.71151128e-01],\n",
" [-9.39621540e-03],\n",
" [-3.92190926e-01],\n",
" [ 1.49056102e-02],\n",
" [-4.16304471e-01]])"
]
},
"metadata": {
"tags": []
},
"execution_count": 56
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "asa0a4Re8yl0",
"colab_type": "text"
},
"source": [
"ทำการเปรียบเทียบ Predicted Prices\tที่ได้จาก Gradient Descent และจาก Normal Equation"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Jcird5e0tHbt",
"colab_type": "code",
"outputId": "a255be84-2c58-49bb-b4f5-096346e975dd",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"NEpredicted = np.dot(df_n,theta) #predict using normal equation's parameters\n",
"sh = pd.DataFrame({'Target Prices':np.array(target).T[0],'GD Predicted Prices':np.dot(df_s,sT).T[0],'NE Predicted Prices':NEpredicted.T[0]})\n",
"sh.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Target Prices</th>\n",
" <th>GD Predicted Prices</th>\n",
" <th>NE Predicted Prices</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" <td>30.595625</td>\n",
" <td>29.098264</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>21.6</td>\n",
" <td>24.982998</td>\n",
" <td>24.502275</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>34.7</td>\n",
" <td>30.979016</td>\n",
" <td>31.227426</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33.4</td>\n",
" <td>29.284144</td>\n",
" <td>29.707104</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>36.2</td>\n",
" <td>28.673260</td>\n",
" <td>29.564796</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Target Prices GD Predicted Prices NE Predicted Prices\n",
"0 24.0 30.595625 29.098264\n",
"1 21.6 24.982998 24.502275\n",
"2 34.7 30.979016 31.227426\n",
"3 33.4 29.284144 29.707104\n",
"4 36.2 28.673260 29.564796"
]
},
"metadata": {
"tags": []
},
"execution_count": 57
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PUSNd1Zj9Bbk",
"colab_type": "text"
},
"source": [
"หลังจากเปรียบเทียบค่าต่อค่าไปแล้วก็ลองมาดูความแม่นยำจาก RMSE ของทั้ง 2 ผลลัพท์"
]
},
{
"cell_type": "code",
"metadata": {
"id": "iH-TaDmKtl0Y",
"colab_type": "code",
"colab": {}
},
"source": [
"def rmse(p):\n",
" return np.sqrt(sum((np.array(target) - p)**2)/len(p))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "6APFHgibtqFG",
"colab_type": "code",
"outputId": "3c59fb7f-bfba-4406-eb4d-6d8047eb8713",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 50
}
},
"source": [
"print(f'RMSE of GD is {rmse(GDpredicted)[0]}\\nRMSE of NE is {rmse(NEpredicted)[0]}')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"RMSE of GD is 4.757750208938539\n",
"RMSE of NE is 4.915902697381884\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ocv3J6Ge9Sh8",
"colab_type": "text"
},
"source": [
"จะพบได้ว่า มีผลลัพท์ที่ใกล้เคียงกัน เนื่องจากมีจำนวน feature กับ example น้อยจึงอาจจะไม่เห็นความแตกต่างมากเท่าไร <br>\n",
"ซึ่งในการเปรียบเทียบข้อดีข้อเสียของทั้ง 2 Algorithm นั้น <br>\n",
"\n",
">**Gradient Descent**<br>\n",
"\n",
">ข้อดี|ข้อเสีย\n",
">--- | ---\n",
">-สามารถใช้งานกับ feature จำนวนมากได้ $O(kn^2)$| -จำเป็นที่จะต้องทำการเลือก $\\alpha$ ซึ่งอาจจะไม่ได้ $\\alpha$ ที่ดีที่สุด\n",
"> | -จำเป็นที่จะต้องใช้ iteration หลายรอบซึ่งทำให้ช้าลง\n",
"\n",
">**Normal Equation**<br>\n",
"\n",
">ข้อดี|ข้อเสีย\n",
">--- | ---\n",
">-ไม่จำเป็นที่จะต้องเลือก $\\alpha$| -ต้องทำการคำนวน $x^Tx$ ซึ่งเป็น matrix ขนาด nxn <br> ทำให้เมื่อมี feature มากจะต้องใช้เวลามาก $O(n^3)$\n",
">-ไม่มี iteration เนื่องจาก เป็นการ dot ครั้งเดียว | \n",
"\n",
"<br>"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment