Skip to content

Instantly share code, notes, and snippets.

@BankNatchapol
Created March 18, 2020 10:20
Show Gist options
  • Save BankNatchapol/8538ef21feea773478fb4ada07b40d9b to your computer and use it in GitHub Desktop.
Save BankNatchapol/8538ef21feea773478fb4ada07b40d9b to your computer and use it in GitHub Desktop.
MultiVar-Linear-Regression.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "MultiVar-Linear-Regression.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyMiqEVWYIRA+6ObXOk7q2cX"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "tMsTK2Ig_658",
"colab_type": "text"
},
"source": [
"# **Multiple Variable Linear Regression**\n",
"---\n",
"Multiple variable linear regression คือ โมเดลทางสถิติที่สามารถบอกถึงความสัมพันธ์ ของตัวแปลต้น \"หลายตัว\" และตัวแปรตามได้ \n",
"ซึ่งสามารถเขียน **hypothesis function** ได้ว่า <br>\n",
"\\begin{equation}\n",
"h_\\theta (x) = \\theta _0 + \\theta _1x_1 +\\theta _2x_2 + \\theta_3x_3+...+\\theta_nx_n\n",
"\\end{equation} <br>\n",
"ซึ่งถ้าเรามองว่า $\\theta$ คือ vector <$\\theta _0 , \\theta _1,\\theta _2 , \\theta_3,...,\\theta_n$><br>\n",
"$\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;$ x คือ vector <$x_0 , x_1,x_2 , x_3,...,x_n$><br><br>\n",
"จะได้ว่า\n",
"\\begin{equation}\n",
"h_\\theta (x) = \\theta^Tx \n",
"\\end{equation} <br>\n",
"<br>\n",
" สามารถเขียน **cost function** หรือ loss function ได้ว่า <br><br> \n",
"\\begin{equation}\n",
"J(\\theta) = \\frac{1}{2m}\\sum^m_{i=1}(h_\\theta(x^{(i)}) - y^{(i)})^2\n",
"\\end{equation}<br><br>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZXNJiLFndjte",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"pd.set_option('display.max_rows', df.shape[0]+1) "
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "2yFTYsPuEGeg",
"colab_type": "text"
},
"source": [
"โดยขั้นแรกจะทำการโหลด boston housing datasets ด้วย library sklearn "
]
},
{
"cell_type": "code",
"metadata": {
"id": "a3DfkMwvd5wO",
"colab_type": "code",
"colab": {}
},
"source": [
"dat = load_boston(return_X_y=False) # boston housing datasets"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "07SsL8bYeGSc",
"colab_type": "code",
"outputId": "b5c8cdad-62d8-4085-c400-52ee3e133026",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"df = pd.DataFrame(dat.data,columns=dat.feature_names) # create dataframe\n",
"df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CRIM</th>\n",
" <th>ZN</th>\n",
" <th>INDUS</th>\n",
" <th>CHAS</th>\n",
" <th>NOX</th>\n",
" <th>RM</th>\n",
" <th>AGE</th>\n",
" <th>DIS</th>\n",
" <th>RAD</th>\n",
" <th>TAX</th>\n",
" <th>PTRATIO</th>\n",
" <th>B</th>\n",
" <th>LSTAT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.00632</td>\n",
" <td>18.0</td>\n",
" <td>2.31</td>\n",
" <td>0.0</td>\n",
" <td>0.538</td>\n",
" <td>6.575</td>\n",
" <td>65.2</td>\n",
" <td>4.0900</td>\n",
" <td>1.0</td>\n",
" <td>296.0</td>\n",
" <td>15.3</td>\n",
" <td>396.90</td>\n",
" <td>4.98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.02731</td>\n",
" <td>0.0</td>\n",
" <td>7.07</td>\n",
" <td>0.0</td>\n",
" <td>0.469</td>\n",
" <td>6.421</td>\n",
" <td>78.9</td>\n",
" <td>4.9671</td>\n",
" <td>2.0</td>\n",
" <td>242.0</td>\n",
" <td>17.8</td>\n",
" <td>396.90</td>\n",
" <td>9.14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.02729</td>\n",
" <td>0.0</td>\n",
" <td>7.07</td>\n",
" <td>0.0</td>\n",
" <td>0.469</td>\n",
" <td>7.185</td>\n",
" <td>61.1</td>\n",
" <td>4.9671</td>\n",
" <td>2.0</td>\n",
" <td>242.0</td>\n",
" <td>17.8</td>\n",
" <td>392.83</td>\n",
" <td>4.03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.03237</td>\n",
" <td>0.0</td>\n",
" <td>2.18</td>\n",
" <td>0.0</td>\n",
" <td>0.458</td>\n",
" <td>6.998</td>\n",
" <td>45.8</td>\n",
" <td>6.0622</td>\n",
" <td>3.0</td>\n",
" <td>222.0</td>\n",
" <td>18.7</td>\n",
" <td>394.63</td>\n",
" <td>2.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.06905</td>\n",
" <td>0.0</td>\n",
" <td>2.18</td>\n",
" <td>0.0</td>\n",
" <td>0.458</td>\n",
" <td>7.147</td>\n",
" <td>54.2</td>\n",
" <td>6.0622</td>\n",
" <td>3.0</td>\n",
" <td>222.0</td>\n",
" <td>18.7</td>\n",
" <td>396.90</td>\n",
" <td>5.33</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO B LSTAT\n",
"0 0.00632 18.0 2.31 0.0 0.538 ... 1.0 296.0 15.3 396.90 4.98\n",
"1 0.02731 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 396.90 9.14\n",
"2 0.02729 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 392.83 4.03\n",
"3 0.03237 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 394.63 2.94\n",
"4 0.06905 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 396.90 5.33\n",
"\n",
"[5 rows x 13 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 848
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2_yJzlEREVJB",
"colab_type": "text"
},
"source": [
"เป้าหมายของเราคือการทำนายราคาของบ้าน จากข้อมูล"
]
},
{
"cell_type": "code",
"metadata": {
"id": "f5E8yL_2euZn",
"colab_type": "code",
"outputId": "2482572d-390f-49ab-e740-26a8cc453f70",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"target = pd.DataFrame(dat.target.T,columns=['PRICES']) # target prices\n",
"target.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PRICES</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>21.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>34.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>36.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PRICES\n",
"0 24.0\n",
"1 21.6\n",
"2 34.7\n",
"3 33.4\n",
"4 36.2"
]
},
"metadata": {
"tags": []
},
"execution_count": 849
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "az8NLroNEfpP",
"colab_type": "text"
},
"source": [
"โดยจากที่เห็นตัวข้อมูลจะพบว่า ข้อมูลในแต่ละ feature มี scale ที่ต่างกันมาก เลยมากทำการ scaling ด้วย **Z-score Normalization**<br><br>\n",
"\\begin{equation}\n",
"x_j := \\frac{x_j - \\mu_j}{\\sigma_j}\n",
"\\end{equation} <br>\n",
"$\\mu_j$ คือ ค่าเฉลี่ยของ feature นั้นๆ<br>\n",
"$\\sigma_j$ คือ ส่วนเบี่ยงเบนมาตรฐานของ feature นั้นๆ"
]
},
{
"cell_type": "code",
"metadata": {
"id": "cYXf-wch1pim",
"colab_type": "code",
"outputId": "e1bfa058-9772-4e98-810b-141345e691d4",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"def scaling(x): #scaling input \n",
" std = np.std(x)\n",
" std[0] = 1\n",
" mean = np.mean(x)\n",
" mean[0] = 0\n",
" return (x-mean)/std\n",
"df = scaling(df)\n",
"df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CRIM</th>\n",
" <th>ZN</th>\n",
" <th>INDUS</th>\n",
" <th>CHAS</th>\n",
" <th>NOX</th>\n",
" <th>RM</th>\n",
" <th>AGE</th>\n",
" <th>DIS</th>\n",
" <th>RAD</th>\n",
" <th>TAX</th>\n",
" <th>PTRATIO</th>\n",
" <th>B</th>\n",
" <th>LSTAT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.00632</td>\n",
" <td>0.284830</td>\n",
" <td>-1.287909</td>\n",
" <td>-0.272599</td>\n",
" <td>-0.144217</td>\n",
" <td>0.413672</td>\n",
" <td>-0.120013</td>\n",
" <td>0.140214</td>\n",
" <td>-0.982843</td>\n",
" <td>-0.666608</td>\n",
" <td>-1.459000</td>\n",
" <td>0.441052</td>\n",
" <td>-1.075562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.02731</td>\n",
" <td>-0.487722</td>\n",
" <td>-0.593381</td>\n",
" <td>-0.272599</td>\n",
" <td>-0.740262</td>\n",
" <td>0.194274</td>\n",
" <td>0.367166</td>\n",
" <td>0.557160</td>\n",
" <td>-0.867883</td>\n",
" <td>-0.987329</td>\n",
" <td>-0.303094</td>\n",
" <td>0.441052</td>\n",
" <td>-0.492439</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.02729</td>\n",
" <td>-0.487722</td>\n",
" <td>-0.593381</td>\n",
" <td>-0.272599</td>\n",
" <td>-0.740262</td>\n",
" <td>1.282714</td>\n",
" <td>-0.265812</td>\n",
" <td>0.557160</td>\n",
" <td>-0.867883</td>\n",
" <td>-0.987329</td>\n",
" <td>-0.303094</td>\n",
" <td>0.396427</td>\n",
" <td>-1.208727</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.03237</td>\n",
" <td>-0.487722</td>\n",
" <td>-1.306878</td>\n",
" <td>-0.272599</td>\n",
" <td>-0.835284</td>\n",
" <td>1.016303</td>\n",
" <td>-0.809889</td>\n",
" <td>1.077737</td>\n",
" <td>-0.752922</td>\n",
" <td>-1.106115</td>\n",
" <td>0.113032</td>\n",
" <td>0.416163</td>\n",
" <td>-1.361517</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.06905</td>\n",
" <td>-0.487722</td>\n",
" <td>-1.306878</td>\n",
" <td>-0.272599</td>\n",
" <td>-0.835284</td>\n",
" <td>1.228577</td>\n",
" <td>-0.511180</td>\n",
" <td>1.077737</td>\n",
" <td>-0.752922</td>\n",
" <td>-1.106115</td>\n",
" <td>0.113032</td>\n",
" <td>0.441052</td>\n",
" <td>-1.026501</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CRIM ZN INDUS ... PTRATIO B LSTAT\n",
"0 0.00632 0.284830 -1.287909 ... -1.459000 0.441052 -1.075562\n",
"1 0.02731 -0.487722 -0.593381 ... -0.303094 0.441052 -0.492439\n",
"2 0.02729 -0.487722 -0.593381 ... -0.303094 0.396427 -1.208727\n",
"3 0.03237 -0.487722 -1.306878 ... 0.113032 0.416163 -1.361517\n",
"4 0.06905 -0.487722 -1.306878 ... 0.113032 0.441052 -1.026501\n",
"\n",
"[5 rows x 13 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 850
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5zi88A1sGbjZ",
"colab_type": "text"
},
"source": [
"ต่อจากนั้นก็มาทำสมการ **hypothesis function**, **cost function** และ **gradient descent** <br><br>\n",
"\\begin{equation}\n",
"Hypothesis\\;Function :\\;\\;\n",
"h_\\theta (x) = \\theta^Tx \n",
"\\end{equation} <br>\n",
"\n",
"\\begin{equation}\n",
"Cost\\;Function\\;:\\;\\;\n",
"J(\\theta) = \\frac{1}{2m}\\sum^m_{i=1}(h_\\theta(x^{(i)}) - y^{(i)})^2\n",
"\\end{equation}<br>\n",
"\\begin{equation}\n",
"Gradient\\;Descent\\;:\\;\\;\n",
"\\theta_j := \\theta_j-\\alpha\\frac{1}{m}\\sum^m_{i=1}(h_\\theta(x^{(i)}) - y^{(i)})x^{(i)}_j\n",
"\\end{equation}<br><br>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "SlfVM0ughBMH",
"colab_type": "code",
"colab": {}
},
"source": [
"ones = pd.DataFrame({'ones':np.ones(df.shape[0]).T}) \n",
"df = ones.join(df)\n",
"m = df.shape[0]\n",
"n = df.shape[1]\n",
"s = np.zeros(n)\n",
"sT = np.array([s]).T #all parameters\n",
"\n",
"def hypo(x,sT): #hypothesis function\n",
" return np.dot(x,sT)\n",
"\n",
"def costFunction(x,sT): #cost function\n",
" h = hypo(x,sT)\n",
" return (1/(2*m))*sum(np.array((h-target))**2)\n",
" \n",
"def gradientCost(x,sT): #gradient descent\n",
" h = hypo(x,sT)\n",
" return (1/(m))*np.dot(df.T,(h-target))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "JIS3w31RIPQA",
"colab_type": "text"
},
"source": [
"กำหนดให้ learning rate = 0.01 <br>\n",
"และ เริ่มทำการ train"
]
},
{
"cell_type": "code",
"metadata": {
"id": "SXZkpeYVnr8T",
"colab_type": "code",
"outputId": "f0645810-795f-42fc-abe7-40476821471b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 67
}
},
"source": [
"alpha = 0.01 # learning rate\n",
"last = costFunction(df,sT)+1\n",
"while last - costFunction(df,sT)>0.001: #loop until change < 0.001\n",
" last = costFunction(df,sT)\n",
" sT = sT - alpha*(gradientCost(df,sT))\n",
"print('result :',sT.T[0])"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"result : [22.65365393 -0.06590317 0.64831977 -0.18460845 0.76100326 -1.12507124\n",
" 3.03972749 -0.11594876 -2.25233247 0.81150778 -0.56628214 -1.82116539\n",
" 0.88059889 -3.70432229]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e9lA8SbCItDe",
"colab_type": "text"
},
"source": [
"สรุป เปรียบเทียบผลการทำนาย"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6RZewRxQsIQv",
"colab_type": "code",
"outputId": "50df3762-66ba-4ea7-862c-54c8ed2ce52a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"compare = pd.DataFrame({'Target Prices':np.array(target).T[0],'Predicted Prices':np.dot(df,sT).T[0]}) # comparing result with target \n",
"compare.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Target Prices</th>\n",
" <th>Predicted Prices</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.0</td>\n",
" <td>30.595625</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>21.6</td>\n",
" <td>24.982998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>34.7</td>\n",
" <td>30.979016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33.4</td>\n",
" <td>29.284144</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>36.2</td>\n",
" <td>28.673260</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Target Prices Predicted Prices\n",
"0 24.0 30.595625\n",
"1 21.6 24.982998\n",
"2 34.7 30.979016\n",
"3 33.4 29.284144\n",
"4 36.2 28.673260"
]
},
"metadata": {
"tags": []
},
"execution_count": 853
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment