-
-
Save ereyester/5c6e5a9b8aa55ba826c7c96a4daf7814 to your computer and use it in GitHub Desktop.
jouhou2_3_13_python.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "jouhou2_3_13_python.ipynb", | |
"provenance": [], | |
"collapsed_sections": [], | |
"toc_visible": true, | |
"authorship_tag": "ABX9TyMuo8hGdVdicLphI29h319U", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/ereyester/5c6e5a9b8aa55ba826c7c96a4daf7814/jouhou2_3_13_python.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "h5sdaFZ62KMs", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"# 高等学校情報科「情報Ⅱ」教員用研修教材\n", | |
"## 第3章前半 13重回帰分析とモデルの決定\n", | |
"### python版" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "8F_OYPOD2V3r", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"stasmodelには、様々な線形回帰モデルがあります.\n", | |
"基本的なもの(最小二乗法、OLS)からより複雑なもの(反復再重み付け最小二乗法、IRLS)まであります.\n", | |
"stasmodelの線形モデルには、2つの主なインターフェースがあります.\n", | |
"配列ベースのものと、formula式ベースのものです." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "SLHIqDfcDKpT", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"https://blog.amedama.jp/entry/2016/12/23/193452\n", | |
"https://qiita.com/0NE_shoT_/items/08376b08783cd554b02e\n", | |
"\n", | |
"http://pepper.is.sci.toho-u.ac.jp/pepper/index.php?%A5%CE%A1%BC%A5%C8%2FPython%2F%C5%FD%B7%D7%2F%B2%F3%B5%A2%CA%AC%C0%CF" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "YFki6HUFTHEy", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"教材のコードは、lm()の引数weights=NULLであるため、最小二乗法(OLS)を使用しているので、ここでもOLSを使う." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "iGUX3Kdzc2dk", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"- coef:係数の推定値\n", | |
"- R-squared: 寄与率(決定係数)\n", | |
"- Adj. R-squared: 自由度修正済みR2\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "syuulWeP975p", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"formula式ベースのコード(教材のRのコードに近い)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "q6OJEX7BLwpC", | |
"colab_type": "code", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 527 | |
}, | |
"outputId": "9a05dc34-a071-4c40-e486-eaa46e916bfc" | |
}, | |
"source": [ | |
"import pandas as pd\n", | |
"import statsmodels.formula.api as smf\n", | |
"high_male = pd.read_csv('/content/high_male_data.csv')\n", | |
"\n", | |
"model = smf.ols('X50m走 ~ 立ち幅跳び + ハンドボール投げ + 握力得点 + 上体起こし得点', data = high_male)\n", | |
"results = model.fit()\n", | |
"print(results.summary())\n" | |
], | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: X50m走 R-squared: 0.525\n", | |
"Model: OLS Adj. R-squared: 0.510\n", | |
"Method: Least Squares F-statistic: 36.18\n", | |
"Date: Mon, 20 Jul 2020 Prob (F-statistic): 2.39e-20\n", | |
"Time: 04:16:26 Log-Likelihood: -41.639\n", | |
"No. Observations: 136 AIC: 93.28\n", | |
"Df Residuals: 131 BIC: 107.8\n", | |
"Df Model: 4 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"Intercept 10.8194 0.325 33.331 0.000 10.177 11.462\n", | |
"立ち幅跳び -0.0120 0.002 -7.648 0.000 -0.015 -0.009\n", | |
"ハンドボール投げ -0.0144 0.006 -2.367 0.019 -0.026 -0.002\n", | |
"握力得点 -0.0402 0.024 -1.677 0.096 -0.088 0.007\n", | |
"上体起こし得点 -0.0255 0.020 -1.264 0.208 -0.065 0.014\n", | |
"==============================================================================\n", | |
"Omnibus: 3.064 Durbin-Watson: 1.813\n", | |
"Prob(Omnibus): 0.216 Jarque-Bera (JB): 2.955\n", | |
"Skew: 0.359 Prob(JB): 0.228\n", | |
"Kurtosis: 2.927 Cond. No. 2.61e+03\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.61e+03. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ZKjcnj-k-7s7", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"配列ベースのコード" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "hl15gVVm2Tve", | |
"colab_type": "code", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 527 | |
}, | |
"outputId": "39d4abd2-db27-4c2b-9a06-83b1a0edcddc" | |
}, | |
"source": [ | |
"import pandas as pd\n", | |
"import statsmodels.api as sm\n", | |
"\n", | |
"high_male = pd.read_csv('/content/high_male_data.csv')\n", | |
"x = high_male[['立ち幅跳び', 'ハンドボール投げ', '握力得点','上体起こし得点']]\n", | |
"y = high_male['X50m走']\n", | |
"x = sm.add_constant(x)\n", | |
"model = sm.OLS(high_male['X50m走'], x)\n", | |
"results = model.fit()\n", | |
"print(results.summary())\n" | |
], | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: X50m走 R-squared: 0.525\n", | |
"Model: OLS Adj. R-squared: 0.510\n", | |
"Method: Least Squares F-statistic: 36.18\n", | |
"Date: Mon, 20 Jul 2020 Prob (F-statistic): 2.39e-20\n", | |
"Time: 04:15:11 Log-Likelihood: -41.639\n", | |
"No. Observations: 136 AIC: 93.28\n", | |
"Df Residuals: 131 BIC: 107.8\n", | |
"Df Model: 4 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"const 10.8194 0.325 33.331 0.000 10.177 11.462\n", | |
"立ち幅跳び -0.0120 0.002 -7.648 0.000 -0.015 -0.009\n", | |
"ハンドボール投げ -0.0144 0.006 -2.367 0.019 -0.026 -0.002\n", | |
"握力得点 -0.0402 0.024 -1.677 0.096 -0.088 0.007\n", | |
"上体起こし得点 -0.0255 0.020 -1.264 0.208 -0.065 0.014\n", | |
"==============================================================================\n", | |
"Omnibus: 3.064 Durbin-Watson: 1.813\n", | |
"Prob(Omnibus): 0.216 Jarque-Bera (JB): 2.955\n", | |
"Skew: 0.359 Prob(JB): 0.228\n", | |
"Kurtosis: 2.927 Cond. No. 2.61e+03\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.61e+03. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ZtNTo17VZ-fi", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"そのほか、LinearRegressionを使う方法などもある" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "IV5IdJEpGCL9", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"import pandas as pd\n", | |
"from sklearn.linear_model import LinearRegression\n", | |
"lr = LinearRegression()\n", | |
"\n", | |
"high_male = pd.read_csv('/content/high_male_data.csv')\n", | |
"# 回帰モデルの呼び出し\n", | |
"clf = LinearRegression()\n", | |
"\n", | |
"# 説明変数にx1とx2のデータを使用\n", | |
"X = high_male.loc[:, ['立ち幅跳び', 'ハンドボール投げ', '握力得点','上体起こし得点']].values\n", | |
"\n", | |
"# 目的変数にx3のデータを使用\n", | |
"Y = high_male['X50m走'].values\n", | |
"\n", | |
"# 予測モデルを作成(重回帰)\n", | |
"results = clf.fit(X, Y)\n", | |
"\n", | |
"\n" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "AyrQFX-l9Mod", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"参考:\n", | |
"https://tanuhack.com/statsmodels-multiple-lra/\n", | |
"\n", | |
"https://future-chem.com/esol-reg-aic/#stepwise_regression\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "6ZptolsJ-to7", | |
"colab_type": "code", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 1000 | |
}, | |
"outputId": "3a93db00-6ae0-4eef-abe4-b4860e840bcd" | |
}, | |
"source": [ | |
"import pandas as pd\n", | |
"import statsmodels.formula.api as smf\n", | |
"high_male = pd.read_csv('/content/high_male_data.csv')\n", | |
"descriptors = ['立ち幅跳び', 'ハンドボール投げ', '握力得点','上体起こし得点']\n", | |
"\n", | |
"model = smf.ols('X50m走 ~ ' + ' + '.join(descriptors), data = high_male)\n", | |
"results = model.fit()\n", | |
"\n", | |
"print(results.summary())\n", | |
"\n", | |
"best_aic = results.aic\n", | |
"best_model = results\n", | |
"while descriptors:\n", | |
" desc_selected = ''\n", | |
" flag = 0\n", | |
" for desk in descriptors:\n", | |
" used_desks = descriptors.copy()\n", | |
" used_desks.remove(desk)\n", | |
" formula = 'X50m走 ~ ' + ' + '.join(used_desks)\n", | |
" model = smf.ols(formula=formula, data=high_male)\n", | |
" results = model.fit()\n", | |
" if results.aic < best_aic:\n", | |
" best_aic = results.aic\n", | |
" best_model = model\n", | |
" desc_selected = desk\n", | |
" flag = 1\n", | |
" if flag:\n", | |
" descriptors.remove(desc_selected)\n", | |
" else:\n", | |
" break\n", | |
"\n", | |
"stepwise_model = best_model.fit()\n", | |
"print(stepwise_model.summary())" | |
], | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: X50m走 R-squared: 0.525\n", | |
"Model: OLS Adj. R-squared: 0.510\n", | |
"Method: Least Squares F-statistic: 36.18\n", | |
"Date: Fri, 24 Jul 2020 Prob (F-statistic): 2.39e-20\n", | |
"Time: 10:51:28 Log-Likelihood: -41.639\n", | |
"No. Observations: 136 AIC: 93.28\n", | |
"Df Residuals: 131 BIC: 107.8\n", | |
"Df Model: 4 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"Intercept 10.8194 0.325 33.331 0.000 10.177 11.462\n", | |
"立ち幅跳び -0.0120 0.002 -7.648 0.000 -0.015 -0.009\n", | |
"ハンドボール投げ -0.0144 0.006 -2.367 0.019 -0.026 -0.002\n", | |
"握力得点 -0.0402 0.024 -1.677 0.096 -0.088 0.007\n", | |
"上体起こし得点 -0.0255 0.020 -1.264 0.208 -0.065 0.014\n", | |
"==============================================================================\n", | |
"Omnibus: 3.064 Durbin-Watson: 1.813\n", | |
"Prob(Omnibus): 0.216 Jarque-Bera (JB): 2.955\n", | |
"Skew: 0.359 Prob(JB): 0.228\n", | |
"Kurtosis: 2.927 Cond. No. 2.61e+03\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.61e+03. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n", | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: X50m走 R-squared: 0.519\n", | |
"Model: OLS Adj. R-squared: 0.508\n", | |
"Method: Least Squares F-statistic: 47.50\n", | |
"Date: Fri, 24 Jul 2020 Prob (F-statistic): 6.92e-21\n", | |
"Time: 10:51:28 Log-Likelihood: -42.463\n", | |
"No. Observations: 136 AIC: 92.93\n", | |
"Df Residuals: 132 BIC: 104.6\n", | |
"Df Model: 3 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"Intercept 10.7121 0.314 34.114 0.000 10.091 11.333\n", | |
"立ち幅跳び -0.0121 0.002 -7.703 0.000 -0.015 -0.009\n", | |
"ハンドボール投げ -0.0169 0.006 -2.929 0.004 -0.028 -0.005\n", | |
"握力得点 -0.0439 0.024 -1.841 0.068 -0.091 0.003\n", | |
"==============================================================================\n", | |
"Omnibus: 2.668 Durbin-Watson: 1.820\n", | |
"Prob(Omnibus): 0.263 Jarque-Bera (JB): 2.632\n", | |
"Skew: 0.334 Prob(JB): 0.268\n", | |
"Kurtosis: 2.867 Cond. No. 2.51e+03\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.51e+03. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "CI6OXg4npAWh", | |
"colab_type": "code", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 1000 | |
}, | |
"outputId": "0021272a-835e-418e-f7d4-d246e5bfe325" | |
}, | |
"source": [ | |
"import pandas as pd\n", | |
"import statsmodels.formula.api as smf\n", | |
"\n", | |
"high_male = pd.read_csv('/content/high_male_data.csv')\n", | |
"descriptors = ['立ち幅跳び', 'ハンドボール投げ', '握力得点','上体起こし得点']\n", | |
"\n", | |
"model = smf.ols('X50m走 ~ ' + ' + '.join(descriptors), data = high_male)\n", | |
"results = model.fit()\n", | |
"\n", | |
"print(results.summary())\n", | |
"\n", | |
"best_aic = results.aic\n", | |
"best_model = model\n", | |
"dict_models = {}\n", | |
"while descriptors:\n", | |
" desc_selected = ''\n", | |
" flag = 0\n", | |
" #dict_fitsに辞書keys:削除対象変数 values:[モデル値,AIC]\n", | |
" for rm_desk in descriptors:\n", | |
" used_desks = descriptors.copy()\n", | |
" used_desks.remove(desk)\n", | |
" formula = 'X50m走 ~ ' + ' + '.join(used_desks)\n", | |
" resultmodel = smf.ols(formula = formula, data = high_male)\n", | |
" dict_models[rm_desk] = [resultmodel, resultmodel.fit().aic]\n", | |
" #AICが最小になる\n", | |
" min_k, min_v = min(dict_models.items(), key=lambda x: x[1][1])\n", | |
" if min_v[1] < best_aic:\n", | |
" best_model = min_v[0]\n", | |
" best_aic = min_v[1]\n", | |
" descriptors.remove(min_k)\n", | |
" else:\n", | |
" #削減してもAICの改善が行われなかったら終了\n", | |
" break\n", | |
"\n", | |
"stepwise_model_fit = best_model.fit()\n", | |
"print(stepwise_model_fit.summary())" | |
], | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: X50m走 R-squared: 0.525\n", | |
"Model: OLS Adj. R-squared: 0.510\n", | |
"Method: Least Squares F-statistic: 36.18\n", | |
"Date: Sat, 25 Jul 2020 Prob (F-statistic): 2.39e-20\n", | |
"Time: 02:25:36 Log-Likelihood: -41.639\n", | |
"No. Observations: 136 AIC: 93.28\n", | |
"Df Residuals: 131 BIC: 107.8\n", | |
"Df Model: 4 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"Intercept 10.8194 0.325 33.331 0.000 10.177 11.462\n", | |
"立ち幅跳び -0.0120 0.002 -7.648 0.000 -0.015 -0.009\n", | |
"ハンドボール投げ -0.0144 0.006 -2.367 0.019 -0.026 -0.002\n", | |
"握力得点 -0.0402 0.024 -1.677 0.096 -0.088 0.007\n", | |
"上体起こし得点 -0.0255 0.020 -1.264 0.208 -0.065 0.014\n", | |
"==============================================================================\n", | |
"Omnibus: 3.064 Durbin-Watson: 1.813\n", | |
"Prob(Omnibus): 0.216 Jarque-Bera (JB): 2.955\n", | |
"Skew: 0.359 Prob(JB): 0.228\n", | |
"Kurtosis: 2.927 Cond. No. 2.61e+03\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.61e+03. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n", | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: X50m走 R-squared: 0.519\n", | |
"Model: OLS Adj. R-squared: 0.508\n", | |
"Method: Least Squares F-statistic: 47.50\n", | |
"Date: Sat, 25 Jul 2020 Prob (F-statistic): 6.92e-21\n", | |
"Time: 02:25:36 Log-Likelihood: -42.463\n", | |
"No. Observations: 136 AIC: 92.93\n", | |
"Df Residuals: 132 BIC: 104.6\n", | |
"Df Model: 3 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"Intercept 10.7121 0.314 34.114 0.000 10.091 11.333\n", | |
"立ち幅跳び -0.0121 0.002 -7.703 0.000 -0.015 -0.009\n", | |
"ハンドボール投げ -0.0169 0.006 -2.929 0.004 -0.028 -0.005\n", | |
"握力得点 -0.0439 0.024 -1.841 0.068 -0.091 0.003\n", | |
"==============================================================================\n", | |
"Omnibus: 2.668 Durbin-Watson: 1.820\n", | |
"Prob(Omnibus): 0.263 Jarque-Bera (JB): 2.632\n", | |
"Skew: 0.334 Prob(JB): 0.268\n", | |
"Kurtosis: 2.867 Cond. No. 2.51e+03\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.51e+03. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
], | |
"name": "stdout" | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment