Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jamm1985/d423d4820e835ae640e0a09ec853d024 to your computer and use it in GitHub Desktop.
Save jamm1985/d423d4820e835ae640e0a09ec853d024 to your computer and use it in GitHub Desktop.
Lab_13_intro_to_ML_regression_part_II.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Lab_13_intro_to_ML_regression_part_II.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyMHlW9igur20r8kmGAUdbdr",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/jamm1985/d423d4820e835ae640e0a09ec853d024/lab_13_intro_to_ml_regression_part_ii.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"Видео лабораторной: https://youtu.be/A3LE-ZmtVGs\n",
"\n",
"TG: https://t.me/data_science_news\n",
"\n",
"\n",
"\n",
"---"
],
"metadata": {
"id": "DWH294RVmYHR"
}
},
{
"cell_type": "code",
"source": [
"!pip install -U tensorflow-addons"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zfPIoGDkN_O6",
"outputId": "e3b82070-faff-4f22-c30b-c182dda278e3"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Requirement already satisfied: tensorflow-addons in /usr/local/lib/python3.7/dist-packages (0.16.1)\n",
"Requirement already satisfied: typeguard>=2.7 in /usr/local/lib/python3.7/dist-packages (from tensorflow-addons) (2.7.1)\n"
]
}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dO65TOXRqVrK",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ec1e8b31-a762-47f2-a24f-761d4cfe79b5"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2.8.0\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pylab as plt\n",
"\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.model_selection import cross_val_score\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import mean_squared_error\n",
"\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"from sklearn.model_selection import KFold\n",
"\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from tensorflow.keras import layers\n",
"import tensorflow_addons as tfa\n",
"\n",
"print(tf.__version__)"
]
},
{
"cell_type": "markdown",
"source": [
"## Первая часть (задача регрессии) \n",
"[Видео лабораторной](https://youtu.be/r-z1cjvpwBE)\n",
"- Кросс-валидация (cross-validation)\n",
"- Регуляризация (regularization)\n",
"- Целевая функция (objective function)\n",
"- Конструирование признаков (feature engineering)\n",
"\n",
"## Вторая часть (задача регрессии)\n",
"- Конструирование признаков (feature engineering)\n",
"- Оценка модели (model assessment)\n",
"- Выбор модели (model selection)"
],
"metadata": {
"id": "o1qNRLQ-FaTQ"
}
},
{
"cell_type": "markdown",
"source": [
"# Дилема смещения-дисперсии (bias-variance trade-off)\n",
"\n",
"Пусть задана MSE для оценки $\\theta$: $\\mathrm{MSE}(\\theta)=E_\\theta[(\\hat{\\theta}-\\theta)^2]$\n",
"\n",
"MSE для оценки $\\hat{\\theta}$ можно выразить как композицию смещения и дисперсии:\n",
"\n",
"$\\mathrm{MSE}(\\theta)=E_\\theta[(\\hat{\\theta}-\\theta)^2]=...=E_\\theta[(\\hat{\\theta}-E_\\theta[\\hat{\\theta}])^2] + (E_\\theta[\\hat{\\theta}]-\\theta)^2=\\mathrm{Var}_\\theta(\\hat{\\theta})+\\mathrm{Bias}(\\hat{\\theta},\\theta)$\n",
"\n",
"**В задаче регрессии**, вобщем виде для модели $y=f(x)+\\epsilon$:\n",
"\n",
"$E_{D,\\epsilon}[y-\\hat{f}(x;D)^2]=\\mathrm{Var}_D(\\hat{f}(x;D))+\\mathrm{Bias}_D[\\hat{f}(x;D)]^2+\\sigma^2$\n",
"\n",
"где\n",
"\n",
"$\\mathrm{Bias}_D[\\hat{f}(x;D)]=E_D[\\hat{f}(x;D)]-f(x)$\n",
"\n",
"$\\mathrm{Var}_D(\\hat{f}(x;D))=E_D[(E_D[\\hat{f}(x;D)]-\\hat{f}(x;D))^2]$\n",
"\n",
"$\\epsilon$ - \"величина ошибки\" (noise), $E[\\epsilon]=0$, $\\mathrm{Var}(\\epsilon)=\\sigma^2$\n",
"\n",
"$D=\\{\\{x_1,y_1\\},\\{x_2,y_2\\},...,\\{x_n,y_n\\}\\}$ - это выборка из совместного распределения $f_{X,Y}(x,y)$\n",
"\n",
"**На наборе данных:** $\\mathrm{MSE} = \\frac{1}{N}\\sum_{i=1}^n(y_i-\\hat{y}_i)^2$"
],
"metadata": {
"id": "-rM1uNGG-Yb2"
}
},
{
"cell_type": "markdown",
"source": [
"![image.png]()\n",
"\n",
"https://en.wikipedia.org/wiki/Bias–variance_tradeoff#/media/File:Bias_and_variance_contributing_to_total_error.svg"
],
"metadata": {
"id": "4i1q1jOOEybZ"
}
},
{
"cell_type": "markdown",
"source": [
"# Разведочный анализ данных\n",
"\n",
"[Набор данных](https://github.com/nguyen-toan/ISLR/blob/master/dataset/Advertising.csv)\n",
"\n",
"[Книга](http://www-bcf.usc.edu/~gareth/ISL/)\n",
"\n",
"[Simple to Multiple and Polynomial Regression in R](https://www.kaggle.com/code/pranjalpandey12/simple-to-multiple-and-polynomial-regression-in-r/data)\n",
"\n"
],
"metadata": {
"id": "tW_KgZOLHSbX"
}
},
{
"cell_type": "code",
"source": [
"!wget https://raw.githubusercontent.com/nguyen-toan/ISLR/master/dataset/Advertising.csv\n",
"!head Advertising.csv"
],
"metadata": {
"id": "1Nl1rZOFtjWm",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "6debe33e-5a41-4b89-9888-46be89c93c81"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2022-03-23 05:32:20-- https://raw.githubusercontent.com/nguyen-toan/ISLR/master/dataset/Advertising.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 5166 (5.0K) [text/plain]\n",
"Saving to: ‘Advertising.csv.1’\n",
"\n",
"\rAdvertising.csv.1 0%[ ] 0 --.-KB/s \rAdvertising.csv.1 100%[===================>] 5.04K --.-KB/s in 0s \n",
"\n",
"2022-03-23 05:32:20 (51.3 MB/s) - ‘Advertising.csv.1’ saved [5166/5166]\n",
"\n",
"\"\",\"TV\",\"Radio\",\"Newspaper\",\"Sales\"\n",
"\"1\",230.1,37.8,69.2,22.1\n",
"\"2\",44.5,39.3,45.1,10.4\n",
"\"3\",17.2,45.9,69.3,9.3\n",
"\"4\",151.5,41.3,58.5,18.5\n",
"\"5\",180.8,10.8,58.4,12.9\n",
"\"6\",8.7,48.9,75,7.2\n",
"\"7\",57.5,32.8,23.5,11.8\n",
"\"8\",120.2,19.6,11.6,13.2\n",
"\"9\",8.6,2.1,1,4.8\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"DATA = pd.read_csv('Advertising.csv')\n",
"DATA"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 423
},
"id": "n2b4Zci3IshW",
"outputId": "17daf468-ce37-4e94-e6dc-f1c9b6a47d07"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Unnamed: 0 TV Radio Newspaper Sales\n",
"0 1 230.1 37.8 69.2 22.1\n",
"1 2 44.5 39.3 45.1 10.4\n",
"2 3 17.2 45.9 69.3 9.3\n",
"3 4 151.5 41.3 58.5 18.5\n",
"4 5 180.8 10.8 58.4 12.9\n",
".. ... ... ... ... ...\n",
"195 196 38.2 3.7 13.8 7.6\n",
"196 197 94.2 4.9 8.1 9.7\n",
"197 198 177.0 9.3 6.4 12.8\n",
"198 199 283.6 42.0 66.2 25.5\n",
"199 200 232.1 8.6 8.7 13.4\n",
"\n",
"[200 rows x 5 columns]"
],
"text/html": [
"\n",
" <div id=\"df-9ac026aa-816f-48c3-b216-94322ee2d6aa\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>TV</th>\n",
" <th>Radio</th>\n",
" <th>Newspaper</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>230.1</td>\n",
" <td>37.8</td>\n",
" <td>69.2</td>\n",
" <td>22.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>44.5</td>\n",
" <td>39.3</td>\n",
" <td>45.1</td>\n",
" <td>10.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>17.2</td>\n",
" <td>45.9</td>\n",
" <td>69.3</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>151.5</td>\n",
" <td>41.3</td>\n",
" <td>58.5</td>\n",
" <td>18.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>180.8</td>\n",
" <td>10.8</td>\n",
" <td>58.4</td>\n",
" <td>12.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>195</th>\n",
" <td>196</td>\n",
" <td>38.2</td>\n",
" <td>3.7</td>\n",
" <td>13.8</td>\n",
" <td>7.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>197</td>\n",
" <td>94.2</td>\n",
" <td>4.9</td>\n",
" <td>8.1</td>\n",
" <td>9.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>197</th>\n",
" <td>198</td>\n",
" <td>177.0</td>\n",
" <td>9.3</td>\n",
" <td>6.4</td>\n",
" <td>12.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>199</td>\n",
" <td>283.6</td>\n",
" <td>42.0</td>\n",
" <td>66.2</td>\n",
" <td>25.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>200</td>\n",
" <td>232.1</td>\n",
" <td>8.6</td>\n",
" <td>8.7</td>\n",
" <td>13.4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>200 rows × 5 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9ac026aa-816f-48c3-b216-94322ee2d6aa')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-9ac026aa-816f-48c3-b216-94322ee2d6aa button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-9ac026aa-816f-48c3-b216-94322ee2d6aa');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 54
}
]
},
{
"cell_type": "code",
"source": [
"DATA = DATA.drop(columns=['Unnamed: 0'])"
],
"metadata": {
"id": "Rnd4OuIEJTll"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"DATA.describe()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"id": "EM8lElX_I6hc",
"outputId": "f8145350-dab5-4d20-8485-2d0d94f5f4b2"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" TV Radio Newspaper Sales\n",
"count 200.000000 200.000000 200.000000 200.000000\n",
"mean 147.042500 23.264000 30.554000 14.022500\n",
"std 85.854236 14.846809 21.778621 5.217457\n",
"min 0.700000 0.000000 0.300000 1.600000\n",
"25% 74.375000 9.975000 12.750000 10.375000\n",
"50% 149.750000 22.900000 25.750000 12.900000\n",
"75% 218.825000 36.525000 45.100000 17.400000\n",
"max 296.400000 49.600000 114.000000 27.000000"
],
"text/html": [
"\n",
" <div id=\"df-b0b1d2ee-60de-4453-a678-95e989e6f0d7\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>Radio</th>\n",
" <th>Newspaper</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>200.000000</td>\n",
" <td>200.000000</td>\n",
" <td>200.000000</td>\n",
" <td>200.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>147.042500</td>\n",
" <td>23.264000</td>\n",
" <td>30.554000</td>\n",
" <td>14.022500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>85.854236</td>\n",
" <td>14.846809</td>\n",
" <td>21.778621</td>\n",
" <td>5.217457</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.700000</td>\n",
" <td>0.000000</td>\n",
" <td>0.300000</td>\n",
" <td>1.600000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>74.375000</td>\n",
" <td>9.975000</td>\n",
" <td>12.750000</td>\n",
" <td>10.375000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>149.750000</td>\n",
" <td>22.900000</td>\n",
" <td>25.750000</td>\n",
" <td>12.900000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>218.825000</td>\n",
" <td>36.525000</td>\n",
" <td>45.100000</td>\n",
" <td>17.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>296.400000</td>\n",
" <td>49.600000</td>\n",
" <td>114.000000</td>\n",
" <td>27.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b0b1d2ee-60de-4453-a678-95e989e6f0d7')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-b0b1d2ee-60de-4453-a678-95e989e6f0d7 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-b0b1d2ee-60de-4453-a678-95e989e6f0d7');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 56
}
]
},
{
"cell_type": "markdown",
"source": [
"# Простая линейная регрессия"
],
"metadata": {
"id": "uC0zpZOTKawW"
}
},
{
"cell_type": "code",
"source": [
"X = DATA.loc[:, DATA.columns != 'Sales'].to_numpy()\n",
"y = DATA['Sales'].to_numpy()\n",
"print(X.shape)\n",
"print(y.shape)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "KRa_VszDKaYY",
"outputId": "950fad99-2c6f-4fe0-afea-1165d116c60b"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(200, 3)\n",
"(200,)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"lin_reg_1 = LinearRegression()\n",
"scores = cross_val_score(lin_reg_1, X, y, cv=5)\n",
"print(\"%0.2f R^2 with a standard deviation of %0.2f\" % (scores.mean(), scores.std()))\n",
"\n",
"scores = cross_val_score(lin_reg_1, X, y, cv=5, scoring='neg_mean_squared_error')\n",
"print(\"%0.2f MSE with a standard deviation of %0.2f\" % (scores.mean(), scores.std()))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dqCkpqGlI-cF",
"outputId": "969ec98d-0751-4f83-b585-b6fae2534cdf"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.89 R^2 with a standard deviation of 0.04\n",
"-3.07 MSE with a standard deviation of 1.28\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# Полиномиальная регрессия"
],
"metadata": {
"id": "COMxvBd_Lkxo"
}
},
{
"cell_type": "markdown",
"source": [
"Допустим у нас есть ряд набюдений $X$:\n",
"\n",
"$$\\bf{X}=\\left[ \\begin{matrix} 1 & x_{11} & x_{12} \\\\ 1& x_{21} & x_{22} \\\\ ... & ... & ... & \\\\ 1 & x_{n1} & x_{n2} & \\end{matrix} \\right]$$\n",
"\n",
"Зависимая переменная ${Y}=\\left[ \\begin{matrix} y_1 \\\\ y_2 \\\\ ... \\\\ y_n \\end{matrix} \\right]$\n",
"\n",
"Пример простой линейной модели с двумя признаками: $y_i=\\beta_0 + \\beta_1x_{i1} + \\beta_2x_{i2}$\n",
"\n",
"Пример полиномиальной модели: $y_i=\\beta_0 + \\beta_1x_{i1} + \\beta_2x_{i2} + \\beta_3x_{i1}^2 + \\beta_4x_{i1}x_{i2} + \\beta_5x_{i2}^2$\n",
"\n",
"То есть, $X$ принимает вид (_feature engineering_):\n",
"\n",
"$$\\bf{X}=\\left[ \\begin{matrix} 1 & x_{11} & x_{12} & x_{11}^2 & x_{11}x_{12} & x_{12}^2 \\\\ 1 & x_{21} & x_{22} & x_{21}^2 & x_{21}x_{22} & x_{22}^2 \\\\ ... & ... & ... & ... & ... & ... \\\\ 1 & x_{n1} & x_{n2} & x_{n1}^2 & x_{n1}x_{n2} & x_{n2}^2 \\end{matrix} \\right]$$"
],
"metadata": {
"id": "EqESRagwzhK2"
}
},
{
"cell_type": "code",
"source": [
"# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html\n",
"\n",
"poly = PolynomialFeatures(2, include_bias=False)\n",
"X_2 = poly.fit_transform(X)\n",
"\n",
"X.shape, X_2.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JaW--o2_LHtm",
"outputId": "d1c6e321-708a-4c8a-9374-1282a736104a"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"((200, 3), (200, 9))"
]
},
"metadata": {},
"execution_count": 59
}
]
},
{
"cell_type": "code",
"source": [
"lin_reg_2 = LinearRegression()\n",
"scores = cross_val_score(lin_reg_2, X_2, y, cv=5)\n",
"print(\"%0.2f R^2 with a standard deviation of %0.2f\" % (scores.mean(), scores.std()))\n",
"\n",
"scores = cross_val_score(lin_reg_2, X_2, y, cv=5, scoring='neg_mean_squared_error')\n",
"print(\"%0.2f MSE with a standard deviation of %0.2f\" % (scores.mean(), scores.std()))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MU3knNuZMAOJ",
"outputId": "1641fa22-1dd9-418a-befc-1e59c64ed570"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.98 R^2 with a standard deviation of 0.01\n",
"-0.44 MSE with a standard deviation of 0.39\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"poly_3 = PolynomialFeatures(3, include_bias=False)\n",
"X_3 = poly_3.fit_transform(X)\n",
"\n",
"print(X.shape, X_3.shape)\n",
"\n",
"lin_reg_3 = LinearRegression()\n",
"scores = cross_val_score(lin_reg_3, X_3, y, cv=5)\n",
"print(\"%0.2f R^2 with a standard deviation of %0.2f\" % (scores.mean(), scores.std()))\n",
"\n",
"scores = cross_val_score(lin_reg_3, X_3, y, cv=5, scoring='neg_mean_squared_error')\n",
"print(\"%0.2f MSE with a standard deviation of %0.2f\" % (scores.mean(), scores.std()))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tkSUyoJMMOoo",
"outputId": "7ac71a71-030a-42b0-e337-e155199c2aa0"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(200, 3) (200, 19)\n",
"0.99 R^2 with a standard deviation of 0.01\n",
"-0.31 MSE with a standard deviation of 0.24\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# Простая нейронная сеть"
],
"metadata": {
"id": "gGE1QhiCNDKk"
}
},
{
"cell_type": "code",
"source": [
"def simple_model(input_features):\n",
" input = keras.Input(shape=(input_features,))\n",
" x = layers.Dense(8, activation='relu')(input)\n",
" x = layers.Dense(8, activation='relu')(x)\n",
" x = layers.Dense(8, activation='relu')(x)\n",
" output = layers.Dense(1)(x)\n",
" model = keras.Model(input, output)\n",
" return model"
],
"metadata": {
"id": "qEc_99FAbGR7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"nn_1 = simple_model(3)\n",
"nn_1.summary()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "M4he8n1vMx1w",
"outputId": "8b5c7caf-9712-4ca3-8a38-49fe9f4f6e9b"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"model_30\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" input_31 (InputLayer) [(None, 3)] 0 \n",
" \n",
" dense_120 (Dense) (None, 8) 32 \n",
" \n",
" dense_121 (Dense) (None, 8) 72 \n",
" \n",
" dense_122 (Dense) (None, 8) 72 \n",
" \n",
" dense_123 (Dense) (None, 1) 9 \n",
" \n",
"=================================================================\n",
"Total params: 185\n",
"Trainable params: 185\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# original features\n",
"input_features=3\n",
"MSE_metric = []\n",
"r2_metric = []\n",
"LR = 0.01\n",
"batch_size = 1\n",
"epochs = 10\n",
"\n",
"kfold = KFold(n_splits=5, shuffle=True)\n",
"\n",
"step = 1\n",
"for train, test in kfold.split(X, y):\n",
" model = simple_model(input_features)\n",
" model.compile(\n",
" optimizer=keras.optimizers.Adam(learning_rate=LR),\n",
" loss=[tf.keras.losses.MeanSquaredError()],\n",
" metrics=[tfa.metrics.RSquare(dtype=tf.float32, y_shape=(1,))]\n",
" )\n",
" \n",
" print(\"Traint on Fold # {}\".format(step))\n",
" history = model.fit(X[train], y[train],\n",
" batch_size=batch_size,\n",
" epochs=epochs)\n",
" \n",
" scores = model.evaluate(X[test], y[test], verbose=0)\n",
" \n",
" MSE_metric.append(scores[0])\n",
" r2_metric.append(scores[1])\n",
"\n",
" step += 1\n",
"\n",
"\n",
"print(\"%0.2f R^2 with a standard deviation of %0.2f\" % (np.mean(r2_metric), np.std(r2_metric)))\n",
"print(\"%0.2f MSE with a standard deviation of %0.2f\" % (np.mean(MSE_metric), np.std(MSE_metric)))\n",
"\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "O2jAiISyamFK",
"outputId": "82bd138a-0370-4a84-f895-b161a9587a0f"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Traint on Fold # 1\n",
"Epoch 1/10\n",
"160/160 [==============================] - 1s 2ms/step - loss: 16.2024 - r_square: 0.4167\n",
"Epoch 2/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.9761 - r_square: 0.8569\n",
"Epoch 3/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.7821 - r_square: 0.8638\n",
"Epoch 4/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.3843 - r_square: 0.9142\n",
"Epoch 5/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.0153 - r_square: 0.8914\n",
"Epoch 6/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.8528 - r_square: 0.8973\n",
"Epoch 7/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.9074 - r_square: 0.8953\n",
"Epoch 8/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 3.3223 - r_square: 0.8804\n",
"Epoch 9/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.0669 - r_square: 0.9256\n",
"Epoch 10/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 3.1468 - r_square: 0.8867\n",
"Traint on Fold # 2\n",
"Epoch 1/10\n",
"160/160 [==============================] - 1s 1ms/step - loss: 17.1019 - r_square: 0.3959\n",
"Epoch 2/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 7.1286 - r_square: 0.7482\n",
"Epoch 3/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.3780 - r_square: 0.9160\n",
"Epoch 4/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.2266 - r_square: 0.9213\n",
"Epoch 5/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.2002 - r_square: 0.9223\n",
"Epoch 6/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.7749 - r_square: 0.9373\n",
"Epoch 7/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.8101 - r_square: 0.9361\n",
"Epoch 8/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 1.9340 - r_square: 0.9317\n",
"Epoch 9/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.3880 - r_square: 0.9156\n",
"Epoch 10/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.9022 - r_square: 0.8975\n",
"Traint on Fold # 3\n",
"Epoch 1/10\n",
"160/160 [==============================] - 1s 2ms/step - loss: 15.5137 - r_square: 0.4153\n",
"Epoch 2/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.7498 - r_square: 0.8587\n",
"Epoch 3/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 3.0436 - r_square: 0.8853\n",
"Epoch 4/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 2.4885 - r_square: 0.9062\n",
"Epoch 5/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.6690 - r_square: 0.8994\n",
"Epoch 6/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 1.2477 - r_square: 0.9530\n",
"Epoch 7/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.2213 - r_square: 0.9540\n",
"Epoch 8/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 1.9037 - r_square: 0.9283\n",
"Epoch 9/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.0913 - r_square: 0.9589\n",
"Epoch 10/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.1647 - r_square: 0.9561\n",
"Traint on Fold # 4\n",
"Epoch 1/10\n",
"160/160 [==============================] - 1s 1ms/step - loss: 15.6965 - r_square: 0.3908\n",
"Epoch 2/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 4.8530 - r_square: 0.8117\n",
"Epoch 3/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.0453 - r_square: 0.8818\n",
"Epoch 4/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.5156 - r_square: 0.9024\n",
"Epoch 5/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 1.6183 - r_square: 0.9372\n",
"Epoch 6/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 1.6698 - r_square: 0.9352\n",
"Epoch 7/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.2195 - r_square: 0.9527\n",
"Epoch 8/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.4367 - r_square: 0.9442\n",
"Epoch 9/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.5627 - r_square: 0.9393\n",
"Epoch 10/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 1.0964 - r_square: 0.9574\n",
"Traint on Fold # 5\n",
"Epoch 1/10\n",
"160/160 [==============================] - 1s 1ms/step - loss: 38.1272 - r_square: -0.4204\n",
"Epoch 2/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.5857 - r_square: 0.8664\n",
"Epoch 3/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 4.1824 - r_square: 0.8442\n",
"Epoch 4/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.6618 - r_square: 0.9008\n",
"Epoch 5/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.9917 - r_square: 0.8885\n",
"Epoch 6/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 3.2090 - r_square: 0.8805\n",
"Epoch 7/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 2.1400 - r_square: 0.9203\n",
"Epoch 8/10\n",
"160/160 [==============================] - 0s 1ms/step - loss: 2.1150 - r_square: 0.9212\n",
"Epoch 9/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.4849 - r_square: 0.9447\n",
"Epoch 10/10\n",
"160/160 [==============================] - 0s 2ms/step - loss: 1.5077 - r_square: 0.9438\n",
"0.91 R^2 with a standard deviation of 0.06\n",
"2.40 MSE with a standard deviation of 1.61\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"X_2.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fYzGmId2j2vn",
"outputId": "dcc87048-fb27-4a5c-f76b-06249304aacd"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(200, 9)"
]
},
"metadata": {},
"execution_count": 65
}
]
},
{
"cell_type": "code",
"source": [
"nn_1 = simple_model(9)\n",
"nn_1.summary()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "nimpInU0j1TL",
"outputId": "763f6300-2e82-4341-90cd-32bee069f9be"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"model_36\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" input_37 (InputLayer) [(None, 9)] 0 \n",
" \n",
" dense_144 (Dense) (None, 8) 80 \n",
" \n",
" dense_145 (Dense) (None, 8) 72 \n",
" \n",
" dense_146 (Dense) (None, 8) 72 \n",
" \n",
" dense_147 (Dense) (None, 1) 9 \n",
" \n",
"=================================================================\n",
"Total params: 233\n",
"Trainable params: 233\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# X^2 polynomial features\n",
"\n",
"input_features=9\n",
"MSE_metric = []\n",
"r2_metric = []\n",
"LR = 0.5\n",
"batch_size = 1\n",
"epochs = 15\n",
"\n",
"kfold = KFold(n_splits=5, shuffle=True)\n",
"\n",
"step = 1\n",
"for train, test in kfold.split(X_2, y):\n",
" model = simple_model(input_features)\n",
" model.compile(\n",
" optimizer=keras.optimizers.Adam(learning_rate=LR),\n",
" loss=[tf.keras.losses.MeanSquaredError()],\n",
" metrics=[tfa.metrics.RSquare(dtype=tf.float32, y_shape=(1,))]\n",
" )\n",
" \n",
" print(\"Traint on Fold # {}\".format(step))\n",
" history = model.fit(X_2[train], y[train],\n",
" batch_size=batch_size,\n",
" epochs=epochs)\n",
" \n",
" scores = model.evaluate(X_2[test], y[test], verbose=0)\n",
" \n",
" MSE_metric.append(scores[0])\n",
" r2_metric.append(scores[1])\n",
"\n",
" step += 1\n",
"\n",
"\n",
"print(\"%0.2f R^2 with a standard deviation of %0.2f\" % (np.mean(r2_metric), np.std(r2_metric)))\n",
"print(\"%0.2f MSE with a standard deviation of %0.2f\" % (np.mean(MSE_metric), np.std(MSE_metric)))\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ufWWjw0ZgHiv",
"outputId": "c6837406-6e13-4548-e373-b2e91e6b639b"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Traint on Fold # 1\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 2ms/step - loss: 239035.0000 - r_square: -9199.8721\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 37.3801 - r_square: -0.4388\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 36.6234 - r_square: -0.4097\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 32.8304 - r_square: -0.2637\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 35.4500 - r_square: -0.3645\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 34.0948 - r_square: -0.3124\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 36.7745 - r_square: -0.4155\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 41.5983 - r_square: -0.6012\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 37.4649 - r_square: -0.4421\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 36.5226 - r_square: -0.4058\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 39.3880 - r_square: -0.5161\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 42.9531 - r_square: -0.6533\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 34.7907 - r_square: -0.3392\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 52.1614 - r_square: -1.0078\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 36.0439 - r_square: -0.3874\n",
"Traint on Fold # 2\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 2ms/step - loss: 180739664.0000 - r_square: -6380801.5000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 162.5291 - r_square: -4.7379\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 160.1068 - r_square: -4.6524\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 157.0634 - r_square: -4.5449\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 153.4884 - r_square: -4.4187\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 149.4353 - r_square: -4.2756\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 144.9490 - r_square: -4.1173\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 140.0574 - r_square: -3.9446\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 134.8342 - r_square: -3.7602\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 129.2957 - r_square: -3.5646\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 123.4846 - r_square: -3.3595\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 117.4536 - r_square: -3.1466\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 111.2620 - r_square: -2.9280\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 104.9761 - r_square: -2.7061\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 98.6283 - r_square: -2.4820\n",
"Traint on Fold # 3\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 2ms/step - loss: 218001.1562 - r_square: -8377.3857\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 27.8599 - r_square: -0.0707\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 27.7370 - r_square: -0.0660\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 25.1877 - r_square: 0.0320\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.3859 - r_square: -0.0910\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.5888 - r_square: -0.1372\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 31.9030 - r_square: -0.2261\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.6330 - r_square: -0.1389\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 31.4090 - r_square: -0.2071\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 28.9161 - r_square: -0.1113\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.8925 - r_square: -0.1104\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 31.6793 - r_square: -0.2175\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 31.9550 - r_square: -0.2281\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 33.1925 - r_square: -0.2757\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.1770 - r_square: -0.1214\n",
"Traint on Fold # 4\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 1ms/step - loss: 54.1740 - r_square: -1.0094\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 31.0335 - r_square: -0.1511\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.1616 - r_square: -0.0816\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.8393 - r_square: -0.1068\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.4575 - r_square: -0.0926\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 30.0795 - r_square: -0.1157\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 27.9198 - r_square: -0.0356\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 27.3414 - r_square: -0.0141\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.1615 - r_square: -0.0445\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 27.6828 - r_square: -0.0268\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.0983 - r_square: -0.0422\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.5057 - r_square: -0.0573\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 28.6568 - r_square: -0.0629\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 28.7601 - r_square: -0.0667\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 28.3267 - r_square: -0.0507\n",
"Traint on Fold # 5\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 1ms/step - loss: 4140985856.0000 - r_square: -147298672.0000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 166.7630 - r_square: -4.9319\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 166.2355 - r_square: -4.9131\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 165.5636 - r_square: -4.8892\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 164.7577 - r_square: -4.8606\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 163.8211 - r_square: -4.8273\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 162.7552 - r_square: -4.7893\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 161.5582 - r_square: -4.7468\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 160.2272 - r_square: -4.6994\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 158.7604 - r_square: -4.6472\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 157.1565 - r_square: -4.5902\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 155.4084 - r_square: -4.5280\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 153.5127 - r_square: -4.4606\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 151.4646 - r_square: -4.3877\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 149.2648 - r_square: -4.3095\n",
"-1.92 R^2 with a standard deviation of 2.02\n",
"71.91 MSE with a standard deviation of 41.74\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"X_3.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "S44LAAull35c",
"outputId": "5c0971ee-122d-499f-a6cb-c7ac1bc2de32"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(200, 19)"
]
},
"metadata": {},
"execution_count": 68
}
]
},
{
"cell_type": "code",
"source": [
"nn_1 = simple_model(19)\n",
"nn_1.summary()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2vO67DlDl8V5",
"outputId": "6792981c-ce4d-465f-feed-a667a8390ea4"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"model_42\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" input_43 (InputLayer) [(None, 19)] 0 \n",
" \n",
" dense_168 (Dense) (None, 8) 160 \n",
" \n",
" dense_169 (Dense) (None, 8) 72 \n",
" \n",
" dense_170 (Dense) (None, 8) 72 \n",
" \n",
" dense_171 (Dense) (None, 1) 9 \n",
" \n",
"=================================================================\n",
"Total params: 313\n",
"Trainable params: 313\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# X^3 polynomial features\n",
"\n",
"input_features=19\n",
"MSE_metric = []\n",
"r2_metric = []\n",
"LR = 0.1\n",
"batch_size = 1\n",
"epochs = 15\n",
"\n",
"kfold = KFold(n_splits=5, shuffle=True)\n",
"\n",
"step = 1\n",
"for train, test in kfold.split(X_3, y):\n",
" model = simple_model(input_features)\n",
" model.compile(\n",
" optimizer=keras.optimizers.Adam(learning_rate=LR),\n",
" loss=[tf.keras.losses.MeanSquaredError()],\n",
" metrics=[tfa.metrics.RSquare(dtype=tf.float32, y_shape=(1,))]\n",
" )\n",
" \n",
" print(\"Traint on Fold # {}\".format(step))\n",
" history = model.fit(X_3[train], y[train],\n",
" batch_size=batch_size,\n",
" epochs=epochs)\n",
" \n",
" scores = model.evaluate(X_3[test], y[test], verbose=0)\n",
" \n",
" MSE_metric.append(scores[0])\n",
" r2_metric.append(scores[1])\n",
"\n",
" step += 1\n",
"\n",
"\n",
"print(\"%0.2f R^2 with a standard deviation of %0.2f\" % (np.mean(r2_metric), np.std(r2_metric)))\n",
"print(\"%0.2f MSE with a standard deviation of %0.2f\" % (np.mean(MSE_metric), np.std(MSE_metric)))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ReXwB3j9OowI",
"outputId": "ef51d07f-51df-4f78-f331-61f9980562db"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Traint on Fold # 1\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 2ms/step - loss: 208534814720.0000 - r_square: -7539420160.0000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.6321 - r_square: -0.0713\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.3144 - r_square: -0.0237\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.8664 - r_square: -0.0798\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.5055 - r_square: -0.0667\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.4847 - r_square: -0.0660\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 30.8022 - r_square: -0.1136\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 29.9225 - r_square: -0.0818\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.9794 - r_square: -0.0477\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 29.5053 - r_square: -0.0667\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 30.8465 - r_square: -0.1152\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 30.0251 - r_square: -0.0855\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 30.3983 - r_square: -0.0990\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 28.5822 - r_square: -0.0334\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 29.2593 - r_square: -0.0578\n",
"Traint on Fold # 2\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 1ms/step - loss: 39032479744.0000 - r_square: -1439124608.0000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 210.6791 - r_square: -6.7678\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 209.8824 - r_square: -6.7384\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 208.8626 - r_square: -6.7008\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 207.6375 - r_square: -6.6556\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 206.2179 - r_square: -6.6033\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 204.6006 - r_square: -6.5436\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 202.7906 - r_square: -6.4769\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 200.7783 - r_square: -6.4027\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 198.5644 - r_square: -6.3211\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 196.1439 - r_square: -6.2318\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 193.5140 - r_square: -6.1349\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 190.6665 - r_square: -6.0299\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 187.5995 - r_square: -5.9168\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 184.3087 - r_square: -5.7955\n",
"Traint on Fold # 3\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 2ms/step - loss: 28592672768.0000 - r_square: -1088961536.0000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 233.1044 - r_square: -7.8779\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 233.0444 - r_square: -7.8756\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 232.9677 - r_square: -7.8727\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 232.8752 - r_square: -7.8692\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 232.7672 - r_square: -7.8650\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 232.6434 - r_square: -7.8603\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 232.5038 - r_square: -7.8550\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 232.3466 - r_square: -7.8490\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 232.1722 - r_square: -7.8423\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 231.9788 - r_square: -7.8350\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 231.7654 - r_square: -7.8269\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 231.5313 - r_square: -7.8179\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 231.2749 - r_square: -7.8081\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 230.9945 - r_square: -7.7975\n",
"Traint on Fold # 4\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 2ms/step - loss: 10843359232.0000 - r_square: -412928064.0000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 229.7369 - r_square: -7.7487\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 229.6414 - r_square: -7.7450\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 229.5188 - r_square: -7.7404\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 229.3708 - r_square: -7.7347\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 229.1986 - r_square: -7.7282\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 229.0008 - r_square: -7.7206\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 228.7769 - r_square: -7.7122\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 228.5263 - r_square: -7.7026\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 228.2477 - r_square: -7.6920\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 227.9395 - r_square: -7.6802\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 227.5996 - r_square: -7.6673\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 227.2270 - r_square: -7.6531\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 226.8187 - r_square: -7.6376\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 226.3732 - r_square: -7.6206\n",
"Traint on Fold # 5\n",
"Epoch 1/15\n",
"160/160 [==============================] - 1s 1ms/step - loss: 327125762048.0000 - r_square: -11685727232.0000\n",
"Epoch 2/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 192.7558 - r_square: -5.8857\n",
"Epoch 3/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 192.6152 - r_square: -5.8807\n",
"Epoch 4/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 192.4306 - r_square: -5.8741\n",
"Epoch 5/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 192.2160 - r_square: -5.8664\n",
"Epoch 6/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 191.9613 - r_square: -5.8574\n",
"Epoch 7/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 191.6872 - r_square: -5.8476\n",
"Epoch 8/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 191.3779 - r_square: -5.8365\n",
"Epoch 9/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 191.0527 - r_square: -5.8249\n",
"Epoch 10/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 190.6981 - r_square: -5.8122\n",
"Epoch 11/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 190.3278 - r_square: -5.7990\n",
"Epoch 12/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 189.9332 - r_square: -5.7849\n",
"Epoch 13/15\n",
"160/160 [==============================] - 0s 1ms/step - loss: 189.5108 - r_square: -5.7698\n",
"Epoch 14/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 189.0756 - r_square: -5.7543\n",
"Epoch 15/15\n",
"160/160 [==============================] - 0s 2ms/step - loss: 188.6184 - r_square: -5.7379\n",
"-5.54 R^2 with a standard deviation of 2.90\n",
"177.37 MSE with a standard deviation of 81.22\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# Выбор модели"
],
"metadata": {
"id": "iBlV6TPotLhH"
}
},
{
"cell_type": "markdown",
"source": [
"Model Name | parameters | $r^2$ | Mean Squared Error|\n",
"----------------|------------|--------------|-------------------|\n",
"LR | $\\bf4$ |$0.89\\pm0.04$ |$3.07\\pm1.28$ |\n",
"LR poly 2 | $10$ |$0.98\\pm0.01$ |$0.44\\pm0.39$ |\n",
"LR poly 3 | $20$ |$\\bf0.99\\pm0.01$ |$\\bf0.31\\pm0.24$ |\n",
"NN | $185$ |$0.91\\pm1.61$ |$1.86\\pm1.49$ |\n"
],
"metadata": {
"id": "ugPHpCGYtPIp"
}
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "B60pcxi-mEXO"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment