Skip to content

Instantly share code, notes, and snippets.

@qdpham
Created February 25, 2022 13:29
Show Gist options
  • Save qdpham/21b30181a8e0dd50cfcbba57b61e9b70 to your computer and use it in GitHub Desktop.
Save qdpham/21b30181a8e0dd50cfcbba57b61e9b70 to your computer and use it in GitHub Desktop.
Fun-inria MOOC scikit-learn Exervice M4.04: Different outputs for linear regression coefficients
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "3f36b5d0",
"metadata": {},
"source": [
"# 📃 Solution for Exercise M4.04\n",
"\n",
"In the previous notebook, we saw the effect of applying some regularization\n",
"on the coefficient of a linear model.\n",
"\n",
"In this exercise, we will study the advantage of using some regularization\n",
"when dealing with correlated features.\n",
"\n",
"We will first create a regression dataset. This dataset will contain 2,000\n",
"samples and 5 features from which only 2 features will be informative."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2e45e8c6-a2ac-4d7a-871a-088094fe7fd5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"System:\n",
" python: 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ]\n",
"executable: /Users/qdp/opt/anaconda3/bin/python\n",
" machine: macOS-10.16-x86_64-i386-64bit\n",
"\n",
"Python dependencies:\n",
" pip: 21.2.4\n",
" setuptools: 58.0.4\n",
" sklearn: 1.0.2\n",
" numpy: 1.20.3\n",
" scipy: 1.7.1\n",
" Cython: 0.29.24\n",
" pandas: 1.3.4\n",
" matplotlib: 3.4.3\n",
" joblib: 1.1.0\n",
"threadpoolctl: 2.2.0\n",
"\n",
"Built with OpenMP: True\n"
]
}
],
"source": [
"# Configuration \n",
"import sklearn\n",
"sklearn.show_versions()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8a51a22a",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import make_regression\n",
"\n",
"data, target, coef = make_regression(\n",
" n_samples=2_000,\n",
" n_features=5,\n",
" n_informative=2,\n",
" shuffle=False,\n",
" coef=True,\n",
" random_state=0,\n",
" noise=30,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8bdbc1ce",
"metadata": {},
"source": [
"When creating the dataset, `make_regression` returns the true coefficient\n",
"used to generate the dataset. Let's plot this information."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "aab52b75",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Relevant feature #0 9.566665\n",
"Relevant feature #1 40.192077\n",
"Noisy feature #0 0.000000\n",
"Noisy feature #1 0.000000\n",
"Noisy feature #2 0.000000\n",
"dtype: float64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAckAAAD4CAYAAACHbh3NAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcHElEQVR4nO3df5BmVX3n8fdnYAT5IUQZLArQoViKFEKYYQZToCAQQrJKBAUXDBpQEKGiKCkw7BoSTIxikMVo4hqWEEBQwGCSUSlhIhCHYJAZ5ge/JGIyCoYSKESQJRrgu388p+Wh6TPd0zPT3UO9X1Vdz33uPeee73PR/vQ593ZPqgpJkvRCs6a7AEmSZipDUpKkDkNSkqQOQ1KSpA5DUpKkjk2nuwCtP9ttt13NnTt3usuQpI3KsmXLHqmqOWMdMyRfRObOncvSpUunuwxJ2qgk+X7vmMutkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHVs8JBMUknOH3p/RpJzxulzSpLfWcdxN0vyj0lWJDlmEv2PTLLHutQwwXG+mGRukg8mOXZo/xVJ7k1yZ5KLk8ze0LVIkp5vKmaSPwPemmS7iXaoqs9V1WXrOO58YHZVzauqqybR/0hgrUIyyaaTGGeXqloNvAFYMrT/CuCXgb2AlwInTeLckqR1MBUh+TRwIXD66ANJXp3kG0lWtddXtf3nJDmjbZ+W5O7W5soks5J8N8mcdnxWkvuGQzjJ9sDlwLw2k9w1yYIk/5RkWZLrkuzQ2r4nyW1JVia5JskWSfYH3gycN9T/piQLW5/tkqxu2yck+VKSrwDXJ9myzfxuS7I8yRFjXZQ2U7wb2D3JCuAw4GtJTgKoqmurAb4N7LSu/yEkSWtnqu5J/iVwXJJtRu3/C+CyqvoVBjOnT4/R9yxgfmtzSlU9yyAAj2vHDwVWVtUjIx2q6iEGM68lVTUP+AHwGeDoqloAXAz8aWv+5arat6r2Bu4BTqyqW4BFwJltJvq9cT7ffsDxVXUI8GHghqraFziYQdBuObpDVR0HnAN8lMGs9do21kXD7doy6zuBr481cJKTkyxNsvThhx8ep0xJ0tqYkpCsqseBy4DTRh3aD/hC2/488Poxuq8CrkjyDgazUhiE3Mg9y3cDfzNOCbsDewKL26ztD3huZrZnkiVJ7mAQvK+ZyGcaZXFVPdq2DwPOauPcBGwOvKrTbz6wgsGS6opOm88C36yqJWMdrKoLq2phVS2cM2fOJEqXJPVM5h7aZH0KuJ01B1qNse9NwIEMlj/PTvKaqro/yY+SHAL8Ks/NKnsC3FVV+41x7BLgyKpameQE4KDOOZ7muR8qNh917MlRYx1VVfd2i0neCHwM2AU4HJgDPJnk0Ko6eKjdH7Vj7+2dS5K04UzZr4C0mdbVwIlDu28BRp7oPA64ebhPklnAzlV1I/AhYFtgq3b4IgbLrldX1TPjDH8vMCfJfu28s5OMzBi3Bh5sy5rDYftEOzZiNbCgbR+9hrGuA96fJG2s+aMbVNW17Vx3VtVewF0MlpSHA/Ik4DeAt7clZknSFJvq35M8Hxh+yvU04F1JVjG47/aBUe03AS5vS6HLgQuq6rF2bBGDwBxvqZWq+jmDYPtEkpUMljb3b4fPBm4FFgPfGep2JXBme/hmV+CTwKlJbhn1GUb7E2A2sCrJne39WOYDK5O8hMFTuI+POv454JXAt9rDQ3843ueUJK1fGTw8ufFpT5peUFUHTHctM8XChQtr6dKl012GJG1UkiyrqoVjHZvKe5LrTZKzgFMZ/16kJEmTtlH+WbqqOreqXl1VN4/fWpKkydkoQ1KSpKlgSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1LHBQzJJJTl/6P0ZSc4Zp88pSX5nHcfdLMk/JlmR5JhJ9D8yyR7rUsMEx/likrlJPpjk2KH970tyX7t+223oOiRJLzQVM8mfAW9dm2/0VfW5qrpsHcedD8yuqnlVddUk+h8JrFVIJtl0EuPsUlWrgTcAS4b2/zNwKPD9SZxTkrQeTEVIPg1cCJw++kCSVyf5RpJV7fVVbf85Sc5o26clubu1uTLJrCTfTTKnHZ/VZlzbDZ13e+ByYF6bSe6aZEGSf0qyLMl1SXZobd+T5LYkK5Nck2SLJPsDbwbOG+p/U5KFrc92SVa37ROSfCnJV4Drk2yZ5OJ2zuVJjhjroiS5IsndwO5JVgCHAV9LchJAVS1v4SlJmiZTdU/yL4Hjkmwzav9fAJdV1a8AVwCfHqPvWcD81uaUqnqWQQAe144fCqysqkdGOlTVQ8BJwJKqmgf8APgMcHRVLQAuBv60Nf9yVe1bVXsD9wAnVtUtwCLgzDYT/d44n28/4PiqOgT4MHBDVe0LHMwgaLcc3aGqjgPOAT7KYNZ6bRvronHGkiRNkSkJyap6HLgMOG3Uof2AL7TtzwOvH6P7KuCKJO9gMCuFQciN3LN8N/A345SwO7AnsLjN2v4A2Kkd2zPJkiR3MAje10zkM42yuKoebduHAWe1cW4CNgde1ek3H1gB7NVe11qSk5MsTbL04YcfnswpJEkdk7mHNlmfAm5nzYFWY+x7E3Agg+XPs5O8pqruT/KjJIcAv8pzs8qeAHdV1X5jHLsEOLKqViY5ATioc46nee6His1HHXty1FhHVdW93WKSNwIfA3YBDgfmAE8mObSqDl7zR3m+qrqQwXI2CxcuHOv6SZImacp+BaTNtK4GThzafQsw8kTnccDNw32SzAJ2rqobgQ8B2wJbtcMXMVh2vbqqnhln+HuBOUn2a+ednWRkxrg18GCS2Tw/bJ9ox0asBha07aPXMNZ1wPuTpI01f3SDqrq2nevOqtoLuIvBkvJaBaQkacOa6t+TPB8Yfsr1NOBdSVYB7wQ+MKr9JsDlbSl0OXBBVT3Wji1iEJjjLbVSVT9nEGyfSLKSwdLm/u3w2cCtwGLgO0PdrgTObA/f7Ap8Ejg1yS2jPsNofwLMBlYlubO9H8t8YGWSlzB4Cvfx4YPtgaUHGCwLr0rivUpJmmKp2jhX6NqTphdU1QHTXctMsXDhwlq6dOl0lyFJG5Uky6pq4VjHpvKe5HqT5CzgVMa/FylJ0qRtlH+WrqrOrapXV9XN47eWJGlyNsqQlCRpKhiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHVs8JBMUknOH3p/RpJzxulzSpLfWcdxN0vyj0lWJDlmEv2PTLLHutQwwXG+mGRukg8mOXZo/y5Jbk3y3SRXJXnJhq5FkvR8UzGT/Bnw1iTbTbRDVX2uqi5bx3HnA7Oral5VXTWJ/kcCaxWSSTadxDi7VNVq4A3AkqH9nwAuqKrdgB8DJ07i3JKkdTAVIfk0cCFw+ugDSV6d5BtJVrXXV7X95yQ5o22fluTu1ubKJLPa7GpOOz4ryX3DIZxke+ByYF6bSe6aZEGSf0qyLMl1SXZobd+T5LYkK5Nck2SLJPsDbwbOG+p/U5KFrc92SVa37ROSfCnJV4Drk2yZ5OJ2zuVJjhjroiS5IsndwO5JVgCHAV9LclKSAIcAf9uaX8ogtCVJU2iq7kn+JXBckm1G7f8L4LKq+hXgCuDTY/Q9C5jf2pxSVc8yCMDj2vFDgZVV9chIh6p6CDgJWFJV84AfAJ8Bjq6qBcDFwJ+25l+uqn2ram/gHuDEqroFWASc2Wai3xvn8+0HHF9VhwAfBm6oqn2BgxkE7ZajO1TVccA5wEcZBOC1bayLgFcAj1XV0635A8COYw2c5OQkS5Msffjhh8cpU5K0NqYkJKvqceAy4LRRh/YDvtC2Pw+8fozuq4ArkryDwawUBiE3cs/y3cDfjFPC7sCewOI2a/sDYKd2bM8kS5LcwSB4XzORzzTK4qp6tG0fBpzVxrkJ2Bx4VafffGAFsFd7HZEx2tZYJ6iqC6tqYVUtnDNnzloXLknqm8w9tMn6FHA7aw60sYLgTcCBDJY/z07ymqq6P8mPkhwC/CrPzSp7AtxVVfuNcewS4MiqWpnkBOCgzjme5rkfKjYfdezJUWMdVVX3dotJ3gh8DNgFOByYAzyZ5NCqOhh4BNg2yaZtNrkT8B/9jydJ2hCm7FdA2kzrap7/AMotwMgTnccBNw/3STIL2LmqbgQ+BGwLbNUOX8Rg2fXqqnpmnOHvBeYk2a+dd3aSkRnj1sCDSWbz/LB9oh0bsRpY0LaPXsNY1wHvb/cVSTJ/dIOqurad686q2gu4i8GS8sHteAE3Do1zPPAP43xGSdJ6NtW/J3k+MPyU62nAu5KsAt4JfGBU+02Ay9tS6HIGT3s+1o4tYhCY4y21UlU/ZxA4n0iyksHS5v7t8NnArcBi4DtD3a4EzmwP3+wKfBI4Ncktoz7DaH8CzAZWJbmzvR/LfGBl+9WO2W1JetjvA7+X5D4G9yj/erzPKUlavzKYtGx82pOmF1TVAdNdy0yxcOHCWrp06XSXIUkblSTLqmrhWMem8p7kepPkLOBUxr8XKUnSpG2Uf5auqs6tqldX1c3jt5YkaXI2ypCUJGkqGJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1GJKSJHUYkpIkdRiSkiR1bDrdBWj9ueOHP2HuWV+b7jIkaUqtPvdNG+zcziQlSeowJCVJ6jAkJUnqMCQlSeowJCVJ6jAkJUnqGDckkzyTZEWSO5N8Jcm247Q/J8kZ663CNY81N8lvr+H4eUnuSnLeJM49L8kb163CCY3z3iQntPE+N7T/wCS3J3k6ydEbug5J0gtNZCb5VFXNq6o9gUeB393ANa2NuUA3JIH3AvtU1ZmTOPc8YK1CMgNrOzs/AFgCvKG9jvgBcALwhbU8nyRpPVnbb+jfAnYESLJrkq8nWZZkSZJfHt14rDZJtkmyeiRMkmyR5P4ks5O8J8ltSVYmuSbJFq3NJUk+neSWJP82NLM6FzigzXRPHzX2ImBL4NYkxySZ0855W/t6XWv32nbe5e119yQvAf4YOKad+5jRM+Q2s57bvu5J8lngdmDnJGe2MVYl+chYFzLJ6UlWAG8BrgE+Anx4ZDZZVaurahXw7Fr+N5IkrScTDskkmwC/Bixquy4E3l9VC4AzgM+O0e0FbarqJ8BKBjMngN8Crquq/wK+XFX7VtXewD3AiUPn2gF4PXA4g3AEOAtY0ma6FwwPXFVv5rlZ8FXAnwMXVNW+wFHARa3pd4ADq2o+8IfAx6rq5237qqH+a7I7cFk7x+7AbsBrGcxGFyQ5cHSHVu+vA9+oqnnAd6tqj6o6ZZyxJElTZCJ/lu6lbcYzF1gGLE6yFbA/8KUkI+02G+40TpurgGOAG4FjeS5g90zyUWBbYCvguqFT/n1VPQvcneSVE/t4z3MosMdQLS9LsjWwDXBpkt2AAmZP4tzfr6p/aduHta/l7f1WDELzm2P02wdY2er48STGJcnJwMkAm7xszmROIUnqmEhIPlVV85JsA3yVwT3JS4DH2gyoZ9Ya2iwCPp7k5cAC4Ia2/xLgyKpameQE4KChPj8b2g5rbxawX1U9NbwzyWeAG6vqLUnmAjd1+j/N82femw9tPzmqto9X1V/1CkmyPXA9sD3wn8Dbga3bDyNHVdX3JvKBAKrqQgYzdjbbYbeaaD9J0vgmvNzalklPY7Bs+hTw70neBr94YGXvUe0f77Wpqp8C32awBPrVqnqmddsaeDDJbOC4CZT1ROszEdcD7xt5k2Re29wG+GHbPmEN517NYOZHkn2AXTrjXAe8u82kSbJjC8VfqKqH2g8PtzNYlr0ceFdb2p1wQEqSNqy1enCnqpYzuJ94LIMQOzHJSuAu4IgxuqypzVXAO9rriLOBW4HFDO4VjmcV8HR70Of0cdqeBixsD9PcDYzc+/szBrPafwY2GWp/I4Pl2RVJjmHwcM3L22zvVOBfxxqkqq5n8ETqt5LcAfwtYwR5u8f7iqp6hMGy9M2jju+b5AHgbcBfJblrnM8nSVrPUuUK3YvFZjvsVjsc/6npLkOSptS6/lNZSZZV1cKxjvkXdyRJ6jAkJUnqMCQlSeowJCVJ6jAkJUnqmMgfE9BGYq8dt2HpOj7lJUl6jjNJSZI6DElJkjoMSUmSOgxJSZI6DElJkjoMSUmSOgxJSZI6DElJkjoMSUmSOgxJSZI6DElJkjoMSUmSOgxJSZI6DElJkjoMSUmSOgxJSZI6DElJkjoMSUmSOgxJSZI6DElJkjoMSUmSOjad7gK0/tzxw58w96yvTXcZG43V575pukuQNMM5k5QkqcOQlCSpw5CUJKnDkJQkqcOQlCSpw5CUJKnDkJQkqWPckEzyTJIVSe5M8pUk247T/pwkZ6y3Ctc81twkv72G4+cluSvJeZM497wkb1y3Cic0znuTnNDG+9zQ/s2SXJXkviS3Jpm7oWuRJD3fRGaST1XVvKraE3gU+N0NXNPamAt0QxJ4L7BPVZ05iXPPA9YqJDOwtrPzA4AlwBva64gTgR9X1X8DLgA+sZbnlSSto7X9hv4tYEeAJLsm+XqSZUmWJPnl0Y3HapNkmySrR8IkyRZJ7k8yO8l7ktyWZGWSa5Js0dpckuTTSW5J8m9Jjm5DnAsc0Ga6p48aexGwJXBrkmOSzGnnvK19va61e2077/L2unuSlwB/DBzTzn3M6Blym1nPbV/3JPkscDuwc5Iz2xirknxkrAuZ5PQkK4C3ANcAHwE+PDSbPAK4tG3/LfBrSbI2/7EkSetmwiGZZBPg14BFbdeFwPuragFwBvDZMbq9oE1V/QRYyWDmBPBbwHVV9V/Al6tq36raG7iHwWxqxA7A64HDGYQjwFnAkjbTvWB44Kp6M8/Ngq8C/hy4oKr2BY4CLmpNvwMcWFXzgT8EPlZVP2/bVw31X5PdgcvaOXYHdgNey2A2uiDJgaM7tHp/HfhGVc0DvltVe1TVKa3JjsD9re3TwE+AV4w+T5KTkyxNsvSZ//eTccqUJK2Nifzt1pe2Gc9cYBmwOMlWwP7Al4YmN5sNdxqnzVXAMcCNwLE8F7B7JvkosC2wFXDd0Cn/vqqeBe5O8sqJfbznORTYY6iWlyXZGtgGuDTJbkABsydx7u9X1b+07cPa1/L2fisGofnNMfrtA6xsdfx41LGxZo31gh1VFzL4YYTNdtjtBcclSZM3kZB8qqrmJdkG+CqDe5KXAI+1GVDPrDW0WQR8PMnLgQXADW3/JcCRVbUyyQnAQUN9fja0PZllx1nAflX11PDOJJ8Bbqyqt7SHY27q9H+a58+8Nx/afnJUbR+vqr/qFZJke+B6YHvgP4G3A1u3H0aOqqrvAQ8AOwMPJNmUQZg/Os5nlCStRxNebm3LpKcxWDZ9Cvj3JG+DXzywsveo9o/32lTVT4FvM1gC/WpVPdO6bQ08mGQ2cNwEynqi9ZmI64H3jbxJMq9tbgP8sG2fsIZzr2Yw8yPJPsAunXGuA97dZtIk2bGF4i9U1UPth4fbGSzLXg68qy3tfq81WwQc37aPBm6oKmeKkjSF1urBnapazuB+4rEMQuzEJCuBuxg8aDLamtpcBbyjvY44G7gVWMzgXuF4VgFPtwd9Th+n7WnAwvYwzd3AyL2/P2Mwq/1nYJOh9jcyWJ5dkeQYBg/XvLzN9k4F/nWsQarqeuALwLeS3MHgoZsXBHm7x/uKqnqEwbL0zaOa/DXwiiT3Ab/H4P6rJGkKxcnJi8dmO+xWOxz/qekuY6PhvycpCSDJsqpaONYx/+KOJEkdhqQkSR2GpCRJHYakJEkdhqQkSR0T+WMC2kjsteM2LPWJTUlab5xJSpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLUYUhKktRhSEqS1GFISpLU4T+6/CKS5Ang3umuYw22Ax6Z7iI6ZnJtYH3rYibXBta3LtZXba+uqjljHfBvt7643Nv717VngiRLZ2p9M7k2sL51MZNrA+tbF1NRm8utkiR1GJKSJHUYki8uF053AeOYyfXN5NrA+tbFTK4NrG9dbPDafHBHkqQOZ5KSJHUYkpIkdRiSLxJJfjPJvUnuS3LWdNczLMnqJHckWZFk6Qyo5+IkDyW5c2jfy5MsTvLd9vpLM6y+c5L8sF3DFUneOE217ZzkxiT3JLkryQfa/hlx/dZQ37RfvySbJ/l2kpWtto+0/TPl2vXqm/ZrN1TjJkmWJ/lqe7/Br533JF8EkmwC/Cvw68ADwG3A26vq7mktrEmyGlhYVTPiF5KTHAj8FLisqvZs+/4MeLSqzm0/ZPxSVf3+DKrvHOCnVfXJ6ahpqLYdgB2q6vYkWwPLgCOBE5gB128N9f0Ppvn6JQmwZVX9NMls4GbgA8BbmRnXrlffbzID/rcHkOT3gIXAy6rq8Kn4/60zyReH1wL3VdW/VdXPgSuBI6a5phmrqr4JPDpq9xHApW37UgbfWKdFp74ZoaoerKrb2/YTwD3AjsyQ67eG+qZdDfy0vZ3dvoqZc+169c0ISXYC3gRcNLR7g187Q/LFYUfg/qH3DzBDvjE0BVyfZFmSk6e7mI5XVtWDMPhGC2w/zfWM5X1JVrXl2GlbDh6RZC4wH7iVGXj9RtUHM+D6teXCFcBDwOKqmlHXrlMfzIBrB3wK+BDw7NC+DX7tDMkXh4yxb8b8BAi8rqr2Af478LttOVFr5/8AuwLzgAeB86ezmCRbAdcAH6yqx6ezlrGMUd+MuH5V9UxVzQN2Al6bZM/pqKOnU9+0X7skhwMPVdWyqR7bkHxxeADYeej9TsB/TFMtL1BV/9FeHwL+jsHy8Ezzo3Y/a+S+1kPTXM/zVNWP2jewZ4H/yzRew3a/6hrgiqr6cts9Y67fWPXNpOvX6nkMuInB/b4Zc+1GDNc3Q67d64A3t+cbrgQOSXI5U3DtDMkXh9uA3ZLskuQlwLHAommuCYAkW7YHKEiyJXAYcOeae02LRcDxbft44B+msZYXGPlG0LyFabqG7eGOvwbuqar/PXRoRly/Xn0z4folmZNk27b9UuBQ4DvMnGs3Zn0z4dpV1f+sqp2qai6D7283VNU7mIJr578C8iJQVU8neR9wHbAJcHFV3TXNZY14JfB3g+9dbAp8oaq+Pp0FJfkicBCwXZIHgD8CzgWuTnIi8APgbTOsvoOSzGOwjL4aeO80lfc64J3AHe3eFcD/YuZcv159b58B128H4NL2NPos4Oqq+mqSbzEzrl2vvs/PgGvXs8H/d+evgEiS1OFyqyRJHYakJEkdhqQkSR2GpCRJHYakJEkdhqQkSR2GpCRJHf8fFfTo54M5RjUAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"\n",
"feature_names = [\n",
" \"Relevant feature #0\",\n",
" \"Relevant feature #1\",\n",
" \"Noisy feature #0\",\n",
" \"Noisy feature #1\",\n",
" \"Noisy feature #2\",\n",
"]\n",
"coef = pd.Series(coef, index=feature_names)\n",
"coef.plot.barh()\n",
"coef"
]
},
{
"cell_type": "markdown",
"id": "b1c460d7",
"metadata": {},
"source": [
"Create a `LinearRegression` regressor and fit on the entire dataset and\n",
"check the value of the coefficients. Are the coefficients of the linear\n",
"regressor close to the coefficients used to generate the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ca5de3ea",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([10.89587004, 40.41128042, -0.20542454, -0.18954462, 0.11129768])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# solution\n",
"from sklearn.linear_model import LinearRegression\n",
"\n",
"linear_regression = LinearRegression()\n",
"linear_regression.fit(data, target)\n",
"linear_regression.coef_"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0f8d6647",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAckAAAD4CAYAAACHbh3NAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcZUlEQVR4nO3df5CeZX3v8fcnEEV+FKoEh+HXMhyHDmJJyGIHFASKnB6hgoIHbLREQYSponTA5hxLi61VLHKw2nosh1JAUMBi2yiMkArUUCySkB/8koptFFpGYJCiDNUGvueP51p58rB3drNJ9tnY92tmZ+/nvq/rvr7PTdjPXtd9726qCkmS9GKzhl2AJEkzlSEpSVIHQ1KSpA6GpCRJHQxJSZI6bD3sArTp7LzzzjUyMjLsMiRpi7J8+fInqmrOeMcMyZ8jIyMjLFu2bNhlSNIWJcn3uo653CpJUgdDUpKkDoakJEkdDElJkjoYkpIkdTAkJUnqYEhKktTBkJQkqYMhKUlSB0NS6xhZdMOwS5CkGcOQlCSpgyEpSVIHQ1KSpA6GpCRJHQxJSZI6GJKSJHUwJCVJ6rDZQzJJJbmo7/U5Sc6foM8ZSX5zI8d9aZK/S7IyyUlT6H98kv02poZJjvPFJCNJPpjk5L79Vyd5MMm9SS5LMntz1yJJWtd0zCR/Arw1yc6T7VBVn6uqKzdy3HnA7KqaW1XXTqH/8cAGhWSSracwzt5VtQZ4A7C0b//VwC8BrwFeBpw2hXNLkjbCdITkWuAS4OzBA0n2SvL1JKvb5z3b/vOTnNO2z0pyf2tzTZJZSb6TZE47PivJQ/0hnGQX4CpgbptJ7pNkfpK/T7I8yU1Jdm1t35PkriSrklyfZNskhwBvBi7s639bktHWZ+cka9r2wiRfSvIV4OYk27WZ311JViQ5bryL0maK9wP7JlkJHA3ckOQ0gKq6sRrgW8DuG/sfQpK0YabrnuSfAQuS7Diw/0+BK6vql+nNnD49Tt9FwLzW5oyqep5eAC5ox48CVlXVE2MdquoxejOvpVU1F/g+8BngxKqaD1wG/FFr/uWqOqiqDgAeAE6tqjuAxcC5bSb63Qne38HAKVV1JPBh4JaqOgg4gl7QbjfYoaoWAOcDH6U3a72xjXVpf7u2zPpO4GvjDZzk9CTLkix7/PHHJyhTkrQhpiUkq+pp4ErgrIFDBwNfaNufB14/TvfVwNVJ3kFvVgq9kBu7Z/lu4C8nKGFfYH9gSZu1/S4vzMz2T7I0yT30gvfVk3lPA5ZU1ZNt+2hgURvnNmAbYM+OfvOAlfSWVFd2tPks8I2qWjrewaq6pKpGq2p0zpw5UyhdktRlKvfQpupTwN2sP9BqnH3HAIfRW/48L8mrq+rhJD9IciTwK7wwq+wS4L6qOnicY5cDx1fVqiQLgcM7zrGWF76p2Gbg2DMDY51QVQ92FpO8CfgYsDdwLDAHeCbJUVV1RF+732/H3tt1LknS5jNtPwLSZlrXAaf27b4DGHuicwFwe3+fJLOAParqVuBDwE7A9u3wpfSWXa+rqucmGP5BYE6Sg9t5ZycZmzHuADzaljX7w/ZH7diYNcD8tn3iesa6CXh/krSx5g02qKob27nurarXAPfRW1LuD8jTgP8OvL0tMUuSptl0/5zkRUD/U65nAe9KsprefbcPDLTfCriqLYWuAC6uqqfascX0AnOipVaq6qf0gu0TSVbRW9o8pB0+D7gTWAJ8u6/bNcC57eGbfYBPAmcmuWPgPQz6Q2A2sDrJve31eOYBq5K8hN5TuE8PHP8c8Ergm+3hod+b6H1Kkjat9B6e3PK0J00vrqpDh13LTDE6OlrLli3bqHOMLLqBNRccs4kqkqSZL8nyqhod79h03pPcZJIsAs5k4nuRkiRN2Rb5a+mq6oKq2quqbp+4tSRJU7NFhqQkSdPBkJQkqYMhKUlSB0NS6/DJVkl6gSEpSVIHQ1KSpA6GpCRJHQxJSZI6GJKSJHUwJCVJ6mBISpLUwZCUJKmDISlJUgdDUpKkDoakJEkdDElJkjoYkpIkdTAkJUnqYEhKktTBkJQkqYMhKUlSB0NSkqQOhqQkSR0MSUmSOhiSkiR1MCQFwMiiGxhZdMOwy5CkGcWQlCSpgyEpSVIHQ1KSpA6GpCRJHQxJSZI6GJKSJHXY7CGZpJJc1Pf6nCTnT9DnjCS/uZHjvjTJ3yVZmeSkKfQ/Psl+G1PDJMf5YpKRJB9McnLf/vcleahdv503dx2SpBebjpnkT4C3bsgX+qr6XFVduZHjzgNmV9Xcqrp2Cv2PBzYoJJNsPYVx9q6qNcAbgKV9+/8BOAr43hTOKUnaBKYjJNcClwBnDx5IsleSrydZ3T7v2fafn+Sctn1Wkvtbm2uSzErynSRz2vFZbca1c995dwGuAua2meQ+SeYn+fsky5PclGTX1vY9Se5KsirJ9Um2TXII8Gbgwr7+tyUZbX12TrKmbS9M8qUkXwFuTrJdksvaOVckOW68i5Lk6iT3A/smWQkcDdyQ5DSAqlrRwlOSNCTTdU/yz4AFSXYc2P+nwJVV9cvA1cCnx+m7CJjX2pxRVc/TC8AF7fhRwKqqemKsQ1U9BpwGLK2qucD3gc8AJ1bVfOAy4I9a8y9X1UFVdQDwAHBqVd0BLAbObTPR707w/g4GTqmqI4EPA7dU1UHAEfSCdrvBDlW1ADgf+Ci9WeuNbaxLJxhLkjRNpiUkq+pp4ErgrIFDBwNfaNufB14/TvfVwNVJ3kFvVgq9kBu7Z/lu4C8nKGFfYH9gSZu1/S6wezu2f5KlSe6hF7yvnsx7GrCkqp5s20cDi9o4twHbAHt29JsHrARe0z5vsCSnJ1mWZNnjjz8+lVNIkjpM5R7aVH0KuJv1B1qNs+8Y4DB6y5/nJXl1VT2c5AdJjgR+hRdmlV0C3FdVB49z7HLg+KpalWQhcHjHOdbywjcV2wwce2ZgrBOq6sHOYpI3AR8D9gaOBeYAzyQ5qqqOWP9bWVdVXUJvOZvR0dHxrp8kaYqm7UdA2kzrOuDUvt13AGNPdC4Abu/vk2QWsEdV3Qp8CNgJ2L4dvpTesut1VfXcBMM/CMxJcnA77+wkYzPGHYBHk8xm3bD9UTs2Zg0wv22fuJ6xbgLenyRtrHmDDarqxnaue6vqNcB99JaUNyggJUmb13T/nORFQP9TrmcB70qyGngn8IGB9lsBV7Wl0BXAxVX1VDu2mF5gTrTUSlX9lF6wfSLJKnpLm4e0w+cBdwJLgG/3dbsGOLc9fLMP8EngzCR3DLyHQX8IzAZWJ7m3vR7PPGBVkpfQewr36f6D7YGlR+gtC69O4r1KSZpmqdoyV+jak6YXV9Whw65lphgdHa1ly5ZNqe/Yn8lac8Exm7IkSZrxkiyvqtHxjk3nPclNJski4EwmvhcpSdKUbZG/lq6qLqiqvarq9olbS5I0NVtkSEqSNB0MSUmSOhiSkiR12CIf3NGm51OtkvRiziQlSepgSEqS1MGQlCSpgyEpSVIHQ1KSpA6GpCRJHQxJSZI6GJKSJHUwJCVJ6mBISpLUwZCUJKmDISlJUgdDUpKkDoakJEkdDElJkjoYkpIkdTAkJUnqYEhKktTBkJQkqYMhKUlSB0NSkqQOhqQAGFl0AyOLbhh2GZI0oxiSkiR1MCQlSepgSEqS1MGQlCSpgyEpSVIHQ1KSpA6GpCRJHTZ7SCapJBf1vT4nyfkT9DkjyW9u5LgvTfJ3SVYmOWkK/Y9Pst/G1DDJcb6YZCTJB5Oc3Ld/7yR3JvlOkmuTvGRz1yJJWtd0zCR/Arw1yc6T7VBVn6uqKzdy3HnA7KqaW1XXTqH/8cAGhWSSracwzt5VtQZ4A7C0b/8ngIur6lXAD4FTp3BuSdJGmI6QXAtcApw9eCDJXkm+nmR1+7xn239+knPa9llJ7m9trkkyq82u5rTjs5I81B/CSXYBrgLmtpnkPknmJ/n7JMuT3JRk19b2PUnuSrIqyfVJtk1yCPBm4MK+/rclGW19dk6ypm0vTPKlJF8Bbk6yXZLL2jlXJDluvIuS5Ook9wP7JlkJHA3ckOS0JAGOBP6qNb+CXmhLkqbRdN2T/DNgQZIdB/b/KXBlVf0ycDXw6XH6LgLmtTZnVNXz9AJwQTt+FLCqqp4Y61BVjwGnAUurai7wfeAzwIlVNR+4DPij1vzLVXVQVR0APACcWlV3AIuBc9tM9LsTvL+DgVOq6kjgw8AtVXUQcAS9oN1usENVLQDOBz5KLwBvbGNdCrwCeKqq1rbmjwC7jTdwktOTLEuy7PHHH5+gTEnShpiWkKyqp4ErgbMGDh0MfKFtfx54/TjdVwNXJ3kHvVkp9EJu7J7lu4G/nKCEfYH9gSVt1va7wO7t2P5Jlia5h17wvnoy72nAkqp6sm0fDSxq49wGbAPs2dFvHrASeE37PCbjtK3xTlBVl1TVaFWNzpkzZ4MLlyR1m8o9tKn6FHA36w+08YLgGOAwesuf5yV5dVU9nOQHSY4EfoUXZpVdAtxXVQePc+xy4PiqWpVkIXB4xznW8sI3FdsMHHtmYKwTqurBzmKSNwEfA/YGjgXmAM8kOaqqjgCeAHZKsnWbTe4O/Fv325MkbQ7T9iMgbaZ1Hes+gHIHMPZE5wLg9v4+SWYBe1TVrcCHgJ2A7dvhS+ktu15XVc9NMPyDwJwkB7fzzk4yNmPcAXg0yWzWDdsftWNj1gDz2/aJ6xnrJuD97b4iSeYNNqiqG9u57q2q1wD30VtSPqIdL+DWvnFOAf52gvcoSdrEpvvnJC8C+p9yPQt4V5LVwDuBDwy03wq4qi2FrqD3tOdT7dhieoE50VIrVfVTeoHziSSr6C1tHtIOnwfcCSwBvt3X7Rrg3PbwzT7AJ4Ezk9wx8B4G/SEwG1id5N72ejzzgFXtRztmtyXpfr8D/HaSh+jdo/yLid6nJGnTSm/SsuVpT5peXFWHDruWmWJ0dLSWLVs2pb5jf0tyzQXHbMqSJGnGS7K8qkbHOzad9yQ3mSSLgDOZ+F6kJElTtkX+WrqquqCq9qqq2yduLUnS1GyRISlJ0nQwJCVJ6mBISpLUYYt8cEebnk+1StKLOZOUJKmDISlJUgdDUpKkDoakJEkdDElJkjoYkpIkdTAkJUnqYEhKktTBkJQkqYMhKUlSB0NSkqQOhqQkSR0MSUmSOhiSkiR1MCQlSepgSEqS1MGQlCSpgyEpSVIHQ1KSpA6GpCRJHQxJSZI6bD3sAjSzjCy6YdglSNIGW3PBMZvlvM4kJUnqYEhKktTBkJQkqYMhKUlSB0NSkqQOhqQkSR0mDMkkzyVZmeTeJF9JstME7c9Pcs4mq3D9Y40k+Y31HL8wyX1JLpzCuecmedPGVTipcd6bZGEb73N9+w9LcneStUlO3Nx1SJJebDIzyWeram5V7Q88CfzWZq5pQ4wAnSEJvBc4sKrOncK55wIbFJLp2dDZ+aHAUuAN7fOY7wMLgS9s4PkkSZvIhn5B/yawG0CSfZJ8LcnyJEuT/NJg4/HaJNkxyZqxMEmybZKHk8xO8p4kdyVZleT6JNu2Npcn+XSSO5L8c9/M6gLg0DbTPXtg7MXAdsCdSU5KMqed86728brW7rXtvCva532TvAT4A+Ckdu6TBmfIbWY90j4eSPJZ4G5gjyTntjFWJ/nIeBcyydlJVgJvAa4HPgJ8eGw2WVVrqmo18PwG/jeSJG0ikw7JJFsBvwosbrsuAd5fVfOBc4DPjtPtRW2q6t+BVfRmTgC/DtxUVf8JfLmqDqqqA4AHgFP7zrUr8HrgWHrhCLAIWNpmuhf3D1xVb+aFWfC1wJ8AF1fVQcAJwKWt6beBw6pqHvB7wMeq6qdt+9q+/uuzL3BlO8e+wKuA19Kbjc5Pcthgh1bvG4GvV9Vc4DtVtV9VnTHBWJKkaTKZX0v3sjbjGQGWA0uSbA8cAnwpyVi7l/Z3mqDNtcBJwK3AybwQsPsn+SiwE7A9cFPfKf+mqp4H7k/yysm9vXUcBezXV8svJNkB2BG4IsmrgAJmT+Hc36uqf2zbR7ePFe319vRC8xvj9DsQWNXq+OEUxiXJ6cDpAHvuuedUTiFJ6jCZkHy2quYm2RH4Kr17kpcDT7UZUJdZ62mzGPh4kpcD84Fb2v7LgeOralWShcDhfX1+0rcdNtws4OCqerZ/Z5LPALdW1VuSjAC3dfRfy7oz7236tp8ZqO3jVfXnXYUk2QW4GdgF+A/g7cAO7ZuRE6rqu5N5QwBVdQm9GTujo6M12X6SpIlNerm1LZOeRW/Z9FngX5K8DX72wMoBA+2f7mpTVT8GvkVvCfSrVfVc67YD8GiS2cCCSZT1o9ZnMm4G3jf2Isnctrkj8K9te+F6zr2G3syPJAcCe3eMcxPw7jaTJsluLRR/pqoea9883E1vWfYq4F1taXfSASlJ2rw26MGdqlpB737iyfRC7NQkq4D7gOPG6bK+NtcC72ifx5wH3AksoXevcCKrgbXtQZ+zJ2h7FjDaHqa5Hxi79/fH9Ga1/wBs1df+VnrLsyuTnETv4ZqXt9nemcA/jTdIVd1M74nUbya5B/grxgnydo/3FVX1BL1l6dsHjh+U5BHgbcCfJ7lvgvcnSdrEUuUK3c+L0dHRWrZs2Uadwz+VJWlLtDF/KivJ8qoaHe+Yv3FHkqQOhqQkSR0MSUmSOhiSkiR1MCQlSeowmV8moP9CNuYJMUn6eeNMUpKkDoakJEkdDElJkjoYkpIkdTAkJUnqYEhKktTBkJQkqYMhKUlSB0NSkqQOhqQkSR0MSUmSOhiSkiR1MCQlSepgSEqS1MGQlCSpgyEpSVIHQ1KSpA6GpCRJHQxJSZI6GJKSJHUwJCVJ6rD1sAvQzDKy6IZhl7DFW3PBMcMuQdIm4kxSkqQOhqQkSR0MSUmSOhiSkiR1MCQlSepgSEqS1MGQlCSpw4QhmeS5JCuT3JvkK0l2mqD9+UnO2WQVrn+skSS/sZ7jFya5L8mFUzj33CRv2rgKJzXOe5MsbON9rm//S5Ncm+ShJHcmGdnctUiS1jWZmeSzVTW3qvYHngR+azPXtCFGgM6QBN4LHFhV507h3HOBDQrJ9Gzo7PxQYCnwhvZ5zKnAD6vqvwEXA5/YwPNKkjbShn5B/yawG0CSfZJ8LcnyJEuT/NJg4/HaJNkxyZqxMEmybZKHk8xO8p4kdyVZleT6JNu2Npcn+XSSO5L8c5IT2xAXAIe2me7ZA2MvBrYD7kxyUpI57Zx3tY/XtXavbedd0T7vm+QlwB8AJ7VznzQ4Q24z65H28UCSzwJ3A3skObeNsTrJR8a7kEnOTrISeAtwPfAR4MN9s8njgCva9l8Bv5okG/IfS5K0cSYdkkm2An4VWNx2XQK8v6rmA+cAnx2n24vaVNW/A6vozZwAfh24qar+E/hyVR1UVQcAD9CbTY3ZFXg9cCy9cARYBCxtM92L+weuqjfzwiz4WuBPgIur6iDgBODS1vTbwGFVNQ/4PeBjVfXTtn1tX//12Re4sp1jX+BVwGvpzUbnJzlssEOr943A16tqLvCdqtqvqs5oTXYDHm5t1wL/Drxi8DxJTk+yLMmyxx9/fIIyJUkbYjK/u/VlbcYzAiwHliTZHjgE+FLf5Oal/Z0maHMtcBJwK3AyLwTs/kk+CuwEbA/c1HfKv6mq54H7k7xycm9vHUcB+/XV8gtJdgB2BK5I8iqggNlTOPf3quof2/bR7WNFe709vdD8xjj9DgRWtTp+OHBsvFljvWhH1SX0vhlhdHT0RcclSVM3mZB8tqrmJtkR+Cq9e5KXA0+1GVCXWetpsxj4eJKXA/OBW9r+y4Hjq2pVkoXA4X19ftK3PZVlx1nAwVX1bP/OJJ8Bbq2qt7SHY27r6L+WdWfe2/RtPzNQ28er6s+7CkmyC3AzsAvwH8DbgR3aNyMnVNV3gUeAPYBHkmxNL8yfnOA9SpI2oUkvt7Zl0rPoLZs+C/xLkrfBzx5YOWCg/dNdbarqx8C36C2BfrWqnmvddgAeTTIbWDCJsn7U+kzGzcD7xl4kmds2dwT+tW0vXM+519Cb+ZHkQGDvjnFuAt7dZtIk2a2F4s9U1WPtm4e76S3LXgW8qy3tfrc1Wwyc0rZPBG6pKmeKkjSNNujBnapaQe9+4sn0QuzUJKuA++g9aDJofW2uBd7RPo85D7gTWELvXuFEVgNr24M+Z0/Q9ixgtD1Mcz8wdu/vj+nNav8B2Kqv/a30lmdXJjmJ3sM1L2+zvTOBfxpvkKq6GfgC8M0k99B76OZFQd7u8b6iqp6gtyx9+0CTvwBekeQh4Lfp3X+VJE2jODn5+TE6OlrLli3bqHP49yQ3nn9PUtqyJFleVaPjHfM37kiS1MGQlCSpgyEpSVIHQ1KSpA6GpCRJHSbzywT0X4hPZkrSC5xJSpLUwZCUJKmDISlJUgdDUpKkDoakJEkdDElJkjoYkpIkdTAkJUnqYEhKktTBkJQkqYN/dPnnSJLHge9t5Gl2Bp7YBOVsSjOxJpiZdc3EmmBm1jUTa4KZWddMrAk2XV17VdWc8Q4YklpHkmVdf6F7WGZiTTAz65qJNcHMrGsm1gQzs66ZWBNMT10ut0qS1MGQlCSpgyGpQZcMu4BxzMSaYGbWNRNrgplZ10ysCWZmXTOxJpiGurwnKUlSB2eSkiR1MCQlSepgSAqAJL+W5MEkDyVZNOx6xiRZk+SeJCuTLBtiHZcleSzJvX37Xp5kSZLvtM+/OANqOj/Jv7brtTLJm6a5pj2S3JrkgST3JflA2z/sa9VV19CuV5JtknwryapW00fa/qFdq/XUNNR/V331bZVkRZKvtteb/Vp5T1Ik2Qr4J+CNwCPAXcDbq+r+oRZGLySB0aoa6g8yJzkM+DFwZVXt3/b9MfBkVV3QvrH4xar6nSHXdD7w46r65HTVMVDTrsCuVXV3kh2A5cDxwEKGe6266vqfDOl6JQmwXVX9OMls4HbgA8BbGdK1Wk9Nv8YQ/1311ffbwCjwC1V17HT8P+hMUgCvBR6qqn+uqp8C1wDHDbmmGaWqvgE8ObD7OOCKtn0FvS+6w65pqKrq0aq6u23/CHgA2I3hX6uuuoamen7cXs5uH8UQr9V6ahq6JLsDxwCX9u3e7NfKkBT0vlg83Pf6EYb8BaRPATcnWZ7k9GEXM+CVVfUo9L4IA7sMuZ4x70uyui3HTuuyZr8kI8A84E5m0LUaqAuGeL3a8uFK4DFgSVUN/Vp11ATD/3f1KeBDwPN9+zb7tTIkBZBx9s2I7x6B11XVgcD/AH6rLTGq2/8F9gHmAo8CFw2jiCTbA9cDH6yqp4dRw3jGqWuo16uqnququcDuwGuT7D+d44+no6ahXqckxwKPVdXy6RwXDEn1PALs0fd6d+DfhlTLOqrq39rnx4C/prc0PFP8oN3rGrvn9diQ66GqftC+yD0P/D+GcL3avazrgaur6stt99Cv1Xh1zYTr1ep4CriN3r2/oV+rwZpmwHV6HfDm9ozCNcCRSa5iGq6VISnoPajzqiR7J3kJcDKweMg1kWS79pAFSbYDjgbuXX+vabUYOKVtnwL87RBrAX72hWLMW5jm69Ue/PgL4IGq+j99h4Z6rbrqGub1SjInyU5t+2XAUcC3GeK16qpp2P+uqup/VdXuVTVC7+vTLVX1DqbhWm29qU+oLU9VrU3yPuAmYCvgsqq6b8hlAbwS+Ove1ze2Br5QVV8bRiFJvggcDuyc5BHg94ELgOuSnAp8H3jbDKjp8CRz6S2XrwHeO5010fuO/53APe2+FsD/ZsjXaj11vX2I12tX4Ir2dPks4Lqq+mqSbzK8a9VV0+eH/O+qy2b/d+WPgEiS1MHlVkmSOhiSkiR1MCQlSepgSEqS1MGQlCSpgyEpSVIHQ1KSpA7/HxkiAmvp+5TOAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"feature_names = [\n",
" \"Relevant feature #0\",\n",
" \"Relevant feature #1\",\n",
" \"Noisy feature #0\",\n",
" \"Noisy feature #1\",\n",
" \"Noisy feature #2\",\n",
"]\n",
"coef = pd.Series(linear_regression.coef_, index=feature_names)\n",
"_ = coef.plot.barh()"
]
},
{
"cell_type": "markdown",
"id": "3b03ce00",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"We see that the coefficients are close to the coefficients used to generate\n",
"the dataset. The dispersion is indeed cause by the noise injected during the\n",
"dataset generation."
]
},
{
"cell_type": "markdown",
"id": "dc96d427",
"metadata": {},
"source": [
"Now, create a new dataset that will be the same as `data` with 4 additional\n",
"columns that will repeat twice features 0 and 1. This procedure will create\n",
"perfectly correlated features."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "45df16ab",
"metadata": {},
"outputs": [],
"source": [
"# solution\n",
"import numpy as np\n",
"\n",
"data = np.concatenate([data, data[:, [0, 1]], data[:, [0, 1]]], axis=1)"
]
},
{
"cell_type": "markdown",
"id": "6aefcec3",
"metadata": {},
"source": [
"Fit again the linear regressor on this new dataset and check the\n",
"coefficients. What do you observe?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "84fee60f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3.63195668, 13.47042681, -0.20542454, -0.18954462, 0.11129768,\n",
" 3.63195668, 13.47042681, 3.63195668, 13.47042681])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# solution\n",
"linear_regression = LinearRegression()\n",
"linear_regression.fit(data, target)\n",
"linear_regression.coef_"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b40ca761-ede7-4edb-918b-0bd96d3c6522",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear regression coefficients by hand: \n",
" [ 5.33306979e+03 7.63990305e+01 -2.06636605e-01 -1.90616627e-01\n",
" 1.11646376e-01 -1.79027400e+03 -1.74340449e+02 4.95808017e+03\n",
" 1.35353488e+02] \n",
"\n",
"Determinant of X.T @ X: -2.9853291155589836e-36\n"
]
}
],
"source": [
"# Linear regression by hand\n",
"import numpy as np\n",
"beta = np.linalg.inv(data.T @ data) @ data.T @ target\n",
"print(\"Linear regression coefficients by hand: \\n \", beta, \"\\n\")\n",
"\n",
"# determinant\n",
"det = np.linalg.det(data.T @ data)\n",
"print(\"Determinant of X.T @ X: \", det)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e36183f4",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAggAAAD4CAYAAACJ8R5TAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAyQElEQVR4nO3de5yVZb3//9cbJBE03QaWecJtppGHARYanjG1trmVEn9kVFKaWTtJ+2pfSjPsqGlfy8xtbLehSUYeMlK3QgqKoshwmOGgpiYlW7bCtvKQJ/Dz++O+ltyse61ZawaGmYn38/GYx9zrvk6f+2b0/tzXfa21FBGYmZmZ5fXq6gDMzMys+3GCYGZmZgVOEMzMzKzACYKZmZkVOEEwMzOzgi26OgCzjhowYEAMGjSoq8MwM+tR5s+fvzoiBtar5wTBeqxBgwbR3Nzc1WGYmfUokv7USD0/YjAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYEXKZolgybc3tUhmJk1ZPlFH+n0MTyDYGZmZgVOEMzMzKygboIg6TxJSyW1Slok6cBNEVhFDEdIum0Tj/n1itdz0u9Bkj6R21+SdHknx3KSpEckzaxTb3yqN6UDY6x3XJ1F0ockTZT0T5LuyO3fW9KDkl6TdE5nx2FmZm1rM0GQNAI4DhgaEfsBRwFPb4rA2kvSxl5PsV6CEBEHpc1BwCdy+5sjYvxGHrvSqcAXI2JknXpfBI6NiLEdGGMQueNqlKTe7WxyKDAbOAx4ILf/eWA8cGl7YzAzs42v3gzCjsDqiHgNICJWR8QzAJKGSbpX0nxJd0naMe1/j6TfS2qRtEDSHspcImmJpMWSxqS6R0iaJekmSY9KmiJJqezDad/9wMeqBSdpnKQbJf0OmC6pv6RrJM2TtFDSCbl6v5V0p6THJH0z18cnJT2cZkd+Jqm3pIuArdK+KaneS6nJRcChqezs/OyGpO0l3ZpmWx6StF/aPzHFNUvSHyVVTSgknZzOzxJJF6d9FwCHAFdJuqTWP5Skq4B/BqaluGqdi0GSZqd/mwWSyolP5XGNk3RFrv/bJB1RPheSviVpLjCi2jmsEt8YSYvIkoAfAf8BfEbSNICIeC4i5gFv1DpGMzPbdOolCNOBXST9QdKVkg4HkNQH+AkwOiKGAdcA301tpgA/jYj9gYOAlWQX+CZgf7JZiEvKCQUwBDgLGEx2gTtYUl+yC8i/kt1xvquNGEcAp0TEkcB5wD0RMRwYmcbpn+odAIxNcZyk7NHA+4AxwMER0QSsBcZGxATglYhoqnI3PgGYncouqyi7EFiYZlu+DlyXK9sb+FCK45vpHL5F0ruBi4EjU4zDJY2KiG8BzSmuc2udhIg4A3gGGJniqnUungOOjoih6djLj0faOq5K/YElEXEg8L9UOYdV4psKDE3t9gWWAEMi4vg6Y61H0umSmiU1r1q1qj1NzcysHdqclo+IlyQNI7tIjwSmSppAdsHaB5iRbvh7AyslbQPsFBG/Se1fBZB0CHBDRKwFnpV0LzAceAF4OCJWpHqLyKa6XwKeiojH0/7rgdNrhDkjIp5P28cAx2vdM+y+wK65ev+b+ruF7K58DTAMmJeOYyuyC2hHHQKcmI79HknvkLRtKrs9zcS8Juk54J3Ailzb4cCsiFiVYpxCNg1/awdjqXUungGukNREdjF/bwf6XgvcnLY/SOPncE/gybTdLyJebO/AETEJmARQKpWive3NzKwxdZ/bp4v6LGCWpMXAKcB8YGlEjMjXlfT2Gt2ojSFey22vzcXU6P/8X64Y58SIeKwirgOr9Bep/rUR8bUGx6qn2nGWx611nG213dBYqp2LicCzZLM5vYBXa7Rfw/ozTH1z26+mv4vyOHXPoaRmYACwhaRlwI4pITwzImY3dERmZrbJ1FukuJekPXO7moA/AY8BA5UtYkRSH0nvj4gXgBWSRqX9W0rqB9wHjEnP9weS3Rk/3MbQjwK7S9ojvT65weO5CzhTemsdw5Bc2dFpjcBWwCiyBXJ3A6Ml7ZDqby9pt1T/jcrHAMmLwDY1xr+PNL2entevTuekEXOBwyUNSM/wTwbubbBtNbXOxbbAyoh4E/gU2ewPFI9rOdAkqZekXcgejVTT1jl8S0SUgNuBE4AfAOelxxlODszMuqF6axC2Bq6VtExSK9k6gYkR8TowGrhYUguwiGy9AWQXnfGp/hyy9QO/AVqBFuAe4KsR8T+1Bk2PJk4Hble2SLGhr6YEvg30AVolLUmvy+4HfpFivTm9+2AZcD7ZAsdWYAbZwkzIprFbVXzLYCuwRtkizLMryiYCpdTXRWSzLQ2JiJXA14CZZOdpQUT8ttH2VdQ6F1cCp0h6iOzxQnkGpvK4HgCeAhaTvbNgQY242zqHlYaSnf9DqUh+JL1L0grgK8D5kla0MSNlZmadTBH/+I9xJY0DShHxpa6OxTaeUqkUzc3NG60/f9SymfUUG/JRy5Lmp1ndNvm7GMySTfHZ5mZmPcVmkSBExGRgcheHYWZm1mP4uxjMzMyswAmCmZmZFThBMDMzswInCGZmZlbgBMHMzMwKnCCYmZlZgRMEMzMzK3CCYGZmZgVOEMzMzKzACYKZmZkVOEEwMzOzgs3iuxjMGuFvc9xw/sIrs38cnkEwMzOzgroJgqTzJC2V1CppkaQDN0VgFTEcIem2TTzm1ytez0m/B0n6RG5/SdLlnRzLSZIekTSzTr3xqd6UDoyx3nF1FkkfkjRR0j9JuiO3X5Iul/RE+lsb2tmxmJlZbW0mCJJGAMcBQyNiP+Ao4OlNEVh7SdrYj0vWSxAi4qC0OQj4RG5/c0SM38hjVzoV+GJEjKxT74vAsRExtgNjDCJ3XI2S1LudTQ4FZgOHAQ/k9v8LsGf6OR349/bGYmZmG0+9GYQdgdUR8RpARKyOiGcAJA2TdK+k+ZLukrRj2v8eSb+X1CJpgaQ90t3hJZKWSFosaUyqe4SkWZJukvSopCmSlMo+nPbdD3ysWnCSxkm6UdLvgOmS+ku6RtI8SQslnZCr91tJd0p6TNI3c318UtLDaXbkZ5J6S7oI2Crtm5LqvZSaXAQcmsrOzs9uSNpe0q3pDvghSful/RNTXLMk/VFS1YRC0snp/CyRdHHadwFwCHCVpEtq/UNJugr4Z2BaiqvWuRgkaXb6t1kgqZz4VB7XOElX5Pq/TdIR5XMh6VuS5gIjqp3DKvGNkbQIGA/8CPgP4DOSpqUqJwDXReYhYLvy35SZmW169RKE6cAukv4g6UpJhwNI6gP8BBgdEcOAa4DvpjZTgJ9GxP7AQcBKsgt8E7A/2SzEJbn/+Q8BzgIGk13gDpbUl+wC8q9kd5zvaiPGEcApEXEkcB5wT0QMB0amcfqnegcAY1McJyl7NPA+YAxwcEQ0AWuBsRExAXglIpqq3I1PAGansssqyi4EFqbZlq8D1+XK9gY+lOL4ZjqHb5H0buBi4MgU43BJoyLiW0BziuvcWichIs4AngFGprhqnYvngKMjYmg69vLjkbaOq1J/YElEHAj8L1XOYZX4pgJDU7t9gSXAkIg4PlXZifVnp1akfWZm1gXanJaPiJckDSO7SI8EpkqaQHbB2geYkW74ewMrJW0D7BQRv0ntXwWQdAhwQ0SsBZ6VdC8wHHgBeDgiVqR6i8imul8CnoqIx9P+68mmnauZERHPp+1jgOMlnZNe9wV2zdX739TfLWR35WuAYcC8dBxbkV1AO+oQ4MR07PdIeoekbVPZ7Wkm5jVJzwHvJLsIlg0HZkXEqhTjFLJp+Fs7GEutc/EMcIWkJrKL+Xs70Pda4Oa0/UEaP4d7Ak+m7X4R8WKuTFXqR+UOSaeT/hZ23XXXQgMzM9s46j63Txf1WcAsSYuBU4D5wNKIGJGvK+ntNbqp9j//stdy22tzMRUuDjW8XDHOiRHxWEVcB1bpL1L9ayPiaw2OVU9bF7lax9lW2w2Npdq5mAg8Szab0wt4tUb7Naw/w9Q3t/1q+rsoj1P3HEpqBgYAW0haBuyYEsIzI2I2WbK0S67JzmTJzHoiYhIwCaBUKjX6N2JmZu1Ub5HiXpL2zO1qAv4EPAYMVLaIEUl9JL0/Il4AVkgalfZvKakfcB8wJj3fH0h2Z/xwG0M/CuwuaY/0+uQGj+cu4EzprXUMQ3JlR6c1AlsBo8gWyN0NjJa0Q6q/vaTdUv03Kh8DJC8C29QY/z7S9Hp6Xr86nZNGzAUOlzQgPcM/Gbi3wbbV1DoX2wIrI+JN4FNksz9QPK7lQJOkXpJ2IXs0Uk1b5/AtEVECbidba/AD4Lz0OGN2qjIN+LQyHwD+FhErO3LgZma24eqtQdgauFbSMkmtZOsEJkbE68Bo4GJJLcAisvUGkF10xqf6c8jWD/wGaAVagHuAr0bE/9QaND2aOB24XdkixT81eDzfBvoArZKWpNdl9wO/SLHenN59sAw4n2yBYyswg2xhJmR3qa0qvmWwFVijbBHm2RVlE4FS6usistmWhqSL4deAmWTnaUFE/LbR9lXUOhdXAqdIeojs8UJ5BqbyuB4AngIWA5cCC2rE3dY5rDSU7PwfSjH5uQP4I/AE2fqTL7bnYM3MbONSxD/+LK2kcUApIr7U1bHYxlMqlaK5uXmj9edPUtxw/iRFs+5P0vw0q9smf5KimZmZFWwW38UQEZOByV0chnVzvvs1M1vHMwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMytwgmBmZmYFThDMzMyswAmCmZmZFThBMDMzswInCGZmZlbgBMHMzMwKnCCYmZlZwWbxXQxmjfC3OZpZT7EpvjvGMwhmZmZW4ATBzMzMCjo9QZC0VtKi3M8gSXPa2cdZkvp1Vow1xjxC0kG512dI+nTaHifp3bmyqyUN7sRYtpT0+3T+xrRRb6CkuZIWSjq0A+Osd1ydRdIsSX0l/UjSB3L7vyvpaUkvdXYMZmbWtk2xBuGViGiq2HdQZSVJvSNibY0+zgKuB/7e1kB1+mivI4CXgDkAEXFVrmwcsAR4JpWdtpHGrGUI0KfKeaz0QeDRiDilg+OMI3dcjZC0RUSsaUf9rYC1EfGqpOHAubni3wFXAI832p+ZmXWOLnnEUL5DTHfpMyX9Elgsqb+k2yW1SFoiaYyk8cC7gZmSZlbpa7mkCyTdD5wk6RhJD0paIOlGSVvn6l0s6eH08560f6CkmyXNSz8HSxoEnAGcne7aD5U0UdI5kkYDJWBKKtsq3RGXUn8nS1qc4r84f8zpDrlF0kOS3lnlWLaXdKuk1lRnP0k7kCVHTWm8PWqc0ybgB8CxubhqnYsL0rEukTRJmWrHtVzSgNSmJGlW2p6Y2k0Hrqt2DmvEOBNYDOwjaTGwLzBP0rEAEfFQRKys/ldjZmab0qZIELbSuscLv6lSfgBwXkQMBj4MPBMR+0fEPsCdEXE52R3tyIgYWWOMVyPiEOD3wPnAURExFGgGvpKr90JEHEB2l/qjtO/HwGURMRw4Ebg6IpYDV6X9TRExu9xBRNyU+h2byl4pl6Xp+YuBI4EmYLikUam4P/BQROwP3Ad8rspxXAgsjIj9gK8D10XEc8BpwOw03pPVTkBELAIuAKammYb+bZyLKyJieDrHWwHHtXVcNQwDToiIT1Q7hzViHAlMAr4InAn8LI11R52x3iLpdEnNkppXrVrVaDMzM2unrnrEkPdwRDyVthcDl6Y779vyF+Y6pqbfHwAGAw9IAngb8GCu3g2535el7aOAwak+wNslbdPguJWGA7MiYhWApCnAYcCtwOvAbanefODoKu0PIbvAEhH3SHqHpG07GEtb52KkpK8C/YDtgaVk0/vtMS2XRFQ9hxHxYpV2Q4CbgWOBRe0ck4iYRJZkUCqVor3tzcysMd3hcxBeLm9ExB8kDSO7eHxf0vSI+FY7+hAwIyJOrlEvqmz3AkZU3jHnLnbt0VajNyKiPOZaqp/7au07ehGsei4k9QWuBEoR8bSkiUDfGn2sYd0sU2Wdl3PbVc9hxbinAV8C3gO8D9gVeFbSsRExtrFDMjOzTaVbvc0xTdH/PSKuBy4FhqaiF4FG7uofAg7OrS/oJ+m9ufIxud/lu+npZBeucgxNDYxZq2wucLikAZJ6AycD9zYQd9l9wNgUxxHA6oh4oR3t82qdi/KFfnVakzA616byuJaTPUqANLNRQ61z+JaIuBo4BrgnzSg9ERHvc3JgZtY9dasEgWzR2sOSFgHnAd9J+ycB/1VtkWJemtofB9wgqZXsIrl3rsqWkuYCXwbOTvvGA6W0MHAZ2eJEyKbcP1pepFgx1GTgqvJivtz4K4GvATOBFmBBRPy20YMHJpZjAS4COvpuhJrnIiL+CvwH2eOcW4F5uWaTWf+4LgR+LGk22axHLbXOYaXDgPsl7QL8qbJQ0g8krQD6SVqRZjfMzKwLaN2s9z82ScvJptVXd3UstnGUSqVobm7eaP35o5bNrKfYkI9aljQ/Ikr16nWHNQhm3cKm+GxzM7OeYrNJECJiUFfHYGZm1lN0tzUIZmZm1g04QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMytwgmBmZmYFThDMzMyswAmCmZmZFThBMDMzs4LN5qOWzerxlzVtOH+fhdk/Ds8gmJmZWYETBDMzMyvYpAmCpLWSFuV+Bkma084+zpLUr7NirDHmEZIOyr0+Q9Kn0/Y4Se/OlV0taXAnxrKlpN+n8zemomzvtH+hpD060PcmObeSHky/b5W0Y27/hyU9JukJSRM6Ow4zM6ttU69BeCUimir2HVRZSVLviFhbo4+zgOuBv7c1UJ0+2usI4CVgDkBEXJUrGwcsAZ5JZadtpDFrGQL0qXIeAUYBv42Ib3aw77No4NzmSdoiIta0o/57gCckCXhXRKxM+3sDPwWOBlYA8yRNi4hl7YjfzMw2ki5/xCDppfT7CEkzJf0SWCypv6TbJbVIWiJpjKTxwLuBmZJmVulruaQLJN0PnCTpGEkPSlog6UZJW+fqXSzp4fTznrR/oKSbJc1LPwdLGgScAZyd7s4PlTRR0jmSRgMlYEoq20rSLEml1N/Jkhan+C/OH7Ok76Zje0jSO6scy/bpDrs11dlP0g5kF/CmNN4eufrHkl3gTyufG0mfTMe3SNLP0kUYSf8uqVnSUkkXpn2Fc1v+t0nboyVNTtuTJf2/VO9iSXtIulPSfEmzJe1d5Xi2krQIuIcs4XoEeG+KrQk4AHgiIv4YEa8DvwJOqPFnY2ZmnWxTJwhb5R4v/KZK+QHAeRExGPgw8ExE7B8R+wB3RsTlZHfqIyNiZI0xXo2IQ4DfA+cDR0XEUKAZ+Equ3gsRcQBwBfCjtO/HwGURMRw4Ebg6IpYDV6X9TRExu9xBRNyU+h2byl4pl6XHDhcDRwJNwHBJo1Jxf+ChiNgfuA/4XJXjuBBYGBH7AV8HrouI54DTgNlpvCdzsdyRi3OkpPcBY4CD02zDWmBsqn5eRJSA/YDDJe3X4LnNey/Zuf0/wCTgzIgYBpwDXFlZOSLKs0e3kc10XAR8Ix3HImAn4OlckxVp33oknZ6Sm+ZVq1Y1EKaZmXVEd3jEkPdwRDyVthcDl6Y779vyF+Y6pqbfHwAGAw9ks9m8DXgwV++G3O/L0vZRwOBUH+DtkrZpcNxKw4FZEbEKQNIU4DDgVuB1sgslwHyyafVKh5AlKUTEPZLeIWnbdoz/QWAY2VQ9wFbAc6ns/5N0Otm//45k56m1HX0D3BgRa9OszEHAjbnztmUb7fYleyTzCeCW3H5VqRuFHRGTyBISSqVSodzMzDaO7vY5CC+XNyLiD5KGAccC35c0PSK+1Y4+BMyIiJNr1Isq272AEfmZAIDcha892mr0RkSUx1xL9X+Hhi6Ydca/NiK+tt5OaXeyu/zhEfGX9Nigb40+8uNV1imf517AX+skfki6gCzh2QOYC/wzcIykOyPiXLIZg11yTXYmreswM7NNr8vXINSSpuj/HhHXA5cCQ1PRi0Ajd/UPAQfn1hf0k/TeXPmY3O/yzMJ04Eu5GJoaGLNW2Vyy6fsB6dn/ycC9DcRddh/pkYCkI4DVEfFCO9rfDYxO6xbKaxp2A95OdnH/W1r78C9tHMuzkt4nqRfw0WqDpJieknRSGkeS9q9S71tkj0d+DhwItETEvik5AJgH7Clpd0lvAz4OTGvH8ZqZ2UbU3WYQ8vYFLpH0JvAG8IW0fxLwX5JWtvWsPCJWSRoH3CCpPOV9PvCHtL2lpLlkSVJ5lmE88FNJrWTn5j6yBYq/A26SdAJwZsVQk4GrJL0CjMiNv1LS14CZZHfzd0TEb9tx/BOBn6dY/g6c0o62RMQySecD09MF/g3g3yLiIUkLgaXAH4EHcs0qz+0EskchT5M9Fti6xnBjgX9P4/UhW2DYUqXe4cBssrUmD1XEu0bSl4C7gN7ANRGxtD3HbGZmG4/WzXRvPiQtB0oRsbqrY7GOK5VK0dzcvNH680ctbzh/1LJZ9ydpflqo3qbuPINgtkn54mZmts5mmSBExKCujsHMzKw767aLFM3MzKzrOEEwMzOzAicIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmyaAJt/sLm8zMEicIZmZmVuAEwczMzAqcIHQCSSHph7nX50iaWKfNGZI+vYHjbinp95IWSRrTgfajJA3ekBgaHOcGSYMknSXp47n9UyQ9JmmJpGsk9ensWMzMrDonCJ3jNeBjkgY02iAiroqI6zZw3CFAn4hoioipHWg/CmhXgiCpI18ZvntELAcOB2bn9k8B9gb2BbYCTutA32ZmthE4Qegca4BJwNmVBZJ2k3S3pNb0e9e0f6Kkc9L2eEnLUp1fSeol6XFJA1N5L0lP5BMQSTsA1wNNaQZhD0nDJN0rab6kuyTtmOp+TtI8SS2SbpbUT9JBwPHAJbn2sySVUpsBkpan7XGSbpT0O2C6pP7pjn+epIWSTqh2UtIMwTJgL0mLgGOA2yWdBhARd0QCPAzsvKH/EGZm1jFOEDrPT4Gxkrat2H8FcF1E7Ed2x3x5lbYTgCGpzhkR8SbZxX9sKj8KaImI1eUGEfEc2R337IhoAv4M/AQYHRHDgGuA76bqt0TE8IjYH3gEODUi5gDTgHPTDMSTdY5vBHBKRBwJnAfcExHDgZFkSUb/ygYRMRaYCHyHbLbijjTW1fl66dHCp4A7K/uQdLqkZknNq1atqhOimZl1lBOEThIRLwDXAeMrikYAv0zbvwAOqdK8FZgi6ZNksxGQXeDLaxQ+C/y8Tgh7AfsAM9Ld+vmsuyPfR9JsSYvJko73N3JMFWZExPNp+xhgQhpnFtAX2LVGuyHAIrLHCItq1LkSuC8iZlcWRMSkiChFRGngwIEdCNvMzBrRkefH1rgfAQto+2IeVfZ9BDiMbMr/G5LeHxFPS3pW0pHAgaybTahFwNKIGFGlbDIwKiJaJI0DjqjRxxrWJZF9K8perhjrxIh4rGYw0rHA94DdgeOAgcDLko6KiJG5et9MZZ+v1ZeZmXU+zyB0onSH/Wvg1NzuOUB55f5Y4P58G0m9gF0iYibwVWA7YOtUfDXZo4ZfR8TaOsM/BgyUNCL120dSeaZgG2BlmsrPJxovprKy5cCwtD26jbHuAs6UpDTWkMoKEXFH6mtJROwLLCV7jJJPDk4DPgScnB6rmJlZF3GC0Pl+COTfzTAe+IykVrLn7F+uqN8buD5N/y8ELouIv6ayaWTJQr3HC0TE62QX9YsltZBN5x+Uir8BzAVmAI/mmv0KODctNNwDuBT4gqQ5FcdQ6dtAH6BV0pL0upohQIukt5G92+KFivKrgHcCD6aFkhfUO04zM+scyhaMW0+Q3lFwWUQc2tWxdAelUimam5s3Wn/lj1leftFHNlqfZmbdjaT5EVGqV89rEHoISROAL1B/7YF1kBMDM7N1/Iihh4iIiyJit4i4v35tMzOzDeMEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYG/i8E2W+UvZyrzdzGYma3jGQQzMzMrcIJgZmZmBU4QOoGkkPTD3OtzJE2s0+YMSZ/ewHG3lPR7SYskjelA+1GSBm9IDA2Oc4OkQZLOkvTx3P4vSXoinb8BnR2HmZnV5gShc7wGfKw9F7mIuCoirtvAcYcAfSKiKSKmdqD9KKBdCYKkjqxj2T0ilgOHA7Nz+x8AjgL+1IE+zcxsI3KC0DnWAJOAsysLJO0m6W5Jren3rmn/REnnpO3xkpalOr+S1EvS45IGpvJe6U57QK7fHYDrgaY0g7CHpGGS7pU0X9JdknZMdT8naZ6kFkk3S+on6SDgeOCSXPtZkkqpzQBJy9P2OEk3SvodMF1Sf0nXpD4XSjqh2kmRNEXSMmAvSYuAY4DbJZ0GEBELU+JgZmZdzAlC5/kpMFbSthX7rwCui4j9gCnA5VXaTgCGpDpnRMSbZBf/san8KKAlIlaXG0TEc8BpwOyIaAL+DPwEGB0Rw4BrgO+m6rdExPCI2B94BDg1IuYA04Bz0wzEk3WObwRwSkQcCZwH3BMRw4GRZElG/8oGETEWmAh8h2y24o401tV1xnqLpNMlNUtqXrVqVaPNzMysnZwgdJKIeAG4DhhfUTQC+GXa/gVwSJXmrcAUSZ8km42A7AJfXqPwWeDndULYC9gHmJHu1s8Hdk5l+0iaLWkxWdLx/kaOqcKMiHg+bR8DTEjjzAL6ArvWaDcEWATsm363S0RMiohSRJQGDhzY3uZmZtYgfw5C5/oRsIC2L+ZRZd9HgMPIpvy/Ien9EfG0pGclHQkcyLrZhFoELI2IEVXKJgOjIqJF0jjgiBp9rGFdEtm3ouzlirFOjIjHagYjHQt8D9gdOA4YCLws6aiIGNn2oZiZ2abmGYROlO6wfw2cmts9Byiv3B8L3J9vI6kXsEtEzAS+CmwHbJ2KryZ71PDriFhbZ/jHgIGSRqR++0gqzxRsA6yU1If1E40XU1nZcmBY2h7dxlh3AWdKUhprSGWFiLgj9bUkIvYFlpI9RnFyYGbWDTlB6Hw/BPLvZhgPfEZSK/Ap4MsV9XsD16fp/4XAZRHx11Q2jSxZqPd4gYh4neyifrGkFrLp/INS8TeAucAM4NFcs18B56aFhnsAlwJfkDSn4hgqfRvoA7RKWpJeVzMEaJH0NrJ3W7yQL0yLM1eQPQppldTw2gQzM9u4FFFthtu6o/SOgssi4tCujqU7KJVK0dzc3OH2/qhlM9scSZofEaV69bwGoYeQNAH4AvXXHliDnBCYmdXmRww9RERcFBG7RcT99WubmZltGCcIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCvxdDLZZqvyiJvB3M5iZ5XkGwczMzAqcIJiZmVmBE4ROICkk/TD3+hxJE+u0OUPSpzdw3C0l/V7SIkljOtB+lKTBGxJDg+PcIGmQpLMkfTy3f3dJcyU9LmmqpLd1dixmZladE4TO8RrwMUkDGm0QEVdFxHUbOO4QoE9ENEXE1A60HwW0K0GQ1JF1LLtHxHLgcGB2bv/FwGURsSfwF+DUDvRtZmYbgROEzrEGmAScXVkgaTdJd0tqTb93TfsnSjonbY+XtCzV+ZWkXumuemAq7yXpiXwCImkH4HqgKc0g7CFpmKR7Jc2XdJekHVPdz0maJ6lF0s2S+kk6CDgeuCTXfpakUmozQNLytD1O0o2SfgdMl9Rf0jWpz4WSTqh2UiRNkbQM2EvSIuAY4HZJp0kScCRwU6p+LVnCYmZmXcAJQuf5KTBW0rYV+68ArouI/YApwOVV2k4AhqQ6Z0TEm2QX/7Gp/CigJSJWlxtExHPAacDsiGgC/gz8BBgdEcOAa4Dvpuq3RMTwiNgfeAQ4NSLmANOAc9MMxJN1jm8EcEpEHAmcB9wTEcOBkWRJRv/KBhExFpgIfIfs4n9HGutq4B3AXyNiTaq+Atipsg9Jp0tqltS8atWqOiGamVlHOUHoJBHxAnAdML6iaATwy7T9C+CQKs1bgSmSPkk2GwHZBb68RuGzwM/rhLAXsA8wI92tnw/snMr2kTRb0mKypOP9jRxThRkR8XzaPgaYkMaZBfQFdq3RbgiwCNg3/S5TlbpR2BExKSJKEVEaOHBgB8I2M7NG+HMQOtePgAW0fTEvXASBjwCHkU35f0PS+yPiaUnPSjoSOJB1swm1CFgaESOqlE0GRkVEi6RxwBE1+ljDuiSyb0XZyxVjnRgRj9UMRjoW+B6wO3AcMBB4WdJRETESWA1sJ2mLNIuwM/BM7cMzM7PO5BmETpTusH/N+ovt5gDllftjgfvzbST1AnaJiJnAV4HtgK1T8dVkjxp+HRFr6wz/GDBQ0ojUbx9J5ZmCbYCVkvqwfqLxYiorWw4MS9uj2xjrLuDMtI4ASUMqK0TEHamvJRGxL7CU7DHKyFQewMzcOKcAv61zjGZm1kmcIHS+HwL5dzOMBz4jqRX4FPDlivq9gevT9P9CslX9f01l08iShXqPF4iI18kuthdLaiGbzj8oFX8DmAvMAB7NNfsVcG5aaLgHcCnwBUlzKo6h0reBPkCrpCXpdTVDgJb09sU+6TFM3v8FviLpCbI1Cf9Z7zjNzKxzKLtxs54gvaPgsog4tKtj6Q5KpVI0Nzd3qK0/atnMNleS5kdEqV49r0HoISRNAL5A/bUH1gAnA2ZmbfMjhh4iIi6KiN0i4v76tc3MzDaMEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMytwgmBmZmYFThDMzMyswF/WZJZU+4ZHM7PuaFN84ZxnEMzMzKzACUIFSWslLZK0RNLvJG1Xp/5ESedsotgGSfpEG+WXSFoq6ZIO9N0k6dgNi7ChcT4vaVwa76rc/sMkLZC0RtLozo7DzMza5gSh6JWIaIqIfYDngX/r6oByBgE1EwTg88DQiDi3A303Ae1KEJRp79/QocBs4PD0u+zPwDjgl+3sz8zMOoEThLY9COwEIGkPSXdKmi9ptqS9KytXqyNpW0nLyxdSSf0kPS2pj6TPSZonqUXSzZL6pTqTJV0uaY6kP+buqC8CDk0zHGdXjD0N6A/MlTRG0sDU57z0c3Cqd0Dqd2H6vZektwHfAsakvsdUzoykGZVB6ecRSVcCC4BdJJ2bxmiVdGG1EynpbEmLgI8CNwMXAueVZxEiYnlEtAJvduhfyszMNionCDVI6g18EJiWdk0CzoyIYcA5wJVVmhXqRMTfgBayO2aAfwXuiog3gFsiYnhE7A88Apya62tH4BDgOLLEAGACMDvNcFyWHzgijmfd7MdU4MfAZRExHDgRuDpVfRQ4LCKGABcA34uI19P21Fz7tuwFXJf62AvYEziAbBZimKTDKhukeI8G7o6IJuDxiBgcEWfUGcvMzLqA38VQtFW60x0EzAdmSNoaOAi4UVK53pb5RnXqTAXGADOBj7MuudhH0neA7YCtgbtyXd4aEW8CyyS9swPHcRQwOBfL2yVtA2wLXCtpTyCAPh3o+08R8VDaPib9LEyvtyZLGO6r0m4o0JLi+EsHxkXS6cDpALvuumtHujAzswY4QSh6JSKaJG0L3Ea2BmEy8Nd051tLrzbqTAO+L2l7YBhwT9o/GRgVES2SxgFH5Nq8ltsW7dcLGBERr+R3SvoJMDMiPippEDCrRvs1rD/D1De3/XJFbN+PiJ/VCkTSDsB0YAfgVeBkYJuUiJ0YEU82ckAAETGJbKaGUqkUjbYzM7P28SOGGtKjgfFkjwpeAZ6SdBK8tThv/4r6L9SqExEvAQ+TTfvfFhFrU7NtgJWS+gBjGwjrxdSmEdOBL5VfSGpKm9sC/522x7XR93KyO34kDQV2rzHOXcBn0wwKknZKCcFbIuK5lDgtIHsUcT3wmfQ4o+HkwMzMNh0nCG2IiIVk6wc+TnYBP1VSC7AUOKFKk7bqTAU+mX6XfQOYC8wgWxtQTyuwJi1qPLtO3fFAKS0cXAaUn/X/gGw24wGgd67+TLJHEoskjSFbSLh9usv/AvCHaoNExHSydx48KGkxcBNVkpi0puMdEbGa7FHM/RXlwyWtAE4CfiZpaZ3jMzOzTqQIz9Jaz1QqlaK5uXmj9edPUjSznmJDPklR0vyIKNWr5xkEMzMzK/AiRbNkU3y2uZlZT+EZBDMzMytwgmBmZmYFThDMzMyswAmCmZmZFThBMDMzswInCGZmZlbgBMHMzMwKnCCYmZlZgRMEMzMzK3CCYGZmZgVOEMzMzKzA38VglvjbHDecv8/C7B+HZxDMzMyswAmCmZmZFThBqCBpraRFkpZI+p2k7erUnyjpnE0U2yBJn2ij/BJJSyVd0oG+myQdu2ERNjTO5yWNS+Ndldu/paSpkp6QNFfSoM6OxczManOCUPRKRDRFxD7A88C/dXVAOYOAmgkC8HlgaESc24G+m4B2JQjKtPdv6FBgNnB4+l12KvCXiHgPcBlwcTv7NTOzjcgJQtseBHYCkLSHpDslzZc0W9LelZWr1ZG0raTl5QuppH6SnpbUR9LnJM2T1CLpZkn9Up3Jki6XNEfSHyWNTkNcBByaZjjOrhh7GtAfmCtpjKSBqc956efgVO+A1O/C9HsvSW8DvgWMSX2PqZwZSTMqg9LPI5KuBBYAu0g6N43RKunCaidS0tmSFgEfBW4GLgTOy80inABcm7ZvAj4oSe35xzIzs43HCUINknoDHwSmpV2TgDMjYhhwDnBllWaFOhHxN6CF7I4Z4F+BuyLiDeCWiBgeEfsDj5DdRZftCBwCHEeWGABMAGanGY7L8gNHxPGsm/2YCvwYuCwihgMnAlenqo8Ch0XEEOAC4HsR8Xranppr35a9gOtSH3sBewIHkM1CDJN0WGWDFO/RwN0R0QQ8HhGDI+KMVGUn4OlUdw3wN+Adlf1IOl1Ss6TmVatW1QnTzMw6ym9zLNoq3ekOAuYDMyRtDRwE3Ji7qd0y36hOnanAGGAm8HHWJRf7SPoOsB2wNXBXrstbI+JNYJmkd3bgOI4CBudiebukbYBtgWsl7QkE0KcDff8pIh5K28ekn4Xp9dZkCcN9VdoNBVpSHH+pKKs2WxCFHRGTyBIxSqVSodzMzDYOJwhFr0REk6RtgdvI1iBMBv6a7nxr6dVGnWnA9yVtDwwD7kn7JwOjIqJF0jjgiFyb13LbHZlq7wWMiIhX8jsl/QSYGREfTQsBZ9Vov4b1Z5j65rZfrojt+xHxs1qBSNoBmA7sALwKnAxskxKxEyPiSWAFsAuwQtIWZInM83WO0czMOokfMdSQHg2MJ3tU8ArwlKST4K3FeftX1H+hVp2IeAl4mGza/7aIWJuabQOslNQHGNtAWC+mNo2YDnyp/EJSU9rcFvjvtD2ujb6Xk93xI2kosHuNce4CPptmUJC0U0oI3hIRz6XEaQHZo4jrgc+kxxlPpmrTgFPS9mjgnojwDIGZWRdxgtCGiFhItn7g42QX8FMltQBLyRbVVWqrzlTgk+l32TeAucAMsrUB9bQCa9KixrPr1B0PlNLCwWVA+Vn/D8hmMx4AeufqzyR7JLFI0hiyhYTbp7v8LwB/qDZIREwHfgk8KGkx2QLDQhKT1nS8IyJWkz2Kub+iyn8C75D0BPAVsvUWZmbWReSbNOupSqVSNDc3b7T+/FHLG84ftWzW/UmaHxGlevW8BsEs8cXNzGwdP2IwMzOzAicIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMr8NscrceStAr400bscgCweiP2tyn0tJh7WrzQ82LuafFCz4u5p8UL68e8W0QMrNfACYJZIqm5kfcGdyc9LeaeFi/0vJh7WrzQ82LuafFCx2L2IwYzMzMrcIJgZmZmBU4QzNaZ1NUBdEBPi7mnxQs9L+aeFi/0vJh7WrzQgZi9BsHMzMwKPINgZmZmBU4QzMzMrMAJgm32JH1Y0mOSnpA0oavjqUfSLpJmSnpE0lJJX+7qmBohqbekhZJu6+pYGiFpO0k3SXo0nesRXR1TWySdnf4elki6QVLfro6pkqRrJD0naUlu3/aSZkh6PP3+p66MsVKNmC9Jfxetkn4jabsuDHE91eLNlZ0jKSQNaKQvJwi2WZPUG/gp8C/AYOBkSYO7Nqq61gD/JyLeB3wA+LceEDPAl4FHujqIdvgxcGdE7A3sTzeOXdJOwHigFBH7AL2Bj3dtVFVNBj5csW8CcHdE7AncnV53J5MpxjwD2Cci9gP+AHxtUwfVhskU40XSLsDRwJ8b7cgJgm3uDgCeiIg/RsTrwK+AE7o4pjZFxMqIWJC2XyS7cO3UtVG1TdLOwEeAq7s6lkZIejtwGPCfABHxekT8tUuDqm8LYCtJWwD9gGe6OJ6CiLgPeL5i9wnAtWn7WmDUpoypnmoxR8T0iFiTXj4E7LzJA6uhxjkGuAz4KtDwOxOcINjmbifg6dzrFXTzi22epEHAEGBuF4dSz4/I/uf0ZhfH0ah/BlYBP0+PRa6W1L+rg6olIv4buJTs7nAl8LeImN61UTXsnRGxErLkF9ihi+Npr88C/9XVQbRF0vHAf0dES3vaOUGwzZ2q7OsR7/2VtDVwM3BWRLzQ1fHUIuk44LmImN/VsbTDFsBQ4N8jYgjwMt1v6vst6bn9CcDuwLuB/pI+2bVR/eOTdB7ZI78pXR1LLZL6AecBF7S3rRME29ytAHbJvd6Zbjg1W0lSH7LkYEpE3NLV8dRxMHC8pOVkj3COlHR914ZU1wpgRUSUZ2ZuIksYuqujgKciYlVEvAHcAhzUxTE16llJOwKk3891cTwNkXQKcBwwNrr3BwrtQZY4tqT/BncGFkh6V72GThBsczcP2FPS7pLeRrawa1oXx9QmSSJ7Nv5IRPy/ro6nnoj4WkTsHBGDyM7vPRHRre9uI+J/gKcl7ZV2fRBY1oUh1fNn4AOS+qW/jw/SjRdVVpgGnJK2TwF+24WxNETSh4H/CxwfEX/v6njaEhGLI2KHiBiU/htcAQxNf+NtcoJgm7W00OhLwF1k/0P9dUQs7dqo6joY+BTZnfii9HNsVwf1D+hMYIqkVqAJ+F7XhlNbmum4CVgALCb7f3u3+zhgSTcADwJ7SVoh6VTgIuBoSY+TrbK/qCtjrFQj5iuAbYAZ6b+/q7o0yJwa8Xasr+49M2JmZmZdwTMIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRX8/1DYp+u2UWHLAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"feature_names = [\n",
" \"Relevant feature #0\",\n",
" \"Relevant feature #1\",\n",
" \"Noisy feature #0\",\n",
" \"Noisy feature #1\",\n",
" \"Noisy feature #2\",\n",
" \"First repetition of feature #0\",\n",
" \"First repetition of feature #1\",\n",
" \"Second repetition of feature #0\",\n",
" \"Second repetition of feature #1\",\n",
"]\n",
"coef = pd.Series(linear_regression.coef_, index=feature_names)\n",
"_ = coef.plot.barh()"
]
},
{
"cell_type": "markdown",
"id": "d2c7ab37",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"We see that the coefficient values are far from what one could expect.\n",
"By repeating the informative features, one would have expected these\n",
"coefficients to be similarly informative.\n",
"\n",
"Instead, we see that some coefficients have a huge norm ~1e14. It indeed\n",
"means that we try to solve an mathematical ill-posed problem. Indeed, finding\n",
"coefficients in a linear regression involves inverting the matrix\n",
"`np.dot(data.T, data)` which is not possible (or lead to high numerical\n",
"errors)."
]
},
{
"cell_type": "markdown",
"id": "2ddd26a3",
"metadata": {},
"source": [
"Create a ridge regressor and fit on the same dataset. Check the coefficients.\n",
"What do you observe?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1905d49b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3.6313933 , 13.46802113, -0.20549345, -0.18929961, 0.11117205,\n",
" 3.6313933 , 13.46802113, 3.6313933 , 13.46802113])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# solution\n",
"from sklearn.linear_model import Ridge\n",
"\n",
"ridge = Ridge()\n",
"ridge.fit(data, target)\n",
"ridge.coef_"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f087bf65",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAggAAAD4CAYAAACJ8R5TAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAyQUlEQVR4nO3de5yVZb3//9cbJBE03QaWecJtppGHARYanjG1trmVEn9kVFKaWTtJ+2pfSjPsqGlfy8xtbLehSUYeMlK3QgqKoshwmOGgpiYlW7bCtvKQJ/Dz++O+ltyse61ZawaGmYn38/GYx9zrvk6f+2b0/tzXfa21FBGYmZmZ5fXq6gDMzMys+3GCYGZmZgVOEMzMzKzACYKZmZkVOEEwMzOzgi26OgCzjhowYEAMGjSoq8MwM+tR5s+fvzoiBtar5wTBeqxBgwbR3Nzc1WGYmfUokv7USD0/YjAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYEXKZolgybc3tUhmJk1ZPlFH+n0MTyDYGZmZgVOEMzMzKygboIg6TxJSyW1Slok6cBNEVhFDEdIum0Tj/n1itdz0u9Bkj6R21+SdHknx3KSpEckzaxTb3yqN6UDY6x3XJ1F0ockTZT0T5LuyO3fW9KDkl6TdE5nx2FmZm1rM0GQNAI4DhgaEfsBRwFPb4rA2kvSxl5PsV6CEBEHpc1BwCdy+5sjYvxGHrvSqcAXI2JknXpfBI6NiLEdGGMQueNqlKTe7WxyKDAbOAx4ILf/eWA8cGl7YzAzs42v3gzCjsDqiHgNICJWR8QzAJKGSbpX0nxJd0naMe1/j6TfS2qRtEDSHspcImmJpMWSxqS6R0iaJekmSY9KmiJJqezDad/9wMeqBSdpnKQbJf0OmC6pv6RrJM2TtFDSCbl6v5V0p6THJH0z18cnJT2cZkd+Jqm3pIuArdK+KaneS6nJRcChqezs/OyGpO0l3ZpmWx6StF/aPzHFNUvSHyVVTSgknZzOzxJJF6d9FwCHAFdJuqTWP5Skq4B/BqaluGqdi0GSZqd/mwWSyolP5XGNk3RFrv/bJB1RPheSviVpLjCi2jmsEt8YSYvIkoAfAf8BfEbSNICIeC4i5gFv1DpGMzPbdOolCNOBXST9QdKVkg4HkNQH+AkwOiKGAdcA301tpgA/jYj9gYOAlWQX+CZgf7JZiEvKCQUwBDgLGEx2gTtYUl+yC8i/kt1xvquNGEcAp0TEkcB5wD0RMRwYmcbpn+odAIxNcZyk7NHA+4AxwMER0QSsBcZGxATglYhoqnI3PgGYncouqyi7EFiYZlu+DlyXK9sb+FCK45vpHL5F0ruBi4EjU4zDJY2KiG8BzSmuc2udhIg4A3gGGJniqnUungOOjoih6djLj0faOq5K/YElEXEg8L9UOYdV4psKDE3t9gWWAEMi4vg6Y61H0umSmiU1r1q1qj1NzcysHdqclo+IlyQNI7tIjwSmSppAdsHaB5iRbvh7AyslbQPsFBG/Se1fBZB0CHBDRKwFnpV0LzAceAF4OCJWpHqLyKa6XwKeiojH0/7rgdNrhDkjIp5P28cAx2vdM+y+wK65ev+b+ruF7K58DTAMmJeOYyuyC2hHHQKcmI79HknvkLRtKrs9zcS8Juk54J3Ailzb4cCsiFiVYpxCNg1/awdjqXUungGukNREdjF/bwf6XgvcnLY/SOPncE/gybTdLyJebO/AETEJmARQKpWive3NzKwxdZ/bp4v6LGCWpMXAKcB8YGlEjMjXlfT2Gt2ojSFey22vzcXU6P/8X64Y58SIeKwirgOr9Bep/rUR8bUGx6qn2nGWx611nG213dBYqp2LicCzZLM5vYBXa7Rfw/ozTH1z26+mv4vyOHXPoaRmYACwhaRlwI4pITwzImY3dERmZrbJ1FukuJekPXO7moA/AY8BA5UtYkRSH0nvj4gXgBWSRqX9W0rqB9wHjEnP9weS3Rk/3MbQjwK7S9ojvT65weO5CzhTemsdw5Bc2dFpjcBWwCiyBXJ3A6Ml7ZDqby9pt1T/jcrHAMmLwDY1xr+PNL2entevTuekEXOBwyUNSM/wTwbubbBtNbXOxbbAyoh4E/gU2ewPFI9rOdAkqZekXcgejVTT1jl8S0SUgNuBE4AfAOelxxlODszMuqF6axC2Bq6VtExSK9k6gYkR8TowGrhYUguwiGy9AWQXnfGp/hyy9QO/AVqBFuAe4KsR8T+1Bk2PJk4Hble2SLGhr6YEvg30AVolLUmvy+4HfpFivTm9+2AZcD7ZAsdWYAbZwkzIprFbVXzLYCuwRtkizLMryiYCpdTXRWSzLQ2JiJXA14CZZOdpQUT8ttH2VdQ6F1cCp0h6iOzxQnkGpvK4HgCeAhaTvbNgQY242zqHlYaSnf9DqUh+JL1L0grgK8D5kla0MSNlZmadTBH/+I9xJY0DShHxpa6OxTaeUqkUzc3NG60/f9SymfUUG/JRy5Lmp1ndNvm7GMySTfHZ5mZmPcVmkSBExGRgcheHYWZm1mP4uxjMzMyswAmCmZmZFThBMDMzswInCGZmZlbgBMHMzMwKnCCYmZlZgRMEMzMzK3CCYGZmZgVOEMzMzKzACYKZmZkVOEEwMzOzgs3iuxjMGuFvc9xw/sIrs38cnkEwMzOzgroJgqTzJC2V1CppkaQDN0VgFTEcIem2TTzm1ytez0m/B0n6RG5/SdLlnRzLSZIekTSzTr3xqd6UDoyx3nF1FkkfkjRR0j9JuiO3X5Iul/RE+lsb2tmxmJlZbW0mCJJGAMcBQyNiP+Ao4OlNEVh7SdrYj0vWSxAi4qC0OQj4RG5/c0SM38hjVzoV+GJEjKxT74vAsRExtgNjDCJ3XI2S1LudTQ4FZgOHAQ/k9v8LsGf6OR349/bGYmZmG0+9GYQdgdUR8RpARKyOiGcAJA2TdK+k+ZLukrRj2v8eSb+X1CJpgaQ90t3hJZKWSFosaUyqe4SkWZJukvSopCmSlMo+nPbdD3ysWnCSxkm6UdLvgOmS+ku6RtI8SQslnZCr91tJd0p6TNI3c318UtLDaXbkZ5J6S7oI2Crtm5LqvZSaXAQcmsrOzs9uSNpe0q3pDvghSful/RNTXLMk/VFS1YRC0snp/CyRdHHadwFwCHCVpEtq/UNJugr4Z2BaiqvWuRgkaXb6t1kgqZz4VB7XOElX5Pq/TdIR5XMh6VuS5gIjqp3DKvGNkbQIGA/8CPgP4DOSpqUqJwDXReYhYLvy35SZmW169RKE6cAukv4g6UpJhwNI6gP8BBgdEcOAa4DvpjZTgJ9GxP7AQcBKsgt8E7A/2SzEJbn/+Q8BzgIGk13gDpbUl+wC8q9kd5zvaiPGEcApEXEkcB5wT0QMB0amcfqnegcAY1McJyl7NPA+YAxwcEQ0AWuBsRExAXglIpqq3I1PAGansssqyi4EFqbZlq8D1+XK9gY+lOL4ZjqHb5H0buBi4MgU43BJoyLiW0BziuvcWichIs4AngFGprhqnYvngKMjYmg69vLjkbaOq1J/YElEHAj8L1XOYZX4pgJDU7t9gSXAkIg4PlXZifVnp1akfWZm1gXanJaPiJckDSO7SI8EpkqaQHbB2geYkW74ewMrJW0D7BQRv0ntXwWQdAhwQ0SsBZ6VdC8wHHgBeDgiVqR6i8imul8CnoqIx9P+68mmnauZERHPp+1jgOMlnZNe9wV2zdX739TfLWR35WuAYcC8dBxbkV1AO+oQ4MR07PdIeoekbVPZ7Wkm5jVJzwHvJLsIlg0HZkXEqhTjFLJp+Fs7GEutc/EMcIWkJrKL+Xs70Pda4Oa0/UEaP4d7Ak+m7X4R8WKuTFXqR+UOSaeT/hZ23XXXQgMzM9s46j63Txf1WcAsSYuBU4D5wNKIGJGvK+ntNbqp9j//stdy22tzMRUuDjW8XDHOiRHxWEVcB1bpL1L9ayPiaw2OVU9bF7lax9lW2w2Npdq5mAg8Szab0wt4tUb7Naw/w9Q3t/1q+rsoj1P3HEpqBgYAW0haBuyYEsIzI2I2WbK0S67JzmTJzHoiYhIwCaBUKjX6N2JmZu1Ub5HiXpL2zO1qAv4EPAYMVLaIEUl9JL0/Il4AVkgalfZvKakfcB8wJj3fH0h2Z/xwG0M/CuwuaY/0+uQGj+cu4EzprXUMQ3JlR6c1AlsBo8gWyN0NjJa0Q6q/vaTdUv03Kh8DJC8C29QY/z7S9Hp6Xr86nZNGzAUOlzQgPcM/Gbi3wbbV1DoX2wIrI+JN4FNksz9QPK7lQJOkXpJ2IXs0Uk1b5/AtEVECbidba/AD4Lz0OGN2qjIN+LQyHwD+FhErO3LgZma24eqtQdgauFbSMkmtZOsEJkbE68Bo4GJJLcAisvUGkF10xqf6c8jWD/wGaAVagHuAr0bE/9QaND2aOB24XdkixT81eDzfBvoArZKWpNdl9wO/SLHenN59sAw4n2yBYyswg2xhJmR3qa0qvmWwFVijbBHm2RVlE4FS6usistmWhqSL4deAmWTnaUFE/LbR9lXUOhdXAqdIeojs8UJ5BqbyuB4AngIWA5cCC2rE3dY5rDSU7PwfSjH5uQP4I/AE2fqTL7bnYM3MbONSxD/+LK2kcUApIr7U1bHYxlMqlaK5uXmj9edPUtxw/iRFs+5P0vw0q9smf5KimZmZFWwW38UQEZOByV0chnVzvvs1M1vHMwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMytwgmBmZmYFThDMzMyswAmCmZmZFThBMDMzswInCGZmZlbgBMHMzMwKnCCYmZlZwWbxXQxmjfC3OZpZT7EpvjvGMwhmZmZW4ATBzMzMCjo9QZC0VtKi3M8gSXPa2cdZkvp1Vow1xjxC0kG512dI+nTaHifp3bmyqyUN7sRYtpT0+3T+xrRRb6CkuZIWSjq0A+Osd1ydRdIsSX0l/UjSB3L7vyvpaUkvdXYMZmbWtk2xBuGViGiq2HdQZSVJvSNibY0+zgKuB/7e1kB1+mivI4CXgDkAEXFVrmwcsAR4JpWdtpHGrGUI0KfKeaz0QeDRiDilg+OMI3dcjZC0RUSsaUf9rYC1EfGqpOHAubni3wFXAI832p+ZmXWOLnnEUL5DTHfpMyX9Elgsqb+k2yW1SFoiaYyk8cC7gZmSZlbpa7mkCyTdD5wk6RhJD0paIOlGSVvn6l0s6eH08560f6CkmyXNSz8HSxoEnAGcne7aD5U0UdI5kkYDJWBKKtsq3RGXUn8nS1qc4r84f8zpDrlF0kOS3lnlWLaXdKuk1lRnP0k7kCVHTWm8PWqc0ybgB8CxubhqnYsL0rEukTRJmWrHtVzSgNSmJGlW2p6Y2k0Hrqt2DmvEOBNYDOwjaTGwLzBP0rEAEfFQRKys/ldjZmab0qZIELbSuscLv6lSfgBwXkQMBj4MPBMR+0fEPsCdEXE52R3tyIgYWWOMVyPiEOD3wPnAURExFGgGvpKr90JEHEB2l/qjtO/HwGURMRw4Ebg6IpYDV6X9TRExu9xBRNyU+h2byl4pl6Xp+YuBI4EmYLikUam4P/BQROwP3Ad8rspxXAgsjIj9gK8D10XEc8BpwOw03pPVTkBELAIuAKammYb+bZyLKyJieDrHWwHHtXVcNQwDToiIT1Q7hzViHAlMAr4InAn8LI11R52x3iLpdEnNkppXrVrVaDMzM2unrnrEkPdwRDyVthcDl6Y779vyF+Y6pqbfHwAGAw9IAngb8GCu3g2535el7aOAwak+wNslbdPguJWGA7MiYhWApCnAYcCtwOvAbanefODoKu0PIbvAEhH3SHqHpG07GEtb52KkpK8C/YDtgaVk0/vtMS2XRFQ9hxHxYpV2Q4CbgWOBRe0ck4iYRJZkUCqVor3tzcysMd3hcxBeLm9ExB8kDSO7eHxf0vSI+FY7+hAwIyJOrlEvqmz3AkZU3jHnLnbt0VajNyKiPOZaqp/7au07ehGsei4k9QWuBEoR8bSkiUDfGn2sYd0sU2Wdl3PbVc9hxbinAV8C3gO8D9gVeFbSsRExtrFDMjOzTaVbvc0xTdH/PSKuBy4FhqaiF4FG7uofAg7OrS/oJ+m9ufIxud/lu+npZBeucgxNDYxZq2wucLikAZJ6AycD9zYQd9l9wNgUxxHA6oh4oR3t82qdi/KFfnVakzA616byuJaTPUqANLNRQ61z+JaIuBo4BrgnzSg9ERHvc3JgZtY9dasEgWzR2sOSFgHnAd9J+ycB/1VtkWJemtofB9wgqZXsIrl3rsqWkuYCXwbOTvvGA6W0MHAZ2eJEyKbcP1pepFgx1GTgqvJivtz4K4GvATOBFmBBRPy20YMHJpZjAS4COvpuhJrnIiL+CvwH2eOcW4F5uWaTWf+4LgR+LGk22axHLbXOYaXDgPsl7QL8qbJQ0g8krQD6SVqRZjfMzKwLaN2s9z82ScvJptVXd3UstnGUSqVobm7eaP35o5bNrKfYkI9aljQ/Ikr16nWHNQhm3cKm+GxzM7OeYrNJECJiUFfHYGZm1lN0tzUIZmZm1g04QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMytwgmBmZmYFThDMzMyswAmCmZmZFThBMDMzs4LN5qOWzerxlzVtOH+fhdk/Ds8gmJmZWYETBDMzMyvYpAmCpLWSFuV+Bkma084+zpLUr7NirDHmEZIOyr0+Q9Kn0/Y4Se/OlV0taXAnxrKlpN+n8zemomzvtH+hpD060PcmObeSHky/b5W0Y27/hyU9JukJSRM6Ow4zM6ttU69BeCUimir2HVRZSVLviFhbo4+zgOuBv7c1UJ0+2usI4CVgDkBEXJUrGwcsAZ5JZadtpDFrGQL0qXIeAUYBv42Ib3aw77No4NzmSdoiIta0o/57gCckCXhXRKxM+3sDPwWOBlYA8yRNi4hl7YjfzMw2ki5/xCDppfT7CEkzJf0SWCypv6TbJbVIWiJpjKTxwLuBmZJmVulruaQLJN0PnCTpGEkPSlog6UZJW+fqXSzp4fTznrR/oKSbJc1LPwdLGgScAZyd7s4PlTRR0jmSRgMlYEoq20rSLEml1N/Jkhan+C/OH7Ok76Zje0jSO6scy/bpDrs11dlP0g5kF/CmNN4eufrHkl3gTyufG0mfTMe3SNLP0kUYSf8uqVnSUkkXpn2Fc1v+t0nboyVNTtuTJf2/VO9iSXtIulPSfEmzJe1d5Xi2krQIuIcs4XoEeG+KrQk4AHgiIv4YEa8DvwJOqPFnY2ZmnWxTJwhb5R4v/KZK+QHAeRExGPgw8ExE7B8R+wB3RsTlZHfqIyNiZI0xXo2IQ4DfA+cDR0XEUKAZ+Equ3gsRcQBwBfCjtO/HwGURMRw4Ebg6IpYDV6X9TRExu9xBRNyU+h2byl4pl6XHDhcDRwJNwHBJo1Jxf+ChiNgfuA/4XJXjuBBYGBH7AV8HrouI54DTgNlpvCdzsdyRi3OkpPcBY4CD02zDWmBsqn5eRJSA/YDDJe3X4LnNey/Zuf0/wCTgzIgYBpwDXFlZOSLKs0e3kc10XAR8Ix3HImAn4OlckxVp33oknZ6Sm+ZVq1Y1EKaZmXVEd3jEkPdwRDyVthcDl6Y779vyF+Y6pqbfHwAGAw9ks9m8DXgwV++G3O/L0vZRwOBUH+DtkrZpcNxKw4FZEbEKQNIU4DDgVuB1sgslwHyyafVKh5AlKUTEPZLeIWnbdoz/QWAY2VQ9wFbAc6ns/5N0Otm//45k56m1HX0D3BgRa9OszEHAjbnztmUb7fYleyTzCeCW3H5VqRuFHRGTyBISSqVSodzMzDaO7vY5CC+XNyLiD5KGAccC35c0PSK+1Y4+BMyIiJNr1Isq272AEfmZAIDcha892mr0RkSUx1xL9X+Hhi6Ydca/NiK+tt5OaXeyu/zhEfGX9Nigb40+8uNV1imf517AX+skfki6gCzh2QOYC/wzcIykOyPiXLIZg11yTXYmreswM7NNr8vXINSSpuj/HhHXA5cCQ1PRi0Ajd/UPAQfn1hf0k/TeXPmY3O/yzMJ04Eu5GJoaGLNW2Vyy6fsB6dn/ycC9DcRddh/pkYCkI4DVEfFCO9rfDYxO6xbKaxp2A95OdnH/W1r78C9tHMuzkt4nqRfw0WqDpJieknRSGkeS9q9S71tkj0d+DhwItETEvik5AJgH7Clpd0lvAz4OTGvH8ZqZ2UbU3WYQ8vYFLpH0JvAG8IW0fxLwX5JWtvWsPCJWSRoH3CCpPOV9PvCHtL2lpLlkSVJ5lmE88FNJrWTn5j6yBYq/A26SdAJwZsVQk4GrJL0CjMiNv1LS14CZZHfzd0TEb9tx/BOBn6dY/g6c0o62RMQySecD09MF/g3g3yLiIUkLgaXAH4EHcs0qz+0EskchT5M9Fti6xnBjgX9P4/UhW2DYUqXe4cBssrUmD1XEu0bSl4C7gN7ANRGxtD3HbGZmG4/WzXRvPiQtB0oRsbqrY7GOK5VK0dzcvNH680ctbzh/1LJZ9ydpflqo3qbuPINgtkn54mZmts5mmSBExKCujsHMzKw767aLFM3MzKzrOEEwMzOzAicIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmyaAJt/sLm8zMEicIZmZmVuAEwczMzAqcIHQCSSHph7nX50iaWKfNGZI+vYHjbinp95IWSRrTgfajJA3ekBgaHOcGSYMknSXp47n9UyQ9JmmJpGsk9ensWMzMrDonCJ3jNeBjkgY02iAiroqI6zZw3CFAn4hoioipHWg/CmhXgiCpI18ZvntELAcOB2bn9k8B9gb2BbYCTutA32ZmthE4Qegca4BJwNmVBZJ2k3S3pNb0e9e0f6Kkc9L2eEnLUp1fSeol6XFJA1N5L0lP5BMQSTsA1wNNaQZhD0nDJN0rab6kuyTtmOp+TtI8SS2SbpbUT9JBwPHAJbn2sySVUpsBkpan7XGSbpT0O2C6pP7pjn+epIWSTqh2UtIMwTJgL0mLgGOA2yWdBhARd0QCPAzsvKH/EGZm1jFOEDrPT4Gxkrat2H8FcF1E7Ed2x3x5lbYTgCGpzhkR8SbZxX9sKj8KaImI1eUGEfEc2R337IhoAv4M/AQYHRHDgGuA76bqt0TE8IjYH3gEODUi5gDTgHPTDMSTdY5vBHBKRBwJnAfcExHDgZFkSUb/ygYRMRaYCHyHbLbijjTW1fl66dHCp4A7K/uQdLqkZknNq1atqhOimZl1lBOEThIRLwDXAeMrikYAv0zbvwAOqdK8FZgi6ZNksxGQXeDLaxQ+C/y8Tgh7AfsAM9Ld+vmsuyPfR9JsSYvJko73N3JMFWZExPNp+xhgQhpnFtAX2LVGuyHAIrLHCItq1LkSuC8iZlcWRMSkiChFRGngwIEdCNvMzBrRkefH1rgfAQto+2IeVfZ9BDiMbMr/G5LeHxFPS3pW0pHAgaybTahFwNKIGFGlbDIwKiJaJI0DjqjRxxrWJZF9K8perhjrxIh4rGYw0rHA94DdgeOAgcDLko6KiJG5et9MZZ+v1ZeZmXU+zyB0onSH/Wvg1NzuOUB55f5Y4P58G0m9gF0iYibwVWA7YOtUfDXZo4ZfR8TaOsM/BgyUNCL120dSeaZgG2BlmsrPJxovprKy5cCwtD26jbHuAs6UpDTWkMoKEXFH6mtJROwLLCV7jJJPDk4DPgScnB6rmJlZF3GC0Pl+COTfzTAe+IykVrLn7F+uqN8buD5N/y8ELouIv6ayaWTJQr3HC0TE62QX9YsltZBN5x+Uir8BzAVmAI/mmv0KODctNNwDuBT4gqQ5FcdQ6dtAH6BV0pL0upohQIukt5G92+KFivKrgHcCD6aFkhfUO04zM+scyhaMW0+Q3lFwWUQc2tWxdAelUimam5s3Wn/lj1leftFHNlqfZmbdjaT5EVGqV89rEHoISROAL1B/7YF1kBMDM7N1/Iihh4iIiyJit4i4v35tMzOzDeMEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYG/i8E2W+UvZyrzdzGYma3jGQQzMzMrcIJgZmZmBU4QOoGkkPTD3OtzJE2s0+YMSZ/ewHG3lPR7SYskjelA+1GSBm9IDA2Oc4OkQZLOkvTx3P4vSXoinb8BnR2HmZnV5gShc7wGfKw9F7mIuCoirtvAcYcAfSKiKSKmdqD9KKBdCYKkjqxj2T0ilgOHA7Nz+x8AjgL+1IE+zcxsI3KC0DnWAJOAsysLJO0m6W5Jren3rmn/REnnpO3xkpalOr+S1EvS45IGpvJe6U57QK7fHYDrgaY0g7CHpGGS7pU0X9JdknZMdT8naZ6kFkk3S+on6SDgeOCSXPtZkkqpzQBJy9P2OEk3SvodMF1Sf0nXpD4XSjqh2kmRNEXSMmAvSYuAY4DbJZ0GEBELU+JgZmZdzAlC5/kpMFbSthX7rwCui4j9gCnA5VXaTgCGpDpnRMSbZBf/san8KKAlIlaXG0TEc8BpwOyIaAL+DPwEGB0Rw4BrgO+m6rdExPCI2B94BDg1IuYA04Bz0wzEk3WObwRwSkQcCZwH3BMRw4GRZElG/8oGETEWmAh8h2y24o401tV1xnqLpNMlNUtqXrVqVaPNzMysnZwgdJKIeAG4DhhfUTQC+GXa/gVwSJXmrcAUSZ8km42A7AJfXqPwWeDndULYC9gHmJHu1s8Hdk5l+0iaLWkxWdLx/kaOqcKMiHg+bR8DTEjjzAL6ArvWaDcEWATsm363S0RMiohSRJQGDhzY3uZmZtYgfw5C5/oRsIC2L+ZRZd9HgMPIpvy/Ien9EfG0pGclHQkcyLrZhFoELI2IEVXKJgOjIqJF0jjgiBp9rGFdEtm3ouzlirFOjIjHagYjHQt8D9gdOA4YCLws6aiIGNn2oZiZ2abmGYROlO6wfw2cmts9Byiv3B8L3J9vI6kXsEtEzAS+CmwHbJ2KryZ71PDriFhbZ/jHgIGSRqR++0gqzxRsA6yU1If1E40XU1nZcmBY2h7dxlh3AWdKUhprSGWFiLgj9bUkIvYFlpI9RnFyYGbWDTlB6Hw/BPLvZhgPfEZSK/Ap4MsV9XsD16fp/4XAZRHx11Q2jSxZqPd4gYh4neyifrGkFrLp/INS8TeAucAM4NFcs18B56aFhnsAlwJfkDSn4hgqfRvoA7RKWpJeVzMEaJH0NrJ3W7yQL0yLM1eQPQppldTw2gQzM9u4FFFthtu6o/SOgssi4tCujqU7KJVK0dzc3OH2/qhlM9scSZofEaV69bwGoYeQNAH4AvXXHliDnBCYmdXmRww9RERcFBG7RcT99WubmZltGCcIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCvxdDLZZqvyiJvB3M5iZ5XkGwczMzAqcIJiZmVmBE4ROICkk/TD3+hxJE+u0OUPSpzdw3C0l/V7SIkljOtB+lKTBGxJDg+PcIGmQpLMkfTy3f3dJcyU9LmmqpLd1dixmZladE4TO8RrwMUkDGm0QEVdFxHUbOO4QoE9ENEXE1A60HwW0K0GQ1JF1LLtHxHLgcGB2bv/FwGURsSfwF+DUDvRtZmYbgROEzrEGmAScXVkgaTdJd0tqTb93TfsnSjonbY+XtCzV+ZWkXumuemAq7yXpiXwCImkH4HqgKc0g7CFpmKR7Jc2XdJekHVPdz0maJ6lF0s2S+kk6CDgeuCTXfpakUmozQNLytD1O0o2SfgdMl9Rf0jWpz4WSTqh2UiRNkbQM2EvSIuAY4HZJp0kScCRwU6p+LVnCYmZmXcAJQuf5KTBW0rYV+68ArouI/YApwOVV2k4AhqQ6Z0TEm2QX/7Gp/CigJSJWlxtExHPAacDsiGgC/gz8BBgdEcOAa4Dvpuq3RMTwiNgfeAQ4NSLmANOAc9MMxJN1jm8EcEpEHAmcB9wTEcOBkWRJRv/KBhExFpgIfIfs4n9HGutq4B3AXyNiTaq+Atipsg9Jp0tqltS8atWqOiGamVlHOUHoJBHxAnAdML6iaATwy7T9C+CQKs1bgSmSPkk2GwHZBb68RuGzwM/rhLAXsA8wI92tnw/snMr2kTRb0mKypOP9jRxThRkR8XzaPgaYkMaZBfQFdq3RbgiwCNg3/S5TlbpR2BExKSJKEVEaOHBgB8I2M7NG+HMQOtePgAW0fTEvXASBjwCHkU35f0PS+yPiaUnPSjoSOJB1swm1CFgaESOqlE0GRkVEi6RxwBE1+ljDuiSyb0XZyxVjnRgRj9UMRjoW+B6wO3AcMBB4WdJRETESWA1sJ2mLNIuwM/BM7cMzM7PO5BmETpTusH/N+ovt5gDllftjgfvzbST1AnaJiJnAV4HtgK1T8dVkjxp+HRFr6wz/GDBQ0ojUbx9J5ZmCbYCVkvqwfqLxYiorWw4MS9uj2xjrLuDMtI4ASUMqK0TEHamvJRGxL7CU7DHKyFQewMzcOKcAv61zjGZm1kmcIHS+HwL5dzOMBz4jqRX4FPDlivq9gevT9P9CslX9f01l08iShXqPF4iI18kuthdLaiGbzj8oFX8DmAvMAB7NNfsVcG5aaLgHcCnwBUlzKo6h0reBPkCrpCXpdTVDgJb09sU+6TFM3v8FviLpCbI1Cf9Z7zjNzKxzKLtxs54gvaPgsog4tKtj6Q5KpVI0Nzd3qK0/atnMNleS5kdEqV49r0HoISRNAL5A/bUH1gAnA2ZmbfMjhh4iIi6KiN0i4v76tc3MzDaMEwQzMzMrcIJgZmZmBU4QzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMytwgmBmZmYFThDMzMyswF/WZJZU+4ZHM7PuaFN84ZxnEMzMzKzACUIFSWslLZK0RNLvJG1Xp/5ESedsotgGSfpEG+WXSFoq6ZIO9N0k6dgNi7ChcT4vaVwa76rc/sMkLZC0RtLozo7DzMza5gSh6JWIaIqIfYDngX/r6oByBgE1EwTg88DQiDi3A303Ae1KEJRp79/QocBs4PD0u+zPwDjgl+3sz8zMOoEThLY9COwEIGkPSXdKmi9ptqS9KytXqyNpW0nLyxdSSf0kPS2pj6TPSZonqUXSzZL6pTqTJV0uaY6kP+buqC8CDk0zHGdXjD0N6A/MlTRG0sDU57z0c3Cqd0Dqd2H6vZektwHfAsakvsdUzoykGZVB6ecRSVcCC4BdJJ2bxmiVdGG1EynpbEmLgI8CNwMXAueVZxEiYnlEtAJvduhfyszMNionCDVI6g18EJiWdk0CzoyIYcA5wJVVmhXqRMTfgBayO2aAfwXuiog3gFsiYnhE7A88Apya62tH4BDgOLLEAGACMDvNcFyWHzgijmfd7MdU4MfAZRExHDgRuDpVfRQ4LCKGABcA34uI19P21Fz7tuwFXJf62AvYEziAbBZimKTDKhukeI8G7o6IJuDxiBgcEWfUGcvMzLqA38VQtFW60x0EzAdmSNoaOAi4UVK53pb5RnXqTAXGADOBj7MuudhH0neA7YCtgbtyXd4aEW8CyyS9swPHcRQwOBfL2yVtA2wLXCtpTyCAPh3o+08R8VDaPib9LEyvtyZLGO6r0m4o0JLi+EsHxkXS6cDpALvuumtHujAzswY4QSh6JSKaJG0L3Ea2BmEy8Nd051tLrzbqTAO+L2l7YBhwT9o/GRgVES2SxgFH5Nq8ltsW7dcLGBERr+R3SvoJMDMiPippEDCrRvs1rD/D1De3/XJFbN+PiJ/VCkTSDsB0YAfgVeBkYJuUiJ0YEU82ckAAETGJbKaGUqkUjbYzM7P28SOGGtKjgfFkjwpeAZ6SdBK8tThv/4r6L9SqExEvAQ+TTfvfFhFrU7NtgJWS+gBjGwjrxdSmEdOBL5VfSGpKm9sC/522x7XR93KyO34kDQV2rzHOXcBn0wwKknZKCcFbIuK5lDgtIHsUcT3wmfQ4o+HkwMzMNh0nCG2IiIVk6wc+TnYBP1VSC7AUOKFKk7bqTAU+mX6XfQOYC8wgWxtQTyuwJi1qPLtO3fFAKS0cXAaUn/X/gGw24wGgd67+TLJHEoskjSFbSLh9usv/AvCHaoNExHSydx48KGkxcBNVkpi0puMdEbGa7FHM/RXlwyWtAE4CfiZpaZ3jMzOzTqQIz9Jaz1QqlaK5uXmj9edPUjSznmJDPklR0vyIKNWr5xkEMzMzK/AiRbNkU3y2uZlZT+EZBDMzMytwgmBmZmYFThDMzMyswAmCmZmZFThBMDMzswInCGZmZlbgBMHMzMwKnCCYmZlZgRMEMzMzK3CCYGZmZgVOEMzMzKzA38VglvjbHDecv8/C7B+HZxDMzMyswAmCmZmZFThBqCBpraRFkpZI+p2k7erUnyjpnE0U2yBJn2ij/BJJSyVd0oG+myQdu2ERNjTO5yWNS+Ndldu/paSpkp6QNFfSoM6OxczManOCUPRKRDRFxD7A88C/dXVAOYOAmgkC8HlgaESc24G+m4B2JQjKtPdv6FBgNnB4+l12KvCXiHgPcBlwcTv7NTOzjcgJQtseBHYCkLSHpDslzZc0W9LelZWr1ZG0raTl5QuppH6SnpbUR9LnJM2T1CLpZkn9Up3Jki6XNEfSHyWNTkNcBByaZjjOrhh7GtAfmCtpjKSBqc956efgVO+A1O/C9HsvSW8DvgWMSX2PqZwZSTMqg9LPI5KuBBYAu0g6N43RKunCaidS0tmSFgEfBW4GLgTOy80inABcm7ZvAj4oSe35xzIzs43HCUINknoDHwSmpV2TgDMjYhhwDnBllWaFOhHxN6CF7I4Z4F+BuyLiDeCWiBgeEfsDj5DdRZftCBwCHEeWGABMAGanGY7L8gNHxPGsm/2YCvwYuCwihgMnAlenqo8Ch0XEEOAC4HsR8Xranppr35a9gOtSH3sBewIHkM1CDJN0WGWDFO/RwN0R0QQ8HhGDI+KMVGUn4OlUdw3wN+Adlf1IOl1Ss6TmVatW1QnTzMw6ym9zLNoq3ekOAuYDMyRtDRwE3Ji7qd0y36hOnanAGGAm8HHWJRf7SPoOsB2wNXBXrstbI+JNYJmkd3bgOI4CBudiebukbYBtgWsl7QkE0KcDff8pIh5K28ekn4Xp9dZkCcN9VdoNBVpSHH+pKKs2WxCFHRGTyBIxSqVSodzMzDYOJwhFr0REk6RtgdvI1iBMBv6a7nxr6dVGnWnA9yVtDwwD7kn7JwOjIqJF0jjgiFyb13LbHZlq7wWMiIhX8jsl/QSYGREfTQsBZ9Vov4b1Z5j65rZfrojt+xHxs1qBSNoBmA7sALwKnAxskxKxEyPiSWAFsAuwQtIWZInM83WO0czMOokfMdSQHg2MJ3tU8ArwlKST4K3FeftX1H+hVp2IeAl4mGza/7aIWJuabQOslNQHGNtAWC+mNo2YDnyp/EJSU9rcFvjvtD2ujb6Xk93xI2kosHuNce4CPptmUJC0U0oI3hIRz6XEaQHZo4jrgc+kxxlPpmrTgFPS9mjgnojwDIGZWRdxgtCGiFhItn7g42QX8FMltQBLyRbVVWqrzlTgk+l32TeAucAMsrUB9bQCa9KixrPr1B0PlNLCwWVA+Vn/D8hmMx4AeufqzyR7JLFI0hiyhYTbp7v8LwB/qDZIREwHfgk8KGkx2QLDQhKT1nS8IyJWkz2Kub+iyn8C75D0BPAVsvUWZmbWReSbNOupSqVSNDc3b7T+/FHLG84ftWzW/UmaHxGlevW8BsEs8cXNzGwdP2IwMzOzAicIZmZmVuAEwczMzAqcIJiZmVmBEwQzMzMr8NscrceStAr400bscgCweiP2tyn0tJh7WrzQ82LuafFCz4u5p8UL68e8W0QMrNfACYJZIqm5kfcGdyc9LeaeFi/0vJh7WrzQ82LuafFCx2L2IwYzMzMrcIJgZmZmBU4QzNaZ1NUBdEBPi7mnxQs9L+aeFi/0vJh7WrzQgZi9BsHMzMwKPINgZmZmBU4QzMzMrMAJgm32JH1Y0mOSnpA0oavjqUfSLpJmSnpE0lJJX+7qmBohqbekhZJu6+pYGiFpO0k3SXo0nesRXR1TPZLOTn8TSyTdIKlvV8eUJ+kaSc9JWpLbt72kGZIeT7//qStjrFQj5kvS30WrpN9I2q4LQ1xPtXhzZedICkkDGunLCYJt1iT1Bn4K/AswGDhZ0uCujaquNcD/iYj3AR8A/q0HxAzwZeCRrg6iHX4M3BkRewP7081jl7QTMB4oRcQ+QG/g410bVcFk4MMV+yYAd0fEnsDd6XV3MplizDOAfSJiP+APwNc2dVBtmEwxXiTtAhwN/LnRjpwg2ObuAOCJiPhjRLwO/Ao4oYtjalNErIyIBWn7RbIL105dG1XbJO0MfAS4uqtjaYSktwOHAf8JEBGvR8RfuzSoxmwBbCVpC6Af8EwXx7OeiLgPeL5i9wnAtWn7WmDUpoypnmoxR8T0iFiTXj4E7LzJA6uhxjkGuAz4KtDwOxOcINjmbifg6dzrFXTzi22epEHAEGBuF4dSz4/I/uf0ZhfH0ah/BlYBP0+PRa6W1L+rg2pLRPw3cCnZHeJK4G8RMb1ro2rIOyNiJWTJL7BDF8fTXp8F/qurg2iLpOOB/46Ilva0c4JgmztV2dcj3vsraWvgZuCsiHihq+OpRdJxwHMRMb+rY2mHLYChwL9HxBDgZbrf1Pd60rP7E4DdgXcD/SV9smuj+scm6TyyR35TujqWWiT1A84DLmhvWycItrlbAeySe70z3WxathpJfciSgykRcUtXx1PHwcDxkpaTPcI5UtL1XRtSXSuAFRFRnpm5iSxh6M6OAp6KiFUR8QZwC3BQF8fUiGcl7QiQfj/XxfE0RNIpwHHA2OjeHyi0B1nS2JL+G9wZWCDpXfUaOkGwzd08YE9Ju0t6G9mirmldHFObJIns2fgjEfH/ujqeeiLiaxGxc0QMIju/90REt76zjYj/AZ6WtFfa9UFgWReG1Ig/Ax+Q1C/9jXyQbr6wMpkGnJK2TwF+24WxNETSh4H/CxwfEX/v6njaEhGLI2KHiBiU/htcAQxNf+NtcoJgm7W00OhLwF1k/zP9dUQs7dqo6joY+BTZnfii9HNsVwf1D+hMYIqkVqAJ+F7XhtO2NNtxE7AAWEz2//du9ZHAkm4AHgT2krRC0qnARcDRkh4nW2V/UVfGWKlGzFcA2wAz0n9/V3VpkDk14u1YX917ZsTMzMy6gmcQzMzMrMAJgpmZmRU4QTAzM7MCJwhmZmZW4ATBzMzMCpwgmJmZWYETBDMzMyv4/wG1MafrVo8W3AAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"coef = pd.Series(ridge.coef_, index=feature_names)\n",
"_ = coef.plot.barh()"
]
},
{
"cell_type": "markdown",
"id": "5b15c35b",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"We see that the penalty applied on the weights give a better results: the\n",
"values of the coefficients do not suffer from numerical issues. Indeed, the\n",
"matrix to be inverted internally is `np.dot(data.T, data) + alpha * I`.\n",
"Adding this penalty `alpha` allow the inversion without numerical issue."
]
},
{
"cell_type": "markdown",
"id": "e5825c28",
"metadata": {},
"source": [
"Can you find the relationship between the ridge coefficients and the original\n",
"coefficients?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "83ef8489",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([10.89417991, 40.40406338, -0.61648035, -0.56789883, 0.33351616])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# solution\n",
"ridge.coef_[:5] * 3"
]
},
{
"cell_type": "markdown",
"id": "6df695f6",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"Repeating three times each informative features induced to divide the\n",
"ridge coefficients by three."
]
},
{
"cell_type": "markdown",
"id": "a64488eb",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"<div class=\"admonition tip alert alert-warning\">\n",
"<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
"<p>We advise to always use a penalty to shrink the magnitude of the weights\n",
"toward zero (also called \"l2 penalty\"). In scikit-learn, <tt class=\"docutils literal\">LogisticRegression</tt>\n",
"applies such penalty by default. However, one needs to use <tt class=\"docutils literal\">Ridge</tt> (and even\n",
"<tt class=\"docutils literal\">RidgeCV</tt> to tune the parameter <tt class=\"docutils literal\">alpha</tt>) instead of <tt class=\"docutils literal\">LinearRegression</tt>.</p>\n",
"<p class=\"last\">Other kinds of regularizations exist but will not be covered in this course.</p>\n",
"</div>\n",
"\n",
"## Dealing with correlation between one-hot encoded features\n",
"\n",
"In this section, we will focus on how to deal with correlated features that\n",
"arise naturally when one-hot encoding categorical features.\n",
"\n",
"Let's first load the Ames housing dataset and take a subset of features that\n",
"are only categorical features."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "03cc10c3",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"ames_housing = pd.read_csv(\"../datasets/house_prices.csv\", na_values='?')\n",
"ames_housing = ames_housing.drop(columns=\"Id\")\n",
"\n",
"categorical_columns = [\"Street\", \"Foundation\", \"CentralAir\", \"PavedDrive\"]\n",
"target_name = \"SalePrice\"\n",
"X, y = ames_housing[categorical_columns], ames_housing[target_name]\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" X, y, test_size=0.2, random_state=0\n",
")"
]
},
{
"cell_type": "markdown",
"id": "65787f48",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"\n",
"We previously presented that a `OneHotEncoder` creates as many columns as\n",
"categories. Therefore, there is always one column (i.e. one encoded category)\n",
"that can be inferred from the others. Thus, `OneHotEncoder` creates\n",
"collinear features.\n",
"\n",
"We illustrate this behaviour by considering the \"CentralAir\" feature that\n",
"contains only two categories:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f724aa4a",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"text/plain": [
"618 Y\n",
"870 N\n",
"92 Y\n",
"817 Y\n",
"302 Y\n",
" ..\n",
"763 Y\n",
"835 Y\n",
"1216 Y\n",
"559 Y\n",
"684 Y\n",
"Name: CentralAir, Length: 1168, dtype: object"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train[\"CentralAir\"]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "c4bd2c18",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CentralAir_N</th>\n",
" <th>CentralAir_Y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1163</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1164</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1165</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1166</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1167</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1168 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" CentralAir_N CentralAir_Y\n",
"0 0 1\n",
"1 1 0\n",
"2 0 1\n",
"3 0 1\n",
"4 0 1\n",
"... ... ...\n",
"1163 0 1\n",
"1164 0 1\n",
"1165 0 1\n",
"1166 0 1\n",
"1167 0 1\n",
"\n",
"[1168 rows x 2 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import OneHotEncoder\n",
"\n",
"single_feature = [\"CentralAir\"]\n",
"encoder = OneHotEncoder(sparse=False, dtype=np.int32)\n",
"X_trans = encoder.fit_transform(X_train[single_feature])\n",
"X_trans = pd.DataFrame(\n",
" X_trans,\n",
" columns=encoder.get_feature_names_out(input_features=single_feature),\n",
")\n",
"X_trans"
]
},
{
"cell_type": "markdown",
"id": "3c7e6b09",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"\n",
"Here, we see that the encoded category \"CentralAir_N\" is the opposite of the\n",
"encoded category \"CentralAir_Y\". Therefore, we observe that using a\n",
"`OneHotEncoder` creates two features having the problematic pattern observed\n",
"earlier in this exercise. Training a linear regression model on such a\n",
"of one-hot encoded binary feature can therefore lead to numerical\n",
"problems, especially without regularization. Furthermore, the two one-hot\n",
"features are redundant as they encode exactly the same information in\n",
"opposite ways.\n",
"\n",
"Using regularization helps to overcome the numerical issues that we highlighted\n",
"earlier in this exercise.\n",
"\n",
"Another strategy is to arbitrarily drop one of the encoded categories.\n",
"Scikit-learn provides such an option by setting the parameter `drop` in the\n",
"`OneHotEncoder`. This parameter can be set to `first` to always drop the\n",
"first encoded category or `binary_only` to only drop a column in the case of\n",
"binary categories."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "100b438b",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CentralAir_Y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1163</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1164</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1165</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1166</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1167</th>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1168 rows × 1 columns</p>\n",
"</div>"
],
"text/plain": [
" CentralAir_Y\n",
"0 1\n",
"1 0\n",
"2 1\n",
"3 1\n",
"4 1\n",
"... ...\n",
"1163 1\n",
"1164 1\n",
"1165 1\n",
"1166 1\n",
"1167 1\n",
"\n",
"[1168 rows x 1 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"encoder = OneHotEncoder(drop=\"first\", sparse=False, dtype=np.int32)\n",
"X_trans = encoder.fit_transform(X_train[single_feature])\n",
"X_trans = pd.DataFrame(\n",
" X_trans,\n",
" columns=encoder.get_feature_names_out(input_features=single_feature),\n",
")\n",
"X_trans"
]
},
{
"cell_type": "markdown",
"id": "dae0c25e",
"metadata": {
"tags": [
"solution"
]
},
"source": [
"\n",
"We see that only the second column of the previous encoded data is kept.\n",
"Dropping one of the one-hot encoded column is a common practice,\n",
"especially for binary categorical features. Note however that this breaks\n",
"symmetry between categories and impacts the number of coefficients of the\n",
"model, their values, and thus their meaning, especially when applying\n",
"strong regularization.\n",
"\n",
"Let's finally illustrate how to use this option is a machine-learning pipeline:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "0654d3b3",
"metadata": {
"tags": [
"solution"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R2 score on the testing set: 0.24\n",
"Our model contains 9 features while 13 categories are originally available.\n"
]
}
],
"source": [
"from sklearn.pipeline import make_pipeline\n",
"\n",
"model = make_pipeline(OneHotEncoder(drop=\"first\", dtype=np.int32), Ridge())\n",
"model.fit(X_train, y_train)\n",
"n_categories = [X_train[col].nunique() for col in X_train.columns]\n",
"print(\n",
" f\"R2 score on the testing set: {model.score(X_test, y_test):.2f}\"\n",
")\n",
"print(\n",
" f\"Our model contains {model[-1].coef_.size} features while \"\n",
" f\"{sum(n_categories)} categories are originally available.\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce7e70b0-fe29-4602-98aa-8005e9c4e2d2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "tags,-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@qdpham
Copy link
Author

qdpham commented Feb 25, 2022

Hi,

Here are the two files reproducing the issue, that we talked about in the forum, about exact same coefficients between LinearRegression and Ridge on my machine.

The first file fun_linear_models_ex_04.ipynb is the code run from the fun-inria server and the second my_linear_models_ex_04.ipynb run from my local machine.

Note that I’ve added a cell for computing the coefficients by hand.
I had an error raised from the fun-inria server, as the determinant is computed at exactly 0, however the `LinearRegression’ still provides results without explicit warning (although we can understand that the coefficients are unreasonable).
On the other hand, the determinant is computed at approximately 0 on my machine and so I could proceed and got somewhat reasonable coefficients.

At the end of the day, I get the same coefficients between LineaarRegression and Ridge (regularized) on my machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment