Skip to content

Instantly share code, notes, and snippets.

@izmailovpavel
Created March 2, 2016 22:22
Show Gist options
  • Save izmailovpavel/0486d994ac1ab6403beb to your computer and use it in GitHub Desktop.
Save izmailovpavel/0486d994ac1ab6403beb to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Машинное обучение\n",
"## ВМК МГУ, весна 2015/2016\n",
"## Лабораторная работа 1. Линейные модели"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ФИО: Измаилов Павел Алексеевич\n",
"\n",
"Группа: 317\n",
"\n",
"####Используемая версия python: 3 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Данное задание посвящено применению линейных моделей в задачах классификации и регрессии. Вы научитесь:\n",
"* делать one-hot-кодирование категориальных признаков\n",
"* обучать логистическую и линейную регрессию\n",
"* отбирать признаки с помощью LASSO\n",
"* вычислять метрики качества классификации и регрессии\n",
"* выбирать лучший классификатор при ограничениях на точность или полноту\n",
"* калибровать вероятности\n",
"* реализовывать градиентный спуск (если захотите)\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Часть 1. Научные гранты в Австралии"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](http://imgur.com/cBdDBO3.jpg)\n",
"\n",
"В данной части мы будем работать с задачей \"Predict Grant Applications\" (https://www.kaggle.com/c/unimelb/data), где для заявки на грант требуется предсказать, одобрят её или нет. Будем использовать лишь 40 признаков из имеющихся 249. Файлы для работы можно найти по ссылкам:\n",
"* https://db.tt/iYzRzQYP (обучение)\n",
"* https://db.tt/NGSHb5Qs (контроль)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание 1\n",
"Считайте обучающую и контрольную выборки из файлов с помощью pd.read_csv. Выделите целевую переменную (Grant.Status) в отдельный вектор."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"data_train = pd.read_csv('data/unimel_train.csv')\n",
"data_test = pd.read_csv('data/unimel_test.csv')\n",
"y_train = np.array(data_train['Grant.Status'])\n",
"y_test = np.array(data_test['Grant.Status'])\n",
"del data_train['Grant.Status']\n",
"del data_test['Grant.Status']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание 2\n",
"В этом задании мы приведем данные к виду, пригодному для обучения линейных классификаторов. Для этого вещественные признаки надо отмасштабировать, а категориальные — привести к числовому виду. Также надо устранить пропуски в данных."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В первую очередь поймем, зачем здесь нужно масштабирование. Нарисуем распределение трех признаков."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x108b4acc0>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEACAYAAAC08h1NAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFdFJREFUeJzt3X+s3fVdx/HnSxgKQtYQtBSouWQrji7TAhOIxFDnRjpj\nKH/BiFPqyDLDNhgxzhYTwX8W0OwH00CiAwrZqBI259COUbDXzCyjbrbQUTpgoZNb12JwjKEx0vD2\nj/O5nsP19tz7vefb8znv+309kpt+v9/7Pef7uqX3fb7ndb7noIjAzMy64ydqBzAzs/Hy4Dcz6xgP\nfjOzjvHgNzPrGA9+M7OO8eA3M+uYoYNf0mpJOyU9Jek7kq4v22+RNCNpd/l678Bttkh6VtJ+SZcN\nbL9A0t7yvduP3Y9kZmbDaNh1/JJOB06PiD2STga+DVwBXAn8OCI+NWf/tcD9wC8BZwKPAmsiIiTt\nAj4SEbskbQc+GxEPH5OfyszMjmroGX9EHIqIPWX5VeBpegMdQPPcZCOwLSJei4gDwHPARZJWAadE\nxK6y3330HkDMzGzMFt3xS5oCzgO+WTZ9VNITku6StKJsOwOYGbjZDL0HirnbD9J/ADEzszFa1OAv\nNc+DwA3lzP9O4GxgHfAD4JPHLKGZmbXq+IV2kPQm4IvA5yPiywAR8eLA9z8HPFRWDwKrB25+Fr0z\n/YNleXD7wXmO5Q8OMjNbgoiYr34/6s5H/aLX498HfHrO9lUDyzcC95fltcAe4AR6zwi+R/8F5MeB\ni8p9bgc2zHO8GJZn0r+AW2pncP76ObqYP3P2ZZI/muy/0Bn/JcD7gScl7S7bbgKulrQOCOB54EPl\nyPskPQDsA44A10VJBVwHbAVOBLbH8ryiZ6p2gBFN1Q4woqnaAUY0VTvACKZqBxjRVO0A4zR08EfE\nPzH/6wBfHXKbTwCfmGf7t4F3NA1oZmbt8jt327W1doARba0dYERbawcY0dbaAUawtXaAEW2tHWCc\nhr6Ba9wkRTR5gcLMzBrPzgWv6hk36bSvLe2Wrx2BVz4eEU+1m2jxJK2PiOlaxx+V89eVOX/m7JA/\nf1MTN/jhzssW3mc+H/8veOW0drOYmS0/E1f19C4UWorzfwS7N0bEP7YaysxswjWtevzirplZx3jw\nt0jS+toZRuH8dWXOnzk75M/flAe/mVnHuOM3M0vOHb+ZmQ3lwd+i7D2h89eVOX/m7JA/f1Me/GZm\nHeOO38wsOXf8ZmY2lAd/i7L3hM5fV+b8mbND/vxNefCbmXWMO34zs+Tc8ZuZ2VAe/C3K3hM6f12Z\n82fODvnzN+XBb2bWMe74zcySc8dvZmZDefC3KHtP6Px1Zc6fOTvkz9+UB7+ZWce44zczS84dv5mZ\nDeXB36LsPaHz15U5f+bskD9/Ux78ZmYd447fzCw5d/xmZjaUB3+LsveEzl9X5vyZs0P+/E158JuZ\ndYw7fjOz5Nzxm5nZUB78LcreEzp/XZnzZ84O+fM35cFvZtYxQwe/pNWSdkp6StJ3JF1ftp8qaYek\nZyQ9ImnFwG22SHpW0n5Jlw1sv0DS3vK924/dj1RPREzXzjAK568rc/7M2SF//qYWOuN/DbgxIt4O\nXAx8WNK5wGZgR0ScAzxW1pG0FrgKWAtsAO6QNPuCw53AtRGxBlgjaUPrP42ZmS1o6OCPiEMRsacs\nvwo8DZwJXA7cW3a7F7iiLG8EtkXEaxFxAHgOuEjSKuCUiNhV9rtv4DbLRvae0Pnrypw/c3bIn7+p\nRXf8kqaA84DHgZURcbh86zCwsiyfAcwM3GyG3gPF3O0Hy3YzMxuz4xezk6STgS8CN0TEj/vtDURE\n9K6/b8smYKosrwDWAevL+nT582jrrFPvgtbpknt9yTiW9dlttY7v/M5faz0ipicpz3LPX5Y30XOA\nhhZ8A5ekNwF/B3w1Ij5Ttu0H1kfEoVLj7IyIt0naXILdWvZ7GLgZ+H7Z59yy/Wrg0oj43TnH8hu4\nzMwaUptv4CovzN4F7Jsd+sVXgGvK8jXAlwe2v0/SCZLOBtYAuyLiEPCKpIvKff7WwG2Wjew9ofPX\nlTl/5uyQP39TC1U9lwDvB56UtLts2wLcCjwg6Vp6TzOuBIiIfZIeAPYBR4Drov+U4jpgK3AisD0i\nHm7x5zAzs0XyZ/WYmSXXatVjZmbLjwd/i7L3hM5fV+b8mbND/vxNefCbmXWMO34zs+Tc8ZuZ2VAe\n/C3K3hM6f12Z82fODvnzN+XBb2bWMe74zcySc8dvZmZDefC3KHtP6Px1Zc6fOTvkz9+UB7+ZWce4\n4zczS84dv5mZDeXB36LsPaHz15U5f+bskD9/Ux78ZmYd447fzCw5d/xmZjaUB3+LsveEzl9X5vyZ\ns0P+/E158JuZdYw7fjOz5Nzxm5nZUB78LcreEzp/XZnzZ84O+fM35cFvZtYxy63jf/Mox2/SkZmZ\nTYqmHf/xxzJMHUt94PDMN7NucNXTouw9ofPXlTl/5uyQP39THvxmZh2zDDv+pVc97vjNLCNfx29m\nZkN58Lcoe0/o/HVlzp85O+TP35QHv5lZx7jj7x/dHb+ZpeSO38zMhvLgb1H2ntD568qcP3N2yJ+/\nKQ9+M7OOWXDwS7pb0mFJewe23SJpRtLu8vXege9tkfSspP2SLhvYfoGkveV7t7f/o9QXEdO1M4zC\n+evKnD9zdsifv6nFnPHfA2yYsy2AT0XEeeXrqwCS1gJXAWvLbe6QNPuCw53AtRGxBlgjae59mpnZ\nGCw4+CPi68AP5/nWfK8gbwS2RcRrEXEAeA64SNIq4JSI2FX2uw+4YmmRJ1f2ntD568qcP3N2yJ+/\nqVE6/o9KekLSXZJWlG1nADMD+8wAZ86z/WDZbmZmY7bUwX8ncDawDvgB8MnWEiWWvSd0/roy58+c\nHfLnb2pJn8cfES/OLkv6HPBQWT0IrB7Y9Sx6Z/oHy/Lg9oPz3/smYKosr6D32LK+rE+XP4+2Prtt\nsfu/cX326d7sPwKve93rXp/E9bK8iZ4DNBURC37Rm8R7B9ZXDSzfCNxfltcCe4AT6D0j+B79dwc/\nDlxE77WB7cCGeY4TEEv8Ou/l0W5PLObvYoG/p/Wj3kfNL+d3/i5mXyb5o8n+C57xS9oGXAqcJukF\n4GZgvaR1vUHL88CHypH3SXoA2AccAa6Lkgq4DtgKnAhsj4iHF/nYZGZmLfJn9fSPTvizeswsIX9W\nj5mZDeXB36Ls1wI7f12Z82fODvnzN+XBb2bWMe74+0d3x29mKbnjNzOzoTz4W5S9J3T+ujLnz5wd\n8udvyoPfzKxj3PH3j+6O38xScsdvZmZDefC3KHtP6Px1Zc6fOTvkz9+UB7+ZWce44+8f3R2/maXk\njt/MzIby4G9R9p7Q+evKnD9zdsifvykPfjOzjnHH3z+6O34zS8kdv5mZDeXB36LsPaHz15U5f+bs\nkD9/Ux78ZmYd446/f3R3/GaWkjt+MzMbyoO/Rdl7QuevK3P+zNkhf/6mPPjNzDrGHX//6O74zSwl\nd/xmZjaUB3+LsveEzl9X5vyZs0P+/E158JuZdYw7/v7R3fGbWUru+M3MbCgP/hZl7wmdv67M+TNn\nh/z5m/LgNzPrGHf8/aO74zezlNzxm5nZUB78LcreEzp/XZnzZ84O+fM35cFvZtYx7vj7R3fHb2Yp\ntd7xS7pb0mFJewe2nSpph6RnJD0iacXA97ZIelbSfkmXDWy/QNLe8r3bm/xQZmbWnsVUPfcAG+Zs\n2wzsiIhzgMfKOpLWAlcBa8tt7pA0+yh0J3BtRKwB1kiae5/pZe8Jnb+uzPkzZ4f8+ZtacPBHxNeB\nH87ZfDlwb1m+F7iiLG8EtkXEaxFxAHgOuEjSKuCUiNhV9rtv4DZmZjZGS31xd2VEHC7Lh4GVZfkM\nYGZgvxngzHm2Hyzbl5WImK6dYRTOX1fm/JmzQ/78TY18VU/0Xh2enFeIzcxsqOOXeLvDkk6PiEOl\nxnmxbD8IrB7Y7yx6Z/oHy/Lg9oPz3/UmYKosrwDWAevL+nT582jrs9sWu/8b12d7vtlH/yWsfwzY\nM8Lta687v/MvaX2wI5+EPMs9f1neVCIfoKFFXc4paQp4KCLeUdb/BHgpIm6TtBlYERGby4u79wMX\n0qtyHgXeGhEh6XHgemAX8PfAZyPi4TnHSX05p6T1mZ8yOn9dmfNnzg7LIn+jyzkXHPyStgGXAqfR\n6/P/CPhb4AHg5+g92lwZES+X/W8CPgAcAW6IiK+V7RcAW4ETge0Rcf184TMPfjOzGlof/OPkwW9m\n1lzrb+Cyxct+LbDz15U5f+bskD9/Ux78ZmYd46qnf3RXPWaWkqseMzMbyoO/Rdl7QuevK3P+zNkh\nf/6mPPjNzDrGHX//6O74zSwld/xmZjaUB3+LsveEzl9X5vyZs0P+/E158JuZdYw7/v7R3fGbWUru\n+M3MbCgP/hZl7wmdv67M+TNnh/z5m/LgNzPrGHf8/aO74zezlJp2/Ev9Xy/aHL0HraXzg46ZjYur\nnlbtpPeMo+nXZMjeczp/PZmzQ/78TXnwm5l1jDv+/tFHqltGy+7XF8xs6Xwdv5mZDeXB36rp2gFG\nkr3ndP56MmeH/Pmb8uA3M+sYd/z9o7vjN7OU3PGbmdlQHvytmq4dYCTZe07nrydzdsifvykPfjOz\njnHH3z+6O34zS8kdv5mZDeXB36rp2gFGkr3ndP56MmeH/Pmb8uA3M+sYd/z9o7vjN7OU3PGbmdlQ\nHvytmq4dYCTZe07nrydzdsifvykPfjOzjnHH3z+6O34zS8kdv5mZDTXS4Jd0QNKTknZL2lW2nSpp\nh6RnJD0iacXA/lskPStpv6TLRg0/eaZrBxhJ9p7T+evJnB3y529q1DP+ANZHxHkRcWHZthnYERHn\nAI+VdSStBa4C1gIbgDsk+RmHmdmYjdTxS3oeeGdEvDSwbT9waUQclnQ6MB0Rb5O0BXg9Im4r+z0M\n3BIR3xy4rTt+M7OGxt3xB/CopG9J+mDZtjIiDpflw8DKsnwGMDNw2xngzBGPb2ZmDR0/4u0viYgf\nSPoZYEc52/8/ERG9M+Gjmud7m4CpsrwCWAesL+vT5c+jrc9uW+z+b1yf7fkiYnop6/CZhnln12nl\n+C2sfwzYU/H4zp80/2BHPgl5lnv+srypRD5AQ61dzinpZuBV4IP0ev9DklYBO0vVs7mEvrXs/zBw\nc0Q8PnAfyauenbzxQWg8x26LpPX9B7F8nL+ezNlhWeRvVPUsefBLOgk4LiJ+LOmngUeAPwbeDbwU\nEbeVYb8iIjaXF3fvBy6kV/E8Crw1BgLkH/zu+M1s/JoO/lGqnpXA30iavZ8vRMQjkr4FPCDpWnpP\nQa4EiIh9kh4A9gFHgOuiracbZma2aH7nbv/ornryP911/koyZ4dlkX+sV/WYmVkyPuPvH90dv5ml\n5DN+MzMbyoO/VdO1A4wk++eVOH89mbND/vxNefCbmXWMO/7+0d3xm1lK7vjNzGwoD/5WTS/5lpJi\nlK820mfvOZ2/nszZIX/+pkb9kDZrzSiz2y2RmS2eO/7+0at2/KMOfr9GYNZd7vjNzGwoD/5WTdcO\nMJLsPafz15M5O+TP35QHv5lZx7jj7x/dHb+ZpeSO38zMhvLgb9V07QAjyd5zOn89mbND/vxNefCb\nmXWMO/7+0d3xm1lK7vjNzGwoD/5WTdcOMJLsPafz15M5O+TP35Q/q6fj5n7Am9S8MXLNZJaLO/7+\n0TvZ8Y+We7Rjm1k73PGbmdlQHvytmq4dYETTtQOMJHtPmzl/5uyQP39THvxmZh3jjr9/dHf8Yz62\nmbXDHb+ZmQ3lwd+q6doBRjRdO8BIsve0mfNnzg758zflwW9m1jHu+PtHd8e/xGOPyq8RmI2macfv\nd+5aC+o+cJhZM656WjVdO8CIpmsHGEn2njZz/szZIX/+pjz4zcw6xh1//+ju+Jd4bL8PwKwud/zW\nGXM/WXQp/KBjXTTWqkfSBkn7JT0r6Q/GeezxmK4dYETTtQMsQQx87ZyzvtDXZMncM2fODvnzNzW2\nwS/pOODPgQ3AWuBqSeeO6/jjsad2gBE5f2XragcYQebskD9/I+Osei4EnouIAwCS/grYCDw9xgzH\n2Mu1A4yoe/lHqYuOQU20ouX7G6fM2SF//kbGWfWcCbwwsD5TtplV1KQamtyayKyJcZ7xL/K35V0/\nWtrdf/enlna7Nh2oHWBEB2oHGNGB2gEW7WjPNCTdPI7jH4NnK1Mt39+4TdUOME5ju5xT0sXALRGx\noaxvAV6PiNsG9vGplJnZEjR5MB/n4D8e+C7wa8C/AbuAqyNiGXX8ZmaTb2xVT0QckfQR4GvAccBd\nHvpmZuM3Ue/cNTOzY29iPqsn85u7JK2WtFPSU5K+I+n62pmaknScpN2SHqqdpSlJKyQ9KOlpSfvK\n60lpSNpS/u3slXS/pJ+snWkYSXdLOixp78C2UyXtkPSMpEckTezlkUfJ/6fl388Tkr4k6c01Mw4z\nX/6B7/2epNclnTrsPiZi8C+DN3e9BtwYEW8HLgY+nCw/wA3APnJeq3g7sD0izgV+gUTvDZE0BXwQ\nOD8i3kGvBn1fzUyLcA+939VBm4EdEXEO8FhZn1Tz5X8EeHtE/CLwDLBl7KkWb778SFoNvAf4/kJ3\nMBGDn4E3d0XEa8Dsm7tSiIhDEbGnLL9Kb/CcUTfV4kk6C/h14HMk+4D8cmb2KxFxN/ReS4qIJV4S\nXMUr9E4cTioXQJwEHKwbabiI+DrwwzmbLwfuLcv3AleMNVQD8+WPiB0R8XpZfRw4a+zBFukof/8A\nnwI+vpj7mJTBv2ze3FXO4M6j948ni08Dvw+8vtCOE+hs4N8l3SPpXyT9paSTaodarIj4D+CTwL/S\nu9rt5Yh4tG6qJVkZEYfL8mFgZc0wI/oAsL12iCYkbQRmIuLJxew/KYM/Y73w/0g6GXgQuKGc+U88\nSb8BvBgRu0l2tl8cD5wP3BER5wP/yWTXDG8g6S3Ax+i9gegM4GRJv1k11Iiid8VIyt9pSX8I/E9E\n3F87y2KVE52bgME3/w39XZ6UwX8QWD2wvpreWX8akt4EfBH4fER8uXaeBn4ZuFzS88A24F2S7quc\nqYkZemc6/1zWH6T3QJDFO4FvRMRLEXEE+BK9/ybZHJZ0OoCkVcCLlfM0JmkTvcoz2wPvW+idODxR\nfo/PAr4t6WePdoNJGfzfAtZImpJ0AnAV8JXKmRZNkoC7gH0R8ZnaeZqIiJsiYnVEnE3vRcV/iIjf\nrp1rsSLiEPCCpHPKpncDT1WM1NR+4GJJJ5Z/R++m9yJ7Nl8BrinL1wCZTn6QtIFe3bkxIv67dp4m\nImJvRKyMiLPL7/EMvYsFjvrgOxGDv5zpzL65ax/w18ne3HUJ8H7gV8slkbvLP6SMMj5F/yjwBUlP\n0Luq5xOV8yxaRDwB3Efv5Ge2n/2LeokWJmkb8A3g5yW9IOl3gFuB90h6BnhXWZ9I8+T/APBnwMnA\njvL7e0fVkEMM5D9n4O9/0IK/w34Dl5lZx0zEGb+ZmY2PB7+ZWcd48JuZdYwHv5lZx3jwm5l1jAe/\nmVnHePCbmXWMB7+ZWcf8LyHTCUjvjVghAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10975acc0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_train['Number.of.Successful.Grant.1'].hist(bins=20)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1087417f0>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAEACAYAAABVtcpZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFg1JREFUeJzt3W+MXfWd3/H3JxBaskGx0FYGDCujrtnFq2xJaOK0mxVO\nm1KnWgF9wp9qEWrQqhGbhfCgXZsHDdtKWfIgaYgqeNBAbNLiCiVdBF2WYKhHjbTaeDfFiYPjgqV1\nGntjs8rGC2lV1SjfPrhncm4mtseeuZ7f8Zz3Sxr5/H73nDm/+/W99zvn+713JlWFJGmc3tZ6AZKk\ndkwCkjRiJgFJGjGTgCSNmElAkkbMJCBJI3baJJDkbyb5epK9SfYn+f1u/tIku5K8muSFJGumjtmW\n5LUkB5LcODV/fZJ93W0Pn7u7JEk6U6dNAlX1f4EPVdV1wK8CH0ryQWArsKuqrgFe6sYk2QjcBmwE\ntgCPJEn37R4F7q6qDcCGJFvOxR2SJJ25RctBVfV/us2LgAuAHwI3ATu6+R3ALd32zcDOqjpRVYeA\ng8CmJJcDl1TVnm6/J6aOkSQ1smgSSPK2JHuBY8DuqnoFWFtVx7pdjgFru+0rgMNThx8G1p1k/kg3\nL0lq6MLFdqiqHwPXJXkX8NUkH1pweyXxd09I0nlo0SQwr6r+OskfAtcDx5JcVlVHu1LP691uR4Cr\npg67kskVwJFue3r+yMJzmEwkaWmqKovv9bNOmwSS/DzwVlUdT3Ix8I+A3wOeAe4CPt39+3R3yDPA\nk0k+y6TcswHY010tvJFkE7AHuBP4/CzvyGqT5MGqerD1OobAWPSMRc9Y9JbzA/RiVwKXAzuSvI1J\n/+BLVfVSkpeBp5LcDRwCbgWoqv1JngL2A28B91T/a0rvAbYDFwPPVdXzS130SKxvvYABWd96AQOy\nvvUCBmR96wWsBqdNAlW1D3jvSeb/CvjwKY75FPCpk8x/A3j30pYpSToX/MTwcG1vvYAB2d56AQOy\nvfUCBmR76wWsBhnSH5VJUvYEJOnsLOe10yuBgUqyufUahsJY9IxFz1jMhklAkkbMcpAknecsB0mS\nlsQkMFDWO3vGomcsesZiNkwCkjRi9gQk6TxnT0CStCQmgYGy3tkzFj1j0TMWs2ESkKQRsycgSY3N\n4m+pnJO/JyBJWinLyQNL/9nZctBAWe/sGYuesegZi9kwCUjSiNkTkKTGJj2B5ZWD/JyAJOmsmQQG\nynpnz1j0jEXPWMyGSUCSRsyegCQ1Zk9AktSESWCgrHf2jEXPWPSMxWyYBCRpxOwJSFJj9gQkSU2Y\nBAbKemfPWPSMRc9YzIZJQJJG7LRJIMlVSXYneSXJt5Pc280/mORwkpe7r49MHbMtyWtJDiS5cWr+\n+iT7utsePnd3aXWoqrnWaxgKY9EzFj1jMRunbQwnuQy4rKr2Jnkn8A3gFuBW4M2q+uyC/TcCTwLv\nA9YBLwIbqqqS7AE+XlV7kjwHfL6qnl9wvI1hSaMz2MZwVR2tqr3d9o+A7zB5cZ+c9WfdDOysqhNV\ndQg4CGxKcjlwSVXt6fZ7gkky0SlY7+wZi56x6BmL2TjjnkCS9cB7gD/ppn4nyTeTPJZkTTd3BXB4\n6rDDTJLGwvkj9MlEktTIGSWBrhT0ZeC+7orgUeBq4Drg+8BnztkKR8p6Z89Y9IxFz1jMxqJ/YzjJ\n24GvAP+xqp4GqKrXp27/AvBsNzwCXDV1+JVMrgCOdNvT80dOcb7twKFueBzYO/+fPX/559ixY8er\nbQzdP2xm8fEcsL0br2c5FmsMB9gB/KCq7p+av7yqvt9t3w+8r6r+2VRj+P30jeFf7BrDXwfuBfYA\nf4iN4dNKstmfdCaMRc9Y9FZTLFo2hhe7Evg14DeBbyV5uZt7ALgjyXVMVv3nwL8AqKr9SZ4C9gNv\nAfdUn2XuYZK6LgaeW5gAJEkrz98dJEmNDfYtopKk1c0kMFC+B7pnLHrGomcsZsMkIEkjZk9Akhqz\nJyBJasIkMFDWO3vGomcsesZiNkwCkjRi9gQkqTF7ApKkJkwCA2W9s2csesaiZyxmwyQgSSNmT0CS\nGrMnIElqwiQwUNY7e8aiZyx6xmI2TAKSNGL2BCSpMXsCkqQmTAIDZb2zZyx6xqJnLGbDJCBJI2ZP\nQJIasycgSWrCJDBQ1jt7xqJnLHrGYjZMApI0YvYEJKkxewKSpCZMAgNlvbNnLHrGomcsZsMkIEkj\nZk9AkhqzJyBJauK0SSDJVUl2J3klybeT3NvNX5pkV5JXk7yQZM3UMduSvJbkQJIbp+avT7Kvu+3h\nc3eXVgfrnT1j0TMWPWMxG4tdCZwA7q+qXwE+APx2kmuBrcCuqroGeKkbk2QjcBuwEdgCPJJk/hLl\nUeDuqtoAbEiyZeb3RpJ0Vk6bBKrqaFXt7bZ/BHwHWAfcBOzodtsB3NJt3wzsrKoTVXUIOAhsSnI5\ncElV7en2e2LqGJ1EVc21XsNQGIuesegZi9k4455AkvXAe4CvA2ur6lh30zFgbbd9BXB46rDDTJLG\nwvkj3bwkqaELz2SnJO8EvgLcV1Vv9hUeqKqadLZnI8l24FA3PA7snc/48zXAMYyn651DWE/L8cKY\ntF5P4/F1VfW5Aa2n5fgTrKLXB+j+YTOLj+eA7d14Pcux6FtEk7wd+K/AH009+A4Am6vqaFfq2V1V\nv5xkK0BVPdTt9zzwSeC73T7XdvN3ADdU1ccWnMu3iHaSbPZyd8JY9IxFbzXFYrBvEe2auo8B++cT\nQOcZ4K5u+y7g6an525NclORqYAOwp6qOAm8k2dR9zzunjtFJrJYH9ywYi56x6BmL2TjtlUCSDwL/\nHfgWfZraBuwBngJ+gUnp5taqOt4d8wDwUeAtJuWjr3bz1zO5frkYeK6q7j3J+bwSkDQ6La8E/MTw\nQK2mS93lMhY9Y9FbTbEYbDlIkrS6eSUgSY15JSBJasIkMFD+XpSesegZi56xmA2TgCSNmD0BSWrM\nnoAkqQmTwEBZ7+wZi56x6BmL2TAJSNKI2ROQpMbsCUiSmjAJDJT1zp6x6BmLnrGYDZOAJI2YPQFJ\nasyegCSpCZPAQFnv7BmLnrHoGYvZMAlI0ojZE5CkxuwJSJKaMAkMlPXOnrHoGYuesZgNk4AkjZg9\nAUlqzJ6AJKkJk8BAWe/sGYuesegZi9kwCUjSiNkTkKTG7AlIkpowCQyU9c6esegZi56xmI1Fk0CS\nx5McS7Jvau7BJIeTvNx9fWTqtm1JXktyIMmNU/PXJ9nX3fbw7O+KJOlsLdoTSPLrwI+AJ6rq3d3c\nJ4E3q+qzC/bdCDwJvA9YB7wIbKiqSrIH+HhV7UnyHPD5qnp+wfH2BCSNzqB7AlX1NeCHJz3rz7oZ\n2FlVJ6rqEHAQ2JTkcuCSqtrT7fcEcMtSFixJmp3l9AR+J8k3kzyWZE03dwVweGqfw0yuCBbOH+nm\ndQrWO3vGomcsesZiNi5c4nGPAv+m2/63wGeAu2exoCTbgUPd8Diwt6rmuts2Azge13jeUNbTeHwd\nMKT1NBsD1yUZzHqWf3/m79ZmFh/PAdu78XqW44w+J5BkPfDsfE/gVLcl2QpQVQ91tz0PfBL4LrC7\nqq7t5u8Abqiqjy34XvYEJI3OoHsCJz3dpMY/758C8+8cega4PclFSa4GNgB7quoo8EaSTUkC3Ak8\nvZRzS5Jm50zeIroT+GPgl5J8L8lHgU8n+VaSbwI3APcDVNV+4ClgP/BHwD3VX2rcA3wBeA04uPCd\nQfpp1jt7xqJnLHrGYjYW7QlU1R0nmX78NPt/CvjUSea/AfxMOUmS1I6/O0iSGjvvegKSpNXBJDBQ\n1jt7xqJnLHrGYjZMApI0YvYEJKkxewKSpCZMAgNlvbNnLHrGomcsZsMkIEkjZk9AkhqzJyBJasIk\nMFDWO3vGomcsesZiNkwCkjRi9gQkqTF7ApKkJkwCA2W9s2csesaiZyxmwyQgSSNmT0CSGrMnIElq\nwiQwUNY7e8aiZyx6xmI2TAKSNGL2BCSpMXsCkqQmTAIDZb2zZyx6xqJnLGbDJCBJI2ZPQJIasycg\nSWrCJDBQ1jt7xqJnLHrGYjZMApI0YosmgSSPJzmWZN/U3KVJdiV5NckLSdZM3bYtyWtJDiS5cWr+\n+iT7utsenv1dWV2qaq71GobCWPSMRc9YzMaZXAl8EdiyYG4rsKuqrgFe6sYk2QjcBmzsjnkkyXyz\n4lHg7qraAGxIsvB7SpJW2KJJoKq+BvxwwfRNwI5uewdwS7d9M7Czqk5U1SHgILApyeXAJVW1p9vv\nialjdBLWO3vGomcsesZiNi5c4nFrq+pYt30MWNttXwH8ydR+h4F1wIlue96Rbl4ajMnb9JbHtzjr\nfLPUJPATVVWzePLMS7IdONQNjwN752t/85l/DOOqmhvSesYwntgNbO6256fPdBySbD7X653XOl6t\nx/NzQ1nP8u/P/N3azOLjOWB7N17PcpzRh8WSrAeerap3d+MDwOaqOtqVenZX1S8n2QpQVQ91+z0P\nfBL4brfPtd38HcANVfWxBefxw2JqpuUHdjRu5+OHxZ4B7uq27wKenpq/PclFSa4GNgB7quoo8EaS\nTV2j+M6pY3QS1jt7xqJnLHrGYjYWLQcl2QncAPx8ku8B/xp4CHgqyd1MSje3AlTV/iRPAfuBt4B7\nqr/UuIfJ9cvFwHNV9fxs74ok6Wz5u4OkjuUgtXI+loMkSauASWCgrHf2jEXPWPSMxWyYBCRpxOwJ\nSB17AmrFnoAkqQmTwEBZ7+wZi56x6BmL2TAJSNKI2ROQOvYE1Io9AUlSEyaBgbLe2TMWPWPRMxaz\nYRKQpBGzJyB17AmoFXsCkqQmTAIDZb2zZyx6xqJnLGbDJCBJI2ZPQOrYE1Ar9gQkSU2YBAbKemfP\nWPSMRc9YzIZJQJJGzJ6A1LEnoFbsCUiSmjAJDNQY651Jarlfre/DuTbGx8WpGIvZMAloYOokX7tP\nMb/wS9LZsiegwWhdk299fo2XPQFJUhMmgYGy3jltrvUCBsPHRc9YzIZJQJJGzJ6ABqN1Tb71+TVe\n521PIMmhJN9K8nKSPd3cpUl2JXk1yQtJ1kztvy3Ja0kOJLlxOeeWJC3fcstBBWyuqvdU1fu7ua3A\nrqq6BnipG5NkI3AbsBHYAjySxHLUKVjvnDbXegGD4eOiZyxmYxYvwgsvQW4CdnTbO4Bbuu2bgZ1V\ndaKqDgEHgfcjSWpmFlcCLyb5syS/1c2trapj3fYxYG23fQVweOrYw8C6ZZ5/1aqqudZrGI7NrRcw\nGD4uesZiNi5c5vG/VlXfT/K3gF1JDkzfWFWLfZR/OF1pSRqhZSWBqvp+9+9fJvkDJuWdY0kuq6qj\nSS4HXu92PwJcNXX4ld3cT0myHTjUDY8De+cz/nwNcAzj6XrnENazEuOJOfqf/Kem2Tw1Xnj75p/s\nlWTz7M9/puPlnf8Mx9dV1efO4fc/n8afYBW9Ppzd420O2N6N17McS36LaJJ3ABdU1ZtJfg54Afg9\n4MPAD6rq00m2AmuqamvXGH6SSaJYB7wI/GJNLcC3iPamX0zG4tRvk5vjzEpCq/8tomN8XJzKaopF\ny8fecpLA1cAfdMMLgf9UVb+f5FLgKeAXmPxEf2tVHe+OeQD4KPAWcF9VfXXB9zQJjFjrF+HW59d4\nnZdJ4FwwCYxb6xfh1ufXeJ23HxbTueN7oKfNtV7AYPi46BmL2Vjuu4Mkzcgi76Sb3u+Ut3klorNl\nOUiD0bocM/bzqx3LQZKkJkwCA2W9c9pc6wUMyFzrBQyGz5HZMAlI0ojZE9BgtK6Jj/38aseegCSp\nCZPAQFnvnDbXegEDMtd6AYPhc2Q2TAKSNGL2BDQYrWviYz+/2rEnIElqwiQwUNY7p821XsCAzLVe\nwGD4HJkNk4AkjZg9AQ1G65r42M+vduwJSJKaMAkMlPXOaXOtFzAgc60XMBg+R2bDJCBJI2ZPQIPR\nuibu+c/sj9qcjs/fpWn5f+9fFpM0ZXkvRMux3CRkAloay0EDZb1z2lzrBQzIXOsFnGN1Fl+7p7a1\nVCYBSRoxewIajGHUxD3/+Xn+8/szEvYENAg2BqXxsRw0UO16AmdTk134da7MncPvfb6Za72AAZlr\nvYBVwSQgSSNmT0A/cX7XhD3/uM9vT8CewCphXV7SSlrRclCSLUkOJHktye+u5LnPLwvfAz2EunxL\nc60XMCBzrRcwIHOtF7AqrFgSSHIB8O+BLcBG4I4k167U+c8/e1svYECMRc9Y9IzFLKxkOej9wMGq\nOgSQ5D8DNwPfWcE1nEeOL/nIWZSUhnXupcdi9Rl2LFb2sTfsWJwvVjIJrAO+NzU+DGxawfOfkdVR\nk196c63duWd1frV1fv7/r47n/dKsZBI45z8hJG/7XaiHlv+dhvBAPjSj77MaHDrjPVteBa3M+Q81\nPv+QHJrx9xvC837lrdhbRJN8AHiwqrZ0423Aj6vq01P7jOgBLEmzs9QrkZVMAhcC/xP4h8BfAHuA\nO6rKnoAkNbJi5aCqeivJx4GvAhcAj5kAJKmtQX1iWJK0sgbzu4PG/EGyJFcl2Z3klSTfTnJvN39p\nkl1JXk3yQpI1rde6EpJckOTlJM9247HGYU2SLyf5TpL9STaNOBbbuufHviRPJvkbY4lFkseTHEuy\nb2rulPe9i9Vr3evpjYt9/0EkAT9Ixgng/qr6FeADwG93938rsKuqrgFe6sZjcB+wn/7tGmONw8PA\nc1V1LfCrwAFGGIsk64HfAt5bVe9mUk6+nfHE4otMXhunnfS+J9kI3MbkdXQL8EiS077ODyIJMPVB\nsqo6Acx/kGwUqupoVe3ttn/E5AN064CbgB3dbjuAW9qscOUkuRL4J8AX6N93N8Y4vAv49ap6HCY9\ntar6a0YYC+ANJj8ovaN7g8k7mLy5ZBSxqKqvAT9cMH2q+34zsLOqTnQfzD3I5PX1lIaSBE72QbJ1\njdbSVPdTz3uArwNrq+pYd9MxYG2jZa2kfwf8S+DHU3NjjMPVwF8m+WKS/5HkPyT5OUYYi6r6K+Az\nwP9i8uJ/vKp2McJYTDnVfb+CyevnvEVfS4eSBOxOA0neCXwFuK+q3py+rSYd/FUdpyS/AbxeVS9z\nik/fjCEOnQuB9wKPVNV7gf/NgnLHWGKR5G8DnwDWM3mRe2eS35zeZyyxOJkzuO+njctQksAR4Kqp\n8VX8dDZb9ZK8nUkC+FJVPd1NH0tyWXf75cDrrda3Qv4+cFOSPwd2Av8gyZcYXxxg8vg/XFV/2o2/\nzCQpHB1hLP4u8MdV9YOqegv4L8DfY5yxmHeq58TC19Iru7lTGkoS+DNgQ5L1SS5i0th4pvGaVkyS\nAI8B+6vqc1M3PQPc1W3fBTy98NjVpKoeqKqrqupqJo2//1ZVdzKyOMCkTwR8L8k13dSHgVeAZxlZ\nLJg0xD+Q5OLuufJhJm8cGGMs5p3qOfEMcHuSi5JcDWxg8sHcU6uqQXwBH2HyieKDwLbW61nh+/5B\nJjXwvcDL3dcW4FLgReBV4AVgTeu1rmBMbgCe6bZHGQfg7wB/CnyTyU+/7xpxLP4VkyS4j0kj9O1j\niQWTq+K/AP4fk97pPz/dfQce6F5HDwD/eLHv74fFJGnEhlIOkiQ1YBKQpBEzCUjSiJkEJGnETAKS\nNGImAUkaMZOAJI2YSUCSRuz/A/S1gkox6fBQAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x109f11128>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_train['RFCD.Percentage.1'].hist(bins=20)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x109fa5438>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEACAYAAACznAEdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZhJREFUeJzt3X+MXWd95/H3h4SUNGRxo1RO4gTsLonA2fArwkGrsAxQ\nUgu2SbqoSdgSkU12VcltAy3bxW6rDVqpUaBaUaoqVG032GwbF7dlEVWzIQ6bs2W3KG4ABxNjkrQd\nbWwaQymGRWWFnXz3j/MMvp7M3Bmf89z7nGfu5yWNfM+599zn4+Pj+c55vufcUURgZmb2vNIBzMxs\nGFwQzMwMcEEwM7PEBcHMzAAXBDMzS1wQzMwMWKEgSLpH0lFJB0bW/Yakr0h6VNInJL1o5Lkdkp6Q\ndEjSNSPrr5R0ID334cn8VczMrI+VzhA+CmxdtO4B4PKIeCXwOLADQNJm4EZgc9rmbklK23wEuC0i\nLgUulbT4Pc3MrLCxBSEiPgt8a9G6vRHxbFp8GLg4Pb4O2B0RxyNiHngSuErShcC5EbEvve5jwPWZ\n8puZWSZ9ewi3AvelxxcBh0eeOwxsWGL9kbTezMwGpHNBkPSrwPcj4t6MeczMrJAzu2wk6RbgrcCb\nR1YfAS4ZWb6Y9szgCCenlRbWH1nmff3BSmZmHUSEVn7Vym8y9gvYCBwYWd4KPAacv+h1m4H9wFnA\nJuCvAaXnHgauAkQ7xbR1mbFipTxD/gLeXzrDLGZ3/vJfzl88f+R4n7FnCJJ2A28Azpf0FHAH7VVF\nZwF700VEn4uIbRFxUNIe4CBwAtgWKSmwDdgJnA3cFxH3n2bdqsXG0gF62Fg6QE8bSwfoaWPpAD1t\nLB2gp42lAwzB2IIQEe9YYvU9Y15/J3DnEus/D1xx2unMzGxqfKdyXjtLB+hhZ+kAPe0sHaCnnaUD\n9LSzdICedpYOMAQ6OatTnqSIHI0RM7MZkut7p88QMpI0VzpDVzVnB+cvzfnXBhcEMzMDPGVkZlY9\nTxmZmVlWLggZ1TwPWXN2cP7SnH9tcEEwMzPAPQQzs+q5h2BmZlm5IGRU8zxkzdnB+Re9V/T5Kp2/\nhNrz59Lp46/NbOi6TgV7xnaWuYdgtsa0P+V3Lwj+P1gf9xDMzCwrF4SMap6HrDk7OH9pzr82uCCY\nmRngHoLZmuMewuxxD8HMzLJyQcio5nnImrOD85fm/GuDC4KZmQHuIZhNXNe7f4FO8/nuIcyeXN87\nfaey2VR0+Qbt78s2XZ4yyqjmecias0P9+aEpHaCX2vd/7flzcUEwMzPAPQSzies+p99tPt89hNnj\n+xDMzCwrF4SMap6HrDk71J/fPYSyas+fiwuCmZkBK/QQJN0DvA34ekRckdadB3wceAkwD9wQEcfS\nczuAW4FngNsj4oG0/kpgJ/AC4L6IePcy47mHYGuOewg2adPqIXwU2Lpo3XZgb0RcBnwmLSNpM3Aj\nsDltc7ekhYAfAW6LiEuBSyUtfk8zMytsbEGIiM8C31q0+lpgV3q8C7g+Pb4O2B0RxyNiHngSuErS\nhcC5EbEvve5jI9usKTXPQ9acHerP7x5CWbXnz6VLD2F9RBxNj48C69Pji4DDI687DGxYYv2RtN7M\nzAakV1M52gbEcG5kKCwimtIZuqo5O9SfH+ZKB+il9v1fe/5cunyW0VFJF0TE02k66Otp/RHgkpHX\nXUx7ZnAkPR5df2S5N5e0k7ZZDXAM2L/wj7VwWudlL9e0fFKT/pxb5XL7HtMbb2H55NirGc/L019O\nj2+hNU8mK96pLGkj8GcjVxl9EPhmRHxA0nZgXURsT03le4EttFNCDwIvjYiQ9DBwO7AP+HPgtyLi\n/iXGqvoqo9H/vLWpOTsMO//qrvppeO5ZQj1XGQ15/6/GGsg/+U87lbQbeANwvqSngP8I3AXskXQb\n6bJTgIg4KGkPcBA4AWyLk9VmG+1lp2fTXnb6nGJgZmZl+bOMzCbM9yHYpPmzjMzMLCsXhIxqvpa5\n5uxQf37fh1BW7flzcUEwMzPAPQSziXMPwSbNPQQzM8vKBSGjmucha84O9ed3D6Gs2vPn4oJgZmaA\newhmE+cegk2aewhmZpaVC0JGNc9D1pwd6s/vHkJZtefPxQXBzMwA9xDMJs49BJs09xDMzCwrF4SM\nap6HrDk71J/fPYSyas+fS5ffmGZmtthDUrcZC09RDYd7CGYTNgs9hGn/He1U7iGYmVlWLggZ1TwP\nWXN2qD9/7T2E2tV//OThgmBmZoB7CGYT5x5C/vHsVO4hmJlZVi4IGdU8D1lzdqg/v3sIZdV//OTh\ngmBmZoB7CGYT5x5C/vHsVO4hmJlZVi4IGdU8D1lzdqg/v3sIZdV//OThgmBmZoB7CGYT5x5C/vHs\nVO4hmJlZVp0LgqQdkh6TdEDSvZJ+SNJ5kvZKelzSA5LWLXr9E5IOSbomT/xhqXkesubsUH9+9xDK\nqv/4yaNTQZC0Efh3wGsi4grgDOAmYDuwNyIuAz6TlpG0GbgR2AxsBe6W5LMTM7MB6fpN+TvAceCH\nJZ0J/DDwNeBaYFd6zS7g+vT4OmB3RByPiHngSWBL19BDFRFN6Qxd1Zwd6s8Pc6UDzLT6j588OhWE\niPgH4D8D/4e2EByLiL3A+og4ml52FFifHl8EHB55i8PAhk6JzcxsIjr9Ck1J/xR4D7AR+Dbwx5Le\nOfqaiIj2yoNlLfmcpJ3AfFo8BuxfqN4L83wDXn5PZXl/sDw6hzqEPGsp/0lN+nNuieVm5HUnN5M0\nN5nxxi2fHHs145386Xq6483K8TMm7y0p8jyZdLrsVNKNwFsi4t+m5ZuB1wFvAt4YEU9LuhB4KCJe\nJmk7QETclV5/P3BHRDy86H2rvux09D9vbWrODsPOv7pLMhueO23ky06nZcjHz2rk+t7ZtSC8EvhD\n4LXA/wN2AvuAlwDfjIgPpCKwLiK2p6byvbR9gw3Ag8BLY9HgtRcEW7tWONtdBd+HkHM8O1Wu752d\npowi4lFJHwMeAZ4FvgD8LnAusEfSbbSnMTek1x+UtAc4CJwAti0uBmbD1/2brFkNfKdyRjWfdtac\nHSafv+9P3Z4yyjtebmvg+M/yvdP3ApiZGeAzBLNVmfwZwtLb+QzBVsNnCGZmlpULQkY1fx5Kzdmh\n/vz+LKOy6j9+8nBBMDMzwD0Es1VxD2FSY7qHkIN7CGZmlpULQkY1z0PWnB3qz+8eQln1Hz95uCCY\nmRngHoLZqriHMKkx3UPIwT0EMzPLygUho5rnIWvODvXndw+hrPqPnzxcEMzMDHAPwWxV3EOY1Jju\nIeTgHoKZmWXlgpBRzfOQNWeH+vPPcg9BUnT9yphhLtd71azTb0wzM8vHv4luKNxDMFsF9xAmNWbX\nfdNu6+8XLfcQzMwsKxeEjGqeh6w5O9Sff5Z7CENQ//GThwuCmZkB7iGYrYp7CJMa0z2EHNxDMDOz\nrFwQMqp5HrLm7FB/fvcQyqr/+MnDBcHMzAD3EMxWxT2ESY3pHkIO7iGYmVlWLggZ1TwPWXN2qD+/\newhl1X/85NG5IEhaJ+lPJH1F0kFJV0k6T9JeSY9LekDSupHX75D0hKRDkq7JE9/MzHLp3EOQtAv4\nnxFxj6QzgXOAXwX+PiI+KOl9wI9ExHZJm4F7gdcCG4AHgcsi4tlF7+kegg2SewiTGtM9hByK9hAk\nvQh4fUTcAxARJyLi28C1wK70sl3A9enxdcDuiDgeEfPAk8CWPsHNzCyvrlNGm4BvSPqopC9I+j1J\n5wDrI+Joes1RYH16fBFweGT7w7RnCmtKzfOQNWeH+vO7h1BW/cdPHl1/H8KZwGuAn4+Iv5L0m8D2\n0RdExEq/wGLJ5yTtBObT4jFgf0Q06bm59N6DXAZeJWkwebyc+983/cEcp7fMCs8vt9xmON283cc7\nNe9a3T9rYTk9viXthnky6dRDkHQB8LmI2JSWrwZ2AD8GvDEinpZ0IfBQRLxM0naAiLgrvf5+4I6I\neHjR+7qHYIPkHsKkxnQPIYeiPYSIeBp4StJladWPA48Bfwa8K617F/DJ9PhTwE2SzpK0CbgU2Nc5\ntZmZZdfnPoRfAP5Q0qPAK4BfB+4C3iLpceBNaZmIOAjsAQ4C/x3YFkO6RTqTmucha84O9ed3D6Gs\n+o+fPDr/TuWIeJT2MtLFfnyZ198J3Nl1PDMzmyx/lpHZKriHMKkx3UPIwZ9lZGZmWbkgZFTzPGTN\n2aH+/O4hlFX/8ZOHC4KZmQHuIZitinsIkxrTPYQc3EMwM7OsXBAyqnkesubsUH9+9xDKqv/4yaPz\nfQhmJS31OVnS6s6YPc1gtjT3EKxKfeasS8zL15TVPYT6uIdgZmZZuSBkVPM8ZM3ZW03pAD01pQPM\ntPqP/zxcEMzMDHAPwSrlHsKYrdxDmDnuIZiZWVYuCBnVPA9Zc/ZWUzpAT03pADOt/uM/DxcEMzMD\n3EOwSrmHMGYr9xBmjnsIZmaWlQtCRjXPQ9acvdWUDtBTUzrATKv/+M/DBcHMzAD3EKxS7iGM2co9\nhJnjHoKZmWXlgpBRzfOQNWdvNaUD9NSUDjDT6j/+83BBMDMzwD0Eq5R7CGO2cg9h5riHYGZmWbkg\nZFTzPGTN2VtN6QA9NaUDzLT6j/88XBDMzAzo2UOQdAbwCHA4In5S0nnAx4GXAPPADRFxLL12B3Ar\n8Axwe0Q8sMT7uYdgq+Iewpit3EOYOUPpIbwbOMjJf9HtwN6IuAz4TFpG0mbgRmAzsBW4W5LPTszM\nBqTzN2VJFwNvBX6ftswDXAvsSo93Adenx9cBuyPieETMA08CW7qOPVQ1z0PWnL3VlA7QU1M6wEyr\n//jPo89P6R8Cfhl4dmTd+og4mh4fBdanxxcBh0dedxjY0GNsMzPL7MwuG0n6l8DXI+KLy1XWiIh2\nXnFZSz4naSdt/wHgGLA/Ipr03Fx670EuL6wbSp7TWY6IZkh5VrN88qfqufQ1urz4+dHlVr/xTmeZ\nFZ5fLn+342l1462cd1j7Z6nlfP/fajv+0+Nb0m6YJ5NOTWVJdwI3AyeAFwD/BPgE8FpgLiKelnQh\n8FBEvEzSdoCIuCttfz9wR0Q8vOh93VS2VXFTecxWbirPnKJN5Yj4lYi4JCI2ATcB/yMibgY+Bbwr\nvexdwCfT408BN0k6S9Im4FJgX7/ow1PzPGTN2VtN6QA9NaUDzLT6j/88Ok0ZLWGhxN8F7JF0G+my\nU4CIOChpD+0VSSeAbTGkz8wwMzN/lpHVyVNGY7bylNHMGcp9CGZmtka4IGRU8zxkzdlbTekAPTWl\nA8y0+o//PFwQzMwMcA/BKuUewpit3EOYOe4hmJlZVi4IGdU8D1lz9lZTOkBPTekAM63+4z8PFwQz\nMwPcQ7BKuYcwZiv3EGaOewhmZpaVC0JGNc9D1py91ZQO0FNTOsBMq//4z8MFwczMAPcQrFLuIYzZ\nyj2EmeMegpmZZeWCkFHN85A1Z281pQP01JQOMNPqP/7zcEEwMzPAPQSrlHsIY7ZyD2HmuIdgZmZZ\nuSBkVPM8ZM3ZW03pAD01pQPMtPqP/zxy/U5lm2HtdEE3PuU3Gw73EKy3ac/nlxjTPYRJjekeQg7u\nIZiZWVYuCBnVPA9Zc/ZWUzpAT03pADOt/uM/DxcEMzMD3EOwDNxDWHnMmrK6h1Af9xDMzCwrF4SM\nap6HrDl7qykdoKemdICZVv/xn4fvQzCzmbLcfTPSyjMua32Kyj0E6809hJXHrCnrWu8hlDheJ61o\nD0HSJZIekvSYpC9Luj2tP0/SXkmPS3pA0rqRbXZIekLSIUnX9A1uZmZ5de0hHAd+MSIuB14H/Jyk\nlwPbgb0RcRnwmbSMpM3AjcBmYCtwt6Q117+oeR6y5uytpnSAnprSAWZcUzrAIHT6phwRT0fE/vT4\nu8BXgA3AtcCu9LJdwPXp8XXA7og4HhHzwJPAlh65zcwss94/pUvaCLwaeBhYHxFH01NHgfXp8UXA\n4ZHNDtMWkDUlIprSGbqqOXtrrnSAnuZKB5hxc6UDDEKvgiDphcCfAu+OiP87+ly03epxnZvhdLPN\nzKz7ZaeSnk9bDP5rRHwyrT4q6YKIeFrShcDX0/ojwCUjm1+c1i31vjuB+bR4DNi/8NPrwjz3gJff\nU1neHyyP9hBOd/uTmvTn3CqX2/fomv/U91t4vJrxT47dfbzTWWaF5+dYOn+3/bO68VbOO6z9s9Ry\nzv0zmiXveBP8/3pLijVPJp0uO1V7we4u4JsR8Ysj6z+Y1n1A0nZgXURsT03le2n7BhuAB4GXxqLB\na7/sdPRgqU2f7MO47LRhdaf9Q73stOG5+X3Z6Urb5svasPLxs/YvO+1aEK4G/gL4Eif37A5gH7AH\neDFt1bohIo6lbX4FuBU4QTvF9Okl3rfqgjCrhlEQJjum70OY1JhDKQiTG28aihaESXFBqJMLwspj\n1pTVBSHveNPgD7cboJqv5a85e6spHaCnpnSAGdeUDjAILghmZgZ4ysgy8JTRymPWlNVTRnnHmwZP\nGZmZWVYuCBnVPA9fc/ZWUzpAT03pADOuKR1gEFwQzMwMcA/BMnAPYeUxa8rqHkLe8abBPQQzM8vK\nBSGjmufha87eakoH6KkpHWDGNaUDDIILgpmZAe4hWAbuIaw8Zk1Z3UPIO940uIdgZmZZuSBkVPM8\nfM3ZW03pAD01pQPMuKZ0gEFwQTAzM8A9BMvAPYSVx6wpq3sIecebBvcQzMwsKxeEjGqeh685e6sp\nHaCnpnSAGdeUDjAIZ5YOYIPxUPurss1sVrmHYEBdc+TgHsLYrdxDGL+VewjL8pSRmZkBLghZ1T8P\nX7OmdICemtIBZlxTOsAguCCYmRngHoIlNc2Rg3sIY7dyD2H8Vu4hLMtnCGZmBrggZOUeQklN6QA9\nNaUDzLimdIBBcEEwMzPAPQRLapojB/cQxm7lHsL4rdxDWJbPEMzMDJhyQZC0VdIhSU9Iet80x54G\n9xBKakoH6KkpHWDGNaUDDMLUCoKkM4DfBrYCm4F3SHr5tMafkleVDjC79pcO0FPt+Wvn/Q/TPUPY\nAjwZEfMRcRz4I+C6KY4/DetKB5hdx0oH6Kn2/LXz/ofpftrpBuCpkeXDwFVTHL8qbeOrm6E2vsxs\n2KZZEIpeziQ9/0E48ebJj6M7Rpf7fXPuetXGLJovHaCn+dIBZtz8ql611n9Qm2ZBOAJcMrJ8Ce1Z\nwin67PAh6vf36Xb8dB+zz/E67axLjblrwmNOev88N3+JrNMfcyhZV3f8dFXD97ap3Ycg6Uzgq8Cb\nga8B+4B3RMRXphLAzMzGmtoZQkSckPTzwKeBM4D/4mJgZjYcg7pT2czMypnoZaeS7pF0VNKBkXWv\nlPQ5SV+S9ClJ56b1b5H0SFr/iKQ3jmxzpaQD6Ya2D08yc4/8WyR9MX19SdKNNeUfef7Fkr4r6b0l\n85/mvt8o6Xsj+//uktlPN3967hXpuS+n58+qJb+knxnZ91+U9IykV1SU/wWSdqf1ByVtH9mmhvxn\nSfpoWr9f0hs654+IiX0BrwdeDRwYWfdXwOvT438D/Kf0+FXABenx5cDhkW32AVvS4/uArZPM3TH/\n2cDz0uMLgL8Hzqgl/8jzfwJ8HHhvyf1/mvt+4+jrFr3P4Pc97dTto8AVaflHRo6lwedftN0/o73f\nqKb9fwuwOz0+G/hb4MUV5f852il4gB8FHum6/yd6hhARnwW+tWj1pWk9wIPA29Nr90fE02n9QeBs\nSc+XdCFwbkTsS899DLh+krkXnGb+70XEs2n92cC3I+KZWvIDSLoe+Bva/b+wrkj+082+lIr2/TXA\nlyLiQNr2WxHxbEX5R/1rYDdUtf//DjhH7acpnAN8H/hORflfDjyUtvsGcEzSa7vkL/Hhdo9JWrhD\n+ac59VLUBW8HPh/tHc0bOPXy1CNpXSnL5k/TRo8BjwG/lFZXkV/SC4H/ALx/0euHlH/csbMpTVc0\nkq5O64aUHZbPfxkQku6X9HlJv5zW15J/1A2kgkAl+SPi08B3aAvDPPAbEXGMSvLTnl1eK+kMSZuA\nK4GL6ZC/REG4Fdgm6RHghbTV+AckXQ7cBfxsgWyrsWz+iNgXEZcDrwE+LOlFhTKOs1z+9wMfioh/\nZLh3ty2X/WvAJRHxatpCfK8W9UYGYrn8ZwJX0/50fTXwU5LeROGbOZew0v/dq4B/jIiDS208AEvm\nl/RO2rP6C4FNwL9P31iHZrn9fw/tN/5HgA8Bfwk8Q4fjZ5o3pgEQEV8FfgJA0mXA2xaek3Qx8Ang\n5oj427T6CG21W3BxWlfEuPwjrzkk6a+Bl9L+Qw05/1vTU1uAt0v6IO1nMj0r6Xu0/x6DyL/cvo+I\n75P+c0TEF9K+v5R6jp2ngL+IiH9Iz91H+0PFH1BH/gU3AfeOLA99/y8c+/8c+G8R8QzwDUn/m/an\n7P/FsPMvHP/PcHJGgpT/ceDbnGb+qZ8hSPrR9OfzgF8DPpKW1wF/DrwvIj638PqI+Dva+byrJAm4\nGfjktHMvGJN/o9qb75D0EtpvSE+kvsiQ8/8OQET8i4jYFBGbgN8Efj0i7h5S/jH7/vw0/4ukH6Pd\n939Ty7FDe2/OFZLOTsfQG4DHhrTvYWz+hXU/TfuhlUAV/3d/Jz11CHhTeu4c4HXAoVr2fzpuzkmP\n3wIcj4hDnfb/hDvlu2lP579P+1PQrcDttHcsfxW4c+S1vwZ8F/jiyNf56bkrgQPAk8BvTTJzj/zv\nBL6ccu9jpJtfQ/5F290B/FLJ/Ke57//VyL7/PPC22vY98DPp73AAuKvC/HPAXy7xPoPPD/wQ7dnY\nAdr+33sry7+RtqgdBB6gnT7tlN83ppmZGeBfoWlmZokLgpmZAS4IZmaWuCCYmRnggmBmZokLgpmZ\nAS4IZmaWuCCYmRkA/x/0QBS+NfvrKwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x109f70eb8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_train['Year.of.Birth.1'].hist(bins=20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Какую проблему вы наблюдаете на этих графиках? Как масштабирование поможет её исправить?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выделенные признаки принимают значения в разных шкалах. Если их не нормализовать, то скорость сходимости градиентных методов оптимизации будет низкой."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В наших данных есть пропуски. Ответьте на вопросы:\n",
"1. Сколько всего пропущенных элементов в таблице с обучающей выборкой?\n",
"2. Сколько объектов имеют хотя бы один пропуск?\n",
"3. Сколько признаков имеют хотя бы одно пропущенное значение?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Всего пропусков в данных 26843 .\n",
"Объектов, имеющих хотя бы одно пропущенное значение 5879 .\n",
"Признаокв, имеющих хотя бы одно пропущенное значение 38 .\n"
]
}
],
"source": [
"print('Всего пропусков в данных', data_train.isnull().sum().sum(), '.')\n",
"print('Объектов, имеющих хотя бы одно пропущенное значение', (data_train.isnull().sum(axis=1) > 0).sum(), '.')\n",
"print('Признаокв, имеющих хотя бы одно пропущенное значение', (data_train.isnull().sum(axis=0) > 0).sum(), '.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Большинство признаков в нашем датасете являются категориальными. Типичным подходом к работе с ними является бинарное, или one-hot-кодирование (также есть подход со счётчиками, который мы уже применяли в прошлых работах). Для начала потренируемся делать бинарное кодирование на трех игрушечных объектах."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>City</th>\n",
" <th>Weather</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Msk</td>\n",
" <td>good</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>SPb</td>\n",
" <td>bad</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Msk</td>\n",
" <td>worst</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" City Weather\n",
"0 Msk good\n",
"1 SPb bad\n",
"2 Msk worst"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"simple_data = pd.DataFrame({'City': ['Msk', 'SPb', 'Msk'], 'Weather': ['good', 'bad', 'worst']})\n",
"simple_data.head()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dict_values([{'City': 'Msk', 'Weather': 'good'}, {'City': 'SPb', 'Weather': 'bad'}, {'City': 'Msk', 'Weather': 'worst'}])\n"
]
}
],
"source": [
"# преобразуем каждый объект в dict\n",
"simple_data_dict = simple_data.T.to_dict().values()\n",
"print(simple_data_dict)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1. 0. 0. 1. 0.]\n",
" [ 0. 1. 1. 0. 0.]\n",
" [ 1. 0. 0. 0. 1.]]\n"
]
}
],
"source": [
"# делаем one-hot-кодирование\n",
"from sklearn.feature_extraction import DictVectorizer\n",
"transformer = DictVectorizer(sparse=False)\n",
"# примените здесь DictVectorizer к simple_data_dict, получите бинарную матрицу и выведите её\n",
"encoded_data = transformer.fit_transform(simple_data_dict)\n",
"print(encoded_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Реализуйте функцию transform_data, которая принимает на вход DataFrame с признаками, объекты классов StandardScaler и DictVectorizer и булеву переменную is_test (True, если обрабатывает тестовую выборку и False, если обучающую). В функции должны выполняться следующие шаги:\n",
"1. Замена пропущенных значений на нули для вещественных признаков и на строки 'nan' для категориальных. Используйте для выполнения замены функцию fillna. Имена всех вещественных переменных перечислены в numeric_cols.\n",
"2. Масштабирование вещественных признаков с помощью StandardScaler (метод fit_transform, если is_test == False, и метод transform в противном случае).\n",
"3. One-hot-кодирование категориальных признаков с помощью DictVectorizer (метод fit_transform, если is_test == False, и метод transform в противном случае).\n",
"\n",
"Метод должен возвращать tuple из трех элементов: преобразованной выборки, объекта StandardScaler и объекта DictVectorizer. Преобразованная выборка должна состоять из масштабированных вещественных признаков и закодированных категориальных (исходные признаки не должны в ней остаться)."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"numeric_cols = ['RFCD.Percentage.1', 'RFCD.Percentage.2', 'RFCD.Percentage.3', \n",
" 'RFCD.Percentage.4', 'RFCD.Percentage.5',\n",
" 'SEO.Percentage.1', 'SEO.Percentage.2', 'SEO.Percentage.3',\n",
" 'SEO.Percentage.4', 'SEO.Percentage.5',\n",
" 'Year.of.Birth.1', 'Number.of.Successful.Grant.1', 'Number.of.Unsuccessful.Grant.1']\n",
"categorical_cols = data_train.columns.values.tolist()\n",
"categorical_cols = [col for col in categorical_cols if not(col in numeric_cols)]\n",
"example = data_train[categorical_cols].fillna('nan')\n",
"ex_dict = example.T.to_dict().values()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def transform_data(data, scaler, transformer, is_test, scale=True):\n",
" numeric_cols = ['RFCD.Percentage.1', 'RFCD.Percentage.2', 'RFCD.Percentage.3', \n",
" 'RFCD.Percentage.4', 'RFCD.Percentage.5',\n",
" 'SEO.Percentage.1', 'SEO.Percentage.2', 'SEO.Percentage.3',\n",
" 'SEO.Percentage.4', 'SEO.Percentage.5',\n",
" 'Year.of.Birth.1', 'Number.of.Successful.Grant.1', 'Number.of.Unsuccessful.Grant.1']\n",
" categorical_cols = data.columns.values.tolist()\n",
" categorical_cols = [col for col in categorical_cols if not(col in numeric_cols)]\n",
" data[numeric_cols] = data[numeric_cols].fillna(0)\n",
" data = data.fillna('nan')\n",
" data[categorical_cols] = data[categorical_cols].astype(str)\n",
" if is_test:\n",
" if scale:\n",
" new_features = scaler.transform(data[numeric_cols])\n",
" else:\n",
" new_features = data[numeric_cols]\n",
" categorical_features = transformer.transform(data[categorical_cols].T.to_dict().values()).toarray()\n",
" new_features = np.hstack((new_features, categorical_features))\n",
" else:\n",
" if scale:\n",
" new_features = scaler.fit_transform(data[numeric_cols])\n",
" else:\n",
" new_features = data[numeric_cols]\n",
" categorical_features = transformer.fit_transform(data[categorical_cols].T.to_dict().values()).toarray()\n",
" new_features = np.hstack((new_features, categorical_features))\n",
" return new_features, scaler, transformer\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Применяем функцию к данным:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"scaler = StandardScaler()\n",
"transformer = DictVectorizer()\n",
"X, scaler, transformer = transform_data(data_train, scaler, transformer, False)\n",
"X_test, _, _ = transform_data(data_test, scaler, transformer, True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Сколько признаков у вас получилось после преобразования?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"11734\n"
]
}
],
"source": [
"print(X.shape[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание 3\n",
"Подберите лучшее значение параметра регуляризации C для логистической регрессии с L2-регуляризатором (sklearn.linear_model.LogisticRegression) с помощью кросс-валидации по 5 блокам. В качестве метрики качества используйте AUC-ROC. Сетка для перебора указана ниже. По итогам кросс-валидации укажите лучшее значение параметра регуляризации. Обучите классификатор с этим параметром на всей обучающей выборке и найдите качество (AUC-ROC) на тестовой выборке."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from sklearn.cross_validation import KFold\n",
"cv = KFold(X.shape[0], n_folds=5, shuffle=True, random_state=241)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Лучшее значение параметра регуляризации: 0.1\n",
"ROC-AUC на тестовой выборке: 0.880340804287\n"
]
}
],
"source": [
"# вам пригодятся функции cross_val_score или GridSearchCV; используйте в них объект cv в качестве генератора разбиений\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.grid_search import GridSearchCV\n",
"from sklearn.metrics import roc_auc_score\n",
"\n",
"C_grid = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]\n",
"logreg = LogisticRegression()\n",
"params = {'C': C_grid}\n",
"clf = GridSearchCV(logreg, params, scoring='roc_auc')\n",
"clf.fit(X, y_train)\n",
"best_estimator = clf.best_estimator_\n",
"\n",
"print('Лучшее значение параметра регуляризации:', best_estimator.get_params()['C'])\n",
"\n",
"best_estimator.fit(X, y_train)\n",
"predicted_y = best_estimator.predict_proba(X_test)[:, 1]\n",
"print('ROC-AUC на тестовой выборке: ', roc_auc_score(y_test, predicted_y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Попробуем теперь логистическую регрессию с L1-регуляризатором (penalty='l1'). Выведите число ненулевых коэффициентов (clf.coef\\_) при каждом значении параметра регуляризации из сетки."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<table max-width=100%><tr><td>C</td><td>Число ненулевых коэффициентов</td></tr><tr><td>0.001</td><td>0</td></tr><tr><td>0.01</td><td>9</td></tr><tr><td>0.1</td><td>58</td></tr><tr><td>1.0</td><td>667</td></tr><tr><td>10.0</td><td>2460</td></tr><tr><td>100.0</td><td>2870</td></tr></table>"
],
"text/plain": [
"[('C', 'Число ненулевых коэффициентов'),\n",
" (0.001, 0),\n",
" (0.01, 9),\n",
" (0.1, 58),\n",
" (1.0, 667),\n",
" (10.0, 2460),\n",
" (100.0, 2870)]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class ListTable(list):\n",
" def _repr_html_(self):\n",
" html = [\"<table max-width=100%>\"]\n",
" for elem in self:\n",
" html.append(\"<tr>\")\n",
" html.append(\"<td>{0}</td>\".format(elem[0]))\n",
" html.append(\"<td>{0}</td>\".format(elem[1]))\n",
" html.append(\"</tr>\")\n",
" html.append(\"</table>\")\n",
" return ''.join(html)\n",
" \n",
"table = [('C', 'Число ненулевых коэффициентов')]\n",
"for C in C_grid:\n",
" logreg = LogisticRegression(penalty='l1', C=C)\n",
" logreg.fit(X, y_train)\n",
" table.append((C, np.sum(logreg.coef_ != 0)))\n",
"ListTable(table)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Какое качество на тесте даёт логистическая регрессия с L1-регуляризатором с параметром C=0.01? Сколько для достижения такого качества ей нужно признаков?"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ROC-AUC на тестовой выборке: 0.852546744568\n",
"Для получения такого качества оказалось достаточно 9 признаков.\n"
]
}
],
"source": [
"logreg = LogisticRegression(penalty='l1', C=0.01)\n",
"logreg.fit(X, y_train)\n",
"predicted_y = logreg.predict_proba(X_test)[:, 1]\n",
"print('ROC-AUC на тестовой выборке: ', roc_auc_score(y_test, predicted_y))\n",
"print('Для получения такого качества оказалось достаточно 9 признаков.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание 4\n",
"В этом задании мы внимательно рассмотрим различные аспекты качества классификатора. Для этого возьмите классификатор с L2-регуляризатором с лучшим значением параметра регуляризации (вы его нашли в предыдущем задании), обучите на полной обучающей выборке и найдите предсказания вероятностей на тесте. Далее будем работать только с этими прогнозами."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"logreg = LogisticRegression(C=0.1)\n",
"logreg.fit(X, y_train)\n",
"predicted_y = best_estimator.predict_proba(X_test)[:, 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Посчитайте AUC-ROC, AUC-PR (average_precision_score) и log-loss для прогнозов."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<table max-width=100%><tr><td>ROC-AUC</td><td>0.8803408042872678</td></tr><tr><td>AUC-PR</td><td>0.8387493987323461</td></tr><tr><td>Log-loss</td><td>0.4427420368281853</td></tr></table>"
],
"text/plain": [
"[('ROC-AUC', 0.8803408042872678),\n",
" ('AUC-PR', 0.83874939873234611),\n",
" ('Log-loss', 0.4427420368281853)]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import roc_auc_score, average_precision_score, log_loss\n",
"table = [('ROC-AUC', roc_auc_score(y_test, predicted_y))]\n",
"table.append(('AUC-PR', average_precision_score(y_test, predicted_y)))\n",
"table.append(('Log-loss', log_loss(y_test, predicted_y)))\n",
"ListTable(table)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Нарисуйте ROC- и PR-кривые. Не забудьте подписать оси."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEZCAYAAACTsIJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYXEW5x/HvLwuQQEKACCgEQXYwbErYRAOihEVREBEQ\nL+pVBIN64SqIW0RxRUTBhR1EBa8siogsKlEMkMiaRAkmQJSENew7CXnvH1VDeiYzPT0zffp09/w+\nzzPPdJ+urvPOgfTbVXWqShGBmZlZhyFlB2BmZs3FicHMzDpxYjAzs06cGMzMrBMnBjMz68SJwczM\nOnFiMDOzTpwYrOVJmi/peUnPSHpI0oWSRle8vrOkP0t6WtKTkq6QtHmXOkZLOlXSv3M98yR9X9Ia\njf+LzMrlxGDtIIB9I2IUsDUwHvgigKSdgGuAy4HXAhsAdwLTJG2Qy6wA/AnYHNgz17MTsAiYUGTg\nkoYWWb9ZfzgxWFuJiIeBa4Et8qHvABdExGkR8VxEPBERXwJuBqbkMh8CxgHvjYg5uZ5HI+KkiPhD\nd+eRNETSCbll8bSkWyStI2l9SUslDakoO1XSR/PjwyVNk3SKpEXA1yQ9IWnLivKvyS2gsfn5vpLu\nyOWmSRpfx0tmthwnBmsXApC0LjAJmCFpJOmb/6+7Kf9/wDvy4z2AP0TE830437HAB4C9ImI08GHg\nhR7KRv7pMAG4B1gTOBG4DDi44vX3A1MjYpGkbYFzgI8BqwNnAFfkVo5ZIZwYrB0I+I2kp4H/kD50\nv076IB0CPNjNex4CxubHa/RQppqPAl+IiLkAETErIh6v8b0PRMSPImJpRLwI/JKUZDocko8BfBw4\nIyL+HsnPgJeAHfsYr1nNnBisHQSwX/7mPhHYHXgz8ASwlDS20NVrgUfz40XA63qqXNKheUD6GUm/\nz4fHkRJQf9zf5flUYKSkCZLWJ42TXJ5fez1wbO5GekLSE8C6dP83mdWFE4O1lYj4K3Aa8O2IeA64\nidQ109X7SQPOAH8E9sxdT93V+YuIGJV/9smH7wc26qb4c/l3ZV1rd62yS/2vkLq2Ds4/v8uxQ2oB\nnRQRq1X8rBIRv+ouVrN6cGKwdnQqMEHSDsDxwH9JOlrSKEmrSfo6sAPw1Vz+QtIH/aWSNs0Dy2vk\nweW9ejjH2aSB442UbCVp9Yh4FFgIHCZpqKSPABvWEHNHd1JlNxLAWcAncmtCklaWtI+kVfp4Tcxq\n5sRgbSciFgEXAMdFxDRgT2B/4AFgPqmr5i0RcU8u/zJpAHoOcB3wFDCdNEZxcw+nOYX0Lf/aXP4s\nYKX82seAz5K6qLYAplWGR5cWQ45hBvAsqYvoDxXHb831nQ48Dswl3UVlVhgVuVGPpHOBfYBHIqLb\nW+wk/RDYC3geODwibi8sIDMz61XRLYbzSLcOdkvS3sBGEbEx6e6LnxQcj5mZ9aLQxBARN5DuDOnJ\nu0lNfiJiOjBG0lpFxmRmZtWVPcawDp1v3VtAuhXPzMxKUnZigDxjtUJxgx5mZtarYSWffyFpolCH\ndfOxTiQ5WZiZ9UNEdP3y3auyE8MVwGTgYkk7Ak/mRdCW058/rh1JmhIRU8qOoxn4WizTrtdCYiRp\nFvuupLknr+SffVl+Bnn2udXhO7UuT1KpY8Li88DfqH1mu/J75gG3AXdG8GI/zl93/f1SXWhikHQR\n8DZgrKT7ga8AwwEi4oyIuErS3pLmkWaMfrjIeMxs4CQ2AEaTuqLfBLzcy1u2BFYAtgFWBIaSFhKs\n5cN79fz7n6Q5JdeT1oq6CJjZ/Vt+fTR857Qa6u4qgHsjWNqP97aVQhNDRBxcQ5nJRcZgZtVJrEL+\nwpbtTpoQuLib4vsDo4DZpA/7YXSewNedYcDDwNXA3cDTwDPAvTWEtzSi6p2Ny5HmPx7B3L68xzor\nuyvJ+m5q2QE0kallB9BEpgJIjAc2JX37HUpahXVN0kqyL5PG8TYFHsvvG0aabf1kRV1jSOtI/byb\n80wDrozo82q0jTS17ABaXaEzn+tFUniMwQY7ico9GI4kLbexCmmJj7VYtoz4HFLXy1DSMh3TSF21\n8/Lrj9G5f/7ZiJq6dazF9Pez04nBrAQSo4CV89OdSOshPQXsTVrTqes/zPVy+cWkvv2hwK+AG4Al\npEQwD3gsotc+fxsknBjMmoDEqqRlYLrby3lNUh/9UtJNGU+Tdn1bi9T3/lXSHS539FD9wgieqnfM\n1r6cGMwaTOJjpFsph5M+7LcDts0v/5r0Tb7SUFKf/oWkb/Y3NChUG6ScGMzqQOJtpMFXSN03R5Am\nYS7qUvQN+feFpC6cB0jf9mdHcFMDQjXrlRODWQ2kV5eBGQlsDPwvaSOdlUiDuJAmXkJqCbxI2pTn\n7m6qeziCZ4uL1mxg+vvZ6dtVra1ICFgD2Jx0n/3WwBeBR/LxsaSBXZH69xcBJ5AWcHwS+FcEzzc+\ncrPm4RaDtSyJYaR78o8g7X62J6mfv8P1pJbA48AXSAlhYcSr9/CbtTV3JdmgkVsF+5Fu11yBdHfP\nyaQP/tnAVb5l08xdSTYISIwBjgeOy4duAA6N6GkxNTPrDycGa2p5dc0NgQ+QxgIgDQZ/0q0Cs2K4\nK8mahsQI4ChSi6CjFbAdaY2fh4GrgMkRy80PMLNuuCvJWorEcNKCbtuQlnA+v+LlK4AT8+PngLsj\nvLOfWaO4xWANk5PBeOA9wJfy4ZeAP5LmC3wygm43ajKzvvNdSdZ08vjAV0jLRIwH1s4vzQN+Dxzj\nTVHMiuPEYE1FYjvgStJa/xeQuof+BcyN4KUyYzMbLDzGYE1D4lukAeTbgZ0jmF9uRGbWF0N6L2LW\nOwlJfFniTlJS+DqwvZOCWetxi8EGLM9E/j2wF3A68NEIbik3KjPrLycG6xeJtYCjgcOBdfLhvSK4\nurSgzKwu3JVkfSLxSYl7gYeAzwGXkpavHuKkYNYe3GKwqvIKphsCZ5ImpL0BOAv4AXCXbzc1az9O\nDLYciZVJu5hNAf47H54HfBaYFcHckkIzswZwYhikJFYCdgb2Ie1NvDVp74I1gDdWFD0qgp80PkIz\nK4snuA0yEquQNrY5OR+aDlxO2qj+36QdzR6MYGY5EZpZvXiCm/VKYijwTH76S+C4CBaUGJKZNSEn\nhkEitxS+l5++IYL7yozHzJqXE8MgILE+8DNgV+BjTgpmVo0TQ5uTWBO4D1hKWqLCM5LNrCpPcGtj\nEgfAq/sbrOKkYGa1cGJoQ3lBu8nAJaQ7jkZG8ELJYZlZi3BiaDMSGwA3AqcBp0Swv5OCmfWFxxja\ngMRupLkJB+VDTwJ7RnBteVGZWatyi6FFSQyR2FNiLvBnYBxwJDA0gtWcFMysv9xiaDESrwOOAr6Q\nD80m7YNwTQTNP43dzJpeoS0GSZMkzZE0V9Jx3bw+VtLVku6QNFvS4UXG08oktpJ4ElhIWu76bGB4\nBOMjuNpJwczqpbC1kiQNBe4G9iB9mP0dODgi7qooMwVYMSI+L2lsLr9WRCzpUtegXStJ4mDgf4Dt\ngVeATSO4p9yozKwV9Pezs8gWwwRgXkTMj4jFwMXAfl3KPAiMzo9HA491TQqDVR5D2J60ptFsYG9g\njJOCmRWtyDGGdYD7K54vAHboUuYs4M+SHgBGAe8vMJ6WIbEicDuwOTCXtNjdo+VGZWaDRZGJoZY+\nqhOAOyJioqQNgeskbR0Rz3QtmLudOkyNiKn1CbO5SIwnLYU9Atgxguklh2RmLULSRGDiQOspMjEs\nJN1C2WEcLLfE887ASQARcY+k+4BNYfmlGyJiSjFhNgeJ4aRrNBO4DXhXBA+UG5WZtZL8hXlqx3NJ\nX+lPPUWOMdwCbCxpfUkrkCZfXdGlzBzS4DSS1iIlhXsLjKmZTQXuAZ4CDnNSMLOyFNZiiIglkiYD\n15B2BzsnIu6SdER+/QzgG8B5ku4kJanPRcTjRcXUrCQ+R2o9bRPBnWXHY2aDm7f2LJnE50kJ8vQI\nji47HjNrH97as4VIrEoaYN4AWAE4x0nBzJqFE0ODSYwkLXJ3C7AJ8GwEj5UblZnZMl5Er4Ek9iNN\n6gN4ewT/dlIws2bjxNAAeRbzJcBvgLuALSJ4uuSwzMy65a6kgkkMAe4AxgMfiOBXJYdkZlaVE0Px\nDiQlhXdE8MeygzEz641vVy2QxFBgETArgreWHY+ZDS7NuLqqwXnAGOAjZQdiZlYrJ4aCSJwHHEZa\n82he2fGYmdXKYwwFkFhMurafiODKsuMxM+sLJ4Y6k9iSdF1HRPBi2fGYmfWVu5LqSGJl0m5rC5wU\nzKxVOTHUV8fKqDuVGoWZ2QA4MdSJxAeBDYHdIpbbkMjMrGU4MdSBxK7AhcBVwLSSwzEzGxAnhvr4\nAGnnufdGsLjsYMzMBsJ3JQ2QxCeBo4AdI3i57HjMzAbKLYYBkDgYOB24IILpZcdjZlYPXiupnyRO\nAyYDfwd2iuCVkkMyM+uk8LWSJI3sa+XtSGIPictJSeGLwA5OCmbWTnpNDJJ2lvRP4O78fBtJPy48\nsiYksQ9wHbAeaaD5pAiav8llZtYHvXYlSZoBvA/4bURsm4/9IyK2bEB8HTGU3pUksQHpzqM/R/D2\nMmMxM6tFoV1JEfGfLoeW9PVEbeCU/PszpUZhZlawWm5X/Y+kXQAkrQB8irRv8WDzPPDBCGaVHYiZ\nWZFqaTEcCXwSWAdYCGybnw8KEpL4OfBu8CCzmbW/WloMm0TEIZUHcgui7Zd+kDgAOAHYDjgG+G25\nEZmZFa+WwefbOwadqx0rUhmDzxJrAQ+R7sY6MoLrG3l+M7OB6u9nZ48tBkk7ATsDr5F0DNBR+SgG\nx4zpj+bfm/uWVDMbTKp1Ja1ASgJD8+8OT5NuX21bEqOAk4DTnBTMbLCppStp/YiY35hweoyhoV1J\nEn8DdgGGODGYWauqe1dSheclnQxsAYzIxyIidu/ryVqBxBakpLC7k4KZDUa1jBX8ApgDvAGYAswH\nbikupPJIjAB+BDzowWYzG6xq6Uq6LSK2kzQzIrbKx26JiDc3JEIa05Uk8TZgan66cwQ3FXk+M7Oi\nFdmV1LH5zEOS9gUeAFbr64lawFeBa4F9vQubmQ1mtSSGkySNAY4FTgNGA/9TaFTl2AY4zEnBzAa7\nfm3UI2lCRMyoodwk4FTSLa9nR8S3uykzEfg+MBxYFBETuynTiK6kANaO4OEiz2Nm1ij9/ezsMTFI\nGgK8F9gQmB0RV0l6M/ANYM2I2KaXgIaSZg3vQVpj6e/AwRFxV0WZMaSlNfaMiAWSxkbEonr9cX0h\n8RSwXgRPFXkeM7NGKWLZ7TNJm9yvBnxR0qXABcCPSQvp9WYCMC8i5kfEYuBiYL8uZQ4BLo2IBQDd\nJQUzM2usamMMOwJbRcRSSSuR1g3aMCIeq7HudYD7K54vAHboUmZjYLik60mzq38QERfWWH/dSKxE\nGjsxMxv0qiWGxRGxFCAiXpR0Xx+SAlDT5LDhpJVL3w6MBG6SdHNEzO3Deephk/z7mQaf18ys6VRL\nDJtJqtyUZsOK59Exp6GKhcC4iufjSK2GSveTBpxfAF6Q9Fdga2C5xCBpSsXTqRExtZfz98WVwOMR\nLK1jnWZmDZVv5pk44HqqDD6vX+2Nva2fJGkYafD57aS5DzNYfvB5M+B0YE9gRWA6cFBE/LNLXYUN\nPuc9Fy4BNovg7iLOYWZWhrpPcBvownkRsUTSZOAa0u2q50TEXZKOyK+fERFzJF0NzASWAmd1TQoN\nsBcpIf2rwec1M2tK/ZrH0GgFtxj+AvwigjOLqN/MrCxF3K46WLyV1NVlZmbUmBgkjZS0adHBNJrE\nqfnh30oNxMysifSaGCS9G7idNFaApG0lXVF0YEWTGAp8Gvh0BE+WHY+ZWbOopcUwhTQx7QmAiLid\ntDdDqzsx/z6t1CjMzJpMLYlhcUR0/UbdDvf7nwCc6F3azMw6q2XZ7X9IOhQYJmlj4FPAjcWGVSyJ\n4/PDr5caiJlZE6qlxXA0sCXwEnAR8DTwmSKDaoC3AKd57wUzs+XVsrXndhFxW4Pi6SmGus5jkHgW\n+FAEl9WrTjOzZlP3/RgqKp4KrA38GvhVRMzuV4QDUM/EILEWaaXYsRH0ZVFAM7OWUtgEt7yj2m7A\nIuAMSbMkfanvITaN1YH7nRTMzLrXpyUxJI0HjiMtdDe8sKiWP289WwzfA/47glXrUZ+ZWbMqrMUg\naQtJUyTNJq2EeiNpE55WtTZwdtlBmJk1q1puVz2XtC3nnhGxsOB4GuXOsgMwM2tWg251VYkADong\nonrUZ2bWrOq+H4OkX0fEgV12cetQyw5uTUdiUn54bamBmJk1sWpdSZ/Ov/cFumac5m9mdG8McLnv\nSDIz61mPg88R0bFHwVERMb/yBziqIdHV37q0xzpPZmaFqWVJjHd2c2zvegfSIBsDz5UdhJlZM6s2\nxnAkqWWwYZdxhlHAtKIDK8gSYEbZQZiZNbMe70qStCqwGvAt0qS2jnGGZyKioX309borKd+RdEwE\n369DWGZmTa3uayVJGh0RT0tag24GmyPi8b6H2T/1SAwSk0mb8niNJDMbFOp+uyppie19gFvp/i6k\nDfp6spKNA853UjAzq27QTHCTWACcHcGU+kRlZtbcilwraRdJq+THh0k6RdLr+xNkWSReS1rf6cKy\nYzEza3a13K76U+B5SVsDxwD3Aj8rNKr6mwy8HME9ZQdiZtbsakkMSyJiKfAe4EcRcTrpltVWsjvw\no7KDMDNrBbWsrvqMpBOADwK7ShoKNGwvhjp5Eriu7CDMzFpBLS2Gg4CXgI9ExEOkvvrvFhpV/U0C\nFpcdhJlZK6jpriRJawPbk25bnRERjxQdWJfzD+iuJInFwGoRPFvHsMzMmlqRdyW9H5gOHAi8H5gh\n6cC+h1iql8oOwMysVfTaYpA0E9ijo5Ug6TXAnxq5H8NAWgwS6wP3ASMieLGugZmZNbHCWgykNZIe\nrXj+GMvvz9DMxgD/dFIwM6tNLXclXQ1cI+mXpIRwEPCHQqOqPw88m5nVqNfEEBGflbQ/8JZ86IyI\nuLzYsMzMrCzV9mPYhHRb6kbATOCzEbGgUYHV0RbAqmUHYWbWKqqNMZwLXAkcANwG/LAhEdXfGsDs\nsoMwM2sV1RLDKhFxVkTMiYjv0o9ltiVNkjRH0lxJx1Upt72kJbnLqm4kViMltH/Us14zs3ZWbYxh\nJUnb5ccCRuTnAiIibqtWcV4643RgD2Ah8HdJV0TEXd2U+zZpkLvedzudkH9/vs71mpm1rWqJ4SHg\ne1We79ZL3ROAeRExH0DSxcB+wF1dyh0NXEKaWV1vY4FTIrrdaMjMzLrRY2KIiIkDrHsd4P6K5wuA\nHSoLSFqHlCx2Z9mSG/X0buD4OtdpZtbWapng1l+1fMifChwfafq1qGNXksR4YHXgT/Wq08xsMKhl\nglt/LSTts9xhHKnVUOlNwMWSIHX77CVpcURc0bUySVMqnk6NiKm9nH8nYF4E9/YxbjOzliRpIjBx\nwPUUteezpGHA3cDbgQeAGcDBXQefK8qfB/wuIi7r5rU+r/chMQ14MIL39Tl4M7M20N+1knptMUga\nAhwKbBARJ0paD1g7ImZUe19ELJE0GbgGGAqcExF3SToiv35GX4OtlcRIYGdg16LOYWbWrmpZXfWn\nwFJg94jYTNLqwLUR8eZGBJhj6FPWk5gFvDGipRb7MzOrq8JaDMAOEbGtpNsBIuJxSc2+tecbqUM/\nm5nZYFTLXUkv50lowKv7MSwtLqS6mVZ2AGZmraiWxHAacDmwpqRvkD5wv1loVAMg8cb8sBWSl5lZ\n06ll2e2fS7qVdHcRwH493VnUJFYBpkc4MZiZ9UctdyWtBzwH/C4fCknrRcR/Co2s/8aTkoOZmfVD\nLYPPV7FsFvNKpFVW7wa2LCqoAToKmF92EGZmraqWrqQ3Vj7PK6x+srCIBm4b4DNlB2Fm1qr6vFZS\nXm57h14LlucJYFbZQZiZtapaxhiOrXg6BNiOtA6SmZm1oVrGGCoHcpeQtvu8tJhwzMysbFUTQ57Y\nNjoijq1WrllI7AqMKDsOM7NW1uMYg6RhEfEKsIvyutgt4POkyXhPlh2ImVmrqtZimEEaT7gD+K2k\nXwPP59eiu+WxyySxHrAX8C5PbjMz679qiaGjlbAS8Bhp+81KTZUYgLcCL5LmXZiZWT9VSwyvkXQM\nrXPrZwCXubVgZjYw1RLDUGBUowIxM7PmUC0xPBQRX21YJAO3CimZmZnZAPR55nMTe2vZAZiZtYNq\niWGPhkUxQBIrAYcAfyg7FjOzVtdjYoiIxxoZyAB9BCCCC8oOxMys1Skiei9Vst42tJaYD9wawQGN\ni8rMrLn19tnZk3YZY1gROLnsIMzM2kHLJwaJYcDawH1lx2Jm1g5aPjFkr0TwUNlBmJm1g3ZIDB/E\n8xfMzOqmHRLDOOAXZQdhZtYu2iExgMcXzMzqph0SwwEsWwnWzMwGqOXnMUgEsG0EdzQ4LDOzpjaY\n5zG8APyr7CDMzNpFOyQG35FkZlZHLZ0YJDYDVgCWlB2LmVm7aOnEAGwN/CuCl8sOxMysXbR6YjgI\neLDsIMzM2kmrJ4b3AD8rOwgzs3bS6onhOeCqsoMwM2snhScGSZMkzZE0V9Jx3bx+qKQ7Jc2UNE3S\nVn2o/nmg+SdimJm1kEITg6ShwOnAJGAL4GBJm3cpdi/w1ojYCvgacGaRMZmZWXVFtxgmAPMiYn5E\nLAYuBvarLBARN0XEU/npdGDdgmMyM7Mqik4M6wD3VzxfkI/15KPUOGYgsSKwZv9DMzOz7gwruP6a\n+/8l7QZ8BNilh9enVDydCvFAfryov8GZmbUTSROBiQOtp+jEsJC0X0KHcaRWQyd5wPksYFJEPNFd\nRRExpfN72ASYG8ErdYvWzKyFRcRUYGrHc0lf6U89RXcl3QJsLGl9SSuQJqRdUVlA0nrAZcAHI2Je\nH+renurdUmZm1g+FthgiYomkycA1pMXuzomIuyQdkV8/A/gysBrwE0kAiyNiQg3Vr0SXJGNmZgPX\nsvsxSHwJeEMEHy4pLDOzpjao9mOQGA6cCCwuOxYzs3bTkomBZaPuk8sMwsysHbVqYtgP+KuX2zYz\nq79WTQxLgUvLDsLMrB21amIwM7OCODGYmVknLZcYJCYAnwCeLjsWM7N21HKJAVgd+AtwQdmBmJm1\no1ZMDACvRHiDHjOzIrRiYvg0aTkMMzMrQNGrqxZhEnBw2UGYmbWrllsrSWIpMCyCpSWHZWbW1AbV\nWklmZlYcJwYzM+vEicHMzDpxYjAzs05aKjFIrAf0eSDFzMxq11KJARgH3Oc7kszMitNqiQHgwbID\nMDNrZ62WGDYDRpYdhJlZO2u1xLA7ML/sIMzM2lnLJAaJ4cAhwCVlx2Jm1s5aJjEAP86/f1VqFGZm\nba6VEsOqwBcjWFJ2IGZm7ayVEsMrwL1lB2Fm1u5aKTGYmVkDODGYmVknTgxmZtaJE4OZmXXixGBm\nZp04MZiZWSetlBh2orXiNTNrSa30QbsmcGvZQZiZtbtWSgwjgGfLDsLMrN21UmIAeLjsAMzM2l2h\niUHSJElzJM2VdFwPZX6YX79T0rZVqjs5gsUFhWpmZllhiUHSUOB0YBKwBXCwpM27lNkb2CgiNgY+\nDvykSpU/LSrWViJpYtkxNAtfi2V8LZbxtRi4IlsME4B5ETE/IhYDFwP7dSnzbuACgIiYDoyRtFZ3\nlUVwT4GxtpKJZQfQRCaWHUATmVh2AE1kYtkBtLoiE8M6wP0VzxfkY72VWbfAmMzMrBdFJoaosZz6\n+T4zMyvAsALrXgiMq3g+jtQiqFZm3XxsOZKcMDJJXyk7hmbha7GMr8UyvhYDU2RiuAXYWNL6wAPA\nQcDBXcpcAUwGLpa0I/BkRCx3S2pEdG1VmJlZQQpLDBGxRNJk4BpgKHBORNwl6Yj8+hkRcZWkvSXN\nA54DPlxUPGZmVhtFuIfGzMyWaaqZz3WeENfSersWkg7N12CmpGmStiojzkao5f+LXG57SUsk7d/I\n+Bqlxn8fEyXdLmm2pKkNDrFhavj3MVbS1ZLuyNfi8BLCbAhJ50p6WNKsKmX69rkZEU3xQ+pumges\nDwwH7gA271Jmb+Cq/HgH4Oay4y7xWuwErJofTxrM16Ki3J+BK4EDyo67pP8nxgD/ANbNz8eWHXeJ\n12IK8M2O6wA8BgwrO/aCrseuwLbArB5e7/PnZjO1GOo6Ia7F9XotIuKmiHgqP51O+87/qOX/C4Cj\ngUuARxsZXAPVch0OAS6NiAUAEbGowTE2Si3X4kFgdH48GngsIpY0MMaGiYgbgCeqFOnz52YzJQZP\niFumlmtR6aPAVYVGVJ5er4WkdUgfDB1LqrTjwFkt/09sDKwu6XpJt0g6rGHRNVYt1+IsYEtJDwB3\nAp9uUGzNqM+fm0XertpXnhC3TM1/k6TdgI8AuxQXTqlquRanAsdHREgSy/8/0g5quQ7Dge2AtwMj\ngZsk3RwRcwuNrPFquRYnAHdExERJGwLXSdo6Ip4pOLZm1afPzWZKDHWdENfiarkW5AHns4BJEVGt\nKdnKarkWbyLNhYHUn7yXpMURcUVjQmyIWq7D/cCiiHgBeEHSX4GtgXZLDLVci52BkwAi4h5J9wGb\nkuZXDTZ9/txspq6kVyfESVqBNCGu6z/sK4APAVSbENcGer0WktYDLgM+GBHzSoixUXq9FhHxhojY\nICI2II0zHNlmSQFq+/fxW+AtkoZKGkkaaPxng+NshFquxRxgD4Dcn74pcG9Do2weff7cbJoWQ3hC\n3KtquRbAl4HVgJ/kb8qLI2JCWTEXpcZr0fZq/PcxR9LVwExgKXBWRLRdYqjx/4lvAOdJupP0Bfhz\nEfF4aUEXSNJFwNuAsZLuB75C6lbs9+emJ7iZmVknzdSVZGZmTcCJwczMOnFiMDOzTpwYzMysEycG\nMzPrxInBzMw6cWKwpiHplbxkdMfPelXKPluH850v6d58rlvz5J++1nGWpM3y4xO6vDZtoDHmejqu\ny0xJl0lapZfyW0vaqx7ntsHJ8xisaUh6JiJG1btslTrOA34XEZdJegdwckRsPYD6BhxTb/VKOp+0\nvPL3qpQHi2JqAAADoklEQVQ/HHhTRBxd71hscHCLwZqWpJUl/TF/m58p6d3dlHmtpL/mb9SzJL0l\nH3+npBvze/9P0so9nSb/vgHYKL/3mFzXLEmfrojl93njl1mSDszHp0p6k6RvASNyHBfm157Nvy+W\ntHdFzOdL2l/SEEnflTQjb6Dy8Rouy03AhrmeCflvvE1ps6ZN8hIRJwIH5VgOzLGfK2l6LrvcdTTr\npOxNJvzjn44fYAlwe/65lLTcwaj82lhgbkXZZ/LvY4ET8uMhwCq57F+AEfn4ccCXujnfeeRNfYAD\nSR+625GWlBgBrAzMBrYBDgDOrHjv6Pz7emC7ypi6ifE9wPn58QrAf4AVgY8DX8jHVwT+DqzfTZwd\n9QzN1+Wo/HwUMDQ/3gO4JD/+L+CHFe//BnBofjwGuBsYWfZ/b/8070/TrJVkBrwQEa9uOyhpOPBN\nSbuS1v55naQ1I+KRivfMAM7NZX8TEXdKmghsAdyY15FaAbixm/MJ+K6kLwKPkPa1eAdwWaQVSpF0\nGWmHrKuBk3PL4MqI+Fsf/q6rgR/kb/N7AX+JiJckvRMYL+l9udxoUqtlfpf3j5B0O2ld/fnAT/Px\nMcDPJG1EWka5499z16XH3wm8S9L/5ucrklbbvLsPf4MNIk4M1swOJX373y4iXlFaOnmlygIRcUNO\nHPsC50s6hbSb1XURcUgv9QfwvxFxWccBSXvQ+UNV6TQxV2mv3H2Ar0v6U0R8rZY/IiJeVNp/eU/g\n/cBFFS9PjojreqnihYjYVtII0sJx+wGXA18D/hQR75X0emBqlTr2j/bbl8EK4jEGa2ajgUdyUtgN\neH3XAvnOpUcj4mzgbNLetzcDuyht0NIxPrBxD+fouoHJDcB7JI3I4xLvAW6Q9FrgxYj4BXByPk9X\niyX19GXrV6QNlTpaH5A+5I/qeE8eIxjZw/vJrZhPAScpNYVGAw/klytXzHya1M3U4Zr8PvJ5et8M\n3gY1JwZrJl1vkfsF8GZJM4HDgLu6KbsbcIek20jfxn8Qaa/jw4GL8rLLN5LW4+/1nBFxO3A+qYvq\nZtLS1XcC44HpuUvny8DXu6nrTGBmx+Bzl7qvBd5Kasl07D18Nmm/hNskzSJtTdpdYnm1noi4A5iX\n/9bvkLrabiONP3SUux7YomPwmdSyGJ4H8GcDX+3hWpgBvl3VzMy6cIvBzMw6cWIwM7NOnBjMzKwT\nJwYzM+vEicHMzDpxYjAzs06cGMzMrBMnBjMz6+T/AeLQrdcMKa7HAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10d7bbe10>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEZCAYAAACTsIJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8HFWd/vHPQ8IW2UVQISwCQkCWCETEoHFACciiqKwC\nOorIyIw66iD8dIiOCs44IyqKiICICCqIoqMwiEQRFGQJewIBgmFfZQcT8vz+OHVJ9829uX2X3u59\n3q9Xv2531emqbxWkvn3qnDpHtomIiOixTLsDiIiIzpLEEBERdZIYIiKiThJDRETUSWKIiIg6SQwR\nEVEniSE6mqSDJF3UQLmTJH22FTG1gqR5kv6hej9D0pntjinGjiSGGLLq4vWspKckPSDpdEkvG8l9\n2D7L9q4NlDvC9hdHct89JC2S9HR1nPdK+oak8c3YVw33876v+FaRdIKku6sY50r6mqSXNznGGKWS\nGGI4DOxhe2Xg9cB2wBK/2ltwEW2FrarjfDOwD/DhFu5b/a6QlgMuASYBu1YxvhF4BJgy6B2Njv9W\nMUxJDDEibN8HXAhsAS/9yv4nSbcDc6ple0iaJelxSZdL2rLn+5ImSvqZpIckPSLpm9Xy90u6rHqv\n6pfwg5KekHSDpM2rdd+X9B812ztM0u2SHpX0C0mvqlm3SNLhkm6rYjlxEMd5B3A5sHnN9oZyXBtJ\n+l217GFJP5S06qBOenEIMBF4l+3ZVYwP2/6S7d/UHO9ramJ66VxJmibpHkn/Jul+4DRJt0h6R035\n8VWM21Sfd5B0RXW8syS9ZQhxRwdLYojhEpQLILAbcF3Nur2B7YHNJU0GTgUOA9YATgYukLSspHHA\nr4C7gPWBdYCz+9jX24GdgE1srwq8F3isWufqRXVv/svV+lcBdwPn9NrWOyg1nK2AfSUNdLuq5zg3\nq2K4qvo82OOqjeNLVXyTKBf3GQPE0JddgN/YfnYQ33npXFXWBlYH1qPUhM4GDqhZvyvwkO1Zktah\nHNMXbK8OfAo4T9KaQ4g9OlQSQwyHgJ9Lehy4DJhJuSD3OM7232y/QLngnGz7Ly5+ALxAue0xhXKB\n/LTt52y/YPuKPva3AFgZmCRpGdtzbD/QR7mDgFNtz7L9d+Bo4I2S1qspc7ztJ23PBy4FthngWK+V\n9DRwC3BuFT9DOK7LodQ8bF9ie4HtR4CvAUP55b0GcP8Qvld7e2oRcGwVy/PAj4C9JK1QrT+QxYn6\nfcCvbV9YHcdvgauB3YcQQ3SoJIYYDgN7217d9ga2j6ySQI/5Ne/XBz5Z3X54vEom61IunBOBu20v\nWurO7N8BJwLfAh6UdLKklfso2lNL6PneM8CjlF/sPWoTyrPAywAk3Vw14D4l6U01ZSbbXgnYDzhE\n0vrDOS5Ja0s6p7qN8wRwJjCUxuJHgVcP4Xu1Hq4SKPDS7bJbKclhArAnJVlAOd739jreNwGvHGYM\n0UGSGKKZam9X/BX4UpVEel4r2f4xJYGsV916WfoG7W/a3o5yj/+1wKf7KHYfsEHPh6qn1MuBe5ey\naVXb38L2ytXr8j72/1PKrZQZwzyuLwMvAq+rbosdzND+Pf4W2LW6gPfnWaB2/asYuNdTz+2kvYFb\nbN9ZLf8rcGav413Z9n8OIfboUEkM0SqnAB+RNKVqRH6ZpHdIWgm4knI75HhJEyStIGnH3huQtJ2k\nN0halnKxe55ycYVyYe+5PXI28AFJW0tannIR/rPtv/YTW7+9fvpxPHCApHWHcVwrAc8AT1b37ftK\ncI04k5KAzpO0qaRlJL1c0jGSdqvKzAIOkjRO0nRKz6qBnENpW/gIcFbN8h8Ce0p6e7W9FaoG7HX6\n3Ep0pSSGaJa6X6G2r6E00J5IaTC+ndKjhupWy57AxpRfpPOBfWu207OtVYDvVt+fR+mS+V+9y9m+\nBPgccB6l9rAhsH9/sbFkY+xAx3IT8DvgX4dxXJ+ndPF9AvhlFWt/MfQbX3ULaBdgNnBxtb0rKW0P\nf66KfayK43FKe8H5Szu+arsPAFdQ2kp+XLP8Hkot4hjgoeq4PkmuJaOKmjlRj6TTKL0/HrK9ZT9l\nvkHpzfIs8H7b1/VVLiIiWqPZWf50YHp/KyXtDmxsexNK746TmhxPREQMoKmJwfZllOprf/YCzqjK\nXgmsJmntZsYUERFL1+77gutQ36XxHkpXv4iIaJN2JwZYskdI8xo9IiJiQO0eMOteykNAPdalj77m\nkpIsIiKGwPZgu2O3PTFcABwJnCNpB+Bvth/su6hXAh6yGdFhnbuNpBm2Z7Q7jk6Qc7FYzsViOReL\nDfVHdVMTg6SzKeO/rClpPnAssCyA7ZNt/1rS7pLmUh72+UAz44mIiIE1NTHYPqCBMkc2M4aIiBic\nTmh8jsGZ2e4AOsjMdgfQQWa2O4AOMrPdAXS7pj75PFLKfbK0MUREDIYkD6XxOTWGiIiok8QQERF1\nkhgiIqJOEkNERNRJYoiIiDpJDBERUSeJISIi6iQxREREnSSGiIiok8QQERF1khgiIqJOEkNERNRJ\nYoiIiDpJDBERUSeJISIi6iQxREREnSSGiIiok8QQERF1khgiIqJOEkNERNRJYoiIiDpJDBERUSeJ\nISIi6iQxABJTJf6n3XFERHSC8e0OoN0kNgZ+Dtzb7lgiIjrBmK4xSKwG/C/wE0ASGub2JHGYxN0S\nK4xIkBERLdbUxCBpuqTZkm6XdFQf61eXdL6k6yVdKWmLATY5brgX78X7RsB3gUuqv1sC7xvG9tYG\nfgH8E/BKYLkRCDMiouWalhgkjQNOBKYDmwMHSJrUq9gxwLW2twYOAb4+wGaXB/YcoRA/BGwK/Ctw\nW7VsSLfWJKYB1wE3A28AXui1fjOJ/5PYacjRRkS0SDNrDFOAubbn2V4AnAPs3avMJOBSANtzgA0k\nvWKA7Q60fkASmwBfBva3ed7mWeBsShIbzHYk8QnKsR1ic7TN34GVgZUlxkscBfwReC2wbq/vryhx\nrMR90tBrGFUcb5I4T+L7Eu/sr2YlsbrEkRIfrVm2vMQ+EpsONYaIGD2amRjWAebXfL6nWlbremAf\nAElTgPXpdfHsw8PDCaq6YH4HON7m1ppV2wL7NnqrSmJ54EzK7acdbH7bq8hhwBXA24DtgT/VxiCx\nN3AL8DpgTYZQW6kSz37An4EzgMeAg4HzgAsltpc4RmKKxE4SPwDuAg4C9pPYTuJESsP7D4AfSGwp\ncbzEXRJflzi/SlyrDTa+iOhOzUwMbqDM8cBqkq4DjqTcjnmxn7I9F+y/DTOug4HVWfK21a7A34Er\nJN7aZwDiZdXfVYHfACsAU23m9Sp6LvAZysX6bTZ31WxjHeACyrEfZvNeYCHwU4nvLy1wic2rC/fy\nEkcAc4GPAsdRbot9BFgL+CKLa2MfAmYCpwCzgI2BfwOmUhrdHwK2o5z/KcCvq939EtgMuAh4FXBL\nVbvZpJ/YlpfYQ+I4iXFLO46I6HC2m/ICdgAurPl8NHDUAN+5C1ipj+UGZsDRz8Dk04BpQ4vJK4Hv\nB2/fx7rVwa5e7+lj/X7gZ8DrgK8Hnwge189+dgK/rteys8E/BT8EngFermbdVeCfV/veslom8HvB\nHwRvAD4DvAA8Gzwf/L/gHZZyrJuAV6v+TgWrZt2y4G3By9QsG997Wc26qeBras7Pa8AbglcA7w0+\nE/w4+A/V+lPAK4K3qd1vzfYEntjAf6/l+vp+Xnnl1fcLmFaulS+9PKTtNDHA8cAdwAaUHjqzgEm9\nyqwKLFe9Pwz4fj/bcvnry6qL7vmNXFiW3I7/HXxWP+uWBX+luuC+p9e6PcEPVBe928CfG+wFC3w6\neBZ4m37Wr1xt/07w5Ooi+zD4b+BHwZ8HTwf/pK/E1pr/6bx+TXJwFdul4I+CX12VOaFm/SLwujXn\ndxfwN8F/rda/AP4NeF3wruBvg+8CfwY8syrz4ySHvPIa2qvjEkMV1G7AHMotj6OrZYcDh1fv31it\nn025/bLq0g6uJjEYvPvgYvFa4EfAGw5Q7lzwe8CHVt/5h+pX/vbg58BHD+1ceAJ4/ABlPlEd2wPg\nD4Mngr8AXqvd/4PVxDipShAfBK/dT5ltq9jvBv8L+CzwY+ArwUdX23gf+Fs1SeQy8LHVOf4GeHfw\nUdW6H4B3Bvf5/0deeeXV92uoiUHVlzuaJNuWxGXAf1Due28LbAWcY/P8wNvgq8ByNv8yQLlzKff8\n9wM+Dvw/YF+bmRLr2M17QlpiAnAE8D2bJ5q1n1aR+D3wHOXJ8gts7uujzNrAApvH+tnG54F/B54G\nvg9cCDxJ6RJ8o81zIxyzgFf1FWtEt+m5dg76e12YGM6gNKRuC1wGbG0zd+nfZ3VKrWUbu66nVF9l\nzwX2AO4HXg0cYXPaCBxGDEF1oV6R0mD+MViid9Q/U/47HUZpdP/p0i7qVfLdntIjbnPKPdl3A5tQ\nHnacSuklBnAVpSvy9yiJaUVgZZsHR+DQIppuLCWGmcBngb0ovXs2aSAxHANsanPowPvic5RG8H2A\nO2w+Pdz4Y2RUSWJl4FlgQ+BiykOP36Nc4KdWRd9G+dGwFiXBbwfsDOxCSQovq8pdB/ye0m34jdX2\nLqN0q34HJWFsXpV9EV7qbfVt4HOUZDEZeBMwj9J7642U7sD/RRluZT7loce3VDG/n9L7a8dq/3dX\n8Uym1BQfH8YpiqgzlhLDHcChwKcp//iWmhgklqX849vV5sbG98kKwAt2Q91uowNIrEx5mHACpVsu\nwBPAX4HfUmoEf6B0M17YyEW4qm2+lpKArgW+SmknA3iGUhPdjHL78RzKsyv7U5ITlMT0Z8pT8Z+t\nll0EvJlSA4GSjHai/H/6O2BZ4B9tFgzm+CN6G0uJ4WnKE8qnAh9k4MSwD/Bxmze3JtpoJ4ntKT3h\nLqHUDmaN9K2f6in1N1bb7rctSGIl4JmeHxfVbaznan9sSCxjs0hic6BnPLFDgJ9Saq6XA//XSDta\nRG9jKTH0VLsvp1ThB0oMFwFn2vywNdFGDI/EoZRax66UBztvpgwnsxrlKfXtKbfHjhtMwpBQasBj\ny1hKDK+hNDY+Qmkk7DcxSGxIaUCcmF9c0Y0kPgB1nR+eotyaehvl9thqlBr044Ap7SCTKIljO0r7\nxraU22CvB74JfCkN6GPDUBNDN07Us1b1d00Y8B7sAZReKkkK0ZVsTq/GuDKwEvCUjSXeSbnQf47y\nI6nHM5Rxya4GrqG0e7yGMoLwuyhDpPyFMs5XRJ+6scYwtWbVfcChXnIAu+p73AAcafOHVsQZ0WoS\nr6f8SJpH6RU1ULvH/wK7A5NsZrckyGibsXQraWrv9faSI6JWjXkXU24jLWpBmBEdT+IVwK3Ay6tF\nfwX2sbmmfVFFsww1MXTb1J7L1rx/coCy+wE/SVKIWMzmYZs1KaPx/hOwHnC1hCU+WQ0Jv3x7o4x2\n67bEsGbN+z6fYJY4XGI6Zaa381sSVUSXsbnN5iTKMx17UoZZ/yqwCHhe4pR2xhft1W2JYe2a93+t\n/vZugP4A8FZKX/YrWhBTRNeyecHmVzZ7UWoP61HmKvmQNGLT6EaX6bZeSSuxeGiC+ZSq8O49K6sH\nirajPFH6W5uF7QgyohvVjCP2cYnNgHMk9rK5pJ1xRet1W40BFs/gNp/SLa/WjpSksRWLZyKLiME7\njTK0yG+rdoeVJN5S9YKKUa5bE8PDlKk1AfaQWKN6/5aache2NKqIUcTmJ5SJtABuAh6kjON0jcRG\n0kvjPMUo1JWJwWatXt3rDqr+vqlngc0DrQ0rYnSxeZIyymzPPOnLVavmAs9K/LJdsUVzdVsbAyy+\nlQSL47+jmoC+p5r799aGFDE62fW3ZKt/ZxMoA/19S2LtDK8x+nRljaHm/YbV3+co/bIfAras3kfE\nCLNZZPM0cBLwPHCCxPptDitGWLcnhqervx+jDIP8F5ubbOa1PKqIMaQapfVoyiiw86pJlGKU6PbE\ncAplQLC9gXdW7yOiBWxOYPFkQ4skjkiCGB26MTFc2/Om+tXSU43dkjKiZES0SDVy8auAOZQpTxdJ\nbFr1XFp16d+OTtVtg+gt23u6Q6lu4pHV7boaRUS0QFVTWJsylWmt84AD7XQIaYcxMYjeAHPg3puk\nENEeNrZ5wEbVaMdrACcC72bxIH0fk7rrmjNWdVWNoe91L9UYLrKZ3sKwImIAEntRnn84HtioWvxN\nyphMf7L5SrtiGwvGRI2hHz2jQN7U1igiYgk2F9ica7MxsBNlWtKJwMbA8dJLD6dGB+n6GkNZzwPA\nZ2y+37qoImKoJMYDNwKbUXoavg24pupQEiNkLNcYetzc7gAiojE2C20mUYbcWJnS1XyRxP0S35WY\n3N4Ix7bRUmP4OHCSzQstDCsiRkA1XP4WwK7Am4Gdq1X3A1Nt7mxXbN1uTMz53O44IqI1JHZj8dD5\n29hc3854ulVH3kqSNF3SbEm3Szqqj/VrSrpQ0ixJN0l6fzPjiYjuYPMbFg+SOUvite2MZ6xpWmKQ\nNI7Sj3k6sDlwgKRJvYodCVxnextgGvDfkrpxxNeIGGE2L7J4oMw51bMQh7czprGimTWGKcBc2/Ns\nLwDOoYxpVOt+YJXq/SrAo7YzHWdEAFANiDmeMtfKN4DvSExpa1BjQDN/na8DL80hC2Uazjf0KnMK\n8DtJ91F6JuzbxHgiogtVNYcrgCsktgZe2eaQRr1mJoZGWrWPAWbZniZpI+BiSVvbfqp3QUkzaj7O\ntD1zZMKMiC6yIvCL6tmlqTZ3tDugTiJpGuW2/LA0MzHcS3nCscdESq2h1o7AlwBs3yHpLsokO0uM\nkmp7RnPCjIgusgPwGeDLwFyJZ4DLbHZrb1idofrBPLPns6Rjh7KdZrYxXA1sImkDScsB+wEX9Coz\nG9gFQNLalKSQPssR0adqsL7jqoH6pgP/AUyX2KfNoY0qTX2OQdJuwAnAOOBU28dJOhzA9smS1gRO\npwyotQxwnO0f9bGdPMcQEX2SuA3YBJgHnGDz9fZG1DnygFtEjEnVXBD7U3pCfrxavInN3PZF1RmS\nGCJizKsehJsD3AAcAjxvM6e9UbVPEkNEBCCxBTXD8FftEWNSRw6JERHRajY3U3pcrgUg8alqmO9o\nUGoMETEqVdOIXgNsA8yuhvkeU1JjiIioYbPIZjLwHmAzafgPfo0VqTFExKgncRewAfBKmwfbHE7L\npMYQEdG/nsl/MoRGA5IYImLUq2aB2wl4mcRJEsu2O6ZOlsQQEWOCzR+B7wEfAZYYYSEWSxtDRIwp\nEkcA3wa2trmh3fE0Ux5wi4hogMQ4YCEwq+q1NGql8TkiogHVxD/7AttI3COxbrtj6jQDJgZJUyVd\nLOl2SXdVrwyNHRFdy+anwIFUM01W80lPb3NYHWPAW0mS5lBGLLwWeLFnue1HmhtaXQy5lRQRI05i\nRWA34Fx4aUylk4BP2LzQtsBGSNPaGCRdabv3XM0tlcQQEc0msTPwfuB91aIzgB8Cv7dZ0K64hqOZ\nieF4ykQ7P4PFGdT2tYPd2VAlMUREq0hsCXyKMmx3j0/Z/HebQhqyZiaGmcAShWy/dbA7G6okhoho\ntar3koDzgT2AI2y+096oBifdVSMimkBiBeAKYDKwgc3dbQ6pYU3rrippNUlfk3RN9fpvSasOLcyI\niO5i8zzwJuAp4BVtDqclGnmO4TTgSeC9lL6/TwGnNzOoiIhOYvMcpVdmy26ht1MjsxptZHufms8z\nJF3frIAiIjrUr4AV2x1EKzRSY3hO0k49HyRNBZ5tXkgRER3pb8Dnq0bpUa2RXknbAD8AetoVHgcO\ntd2yWkManyOi3SQmAM8AU20ub3c8jWh6ryRJqwDYfnKwOxmuJIaI6AQSzwErAO+y+Xm74xnIiCcG\nSQfbPlPSJ6l/jkGAbf/P0EIdvCSGiOgEVa3hKuDlwB4217Q5pKVqRnfVCdXflft5RUSMKTbPAocC\nrwROrxLFqJMH3CIiBkniQOCs6uOqNi2/xd6IZj7g9p+SVpG0rKRLJD0i6eChhRkR0f1sfgS8tvp4\nyWibQ7qR7qq7Vg3OewDzgI2ATzeycUnTJc2u5nI4qo/1n5J0XfW6UdJCSasN5gAiItrB5nbgYGA7\nylzSo0YjiaHnIbg9gHNtP0Efg+r1JmkccCIwHdgcOEDSpNoytr9qe7LtycDRwEzbfxvMAUREtIvN\nD4EZwCES35N4XZtDGhGNJIZfSpoNbAtcImkt4PkGvjcFmGt7nu0FwDnA3kspfyBwdgPbjYjoJMcB\nZwIfBG6UOFCiq9tEB0wMtj9DGUBqW9t/pzzgsbQLfI91gPk1n++pli1B0gRgV+C8BrYbEdExbP5u\nc4iNgEsojdKLJM6U6MoBR/sdK0nSzrYvkfRuqltHknqyoCkT9yzNYLo77Qn8cWm3kSTNqPk40/bM\nQWw/IqLpbHaRWAX4HWUmuPdJnG+zzwBfHRGSpgHThrudpQ2i92ZK9tuTvi/yAyWGe4GJNZ8nUmoN\nfdmfAW4j2Z4xwP4iItqu6rq6XVVb+DpwqMQqrejSWv1gntnzWdKxQ9lO055jkDQemAPsDNxHeVrw\nANu39iq3KnAnsK7t5/rZVp5jiIiuI7EMZbjuE2w+0fr9N+85hi/XdiGVtLqkLw70PdsLgSOBi4Bb\ngB/bvlXS4ZIOryn6TuCi/pJCRES3slkEXAh8XGLXdsfTqEZGV51le5tey66rupi2RGoMEdGtJNZl\ncUeclj4l3bQaA7CMpBVqdrQisNxgdxQRMRbZ3FP1WAJa0wg9XI3M4HYW5fmF0ygjq36AMj9DREQ0\n7mzKwHu32lzZ7mCWpqHGZ0m7URqRAS62fVFTo1py/7mVFBFdrerG+kT1sSWT/Qz12tlIjQHgVmCh\n7YslTZC0su2nBruziIixyuZJidcCtwH/DJ07C1wjvZI+DPwU+E61aF3o/JmLIiI6TTXw3keA3Tt5\n2IxGGp8/CkyF0pJu+zZgrWYGFRExis2iTHa2Y7sD6U8jieEF2y/0fKgeXOv82X0iIjpQ1fB8P/BH\niTlSw7f0W6aRxPB7Sf8PmCDpbZTbSr9sblgREaOXzauBD1Mm+1kgsUObQ6rTyANuywAfAt5eLboI\n+J5bOCdoeiVFxGgksQLQM+rDLOD19sjdkRnqtXOpiaG6bXST7c2GE9xwJTFExGhVNUIfQHlmbAub\nW0Zu20148rka72iOpPWHHFlERPTLxtUc0gCXtjWYSiONHmsAN0u6ijJJD4Bt79W8sCIixpx3A+dJ\nvNrmvnYG0kgbw1t63tYstu3fNy2qJWPIraSIGPUkDBxUU4MY5vZGuI2hGizvI8DGwA3AadXczS2X\nxBARY4HEHylTKa9p8+jwtzfybQxnANtSksLuwFeHGFtERDRmp+rv1u0MYmltDJNsbwkg6VTgL60J\nKSJibLKxxJ3AJMq80W2xtBrDwp43Ve+kiIhovhuAI6vRWNtiaYlhK0lP9byALWs+t2wGooiIMeYk\nYDPgU+0KoKH5GNotjc8RMZZIfAk4hmFOBdrMqT0jIqK1zqz+7tmOnScxRER0GJvZwPnA29qx/ySG\niIjOdBtwqETLx6pLYoiI6EA2nwGuAya0et9JDBERUSe9kiIiOlQ1dhLAijbPD/776ZUUETHaTK7+\nvn2ppUZYagwRER2sZmC98TYvDu67qTFERIxGJ1V/P9aqHTY1MUiaLmm2pNslHdVPmWmSrpN0k6SZ\nzYwnIqLb2JwFnAi0bNqDRmZwGxJJ4ygHswtwL/AXSRfYvrWmzGrAt4Bdbd8jac1mxRMREY1pZo1h\nCjDX9rxqgp9zgL17lTkQOM/2PQC2H2liPBER3WoN4But2lkzE8M6wPyaz/dUy2ptAqwh6VJJV0s6\nuInxRER0qy8ASBzWip017VYS0Eh3p2WB1wM7U57u+5OkP9u+vXdBSTNqPs60PXMkgoyI6HQ2cyTO\nBo6ROMvm2b7KSZoGTBvu/pqZGO4FJtZ8nkipNdSaDzxi+zngOUl/oExpt0RisD2jSXFGRHSDHwMH\nUOZp+EJfBaofzDN7Pks6dig7auatpKuBTSRtIGk5YD/ggl5lfgFMlTRO0gTgDcAtTYwpIqIr2fwC\nOAN4S7P31bQag+2Fko4ELgLGAafavlXS4dX6k23PlnQhZSq7RcAptpMYIiL6diHwzmbvJE8+R0R0\nCYn9gX+1mdJY+Tz5HBEx2j0EbN/snSQxRER0j6sBJL7dzJ0kMUREdAmbJ4GvAUdITRy5Im0MERHd\nQ2I54AXgxzb7L71s2hgiIkY9m78DX6c8AtAUqTFERHQZiVcC9wMb29zRf7nUGCIixgSbB6q3Tak1\nJDFERHSns4APNmPDSQwREd3pRzQ2WOmgJTFERHSnu4GNJEa8/TWJISKiC9ncXL0d8Tka0ispIqJL\nSVwHbGGzXN/r0yspImKsORD67646VEkMERHd63lgM4lPjuRGcyspIqKLSZwCfMheshF6qNfOJIaI\niC4msR5w90gmhtxKiojobo8Bz4zkBpMYIiK634jeUUliiIjobguBCRLvGKkNpo0hIqLLSVwDvGCz\nY/3ytDFERIxVpwATR2pjSQwREd3vWsr8DCMiiSEiIuokMUREjA7bS2w0EhtK43NERJeTWJ4yPAa1\nD7ql8TkiYoyyeQF4JYDE+sPdXhJDRMQoYPMgcCew2nC3lcQQETF6PD0SG2lqYpA0XdJsSbdLOqqP\n9dMkPSHpuur12WbGExERAxvfrA1LGgecCOwC3Av8RdIFtm/tVfT3tvdqVhwRETE4zawxTAHm2p5n\newFwDrB3H+XS2ygiooM0MzGsA8yv+XxPtayWgR0lXS/p15I2b2I8ERHRgGYmhkYekLgWmGh7a+Cb\nwM+bGE9ExGj3WuDLw91I09oYKO0KtYM6TaTUGl5i+6ma97+R9G1Ja9h+rPfGJM2o+TjT9syRDTci\notudfgHcva/0xS/Ai4uGupWmPfksaTwwB9gZuA+4CjigtvFZ0trAQ7YtaQrwE9sb9LGtPPkcEdEA\niQXABJsFQ712Nq3GYHuhpCOBi4BxwKm2b5V0eLX+ZOA9wBGSFgLPAvs3K56IiGhMxkqKiBhFJAy8\nwuaRjJWf+Zp/AAAGi0lEQVQUERE9HpZYZahfTmKIiBhdXlX9fcVQN5DEEBExitg8QBlMb8iSGCIi\nok4SQ0RE1EliiIgYfTYA1hvql9NdNSJilJF4BrgPtHG6q0ZEBMCBwC1D/XISQ0RE1EliiIiIOkkM\nERGjz0JgyDNjpvE5ImKUkRgPbA26eijXziSGiIhRKoPoRUTEiEhiiIiIOkkMERFRJ4khIiLqJDFE\nRESdJIaIiKiTxBAREXWSGCIiok4SQ0RE1EliiIiIOkkMERFRJ4khIiLqJDFERESdJIaIiKiTxBAR\nEXWamhgkTZc0W9Ltko5aSrntJS2UtE8z44mIiIE1LTFIGgecCEwHNgcOkDSpn3JfAS4EMhnPACRN\na3cMnSLnYrGci8VyLoavmTWGKcBc2/NsLwDOAfbuo9w/A+cCDzcxltFkWrsD6CDT2h1AB5nW7gA6\nyLR2B9DtmpkY1gHm13y+p1r2EknrUJLFSdWizp9nNCJilGtmYmjkIn8C8BmXiadFbiVFRLSdyjW5\nCRuWdgBm2J5efT4aWGT7KzVl7mRxMlgTeBY4zPYFvbaVmkRExBDYHvQP7mYmhvHAHGBn4D7gKuAA\n27f2U/504Je2f9aUgCIioiHjm7Vh2wslHQlcBIwDTrV9q6TDq/UnN2vfERExdE2rMURERHfqqCef\nG3kgTtI3qvXXS5rc6hhbZaBzIemg6hzcIOlySVu1I85WyIOSRYP/PqZJuk7STZJmtjjElmng38ea\nki6UNKs6F+9vQ5gtIek0SQ9KunEpZQZ33bTdES/K7aa5wAbAssAsYFKvMrsDv67evwH4c7vjbuO5\neCOwavV++lg+FzXlfgf8Cnh3u+Nu0/8TqwE3A+tWn9dsd9xtPBczgON6zgPwKDC+3bE36XzsBEwG\nbuxn/aCvm51UY2jkgbi9gDMAbF8JrCZp7daG2RIDngvbf7L9RPXxSmDdFsfYKnlQsmjkPBwInGf7\nHgDbj7Q4xlZp5FzcD6xSvV8FeNT2whbG2DK2LwMeX0qRQV83OykxDPhAXD9lRuMFsZFzUeuDwK+b\nGlH75EHJopH/JzYB1pB0qaSrJR3csuhaq5FzcQqwhaT7gOuBj7Uotk406Otm03olDUGj/5h798kd\njReBho9J0luBfwTe1Lxw2mpQD0pKGq0PSjZyHpYFXk/pIj4B+JOkP9u+vamRtV4j5+IYYJbtaZI2\nAi6WtLXtp5ocW6ca1HWzkxLDvcDEms8TKZltaWXWrZaNNo2cC6oG51OA6baXVpXsZo2ci22Bc0pO\nYE1gN0kL3OtByS7XyHmYDzxi+zngOUl/ALYGRltiaORc7Ah8CcD2HZLuAjYFrm5JhJ1l0NfNTrqV\ndDWwiaQNJC0H7Af0/od9AXAIvPRk9d9sP9jaMFtiwHMhaT3gZ8D7bM9tQ4ytMuC5sP0a2xva3pDS\nznDEKEsK0Ni/j18AUyWNkzSB0tB4S4vjbIVGzsVsYBeA6n76psCdLY2ycwz6utkxNQY38ECc7V9L\n2l3SXOAZ4ANtDLlpGjkXwL8DqwMnVb+UF9ie0q6Ym6XBczHqNfjvY7akC4EbgEXAKbZHXWJo8P+J\nLwOnS7qe8gP432w/1ragm0jS2cBbgDUlzQeOpdxWHPJ1Mw+4RUREnU66lRQRER0giSEiIuokMURE\nRJ0khoiIqJPEEBERdZIYIiKiThJDBCDpxWq46hsk/UzSSiO8/XmS1qjePz2S244YaUkMEcWztifb\n3gp4Ejh8hLfvft5HdJwkhogl/QnYCEDSRpJ+U41W+gdJm1bL15Z0fjURzKxqqAGqZVdXk8Mc1sZj\niBiyjhkSI6ITSBoHvB24pFr0XeBw23MlvQH4NmX00m8Al9p+l6RlgJ5bT/9o+3FJKwJXSTp3FA9w\nGKNUhsSIACQtBG6kjF0/D9iBMnT1Q8CcmqLL2d5C0kPAOtVEMbXbmQG8s/q4AfB221dVo3tua/sx\nSU/ZXrmZxxMxHKkxRBTP2Z5c/dK/iDLxz28pI1H2N0du3Rj3kqZRahM72H5e0qXACk2MOaIp0sYQ\nUaOay+BfKGP5Pw3cJek9ACq2qopeAhxRLR8naRXKFJKPV0lhM0qtI6LrJDFEFC/dU7U9izLZ/L7A\nQcAHJc0CbqLMnwtlqsi3SrqBMj/AJOBCYLykW4DjKI3YS91XRCdKG0NERNRJjSEiIuokMURERJ0k\nhoiIqJPEEBERdZIYIiKiThJDRETUSWKIiIg6SQwREVHn/wOARqzvQk/y1gAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10d7bbdd8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.metrics import roc_curve, precision_recall_curve\n",
"fpr, tpr, thresholds_roc = roc_curve(y_test, predicted_y)\n",
"plt.plot(fpr, tpr)\n",
"plt.ylabel('True Positive Rate')\n",
"plt.xlabel('False Positive Rate')\n",
"plt.title('ROC-curve')\n",
"plt.show()\n",
"precision, recall, thresholds_pr = precision_recall_curve(y_test, predicted_y)\n",
"plt.plot(recall, precision)\n",
"plt.ylabel('Precision')\n",
"plt.xlabel('Recall')\n",
"plt.title('Precision-Recall Curve')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Допустим, мы хотим построить классификатор, который будет находить не менее 90% успешных грантов. Соответственно, имеем нижнюю границу 90% на полноту. Какую максимальную точность при этом можно получить? При каком значении порога? Для получения ответа проанализируйте массивы, возвращаемые функцией metrics.precision_recall_curve."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Наибольшая возможная точность 0.701754385965 при данном ограничении на полноту\n",
"достигается при пороге 0.315451906769.\n"
]
}
],
"source": [
"print('Наибольшая возможная точность', precision[len(recall[recall > 0.9])], 'при данном ограничении на полноту')\n",
"print('достигается при пороге', thresholds_pr[len(recall[recall > 0.9])], end=\"\")\n",
"print('.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Возьмите классификатор с порогом из предыдущего пункта и подсчитайте для него F-меру. Почему она получилась маленькой, несмотря на высокую полноту?"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"F-мера: 0.788321167883\n",
"F-мера — гармоническое среднее полноты и точности. Так как точность при выбранном пороге\n",
"низка (0.7), то и F-мера не слишком высока.\n"
]
}
],
"source": [
"opt_precision = precision[len(recall[recall > 0.9])]\n",
"opt_recall = recall[len(recall[recall > 0.9])]\n",
"F_measure = 2 * (opt_precision * opt_recall) / (opt_precision + opt_recall)\n",
"print('F-мера:', F_measure)\n",
"print('F-мера — гармоническое среднее полноты и точности. Так как точность при выбранном пороге')\n",
"print('низка (0.7), то и F-мера не слишком высока.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Допустим, что университету важно подавать как можно меньше заявок, которые окажутся неуспешными — они очень плохо влияют на репутацию. Установим нижнюю границу на точность 80%. Какую максимальную полноту при этом можно обеспечить? При каком пороге?"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Максимальная полнота 0.8759367194 при данном ограничении на точность\n",
"достигается при пороге 0.359109087236.\n"
]
}
],
"source": [
"print('Максимальная полнота ', recall[len(precision[precision > 0.8])], 'при данном ограничении на точность')\n",
"print('достигается при пороге', thresholds_pr[len(precision[precision > 0.8])], end=\"\")\n",
"print('.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание 5\n",
"В этом задании мы научимся оценивать способность классификатора предсказывать вероятности, а также разберемся с калибровкой.\n",
"\n",
"Начнем с калибровочных кривых. Допустим, алгоритм возвращает некоторые числа от нуля до единицы. Хорошо ли они оценивают вероятность? Для этого разобьем отрезок $[0, 1]$ на несколько маленьких отрезков одинаковой длины. Рассмотрим $i$-й отрезок с границами $[a_i, b_i]$ и предсказания $p_1, p_2, \\dots, p_k$, которые попали в него. Пусть им соответствуют истинные ответы $y_1, y_2, \\dots, y_k$. Если алгоритм выдает корректные вероятности, то среди этих истинных ответов должно быть примерно $(a_i + b_i) / 2$ единиц. Иными словами, если нарисовать кривую, у которой по оси X отложены центры отрезков, а по оси Y — доли единичных ответов этих в отрезках, то она должна оказаться диагональной. Ниже приведена функция, которая должна рисовать такие кривые. В ней допущено две ошибки — найдите и исправьте их."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def plot_calibration_curve(y_test, preds):\n",
" bin_middle_points = []\n",
" bin_real_ratios = []\n",
" n_bins = 20\n",
" for i in range(n_bins):\n",
" l = 1.0 / n_bins * i\n",
" r = 1.0 / n_bins * (i + 1)\n",
" bin_middle_points.append((l + r) / 2)\n",
" bin_real_ratios.append(np.mean(y_test[(preds >= l) & (preds < r)] == 1))\n",
" plt.plot(bin_middle_points, bin_real_ratios)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проверим её работу на логистической регрессии (поменяйте имена переменных, если это необходимо). Получилась ли почти диагональная калибровочная кривая?"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGahJREFUeJzt3XmUXGWdxvHvQ0gYkCVokDmGsMdAUBDBgCDYLEoTlMyo\noJHFILskEhwEdzLDIKCD7IMxRFxA8MiigDGgQIvGEDbDEhNMgGACDqs4bGoy/OaPe0OKprtr6Vt1\nb916PufU6VTVza1f7ul++s1730URgZmZlcsaeRdgZmbZc7ibmZWQw93MrIQc7mZmJeRwNzMrIYe7\nmVkJVQ13Sd+V9KSkBwY45gJJiyXdJ2nHbEs0M7N61dJyvwzo7u9NSeOBrSNiNHAMcElGtZmZWYOq\nhntE/Ab4ywCHHAh8Pz12HjBc0sbZlGdmZo3Ios99JLCs4vlyYJMMzmtmZg3K6oaqej33mgZmZjla\nM4NzPA6Mqni+Sfra60hy4JuZNSAiejegq8oi3K8HJgNXSdoVeD4inuzrwEYKLCNJ0yJiWt51FIGv\nxWq+Fqv5WqzWaMO4arhLuhJ4PzBC0jLgNGAoQERMj4hZksZLWgK8BBzRSCFmZpadquEeERNrOGZy\nNuWYmVkWPEM1Hz15F1AgPXkXUCA9eRdQID15F9Du1KrNOiSF+9zNzOrTaHa65W5mVkIOdzOzEnK4\nm5mVkMPdzKyEHO5mZiXkcDczKyGHu5lZCTnczcxKyOFuZlZCDnczsxJyuJuZlZDD3cyshBzuZmYl\n5HA3M8uBxDoSmzfr/Flss2dmZhUkBAwHNuvjsXn6dV3gbmCPptTg9dzNzOonsSHwdvoPcIDHKh5L\nez1/KoJXq39OY9npcDczq4PEUOBk4BTgYV4f2JVB/nwEgw7YRrPT3TJmZjWSeA9wKfBnYMcIluZb\nUf98Q9XMrAqJdSXOBW4AvgHsX+RgB4e7mdmAJLqBB4C3AO+I4Iosuluazd0yZmZ9kNgIOBfYHTg2\ngptzLqkubrmbmVWQkMRhwIPA/5C01tsq2MEtdzOz10hsAUwHNgLGR3BPziU1zC13M+t4EmtKnAzc\nBfwKGNfOwQ5uuZtZh5PYkWR44/PArhEsybmkTLjlbmYdKV3b5WzgJuBCYN+yBDs43M2sA0m8H7gf\n2BR4ZwTfa4fhjfVwt4yZdYx0Qa+Tgc8BR0dwY84lNY3D3cw6gsR6wHdJFvbaJYI/5VxSU7lbxsxa\nSuItOXzmGGAeyU3TPcse7OBwN7MWktgJeEpipsRbW/SZ/wL8Bjg3gqMj+FsrPjdvDncza6UpwFkk\nLegFElOk5nQPSwyROAO4APhQBDOa8TlF5fXczawl0rVa/ghsHcGzEtuRBO9GwOQIbs/ws94CXAEM\nAz4RwVNZnbvVGs1Ot9zNrFWOBq6N4FmACBYA+wKnA5dLXCHxtsF+SDop6S6SlRw/2M7BPhhVw11S\nt6RFkhZLOrWP90dImi1pvqQHJU1qSqVm1rbSrpfjSSYLvSaCiOAnwLYkuxfdL/F5iWENfs7hwM3A\nFyL4fAQrB1d5+xqwW0bSEOAhkt+uj5P8NpwYEQsrjpkGrBURX5Q0Ij1+44hY2etc7pYx61ASHwNO\njBh4M2iJ0cD5wJbAlAh+WeP5hwHfAj4I/Gv6v4JSaFa3zDhgSUQsjYgVwFXAhF7H/BlYP/3z+sCz\nvYPdzDreFHq12vsSwWLgAODzwHSJayQ2G+jvpF05twGjgPeUKdgHo1q4jwSWVTxfnr5WaQawnaQn\ngPuAE7Mrz8zancT2wFbAdbUcn3bV3ACMBeYD90p8VeKf+jj3HiQ9CrNIWux/za7y9lZtCFItQ2m+\nBMyPiC5JWwG/lLRDRLzQ+8C0C2eVnojoqblSM2tXU4BvR7Cinr+Ujkc/XeKHwDkkQyenRnBDuozA\nZOArwKcimJ151TmR1AV0Dfo8VfrcdwWmRUR3+vyLwKsRcXbFMbOAMyJiTvr8FuDUiLi717nc527W\nYSTeDDwMjBnsqBWJD5IMnXwY+CvJTdiPRvDIoAstsGb1ud8NjJa0uaRhwMeB63sds4jkhiuSNgbG\nQLkvtpnV7EjghiyGI6Zb3W0P9ABPAruXPdgHo+okJkn7A+cBQ4CZEXGmpGMBImJ6OkLmMpKlM9cA\nzoyIH/VxHrfczTqIxBBgCXBwBHflXU+7ajQ7PUPVzJpCYgLwxQh2zbuWduYZqmZWNDUNf7TmcMvd\nzDKXrhvzK2CzCP6Rdz3tzC13MyuSycB3HOz5ccvdzDIlMRx4FNgugifyrqfdueVuZkVxBDDbwZ4v\n76FqZpmRWAM4ATg871o6nVvuZpal/Ulmj87Nu5BO53A3syxNAS6MqGldKmsi31A1s0xIjCHZiHrT\nTtmEuhV8Q9XM8nYCcKmDvRjccjezQZNYD3gM2CHidXtA2CC55W5mefoUcKuDvTg8FNLMBiUd/jgZ\nOCbvWmw1t9zNbLD2Bf5OcjPVCsLhbmaD5eGPBeQbqmbWMIktgXkkqz++nHc9ZeQbqmaWhxOAyxzs\nxeOWu5k1ROJNJMMfd45gac7llJZb7mbWaocCv3WwF5PD3czqJiG8jV6hOdzNrBFdgIBbc67D+uFw\nN7NGTAEu8vDH4vINVTOri8RmwL0kwx9fzLuesvMNVTNrleOBHzjYi80tdzOrmcTaJMMfd4tgSd71\ndAK33M2sFSYCdznYi8+rQppZTSQ+AJyGV39sCw53MxuQxNuB/wLGAlMjuCnnkqwG7pYxsz5JDJf4\nFvA7kuV8t4vgupzLsho53M3sdSTWlDgeeAhYlyTUvxnB33Muzergbhkze43EvsC5wDPAfhHMz7kk\na5DD3cyQGA2cA2wHnAz81LNP25u7Zcw6WNqvfg4wF/gtMDaC6xzs7c/hbtaB0n7144BFwPok/erf\ncL96ebhbxqzDSOxD0q/+HNDtfvVyqtpyl9QtaZGkxZJO7eeYLkm/l/SgpJ7MqzSzQZEYJrGrxM+A\nGcA0YC8He3kNuLaMpCEkw6H2BR4H7gImRsTCimOGA3OA/SJiuaQREfFMH+fy2jJmLSIxCti14vEu\nYAlwBXBBBH/LsTyrQ6PZWa1bZhywJCKWph9yFTABWFhxzCeBayJiOUBfwW5mzZMu5rUTrw/zYcAd\n6eMrwN0RvJBbkdZy1cJ9JLCs4vlyYJdex4wGhkq6DVgPOD8ifphdiWa2Srq93Va8Psi3BRaQBPm1\nwCnAox7x0tmqhXst3xxDgXcD+wDrAHMl3RERiwdbnJkl/eXAx4GDScL8b6xulV8J3BvBK/lVaEVU\nLdwfB0ZVPB9F0nqvtAx4JiJeAV6RdDuwA/CGcJc0reJpT0T01FuwWaeQ2Ag4jmRzjAXATOD4iDf8\nDFqJSOoi2aN2cOepckN1TZIbqvsATwB38sYbqtsAFwH7AWsB84CPR8Qfep3LN1TNaiDxTuBE4KPA\n1SQ3QB/ItyrLS1NuqEbESkmTgZuAIcDMiFgo6dj0/ekRsUjSbOB+4FVgRu9gN7OBSawBjAemkiyt\nezHw9giezrUwa1veZs8sRxLrApNIWup/JZlc9JMI/pFnXVYczRoKaWZNILE5MBk4AriVJOB/5xEu\nlhWvLWPWIhKSeJ/E1cA96cs7RXBQBHMc7JYlt9zNmkxiLeBjJP3pw4HzgSM8qciayeFu1iTpEgDH\nAkcBDwL/DsyK4NVcC7OO4G4ZswylXS97S1wD3EeynG5XBPtGcKOD3VrFLXezDEisDxwGnEAyJPhi\nYJK7XiwvDnezQZAYSxLoE4FbgM8Av/bNUcubw92sThJrkqyOegLJol0zgO29LIAVicPdrEYSGwNH\nk9wkfYyk6+UaTziyInK4m1Uh8W7g30iWB7ga+LB3MLKi82gZswFI7EmyttI9wJYRHO1gt3bglrtZ\nPyS2B34CTIzgV3nXY1YPt9yt0NKdh/L43M2BWcBnHezWjhzuVlgS44E/SGzd4s/diKQr5uwIftzK\nzzbLisPdiuwUkn0Cbk9vajZdugTvLJJldy9sxWeaNYPD3QpJYkeSjaAPJVkad7bEPk3+zGEkG0zP\nB77azM8yazaHuxXVicBFEayI4FrgIOBKiYOb8WHpTkjfA14m2afUM0ytrXknJisciX8GFgJbRfBc\nxes7AD8Hzorgogw/TyQ7IO0EfDCCV7I6t9lgeScmK5PjgB9XBjtABPdJ7AHclP4C+GpGLexTSTaB\n39PBbmXhlrsVSrqxxWPA3hH0udF6OpplFsmSusdFsHIQn/dp4GvAbhE80eh5zJql0ex0n7sVzSeA\n+f0FO0AETwN7AaOAqyXWbuSDJA4EzgD2c7Bb2TjcrTDSvu+pwHnVjo3gReDDwEvAzRIb1vlZ7wNm\nAhMieKiBcs0KzeFuRbInsDZwcy0Hp6sxHgbcTTIWfmQtf0/iHcA1wCER3NlgrWaF5nC3IpkKnF/P\nVnTpsZ8DLgfmSGwz0PESmwG/AE6KqO2XiFk78g1VKwSJLYE7gc0ieKnBc0wCziLpapnXx/sjgN8C\n346o3vVjVgS+oWrtbjIws9FgB4jge8BRwI0S3ZXvpcsK/By4zsFuncAtd8tdurn0o8COEfwpg/O9\nF7gOODmCyyWGAjcATwBHevaptRNPYrJ2Ngm4JYtgB4hgrsTewC/SrfHeBawAjnGwW6dwy91yla7p\n8kfg8Ah+l/G5RwGzgeeBD0TwcpbnN2sFt9ytXR0APAfMzfrEESyT2BmICP6W9fnNiszhbnmbCpzX\nrO4SrxVjncqjZSw36R6l2wBX512LWdk43C1PJwIXpzNNzSxDvqFquUhXdvwjMDqCZ/Kux6yoPInJ\n2s2xwNUOdrPmcMvdWi7dq3Qpya5HD+ZcjlmhNa3lLqlb0iJJiyWdOsBx75G0UtJH6i3COs7BwAIH\nu1nzDBjukoYAFwHdwFhgoqRt+znubJIJI26dW7/qWbPdzBpXreU+DlgSEUsjYgVwFTChj+OmkAxn\nezrj+qx8dgPWJ1l218yapFq4jwSWVTxfnr72GkkjSQL/kvQlr91hA5kKXFDPmu1mVr9q4V5LUJ8H\nfCGSO7PC3TLWj3SjjL2B7+VcilnpVVt+4HGSTYhXGUXSeq+0E3CVJIARwP6SVkTE9b1PJmlaxdOe\niOipt2Bra5OBy9L9T82sD5K6gK5Bn2egoZCS1gQeAvYhWQv7TmBiRCzs5/jLgBsi4to+3vNQyA6W\nbpaxFNg5gqX5VmPWPpqyKmRErJQ0GbgJGALMjIiFko5N35/eULXWiQ4Hfu1gN2sNT2KypkvXbF8I\nHBXBb/Kux6ydePkBK7Ju4EWSzanNrAUc7tYKTV2z3czeyN0y1lQS2wG/BLaI4O9512PWbtwtY0X1\nWeDbDnaz1nLL3ZpG4i3AEmBMBE/lXY9ZO3LL3YroGOA6B7tZ63mD7BKS2B84juSXdy2PIf28HsCr\nAzz+r8r7u5MsN2BmLeZumZKRWB9YBHwNeJKBw3eggF71jVHrL4i+Hs9FMKeZ/16zsms0Ox3uJSPx\nTeDNERyZdy1mNnhNWX7A2ovEGGAS8I6cSzGznPmGakmkOxydB5wZwZN512Nm+XK4l8cBwBYk2yKa\nWYdzt0wJSKwFnAtMieAfeddjZvlzy70cpgILI5iddyFmVgweLdPmJN4G3A/sGsGSvOsxs2x5hmrn\nOguY4WA3s0ruc29jEu8l2QJxm7xrMbNiccu9TaW7G10AnBrBC3nXY2bF4nBvX5OAFcAVOddhZgXk\nG6ptSGIDkvVjPhzB3XnXY2bN47VlOojEOcAGERyVdy1m1lxeW6ZDSGwDHA5sl3ctZlZc7nNvIxXr\nx3zdG2CY2UAc7u3lQ8CmeP0YM6vC3TJtomL9mM9EsCLvesys2Nxybx8nAQsiuDnvQsys+Dxapg1I\njATuA3aJ4OG86zGz1vHaMuV2FvAdB7uZ1cp97gUnsRuwF14/xszq4JZ7gUkMYfX6MS/mXY+ZtQ+H\ne7EdAfwd+FHehZhZe/EN1YKSGE6yfsz4CO7Nux4zy4fXlikZiXOBN0VwTN61mFl+vLZMiUiMBQ4F\nxuZdi5m1J4d7hiSGkiwNMBx4CXi54uvLfbzW39fzgP+M4OkW/xPMrCQc7tn6NElr+2JgHeBNFV83\n7vV8nT6OWfV1AfDfLa7dzEqkpj53Sd0krckhwKURcXav9w8BTgEEvAAcHxH39zqm1H3uEm8CFpNs\noHFP3vWYWTk0bYaqpCEkXQ3dJK3SiZK27XXYI8CeEbE9cDrwnXoLKYGpwK8d7GZWBLV0y4wDlkTE\nUgBJVwETgIWrDoiIuRXHzwM2ybDGwpMYQbKw1y5512JmBrVNYhoJLKt4vjx9rT9HArMGU1Qb+jJw\npdd+MbOiqKXlXvNAeEl7kdxU3L2f96dVPO2JiJ5az11UElsAh+Fhi2aWAUldQNdgz1NLuD8OjKp4\nPoqk9d67oO2BGUB3RPylrxNFxLQGaiy6/wAu9LZ3ZpaFtNHbs+q5pNMaOU/V0TKS1gQeAvYBngDu\nBCZGxMKKYzYFbgUOjYg7+jlP6UbLSLwL+AXw9gheyLseMyufps1QjYiVkiYDN5EMhZwZEQslHZu+\nPx34GrAhcIkkgBURMa7eYtrQmSSTjRzsZlYoXlumQRJ7kwz5HBvBP/Kux8zKyTsxtZCEgLOBLzvY\nzayIHO6N+RjJbNyf5F2ImVlfvLZMndLFwb4OHBfBq3nXY2bWF7fc63cU8EgEt+RdiJlZf3xDtQ4S\n6wJ/BA6I4Pd512Nm5ecbqq1xEnCbg93Mis4t9xpJbESyWNq4CB7Jux4z6wzeQ7XJJM4HiODEvGsx\ns87hPVSbSGJL4BCg9zr2ZmaF5D732pwOnO89Tc2sXbhbpgqJHYGfkywO9mLe9ZhZZ/FomeY5Czjd\nwW5m7cThPgCJfYEtgUvzrsXMrB4O935IrEHSav9SBCvyrsfMrB4O9/4dBLwKXJ13IWZm9fJQyD5I\nDAPOAI6OqH0PWTOzoihFy11iQ4lrJJZJ/EBikvS6fV/rdTSwOILbsqrRzKyV2j7cJXYG7gH+BHwA\nmAPsD9wrsVji2xIHp8sH1HK+9YCvAF9oVs1mZs3WtuPc092QPgOcBhwfwTW93l8DeAfJxt77AHsA\nS4FbSDbzvj2C/+3jvKcBW0dwWFa1mpk1qqPWlklb1zOAMcBBESyp4e8MBXYG9iYJ+3HAA6wO+98B\nGwALgJ0jWJpFrWZmg9Ex4S6xPcn2dj3A1AheafA8awO7kQT93iSt/KeBn0Zw0mDrNDPLQkeEu8QR\nwDeAkyK4PJvKXjv3BiRhP6ev7hozszyUOtwl1gEuJulKOSiCP2RanJlZQZV2bRmJMcA8kjH54xzs\nZmbVFTrcJT4B/Ba4ADg8gpdyLsnMrC0UcoaqxFrAuSTj1j8QwfycSzIzayuFa7mnux7NAd5KMiTR\nwW5mVqdChbvEBGAu8AOSG6d/zbkkM7O2VIhumXSC0ZnAx4ADI5iXc0lmZm2tEOEOjEofO0XwbN7F\nmJm1u7YY525m1qlKO87dzMzq53A3Myshh7uZWQk53M3MSqhquEvqlrRI0mJJp/ZzzAXp+/dJ2jH7\nMs3MrB4DhrukIcBFQDcwFpgoadtex4wHto6I0cAxwCVNqrU0JHXlXUNR+Fqs5muxmq/F4FVruY8D\nlkTE0ohYAVwFTOh1zIHA9wEiYh4wXNLGmVdaLl15F1AgXXkXUCBdeRdQIF15F9DuqoX7SGBZxfPl\n6WvVjtlk8KWZmVmjqoV7rTOceg+wb83MKDMz61O15QceJ1kWYJVRJC3zgY7ZJH3tDSQ59FOSTsu7\nhqLwtVjN12I1X4vBqRbudwOjJW0OPAF8HJjY65jrgcnAVZJ2BZ6PiCd7n8hLD5iZtc6A4R4RKyVN\nBm4ChgAzI2KhpGPT96dHxCxJ4yUtAV4Cjmh61WZmNqCWLRxmZmatk/kMVU96Wq3atZB0SHoN7pc0\nR9L2edTZCrV8X6THvUfSSkkfaWV9rVLjz0eXpN9LelBST4tLbJkafj5GSJotaX56LSblUGZLSPqu\npCclPTDAMfXlZkRk9iDpulkCbA4MBeYD2/Y6ZjwwK/3zLsAdWdZQlEeN1+K9wAbpn7s7+VpUHHcr\ncCPw0bzrzul7YjiwANgkfT4i77pzvBbTgDNXXQfgWWDNvGtv0vXYA9gReKCf9+vOzaxb7p70tFrV\naxERcyNi1VaC8yjv/IBavi8ApgBXA0+3srgWquU6fBK4JiKWA0TEMy2usVVquRZ/BtZP/7w+8GxE\nrGxhjS0TEb8B/jLAIXXnZtbh7klPq9VyLSodCcxqakX5qXotJI0k+eFetXxFGW8G1fI9MRp4s6Tb\nJN0t6bCWVddatVyLGcB2kp4A7gNObFFtRVR3bma9zZ4nPa1W879J0l7Ap4Hdm1dOrmq5FucBX4iI\nkCTe+D1SBrVch6HAu4F9gHWAuZLuiIjFTa2s9Wq5Fl8C5kdEl6StgF9K2iEiXmhybUVVV25mHe6Z\nTnpqc7VcC9KbqDOA7ogY6L9l7ayWa7ETyVwJSPpX95e0IiKub02JLVHLdVgGPBMRrwCvSLod2AEo\nW7jXci12A84AiIiHJT0KjCGZf9Np6s7NrLtlXpv0JGkYyaSn3j+c1wOHAww06akEql4LSZsC1wKH\nRsSSHGpslarXIiK2jIgtImILkn7340sW7FDbz8fPgPdJGiJpHZKbZ39ocZ2tUMu1WATsC5D2L48B\nHmlplcVRd25m2nIPT3p6TS3XAvgasCFwSdpiXRER4/KquVlqvBalV+PPxyJJs4H7gVeBGRFRunCv\n8Xvi68Blku4jaYieEhHP5VZ0E0m6Eng/MELSMuA0ki66hnPTk5jMzErI2+yZmZWQw93MrIQc7mZm\nJeRwNzMrIYe7mVkJOdzNzErI4W5mVkIOdzOzEvp/phKGqGjNysYAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10cb98fd0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"clf = LogisticRegression(C=1.0)\n",
"clf.fit(X, y_train)\n",
"preds = clf.predict_proba(X_test)[:, 1]\n",
"plot_calibration_curve(y_test, preds)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Заодно обратим внимание на то, как часто классификатор выдает те или иные вероятности."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEACAYAAAC9Gb03AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFGRJREFUeJzt3X+s3fV93/HnCwhNkzZhiMkY21u8Faq4UgPr6nVJply2\n1gLUAdmk4HRJ2YYmVhaCIq0b5I/mptVIFim0miZQFUjiZouL1azMrGmDoVhLpwbEZBPCjRd7wRKX\nwqVNadaSVrLHe3+c750P7vU9554f91x//HxIX/n7/Zzvj/f9yud1P/dzvt/zTVUhSWrXebMuQJI0\nXQa9JDXOoJekxhn0ktQ4g16SGmfQS1Ljhgr6JOcnOZTk4W55Psli13YoybV9696V5GiSI0l2Tatw\nSdJwLhhyvTuABeAHu+UC7qmqe/pXSrIDuAnYAWwBHk1yRVW9NqF6JUlrNLBHn2QrcB1wP5Dl5r75\nfjcAe6vqRFUdB44BOydTqiRpFMMM3fwy8PNAf6+8gNuTPJ3kgSQXde2XAYt96y3S69lLkmZk1aBP\n8tPAy1V1iNf34O8DtgNXAi8Cn15lN37HgiTN0KAx+ncC1ye5Dngj8JYkv1ZVP7u8QpL7gYe7xReA\nbX3bb+3aXieJ4S9JI6iqlYbNV5Vhv9QsyXuAf11V/zDJ5qp6sWv/CPDjVfUz3YexX6Q3Lr8FeBT4\noTrtIElqlGJblGS+quZnXcdG4Lk4xXNxiufilFGzc9irbqA3dLMc2J9K8o5u+TngVoCqWkiyj94V\nOieB204P+b6CP7zWYk/zaFUtjLkPSWre0EFfVQeBg938B1dZ727g7sF7/LlPDXvsv+wrwLf/Jb1f\nKJKkVaylRz9h937f6NvufhW+PblSZuvgrAvYQA7OuoAN5OCsC9hADs66gLOdX4EwY91fSsJz0c9z\ncYrnYnwGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiD\nXpIaZ9BLUuMMeklq3FBBn+T8JIeSPNwtX5zkQJJvJXkkyUV9696V5GiSI0l2TatwSdJwhu3R30Hv\naU7LjwW8EzhQVVcAj3XLdM+MvQnYAVwD3JvEvxokaYYGhnCSrcB1wP30nhsLcD2wp5vfA9zYzd8A\n7K2qE1V1HDhG70HhkqQZGaa3/cvAzwOv9bVtqqqlbn4J2NTNXwYs9q23CGwZt0hJ0uhWfWZskp8G\nXq6qQ0nmVlqnqipJrfTa8iorN8/3zc91kyRpWZe7c+PuZ9DDwd8JXJ/kOuCNwFuSfAFYSnJpVb2U\nZDPwcrf+C8C2vu23dm0rmB+jbElqX/e83IPLy0k+Nsp+Vh26qaqPVtW2qtoO7AZ+t6o+COwHbu5W\nuxl4qJvfD+xOcmGS7cDlwJOjFCZJmoxBPfrTLQ/DfBLYl+QW4DjwPoCqWkiyj94VOieB26pqtWEd\nSdKUZRY53BvTH+e4u1+FBz9UVZ+fVE2StNElqarK4DVfz2vcJalxBr0kNc6gl6TGGfSS1DiDXpIa\nZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNWzXok7wx\nyRNJDidZSPKJrn0+yWKSQ910bd82dyU5muRIkl3T/gEkSatb9VGCVfUXSa6uqu8luQD4vSTvpvd4\nqHuq6p7+9ZPsAG4CdgBbgEeTXFFVr02pfknSAAOHbqrqe93shcD5wCvd8kqPs7oB2FtVJ6rqOHAM\n2DmBOiVJIxoY9EnOS3IYWAIer6pnu5duT/J0kgeSXNS1XQYs9m2+SK9nL0makVWHbgC6YZcrk7wV\n+EqSOeA+4Be7VX4J+DRwy5l2sXLzfN/8XDdJkpZ1eTs37n4GBv2yqvpukt8C/nZVHewr5H7g4W7x\nBWBb32Zbu7YVzK+tUkk6x3RZe3B5OcnHRtnPoKtuLlkelkny/cBPAYeSXNq32nuBZ7r5/cDuJBcm\n2Q5cDjw5SmGSpMkY1KPfDOxJch69XwpfqKrHkvxakivpDcs8B9wKUFULSfYBC8BJ4LaqOsPQjSRp\nPWQWOZykzjh0P5Tdr8KDH6qqz0+qJkna6JJUVa10xeOqvDNWkhpn0EtS4wx6SWqcQS9JjTPoJalx\nBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktS4QY8SfGOS\nJ5IcTrKQ5BNd+8VJDiT5VpJHlh832L12V5KjSY4k2TXtH0CStLpVg76q/gK4uqquBH4UuDrJu4E7\ngQNVdQXwWLdMkh3ATcAO4Brg3u4xhJKkGRkYwlX1vW72QuB84BXgemBP174HuLGbvwHYW1Unquo4\ncAzYOcmCJUlrMzDok5yX5DCwBDxeVc8Cm6pqqVtlCdjUzV8GLPZtvghsmWC9kqQ1umDQClX1GnBl\nkrcCX0ly9WmvV+9h32fexcrN833zc90kSW0akJNTNTDol1XVd5P8FvBjwFKSS6vqpSSbgZe71V4A\ntvVttrVrW8H8KPVK0lls3KzPSFsNuurmkuUrapJ8P/BTwCFgP3Bzt9rNwEPd/H5gd5ILk2wHLgee\nHKkySdJEDOrRbwb2dFfOnAd8oaoeS3II2JfkFuA48D6AqlpIsg9YAE4Ct1XVzP5ckSRBZpHDvbGq\ncY67+1V48ENV9flJ1SRJ0zR+7gGEqlrz+I3XuEtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS\n1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktS4YR4Ovi3J40meTfKNJB/u2ueT\nLCY51E3X9m1zV5KjSY4k2TXNH0CStLphnhl7AvhIVR1O8gPA/0xygN436N9TVff0r5xkB3ATsAPY\nAjya5IruIeOSpHU2sEdfVS9V1eFu/s+Ab9ILcFj5SbU3AHur6kRVHQeOATsnU64kaa3WNEaf5G3A\nVcDXuqbbkzyd5IHlh4gDlwGLfZstcuoXgyRpnQ0d9N2wzW8Ad3Q9+/uA7cCVwIvAp1fZ3AeES9KM\nDDNGT5I3AF8C/lNVPQRQVS/3vX4/8HC3+AKwrW/zrV3baeb75ue6SZJ0ysFuGs/AoE8S4AFgoap+\npa99c1W92C2+F3imm98PfDHJPfSGbC4HnvzLe54fp25JOgfM8fpO8MdH2sswPfp3AR8Avp7kUNf2\nUeD9Sa6kNyzzHHArQFUtJNkHLAAngduqyqEbSZqRgUFfVb/HymP5v73KNncDd49RlyRpQrwzVpIa\nZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXubA76zyWpcadZ/xCSNG1DfQXCxjVuTq/05ZuS1JazuUcv\nSRqCQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMGBn2SbUkeT/Jskm8k+XDXfnGSA0m+\nleSRJBf1bXNXkqNJjiTZNc0fQJK0umF69CeAj1TVjwA/AfyrJG8H7gQOVNUVwGPdMkl2ADcBO4Br\ngHuT+JeDJM3IwACuqpeq6nA3/2fAN+k99Pt6YE+32h7gxm7+BmBvVZ2oquPAMWDnhOuWJA1pTT3t\nJG8DrgKeADZV1VL30hKwqZu/DFjs22yR3i8GSdIMDB30SX4A+BJwR1X9af9rVVWs/g1jfkukJM3I\nUN9emeQN9EL+C1X1UNe8lOTSqnopyWbg5a79BWBb3+Zbu7bTzPfNz3WTJOmUg900nvQ646uskITe\nGPx3quojfe2f6tr+fZI7gYuq6s7uw9gv0huX3wI8CvxQ9R2o9z3w43Tyd78KD755El9TXFV+V7Gk\nqRs/92DUzBqmR/8u4APA15Mc6truAj4J7EtyC3AceB9AVS0k2QcsACeB22rQbxNJ0tQM7NFP5aD2\n6CWdY2bZo/f6dklqnEEvSY0z6CWpcWf5w8Elafp64+tnL4NekoYy/geps+LQjSQ1zqCXpMYZ9JLU\nOINekhpn0EtS4wx6SWqcQS9JjTPoJalx5/wNU+Pe8ea3X0ra6M75oB/vbjczXtLGZ9BLat7Z/l01\n4xo4Rp/ks0mWkjzT1zafZDHJoW66tu+1u5IcTXIkya5pFS5Ja1NjTGe3YT6M/RxwzWltBdxTVVd1\n028DdM+LvQnY0W1zbxI/8JWkGRoYwlX1VeCVFV5aaYD6BmBvVZ2oquPAMXoPCZckzcg4ve3bkzyd\n5IEkF3VtlwGLfessAlvGOIYkaUyjfhh7H/CL3fwvAZ8GbjnDumcY4Jrvm5/rJknSKQe7aTwjBX1V\nvbw8n+R+4OFu8QVgW9+qW7u2FcyPcmhJOofM8fpO8MdH2stIQzdJNvctvhdYviJnP7A7yYVJtgOX\nA0+OVJkkaSIG9uiT7AXeA1yS5HngY8BckivpDcs8B9wKUFULSfYBC8BJ4LaqOvuvTZKks1hmkcO9\nmxfGOe7uV+HBN0/mGY7j3RnrVyBIG9/4mTNuVkxuH6Nkjte4S1LjDHpJapxBL0mNM+glqXEGvSQ1\nzqCXpMYZ9JLUOINekhrnE6bGNIkn13jTlaRpMujHNom75SRpehy6kaTGGfSS1DiDXpIaZ9BLUuMM\neklqnFfdSNrQJnEJ87luYI8+yWeTLCV5pq/t4iQHknwrySNJLup77a4kR5McSbJrWoVLOpfUmNO5\nbZihm88B15zWdidwoKquAB7rlkmyA7gJ2NFtc28Sh4ckaYYGhnBVfRV45bTm64E93fwe4MZu/gZg\nb1WdqKrjwDFg52RKlSSNYtTe9qaqWurml4BN3fxlwGLfeovAlhGPIUmagLE/jK2qGvBhyRlem++b\nn+smSdIpB7tpPKMG/VKSS6vqpSSbgZe79heAbX3rbe3aVjA/4qEl6Vwxx+s7wR8faS+jDt3sB27u\n5m8GHupr353kwiTbgcuBJ0c8hiRpAgb26JPsBd4DXJLkeeAXgE8C+5LcAhwH3gdQVQtJ9gELwEng\ntqry2iZJmqHMIod7Y/rjHHf3q/DgmyfzFcHj7GPc7Xv78PvopTMbPy9go7zXZ5UX3hkraUWTuiPV\njszsGfSSVuGDdVpg0G8APo5Q0jQZ9BuCvSZJ02PQN2LWfxU4nittXAZ9MzbCXwUboYY2+ItTk2TQ\nSxtWG784/T752TPoJU1ZG7+wzmZ+V7wkNc4evdQwh00EBr02mFlfPdSecW/7VwsMem0wjudKk2bQ\nSxPmcIk2GoNezdkYwz/+ZaKNw6BXgwxZqZ9Br//PIQepTWMFfZLjwP8B/i9woqp2JrkYeBD463RP\nn6qqPxmzTq0Lr9CQWjTuDVMFzFXVVVW1s2u7EzhQVVcAj3XL0lklSY06zbp26XSTuDP29K7c9cCe\nbn4PcOMEjiGtsxpjkjaWSfToH03yVJJ/0bVtqqqlbn4J2DTmMSRJYxj3w9h3VdWLSf4qcCDJkf4X\nq2qVP2Xn++bnukmSdMrBbhrPWEFfVS92//5hkt8EdgJLSS6tqpeSbAZeXnnr+XEOLUnngDle3wn+\n+Eh7GXnoJsmbkvxgN/9mYBfwDLAfuLlb7WbgoVGPIUka3zg9+k3AbyZZ3s9/rqpHkjwF7EtyC93l\nlWNXKUka2chBX1XPAVeu0P7HwE+OU5QkaXJ88IgkNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEv\nSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNm0rQJ7kmyZEkR5P822kcQ5I0\nnIkHfZLzgf8IXAPsAN6f5O2TPo4kaTjT6NHvBI5V1fGqOgH8OnDDFI4jSRrCNIJ+C/B83/Ji1yZJ\nmoFxHg5+JjXcan//u6Mf4pnvG31bSTq3TCPoXwC29S1vo9erP83jbx3/UBl/F2PvYyPUsFH2sRFq\n2Cj72Ag1TGIfG6GGjbKPjVDDiEetGrIDPuwOkwuA/wX8A+APgCeB91fVNyd6IEnSUCbeo6+qk0k+\nBHwFOB94wJCXpNmZeI9ekrSxTPXO2GFunEryH7rXn05y1TTrmaVB5yLJP+nOwdeT/I8kPzqLOtfD\nsDfUJfnxJCeT/KP1rG89DfkemUtyKMk3khxc5xLXzRDvkUuS/E6Sw925+KczKHPqknw2yVKSZ1ZZ\nZ225WVVTmegN2xwD3ga8ATgMvP20da4DvtzN/x3ga9OqZ5bTkOfi7wJv7eavOZfPRd96vwv8N+Af\nz7ruGf6/uAh4FtjaLV8y67pneC7mgU8snwfgO8AFs659Cufi7wFXAc+c4fU15+Y0e/TD3Dh1PbAH\noKqeAC5KsmmKNc3KwHNRVb9fVcuXnD4BbF3nGtfLsDfU3Q78BvCH61ncOhvmXPwM8KWqWgSoqj9a\n5xrXyzDn4kXgLd38W4DvVNXJdaxxXVTVV4FXVlllzbk5zaAf5sapldZpMeDWehPZLcCXp1rR7Aw8\nF0m20HuT39c1tfpB0jD/Ly4HLk7yeJKnknxw3apbX8Oci88AP5LkD4CngTvWqbaNZs25OY3r6JcN\n++Y8/cLSFt/UQ/9MSa4G/jnwrumVM1PDnItfAe6sqkoSZnXx8fQNcy7eAPwtepcrvwn4/SRfq6qj\nU61s/Q1zLj4KHK6quSR/EziQ5B1V9adTrm0jWlNuTjPoh7lx6vR1tnZtrRnqJrLuA9jPANdU1Wp/\nup3NhjkXPwb8ei/juQS4NsmJqtq/PiWum2HOxfPAH1XVnwN/nuS/A+8AWgv6Yc7FO4F/B1BV/zvJ\nc8APA0+tS4Ubx5pzc5pDN08Blyd5W5ILgZuA09+o+4GfBUjyE8CfVNXSFGualYHnIslfA/4L8IGq\nOjaDGtfLwHNRVX+jqrZX1XZ64/Q/12DIw3Dvkf8KvDvJ+UneRO/Dt4V1rnM9DHMujgA/CdCNSf8w\n8O11rXJjWHNuTq1HX2e4cSrJrd3rv1pVX05yXZJjwKvAP5tWPbM0zLkAfgH4K8B9XU/2RFXtnFXN\n0zLkuTgnDPkeOZLkd4CvA68Bn6mq5oJ+yP8XdwOfS/I0vU7qv6mqP55Z0VOSZC/wHuCSJM8DH6M3\nhDdybnrDlCQ1zkcJSlLjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhr3/wBRsht0jwg0\nPQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x108ca49e8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.hist(preds, bins=20)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучим теперь классификатор с кусочно-линейной функцией потерь (hinge loss, как в SVM)."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.linear_model import SGDClassifier\n",
"clf = SGDClassifier(loss='hinge')\n",
"clf.fit(X, y_train)\n",
"preds = clf.decision_function(X_test)\n",
"preds = 1.0 / (1.0 + np.exp(-preds))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Изучите распределение ответов классификатора. Чем оно отличается от распределения у логистической регрессии? Чем вы можете объяснить это?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEACAYAAABbMHZzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEtFJREFUeJzt3X+s3Xddx/Hny5Uq1cFcZrqurVklnaEECUxXBAlVcKmL\nWRdNtqnMCo0x1h9ojLhiAuUPEaMi+McWBTaLkZqKZBYz55pBI5Ef5eco3NW1anX3Yu8EJvijzta9\n/eN+Sw/X9vbcc+495+x+no/kJJ/v5/v5fs/7fHvPq5/z/Z4fqSokSW35hnEXIEkaPcNfkhpk+EtS\ngwx/SWqQ4S9JDTL8JalBC4Z/knuSzCY5eoF1v5LkqSRX9vTtSXI8ybEkN/b0X5/kaLfu7Uv7ECRJ\ni3Wpmf+9wPb5nUk2Aj8I/FNP3xbgNmBLt81dSdKtvhvYVVWbgc1J/t8+JUmjs2D4V9WHgCcusOqt\nwOvm9e0A9lfVmao6CZwAtiZZB1xeVUe6ce8GbhmqaknSUBZ9zj/JDmC6qj47b9U1wHTP8jSw/gL9\nM12/JGlMVi1mcJI1wOuZO+Xzte4lrUiStOwWFf7Ac4BrgYe70/kbgE8m2crcjH5jz9gNzM34Z7p2\nb//MhXaexC8akqQBVNWiJuKLCv+qOgqsPbec5B+B66vqy0kOAu9J8lbmTutsBo5UVSX5avcfxBHg\nDuD3l+oBrFRJ9lbV3nHXMQk8Fud5LM7zWJw3yMT5Um/13A98GLguyWNJXj1vyNfusKqmgAPAFPBX\nwO46/5Whu4F3AseBE1X1wGILlSQtnQVn/lX1Y5dY/x3zlt8MvPkC4z4JPH+QAiVJS89P+E6uw+Mu\nYIIcHncBE+TwuAuYIIfHXcDTWSbpx1ySlOf8JWlxBslOZ/6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y\n/CWpQYa/JDVosV/stuySb77o9/5c2pmn4Mzrq+q/lq4iSVp5Ju5DXvA7Q+xhz//CmW+rqgv9AI0k\nrUiDfMhrAsN/mHrWPAmn1xn+klriJ3wlSX0x/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9J\napDhL0kNMvwlqUGGvyQ1aMHwT3JPktkkR3v6fjvJI0keTvK+JM/uWbcnyfEkx5Lc2NN/fZKj3bq3\nL89DkST161Iz/3uB7fP6HgSeV1UvAB4F9gAk2QLcBmzptrkrybkvGrob2FVVm4HNSebvU5I0QguG\nf1V9CHhiXt+hqnqqW/wYsKFr7wD2V9WZqjoJnAC2JlkHXF5VR7px7wZuWaL6JUkDGPac/2uA+7v2\nNcB0z7ppYP0F+me6fknSmAz8S15Jfh34n6p6zxLWA+ztaW/rbpKkc5JsY8hwHCj8k/wUcBPwip7u\nGWBjz/IG5mb8M5w/NXSuf+bie987SEmS1IyqOgwcPrec5I2L3ceiT/t0F2t/FdhRVf/ds+ogcHuS\n1Uk2AZuBI1V1Cvhqkq3dBeA7gPsWe7+SpKWz4Mw/yX7g5cBVSR4D3sjcu3tWA4e6N/N8pKp2V9VU\nkgPAFHAW2F3nfyNyN/BHwDOB+6vqgeV4MJKk/vgbvpL0NOdv+EqS+mL4S1KDDH9JapDhL0kNMvwl\nqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5Ia\nZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBi0Y/knuSTKb5GhP35VJDiV5NMmDSa7oWbcnyfEk\nx5Lc2NN/fZKj3bq3L89DkST161Iz/3uB7fP67gQOVdV1wEPdMkm2ALcBW7pt7kqSbpu7gV1VtRnY\nnGT+PiVJI7Rg+FfVh4An5nXfDOzr2vuAW7r2DmB/VZ2pqpPACWBrknXA5VV1pBv37p5tJEljMMg5\n/7VVNdu1Z4G1XfsaYLpn3DSw/gL9M12/JGlMVg2zcVVVklqqYubs7Wlv626SpHOSbGPIcBwk/GeT\nXF1Vp7pTOo93/TPAxp5xG5ib8c907d7+mYvvfu8AJUlSO6rqMHD43HKSNy52H4Oc9jkI7OzaO4H7\nevpvT7I6ySZgM3Ckqk4BX02ytbsAfEfPNpKkMVhw5p9kP/By4KokjwFvAN4CHEiyCzgJ3ApQVVNJ\nDgBTwFlgd1WdOyW0G/gj4JnA/VX1wNI/FElSv3I+n8dv7vrBMPWseRJOr6uq+e9QkqQVK0lVVS49\n8jw/4StJDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8Jek\nBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwYO/yR7\nknw+ydEk70nyjUmuTHIoyaNJHkxyxbzxx5McS3Lj0pQvSRrEQOGf5Frgp4EXVdXzgcuA24E7gUNV\ndR3wULdMki3AbcAWYDtwVxJfdUjSmAwawF8FzgBrkqwC1gBfAG4G9nVj9gG3dO0dwP6qOlNVJ4ET\nwA2DFi1JGs5A4V9VXwZ+F/hn5kL/36rqELC2qma7YbPA2q59DTDds4tpYP1AFUvSCpKkhr0Ncr+r\nBiz2OcAvAdcCXwH+LMmresdU1aWKusi6vT3tbd1Nklayxeb34e52zpsWfY8DhT/w3cCHq+pLAEne\nB3wvcCrJ1VV1Ksk64PFu/AywsWf7DV3fBewdsCRJasU2vn5ivPjwH/Sc/zHgxUmemSTAK4Ep4P3A\nzm7MTuC+rn0QuD3J6iSbgM3AkQHvW5I0pIFm/lX1cJJ3A58AngI+BfwhcDlwIMku4CRwazd+KskB\n5v6DOAvsrqqBzlNJkoaXScrguWsEw9Sz5kk4va6qnliyoiRpGQ2fewChqrKYLXyvvSQ1yPCXpAYZ\n/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEv\nSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNGjj8k1yR5L1JHkkylWRrkiuT\nHEryaJIHk1zRM35PkuNJjiW5cWnKlyQNYpiZ/9uB+6vqucB3AceAO4FDVXUd8FC3TJItwG3AFmA7\ncFcSX3VI0pgMFMBJng28rKruAaiqs1X1FeBmYF83bB9wS9feAeyvqjNVdRI4AdwwTOGSpMENOvve\nBPxrknuTfCrJO5J8M7C2qma7MbPA2q59DTDds/00sH7A+5YkDWnVENu9CPj5qvp4krfRneI5p6oq\nSS2wj4us29vT3tbdJEnnHe5ugxs0/KeB6ar6eLf8XmAPcCrJ1VV1Ksk64PFu/QywsWf7DV3fBewd\nsCRJasU2vn5i/KZF72Gg0z5VdQp4LMl1Xdcrgc8D7wd2dn07gfu69kHg9iSrk2wCNgNHBrlvSdLw\nBp35A/wC8CdJVgN/D7wauAw4kGQXcBK4FaCqppIcAKaAs8DuqlrolJAkaRllkjJ47hrBMPWseRJO\nr6uqJ5asKElaRsPnHkCoqixmC99rL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ\n4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+\nktQgw1+SGmT4S1KDhgr/JJcl+XSS93fLVyY5lOTRJA8muaJn7J4kx5McS3LjsIVLkgY37Mz/tcAU\nUN3yncChqroOeKhbJskW4DZgC7AduCuJrzokaUwGDuAkG4CbgHcC6bpvBvZ17X3ALV17B7C/qs5U\n1UngBHDDoPctSRrOMLPv3wN+FXiqp29tVc127Vlgbde+BpjuGTcNrB/iviVJQ1g1yEZJfhh4vKo+\nnWTbhcZUVSWpC607N+TC3Xt72tu6myTpvMPdbXADhT/wEuDmJDcB3wQ8K8kfA7NJrq6qU0nWAY93\n42eAjT3bb+j6LmDvgCVJUiu28fUT4zcteg8DnfapqtdX1caq2gTcDnygqu4ADgI7u2E7gfu69kHg\n9iSrk2wCNgNHBrlvSdLwBp35z3fuFM5bgANJdgEngVsBqmoqyQHm3hl0FthdVQudEpIkLaNMUgbP\nXSMYpp41T8LpdVX1xJIVJUnLaPjcAwhVlUuPO8/32ktSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QG\nGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDh\nL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUoIHCP8nGJB9M8vkkn0vyi13/lUkOJXk0yYNJrujZZk+S40mO\nJblxqR6AJGnxBp35nwF+uaqeB7wY+LkkzwXuBA5V1XXAQ90ySbYAtwFbgO3AXUl81SFJYzJQAFfV\nqar6TNf+D+ARYD1wM7CvG7YPuKVr7wD2V9WZqjoJnABuGKJuSdIQhp59J7kWeCHwMWBtVc12q2aB\ntV37GmC6Z7Np5v6zkCSNwaphNk7yLcCfA6+tqn9P8rV1VVVJaoHNL7Jub097W3eTJJ13uLsNbuDw\nT/IM5oL/j6vqvq57NsnVVXUqyTrg8a5/BtjYs/mGru8C9g5akiQ1YhtfPzF+06L3MOi7fQK8C5iq\nqrf1rDoI7OzaO4H7evpvT7I6ySZgM3BkkPuWJA1v0Jn/S4FXAZ9N8umubw/wFuBAkl3ASeBWgKqa\nSnIAmALOAruraqFTQpKkZZRJyuC5awTD1LPmSTi9rqqeWLKiJGkZDZ97AKGqculx5/lee0lqkOEv\nSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLU\nIMNfkhpk+EtSg4b6AfcJ9eXeH5IfxGJ/FEFSm+Z+iOXpaSWGP8P9Ko65L2kxhv8VrnHwtI8kNcjw\nl6QGGf6S1KAVes5/vJbqIpAXnqXl83S+WLsURhr+SbYDbwMuA95ZVb81yvsfreEvAk3CH6f/AWlS\nLc3z4+l5sXYppGo0+ZLkMuDvgFcCM8DHgR+rqkd6xtRw/xhrnoTT3zgZ7/ZZij+qce8jExH+SbZV\n1eFx1zGslfKKcFIex/B5MQnPsaXbx2KP5yhn/jcAJ6rqJECSPwV2AI8stNF4tDsbmG8SXn10VkTg\nrZRXhCvncbRrlOG/HnisZ3ka2DrC+9dAJmNWMxlBMSmTgkmY7S6FSamjTaMM/z7/pX/gK4PfxZOX\nD76tJttKCTxpMowy/GeAjT3LG5mb/c/zwWcPf1fDPlGX4om+UvYxCTUsxT4moYZJ2cck1DAp+5iE\nGpZqH4u8xxFe8F3F3AXfVwBfAI4w74KvJGk0Rjbzr6qzSX4e+Gvm3ur5LoNfksZjZDN/SdLkGMvX\nOyTZnuRYkuNJfu0iY36/W/9wkheOusZRudSxSPIT3TH4bJK/TfJd46hzufXzN9GN+54kZ5P8yCjr\nG6U+nx/bknw6yeeSHB5xiSPTx/PjqiQPJPlMdyx+agxljkSSe5LMJjm6wJj+c7OqRnpj7pTPCeBa\n4BnAZ4DnzhtzE3B/194KfHTUdU7Qsfhe4Nlde/tKPBb9HIeecR8A/hL40XHXPca/iSuAzwMbuuWr\nxl33GI/FXuA3zx0H4EvAqnHXvkzH42XAC4GjF1m/qNwcx8z/ax/2qqozwLkPe/W6GdgHUFUfA65I\nsna0ZY7EJY9FVX2kqs69/fVjwIYR1zgK/fxNAPwC8F7gX0dZ3Ij1cyx+HPjzqpoGqKovjrjGUenn\nWPwL8Kyu/SzgS1V1doQ1jkxVfQh4YoEhi8rNcYT/hT7stb6PMSsx9Po5Fr12Afcva0XjccnjkGQ9\nc0/8u7uulXqxqp+/ic3AlUk+mOQTSe4YWXWj1c+xeAfwvCRfAB4GXjui2ibRonJzHN/q2e+Tdv4b\nX1fik73vx5Tk+4HXAC9dvnLGpp/j8DbgzqqqzP1O50r91FU/x+IZwIuYe9v0GuAjST5aVceXtbLR\n6+dYvB74TFVtS/Ic4FCSF1TVvy9zbZOq79wcR/j382Gv+WM2dH0rTV8ffOsu8r4D2F5VC73se7rq\n5zhcD/xp9/vMVwE/lORMVR0cTYkj08+xeAz4YlWdBk4n+RvgBcBKC/9+jsVLgN8AqKq/T/KPwHcC\nnxhJhZNlUbk5jtM+nwA2J7k2yWrgNmD+E/gg8JMASV4M/FtVzY62zJG45LFI8u3A+4BXVdWJMdQ4\nCpc8DlX1HVW1qao2MXfe/2dXYPBDf8+PvwC+L8llSdYwd3FvasR1jkI/x+IYc98UTHd++zuBfxhp\nlZNjUbk58pl/XeTDXkl+plv/B1V1f5KbkpwA/hN49ajrHIV+jgXwBuBbgbu7We+ZqrphXDUvhz6P\nQxP6fH4cS/IA8FngKeAdVbXiwr/Pv4s3A/cmeZi5yezrqurLYyt6GSXZD7wcuCrJY8AbmTsFOFBu\n+iEvSWqQv+ErSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JatD/AfTRbl5Vxx6zAAAA\nAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x108c7b748>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение ответов отличается от распределения ответов для логисчтической регрессии.\n",
"Данный классификатор почти всегда выдает ответы близкие к 0 или к 1.\n",
"Это можно объяснить тем, что данный метод меньше штрафует за большую уверенность в неправильном\n",
"ответе, чем логистическая регрессия (потому что hinge loss меньше логистической функции при \n",
"больших отрицательных значениях). Поэтому данный метод реже \"сомневается\" в своих предсказаниях.\n"
]
}
],
"source": [
"plt.hist(preds, bins=20)\n",
"plt.show()\n",
"print('Распределение ответов отличается от распределения ответов для логисчтической регрессии.')\n",
"print('Данный классификатор почти всегда выдает ответы близкие к 0 или к 1.')\n",
"print('Это можно объяснить тем, что данный метод меньше штрафует за большую уверенность в неправильном')\n",
"print('ответе, чем логистическая регрессия (потому что hinge loss меньше логистической функции при ')\n",
"print('больших отрицательных значениях). Поэтому данный метод реже \"сомневается\" в своих предсказаниях.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Нарисуем калибровочную кривую. Видите ли вы какие-либо проблемы в ней?"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYHVW57/Hv7ySgMggqChqiQYwQQBJAgqIeW+HRFuGi\nOGAcEMdcFVHUI+r1SpzFowyCIiIqDkecgqAC8Rylj4BMkRCmBBMg5yaAKKMgiIm894+qJjud7r1r\n713Trv59nqefdO1dXfWmkn579VrrXUsRgZmZNcu/VB2AmZnlz8ndzKyBnNzNzBrIyd3MrIGc3M3M\nGsjJ3cysgTomd0nDkpZLWiHp6HHef5yksyQtlXSZpF2LCdXMzLJqm9wlTQFOBoaBXYB5kmaNOe1j\nwJURMRs4DDixiEDNzCy7Ti33ucDKiFgVEWuBM4GDx5wzC7gAICJuAGZIemLukZqZWWadkvs0YHXL\n8Zr0tVZLgUMAJM0FngZsn1eAZmbWvU7JPcvaBF8Atpa0BDgCWAL8s9/AzMysd1M7vH8LML3leDpJ\n6/0REXEf8NbRY0k3AzeNvZAkL2JjZtaDiFC3X9MpuS8GZkqaAdwKHArMaz1B0lbAgxHxD0nvAP47\nIu7PK8AmkrQgIhZUHUcd+Fms52exnp/Fer02jNsm94hYJ+kIYBEwBTg9IpZJmp++fyrJLJrvpAFc\nC7ytl0DMzCw/nVruRMR5wHljXju15fNLgJ3yD83MzHrlCtVqjFQdQI2MVB1AjYxUHUCNjFQdwKBT\nWZt1SAr3uZuZdafX3OmWu5lZAzm5m5k1kJO7mVkDObmbmTWQk7uZWQM5uZuZNZCTu5lZAzm5m5k1\nkJO7mVkDObmbmTWQk7uZWQUktpT4bFHXd3I3M6vGs4Ghoi7u5G5mVo19gMuKuriTu5lZNZzczcwa\naC5weVEXd3I3MyuZxPbAJsCqou7h5G5mVr59gMsiKGy3pI7JXdKwpOWSVkg6epz3t5F0vqSrJF0r\n6fBCIjUza459KLBLBjokd0lTgJOBYWAXYJ6kWWNOOwJYEhFzSKb1fFlSx423zcwmsbkUOJgKnVvu\nc4GVEbEqItYCZwIHjznnNuCx6eePBe6MiHX5hmlm1gwSU4C9gCuKvE+nFvY0YHXL8RqSXydanQb8\nVtKtwJbAa/MLz8yscXYFbo3g7iJv0im5Z+ns/xhwVUQMSdoR+E9JsyPivrEnSlrQcjgSESOZIzUz\na4a2XTKShsihcrVTcr8FmN5yPJ2k9d5qX0jWR4iIGyXdDOwELB57sYhY0HOkZmbN0LZ4KW30jowe\nSzqml5t06nNfDMyUNEPSpsChwDljzlkO7J8GsS1JYr+pl2DMzCaBQitTR7VtuUfEOklHAIuAKcDp\nEbFM0vz0/VOBzwHflrSU5IfFhyPiroLjNjMbOBJbADsCVxd+r4jC5tBveCMpIkKl3MzMrIYkXgh8\nIYLnZv+a3nKnK1TNzMpTSpcMOLmbmZWp8MrUUU7uZmblccvdzKxJJKYBj6Kk2YRO7mZm5ZgLXF7k\nSpCtnNzNzMpRWn87OLmbmZWl8JUgW3meu5lZwdKVIO8GZkTQVZGn57mbWeEk3EDrzSzg9m4Tez+c\n3M0skzSxXy4xt+pYBlBpUyBHObmbWVazgGcDb686kAFUan87OLmbWXYHkqwK+2qJx1QdzIBxy93M\nausg4OskS4GP3W7TJiCxOTATWFrmfZ3czawjiScAs4ELgDOAN1cb0UDZE7g2gofKvKmTu5llcQDw\n2wj+DpwFPEfiKRXHNChK75IBJ3czy+Yg4BcAETwALATeUGlEg6PUytRRTu5m1pbEpsBLgF+1vHwG\n8GbPe8/ELXczq6UXADdE8KeW1y4CNiPpT7YJSDwZ2BxYWfa9ndzNrJODgF+2vhDBw8B38cBqJ6Wu\nBNmqY3KXNCxpuaQVko4e5/0PSVqSflwjaZ2krYsJ18zKlHa7PNLfPsZ3gXlpt42Nr5L+duiQ3CVN\nAU4GhoFdgHmSZrWeExFfiog9ImIP4KPASETcU1TAZlaqWcAmjDNHO4KbgGUkM2lsfKVXpo7q1HKf\nC6yMiFURsRY4k/bFC68HfphXcGZWuQOBX7bpVvCc9wlI/AuwN3VsuQPTgNUtx2vS1zYiaTPgpcDP\n8gnNzGpgoi6ZUT8BXiSxTUnxDJKdgTsiuKOKm0/t8H43gwAHARe165KRtKDlcCQiRrq4vpmVaExV\n6rgi+KvEr4B5wEllxTYgepoCKWkIGOr35p2S+y3A9Jbj6SSt9/G8jg5dMhGxIHNkZla11qrUds4A\nPoeT+1g99benjd6R0WNJx/Ry807dMouBmZJmSNoUOJRkVbgNSNoK+Ffg7F6CMLNa6tQlM+o3wHYS\nuxYcT24kTpE4vODbVFK8NKptco+IdcARwCLgeuBHEbFM0nxJ81tOfQWwKCIeLC5Us2pIvF3iSVXH\nUaYJqlLHFcE/ge8zIAOrEs8l6Ub6eLr9XRH32AzYCbiqiOtnisF7qJq1J/En4AsRnFB1LGWR2A/4\nXAT7ZDx/FkkL/qkRrCs0uD6k8/Z/T7J08btI/l1/XsB9ngecEMHe/V/Le6ia5U5iO2Bb4JCqYynZ\nRlWp7USwjGQ8br/CIsrH60jm7X8POA44qqD7VNolA07uZp3MJmnp7S6xbdXBlKFDVWo7tZ7znu4e\n9QXgqHT5hIXADIlnF3C7yipTRzm5m7U3m+Sb9Dwmz+5DE1aldnAmcIDEVvmHlIsPAFdEcCFA2n30\nFYppvbvlblZzc0gGxRYyebpmOlWljiuCO4HfAq8pJKo+pN1rHwDGro/1TWBYYvsc77UtsBWwIq9r\n9sLJ3ay92SQt2POAfSUmw6J4vXTJjKpr18xngG9FcGPrixHcS9L/fkSO9xpdCfLhHK/ZNSd3swmk\nfbRPB66P4H6SSs0Dq42qWFmqUjs4D9hJYsf8ouqPxBySf7fPTHDKV4C3S2yR0y0r728HJ3ezdnYF\n/hjBP9LjydA1k7UqdVzps/ohcFiuUfUoHRw+Dvhk2krfSLq65Qj5/cZR2UqQrZzczSY2mw2LUH4B\n7C+xeUXxlKGfLplRZwCHpasiVu0gkqmsp3U473jg/f0WNVW9EmSrOjx8s7qaQ8uMkQjuIvmmfWll\nERWom6rUDpYA95Nsz1eZ9O/zJeCDGQqrfg/cRf/dbs8E7ongz31ep29O7mYTGx1MbbUQeGUFsZRh\nvL1Su5bOsqnDwOq7gRsjOL/TiWnMx9P/tMjKp0COcnI3G0faVztecv858PKGbi3XVVVqBz8AXllV\nF1Y6MPwx4INdfNnPgKdL7NXHrWvR3w5O7mYTmQHcN3ajhQhuBZYDL6oiqKL0UZU6rghuAy6hut9y\njgF+EsH1Wb8ggrX0X9TklrtZzY0WL42nibNmdqa3qtR2KumakdiZZNXHBT18+TdJqmzH3XGuw30f\nQ7LX9JIe7ps7J3ez8Y3XJTPqLOAVRS0XW5GD6KEqtYOzgT2lDTb8KcO/A8dG8JduvzCCe+i9qGkO\nsCyCWix97uRuNr6x0yAfkVY53gbsW2pExcqtS2ZUOlf+p8Ab87xuOxL7k6yN08+uUCeSFDV1O15Q\nmy4ZcHI3m8gG0yDH0ZiumRyqUts5A3hz2qdfqPQ3qeOAD0fwUK/XSYuaLqT7LqVaVKaOcnI3GyNd\nP+aJsOE6JGMsBA4pI2mVoK+q1A4uIckzcwu49lhvBe4m6Tbr13EkRU3d5Ei33M1qbnfgmnT7uIlc\nBzwE7FlOSIU6kJy7ZEalffjfpeCBVYnHAp8CPpDTuMHFwL3AyzPe/4nA44Ebcrh3LpzczTbWbjAV\neCRpDXzXTI5Vqe18D3itxKMKvMdHgUUR/CGPi6X/vseRLBOcxVySteIrXQmyVcfkLmlY0nJJKySN\nXQt59JwhSUskXStpJPcozcrVbhpkq4FP7iRVqX/styq1nQj+B7iaZNA2dxIzgHeSFC3l6afAMyT2\nyHBurfrboUNylzQFOBkYJpm/OU/SrDHnbA18FTgoInYDXl1QrGZl6dhyTy0Gtkg3hx5Uuc+SmUCR\nc96/AJyYFpjlJi1qOolsRU21qUwd1anlPhdYGRGrImItyTZaY7caez3ws4hYAxARd2A2oCSmkjRk\nrul0bvor+FkMaOu9pSo1ryUH2vkZ8IK896GV2Bd4HskCYUU4DTiwXVFT+hznMkgtd2AasLrleE36\nWquZwOMlXSBpsaQ35RmgWcl2Atakm3NkMcgLiRVRlTqu9HmeTdIYzEU6k+V44GMRPJDXdVtFcDfw\nfeA9bU6bSbJURWFdW72Y2uH9LKPOm5DMGNgP2Ay4RNKlEbHR/oGSFrQcjkTESMY4zcqStUtm1EXA\n0ySelvYtD5IiqlLbOYNkkPL4nK43j6SB+oOcrjeRE4FLJT4bwd/GeT/XKZCShoChfq/TKbnfAhuU\nDk8nab23Wg3cEREPAg9K+h3JN8hGyT0iFvQeqlkpsg6mAhDBOolzSFrvJxQWVTEOAj5X4v1GgMdJ\nzI7o77cFic2AzwOvL3qGSgQ3SlxEsrvUKeOckmt/e9roHRk9lnRML9fp1C2zGJgpaYakTYFDgXPG\nnHM28HxJUyRtRvJTLPNKbGY1023LHQZw1kxalbo7xVSljitNwt8jn4HVDwCXRnBRDtfK4njgqAmK\nmmpVvDSqbcs9ItZJOgJYBEwBTo+IZZLmp++fGhHLJZ1PMtXpYeC0iHByt0E14ZoybfwG+IHEthHc\nXkBMRTgAuKCgqtR2zgCulHgN8GD68fcOn499bS3wfpLt7MpyIfBXkuf2yAC0xKNJ9tq9ssRYMlFE\nOd1tkiIimlCqbQ0lsR1J5ek23fZDS/yQJFl+o5DgcibxI+DXEZxewb23Bh4LPCb9eHQPn49E8J2S\n43498PYIXtzy2j7A1yMyzYXv8b695c5Ofe5mk8lsYGmPA4wLgbdB/ZN7S1Xq+6q4f7qs7j1V3LtP\nPwG+KDEn4pHf7mrZJQNefsCsVVeDqWOcB+ybtkrrrvCq1CaaoKipdpWpo5zczdbrZTAVeGQe9wUk\ni3DVXVlVqU30DeAgiaekx265mw2AflruMACzZkquSm2ctKjpP4D3pDOOnkSyp27tuM+9Yul83d1I\nWo1z0j/vBV4ZwT+qjG0ySfe/3AFY1sdlfgF8RWLzCYpd6qC0qtQGOxH4PckMmcUdloaujFvuJZGQ\nxFMkXibxUYkzJZYBdwBfJ9mybQXwcZIppQuqi3ZS2pWkH7rnH6gR3EXS//rS3KLKX9lVqY0TwQqS\n5P5FatolA265F0JiE5IWUmtrfDYgkl/7l5Ksn/1ZYHk6UNP69dcDSyXOi+DCMmOfxPrtkhk1utbM\nwhyuVYSyq1Kb6jiSKtLaJnfPc+9DWq02g6TV1/qxM/D/SJL4VS1/3pa1xSRxIMlyy7MjuDf34G0D\nEicBN0dwXJ/XeTLJXPnt6tatlvYR3wRsW0HxUqOkYxf/Dnwu/Y2twHv1ljud3DNI/yGfyvrkvRvr\nk/jdJN/MrR/X5tHnKvE1YMsIvNJmwSQuBI6J4Lc5XOti4FMRLOo/svxIvAl4VQSvqDoWy87JPQdp\nEt+ejVviuwD3kSZu1ifx64tsVaeDrVcCCyI4s6j7THbpv/s9wNMjuDOH630QeGYE8/sOLicSzycp\nwnlnhKdBDhIn9z5JTCeZHrYdGybw64Dr0ilQVcT1bOBcYK+IDdbWt5xI7AD8LmKDFVD7ud7TgUuA\np9RhJoXE4SSDf4dFcH7F4ViXvPxAHySeRZJATwCOq9NMgggWS5wAnCGxf5024G2QOeQ4NTCCmyRu\nI5kBVdmAuMQUkmVxDwFeGNHXNE8bMJN+KqTEi0lW9fu3CL5cp8Te4liSuclZd2K37vRcmdpGpQVN\nElsCPydZOXEfJ/bJZ1In93SVtzOB19a5Tzv91f5NwNESc6qOp4HymgbZaiFwSNqfXyqJGSTzsG8D\nXprHOIINnkmZ3NOCog+T7Jr+4oj1u57UVQSrSFruP0irKWtNYorE/25Zg6POimi5Xwc8RLIFZWnS\ngdNLgG8C8+s2HdPKM+mSe9oP+RXgjcC+EVxbcUjd+D5wDckPpbr7MPBB4BqJr0o8teqAxpOu4vhE\n4MY8r5t275XaNZMOnC4E3hLBiTXtYrSSTKrknrZ4f0IyvfEFERvtB1tr6Tfru4BXSvUtcU9n+BwF\nvBiYRTKNdInEN9KZKXWyO3BNQbNaSknu6W9JXyRZuuKFnhFjMImSe1qd918k23S9bFCrPtMpmYcD\n35LYpuJwNiKxOclu9EdGsDqCP0fwEeCZwO3AFRLflphZaaDr9bKtXlaLgS0kZhV0fQ+c2oQmRXJP\nW4sXAxcBb4zgoYpD6ktaRflD4LQqBuw6+DJw+dgB6gjujOD/As8AbgZ+L/EDiV2qCLJFrtMgW6XT\nVs+ioNa7B06tnY7JXdKwpOWSVkg6epz3hyTdK2lJ+vHxYkLtjcSeJEn95AiObtA88f8DPB14a9WB\njJI4iGRFxCMmOieCeyL4FLAjSbHYBRI/lti9pDDHKmIwtdXoQmK58sCpdRQRE34AU4CVJItjbULy\n6+usMecMAee0u056XnQ6J+8PiJdC/BnikLLvXdLfbzeIOyCeUYNYtoW4DeL5XX7dFhAfSr/2LIg9\nS4x5KsQDEFsUfI8/Qzwtx2senl5zuOp/d38U/9Fr7uzUcp8LrIyIVRGxlmRO+MHjnFe3roHRmQNn\nkGx6UdflV/sSyUyfTwPfl6qrNk67hr4FnB7BRd18bQT3R/Alkpb8CPALiV+mu8oXbSdgTSRb5BUi\ngnXAOeTQevfAqXWjU0KYBhusZ7IGNvqmC2BfSUuBW4APRcT1+YXYnTTRfJyku2Ioop5bYOXoJOAA\nkr/zgopieDfJdmOf7PUCETwAnChxKsm/3U/SzUw+FsEf8glzI0UUL41nIfBpqe/plu8AtiQZOHX/\nurXVKblnmSd7JTA9Ih6Q9DKSkftnjneipAUthyMRMZIlyKzS1uvXgGeTzGG/Lc/r11EED0u8hWSq\n4aIILinz/umA6CdJnvfaTud3Esk641+T+CbwduAciR2imD7lovvbR/0GeA3wzj6vcx3wiYKehdWE\npCGS7u7+rpP26Ux0k+cACyJiOD3+KPBwRBzb5mtuBvaKiLvGvB5R4KqQ6RS8H5GMDbw6gvuKulcd\nSRxCsvLfHmX93SU2JdmJ5pQIvlHQPX4DfDuC7xdw7UXASRHeLNrqq9fc2anPfTEwU9IMSZsCh5L0\nH7beeFtJSj+fS/IDo9CdSSZwJMneowdOtsQOkI4r/I5kZcuyfJpkx6nTCrzH8cBRBU35LKtbxqx0\nbZN7RKwjmda2CLge+FFELJM0X9LoRgSvBq6RdBVJYnldkQG3sTfwH3l0DQyw9wFDaSu+UBJDJIuZ\nvT2i0DL3c0n6mZ+f50UltiPplrwlz+ua1UVjNuuQ+B9g/0h2Jp+0JJ5LMu6xRwS3FnSPx5H0Vc+P\n4Lwi7jHmfu8B9ovI74eWxDDJMs/75XVNsyIU1S0zENIy/K3IefGnQZQOqJ4AXCKNO221L2n3yCnA\n2WUk9tQZwL+mOxzlpazBVLNKNCK5A3sBS6I51ad9ieDzwFuAYyXOlnhajpd/A/AsklUfS5HOQ/8W\n8N4cL1vkmjJmlWtSci9qLvRAimT9mdnAFcAfJP5NYpN+rpmuZXI88IYIHuw/yq6cBLxZ4rE5Xa+w\nNWXM6qBJyf3KqoOomwgeiuAzwHOA/YErJZ7Xy7XSdfC/B3wxovwWbySbg/+aHNbSSZd+3gG8gqI1\nV1OS+5645T6hCFYCwyRTF38scVq6BHI3PgKsJVn1sSonAEemP2j6sRtwg4uBrMkGPrmnSeoJMLln\nyXSSriX0Y2AX4EHgOok3Z5k/LrE3yTTLN1c5rhHBpSRrwvc7UOzBVGu8gU/uJK12D6ZmFMG9ERwJ\nHEgyQHlBu80k0srf7wNHpF0jVTseeH+f13DxkjVeE5K7B1N7EMFikkXgfgr8TuKzEpuNc+pxwGVp\nq78OFgIzJPbq4xpuuVvjOblPYhH8M4KTSZLdM0g2sx4efT+dJ/8S2my+UbZ0Cd2T6LH1LvEvJPum\nOrlbow18harETcDLw3tH9i3ddPurJDOPjgV+BbwqgosrDWyMtEL2JmDXbqtw00Ko/45geiHBmeVs\nUlaopt/kTwT+WHUsTRDBIpICpeUke3OeVrfEDo9sEv4D4D09fLm7ZGxSGOjkTjKYelUE/6w6kKaI\n4MEIPgE8DTim6njaOBF45wTjBO14MNUmhUFP7u5vL0gEf6rzDKR0gbhLgTd2+aVuuduk4ORug+x4\n4P1drvXuNWVsUnByt0F2AUnV7EuynCyxNbANXj3UJoGBTe7pN+p2wA1Vx2LVSDcJOQE4KuOX7A5c\nW+fuJrO8DGxyB/YAlnowddL7ITAn3ai7Ew+m2qQxyMndXTJGBH8n2TzkfRlO92CqTRpO7tYEXwde\nm+7I1Y5b7jZpdEzukoYlLZe0QtLRbc7bW9I6SYVvzpxycjcAIrgdOAuYP9E5ElOBWcA1ZcVlVqW2\nyV3SFOBkkrXAdwHmSdpoBcH0vGOB86GraWk9kdgKmEZSSWkGycDquyU2neD9nYDVEfytxJjMKtOp\n5T4XWBkRqyJiLXAm46+l/V6S1QX/knN8E9kDuDpdRMqMCK4m+WH/2glO8bZ6Nql0Su7TYIM1vNek\nrz1C0jSShH9K+lIZK5F55yUbz/HAURMUNXkw1SaVqR3ez5KoTwA+EhEhSbTplpG0oOVwJCJGMlx/\nPHsB/9Xj11pznUuy/vzzgQvHvDeHZD0as1qTNAQM9X2ddkv+SnoOsCAihtPjjwIPR8SxLefcxPqE\nvg3wAPCOiDhnzLVyW/JXSn79Tn8VN3uExHuA/SI4ZMzrtwN7RbCmmsjMetNr7uyU3KeSVIDuB9wK\nXA7Mi4hx106X9G3gFxGxMK8AN74OWwJ/ArZyn7uNJbEFsAqYG8FN6WvbAdcCT0yrWs0GRiHruUfE\nOpJdeBYB1wM/iohlkuZLmnDaWcH2ICkhd2K3jURwP3A6ySD/qDkk1cxO7DZpDNxOTBJHATMjeHcO\nYVkDSUwnGTydEcFfJY4Gto3gAxWHZta1ybQTk2fKWFsRrAZ+Dbw1fcmVqTbpDGJyd2WqZXE8cKTE\nFDwN0iahgUru6WDZDOC6ikOxmovgMuB24HXADuAN1G1y6TTPvW7mANdFsLbqQGwgHA98Bbghgn9U\nHYxZmQaq5Y67ZKw7C4F/4P52m4QGreW+J3BR1UHYYIhgXTq76r6qYzEr20BNhZS4FjgsgitzCsvM\nrNYKqVDNU7/JXWJzklUnt3b/qZlNFpNhnvts4HondjOzzgYpue8F7o4xM8ti0JK7Z8qYmWUwSMnd\nyw6YmWU0EAOqEo8B7gQeF8FD+UZmZlZfTR9QnQ0sd2I3M8tmUJK7+9vNzLowSMndM2XMzDIapOTu\nlruZWUa1H1CVeDRwF/D4CP6ef2RmZvXV5AHV3YE/OrGbmWXXMblLGpa0XNIKSUeP8/7BkpZKWiLp\nD5JenHOM7pIxM+tS224ZSVOAG4D9gVuAK4B5EbGs5ZzNI+Jv6efPAs6KiGeMc61eu2W+CSyJ4Kvd\nfq2Z2aArqltmLrAyIlZFxFrgTODg1hNGE3tqC+COboPowC13M7MudUru04DVLcdr0tc2IOkVkpYB\n5wFH5hVcOpi6M97c2MysK512Yso0lSYifg78XNILgO8BO413nqQFLYcjETHS4dK7ASsieDBLHGZm\ng07SEDDU73U6JfdbgOktx9NJWu/jiogLJU2V9ISIuHOc9xd0GZ+7ZMxsUkkbvSOjx5KO6eU6nbpl\nFgMzJc2QtClwKHBO6wmSdpSk9PM90+A2Suw9cnI3M+tB25Z7RKyTdASwCJgCnB4RyyTNT98/FXgV\ncJiktcD9wOtyjG8v4Ds5Xs/MbFKobYWqxKOAu4FtIniguMjMzOqriRWquwE3ObGbmXWvzsndOy+Z\nmfWozsndg6lmZj2qe3L3Gu5mZj2o5YCqxKbAPcCTIri/2MjMzOqraQOquwKrnNjNzHpT1+Tu/nYz\nsz7UNbl7poyZWR/qmtzdcjcz60PtBlQlNiEZTN0ugvuKj8zMrL6aNKC6C7Daid3MrHd1TO7ukjEz\n65OTu5lZA9UxuXumjJlZn2o1oCoxFbgXeHIEfy0lMDOzGmvKgOos4BYndjOz/tQtubu/3cwsB07u\nZmYNlCm5SxqWtFzSCklHj/P+GyQtlXS1pIsl7d5jPE7uZmY56DigKmkKcAOwP3ALcAUwLyKWtZzz\nXOD6iLhX0jCwICKeM+Y6bQcFJKaQDKZuH8E9vf6FzMyapMgB1bnAyohYFRFrgTOBg1tPiIhLIuLe\n9PAyYPtuAwF2Bv7kxG5m1r8syX0asLrleE362kTeBpzbQyzukjEzy8nUDOdknggv6UXAW4Hn9RCL\nk7uZWU6yJPdbgOktx9NJWu8bSAdRTwOGI+Lu8S4kaUHL4UhEjLQc7wWckyEeM7PGkjQEDPV9nQwD\nqlNJBlT3A24FLmfjAdWnAr8F3hgRl05wnQkHBdLB1HuAp0Yw7g8GM7PJqNcB1Y4t94hYJ+kIYBEw\nBTg9IpZJmp++fyrwCeBxwCmSANZGxNwu4ngm8GcndjOzfNRibRmJTUla7StLCcbMbED02nKvRXI3\nM7PxNWXhMDMzy4GTu5lZAzm5m5k1kJO7mVkDObmbmTWQk7uZWQM5uZuZNZCTu5lZAzm5m5k1kJO7\nmVkDObmbmTWQk7uZWQM5uZuZNZCTu5lZAzm5m5k1kJO7mVkDObmbmTWQk7uZWQNlSu6ShiUtl7RC\n0tHjvL+zpEsk/V3SB/MP08zMutExuUuaApwMDAO7APMkzRpz2p3Ae4Ev5R5hA0kaqjqGuvCzWM/P\nYj0/i/5labnPBVZGxKqIWAucCRzcekJE/CUiFgNrC4ixiYaqDqBGhqoOoEaGqg6gRoaqDmDQZUnu\n04DVLcdr0tfMzKymsiT3KDwKMzPL1dQM59wCTG85nk7Seu+aJP+gSEk6puoY6sLPYj0/i/X8LPqT\nJbkvBmb41DCKAAADdElEQVRKmgHcChwKzJvgXE10kYiY8D0zM8uXIjo3piW9DDgBmAKcHhGflzQf\nICJOlbQdcAXwWOBh4D5gl4i4v7DIzcxsQpmSu5mZDZbcK1Q7FTyl53wlfX+ppD3yjqEuMhR/vSF9\nBldLuljS7lXEWYYs/y/S8/aWtE7SIWXGV5aM3x9DkpZIulbSSMkhlibD98c2ks6XdFX6LA6vIMxS\nSPqWpNslXdPmnO7yZkTk9kHSbbMSmAFsAlwFzBpzzgHAuenn+wCX5hlDXT4yPovnAlulnw9P5mfR\nct5vgV8Cr6o67or+T2wNXAdsnx5vU3XcFT6LBcDnR58DSbHk1KpjL+h5vADYA7hmgve7zpt5t9w7\nFjwB/ws4AyAiLgO2lrRtznHUQZbir0si4t708DJg+5JjLEuW/xeQVDn/FPhLmcGVKMtzeD3ws4hY\nAxARd5QcY1myPIvbSMbxSP+8MyLWlRhjaSLiQuDuNqd0nTfzTu5ZCp7GO6eJSa3b4q+3AecWGlF1\nOj4LSdNIvrlPSV9q4mBQlv8TM4HHS7pA0mJJbyotunJleRanAbtKuhVYCryvpNjqqOu8mWUqZDey\nfkOOnRbZxG/kzH8nSS8C3go8r7hwKpXlWZwAfCQiQpJoM612gGV5DpsAewL7AZsBl0i6NCJWFBpZ\n+bI8i48BV0XEkKQdgf+UNDsi7is4trrqKm/mndyzFDyNPWf79LWmyVT8lQ6ingYMR0S7X8sGWZZn\nsRdwZpLX2QZ4maS1EXFOOSGWIstzWA3cEREPAg9K+h0wG2hacs/yLPYFPgsQETdKuhnYiaT2ZrLp\nOm/m3S3zSMGTpE1JCp7GfnOeAxwGIOk5wD0RcXvOcdRBx2ch6anAQuCNEbGyghjL0vFZRMTTI2KH\niNiBpN/9XQ1L7JDt++Ns4PmSpkjajGTw7PqS4yxDlmexHNgfIO1f3gm4qdQo66PrvJlryz0i1kk6\nAljE+oKnZa0FTxFxrqQDJK0E/ga8Jc8Y6iLLswA+ATwOOCVtsa6NiLlVxVyUjM+i8TJ+fyyXdD5w\nNUlB4GkR0bjknvH/xOeAb0taStIQ/XBE3FVZ0AWS9EPghcA2klYDx5B00fWcN13EZGbWQN5mz8ys\ngZzczcwayMndzKyBnNzNzBrIyd3MrIGc3M3MGsjJ3cysgZzczcwa6P8Djk2fXTFqQ4kAAAAASUVO\nRK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10be55f28>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Калибровочная гривая сильно отличается от диагональной, имеет пилообразный вид.\n"
]
}
],
"source": [
"plot_calibration_curve(y_test, preds)\n",
"plt.show()\n",
"print('Калибровочная гривая сильно отличается от диагональной, имеет пилообразный вид.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Попробуем откалибровать вероятности. Грубо говоря, это процедура, которая для каждого отрезка $[a_i, b_i]$ строит преобразование, которое корректирует вероятности в нем, приводя тем самым калибровочную кривую к диагональной форме.\n",
"\n",
"Воспользуйтесь классом sklearn.calibration.CalibratedClassifierCV для калибровки вероятностей на обучении, и постройте с его помощью предсказания для тестовой выборки. Нарисуйте для них калибровочную кривую. Улучшилась ли она?"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGklJREFUeJzt3Xu4HVV9xvHvSwJYKog0NmKIIhBuShBoEoQKB4ISqYrK\nzaBQ5ZJoiUitGkSBqEW81WKxjQFDWq0SFalEm4Ktmio0hHsASYAgwSQod4rV+JjAr3/MhJzs5Jy9\nzz6zZ82e/X6eJw/ZZ09m/xjOefllrTVrFBGYmVm9bJW6ADMzK57D3cyshhzuZmY15HA3M6shh7uZ\nWQ053M3MaqhpuEu6QtIjku4a5Jh/kHS/pKWSDii2RDMzG6pWOvd5wJSB3pR0DLBHRIwDpgGzC6rN\nzMza1DTcI+JnwFODHPIW4F/yY5cAO0oaXUx5ZmbWjiLG3McAq/q9Xg3sUsB5zcysTUVNqKrhtfc0\nMDNLaGQB51gDjO33epf8a5uQ5MA3M2tDRDQ20E0VEe4LgBnAfEkHA09HxCNbOrCdAutI0qyImJW6\njirwtdjI12KjOl8LiW2AHwP/FcGs5se31xg3DXdJVwKHA6MkrQIuBLYGiIg5EbFQ0jGSVgC/Bd7T\nTiFmZj3iEuAJ4JOd/JCm4R4RU1s4ZkYx5ZiZ1ZfE6cCRwMQInuvkZxUxLGNDtyh1ARWyKHUBFbIo\ndQEVsih1AUWTmARcDBwWwTMd/7yyHtYhKTzmbma9SOKlwM3AWREsGNqfbS87vbeMmVkH5ROoVwFz\nhxrsw/pcd+5mZp0j8U9kN3u+rZ1x9naz02PuZmYdUuYEaiOHu5lZB5Q9gdrIY+5mZgXLJ1CvAs6I\nYHmKGhzuZmYFSjWB2sjhbmalkRgpcbzE9qlr6aBS7kBtxuFuZmW6EPgisFLiCxKvSF1QkfpNoJ5S\n9gRqI4e7mZVC4gjgdGAicBDZVuG3SXxb4rVJiytAvwnUt6aYQN2sHq9zN7NOk3gJcDtwWgQ/7Pf1\n7YHTgA8AjwJ/D3w3gvVJCm3TcO5AbX7u9rLT4W5mHSUh4AfA3RHMHOCYEWSP7PxrYFfgUuDyCJ4u\nq852DXUL36Gf39sPmFk1nQOMAj4+0AERPBvBv0VwGPA2YH/gFxKXSuxRUp3tqsQEaiOHu5l1jMRB\nwEeBqRGsa+XPRHBrBO8C9gN+AyyW+J7E4fnfAiqjShOojTwsY2YdkY+n3wZ8PIJvDeM82wGnkv0N\n4Hdk4/LfiuAPhRTafl2TgO+T3YHasRuVPOZuZpWRd9hfB9ZGcGZB59wKmEI2Lr8ncAHwrxE8W8T5\nh1DHtsB7gY8BZ0ZwTWc/z2PuZlYdpwIHkq2CKUQEz0WwMILXAycD04DbJY4pY7hGYoTEKcBy4PXA\n5E4H+3C4czezQknsBVwPHBnBXR38HAFvBj5DtoxyZgRLOvQ5U/LP+W3+OT8r+nMG/nwPy5hZYhIv\nABYDcyL4SkmfORL4S+ATwI3AxyK4t6BzTwI+C4wmmxi+JoJyQvP5GjwsY2bpfQ54AJhT1gdGsD6C\nuWTj8DcD10vMlti53XNK7C3xXbINwL4O7BfB98oO9uFwuJtZISTeQnYj0pkpQjCC30XwWWAvsuGT\nuyU+JbFDq+eQGCNxGfAzYAmwZwRzu+2OWXC4m1kBJHYBLgdOjuCplLVE8GQEHyKb0B0L3Cdxdr7K\nZYskdpS4GLgTeIos1D8Xwdpyqi6ew93MhiUf8/4m8KUI/id1PRtE8FAE7yZb2XI0sEzi5HxJJZDN\nEUh8CLgPeAmwfwQzU/8PqgieUDWzYZGYBbwOeEPZa86HQqKPbHJ0G7LJ0Z3JJmFvBc6LYFm66gbm\n1TJmVjqJw4H5wEERPJy6nmbyZY1vBz4FPAacG8HitFUNzuFuZqWS+BPgDmBaBP+Rup66cribWWny\nDvga4L588tI6pN3sHNmJYsys9t5PNmZ9fOpCbMvcuZvZkEgcAPwQODiCB1LXU3e+Q9XMOk7ihWQT\nqGc72KvNnbuZtSTfV/2fgf+L4LTE5fQMd+5m1jESh5KtjHmObLzdKs4TqmY2oLxb/xTZ/ulnRXB1\n4pKsRe7czWyL+nXrLyPbFdHB3kXcuZvZJtyt10PTzl3SFEnLJd0vaeYW3h8l6VpJd0i6W9K7O1Kp\nmXWcu/X6GHS1jKQRwL3AUcAaso3wp0bEsn7HzAK2jYiPShqVHz86ItY3nMurZcwqKu/W/xZ4BzDD\noV4dnVotMxFYERErI2Id2frWYxuO+RU8vxn+DsATjcFuZtXVr1vfGRjvYK+HZmPuY4BV/V6vBiY1\nHHM58GNJDwPbAycWV56ZdYq79XprFu6t3OF0HnBHRPRJ2h34T0n7R8RvGg/Mh3A2WBQRi1qu1MwK\nk3fr88j2Mh8fweOJS7KcpD6gb7jnaRbua8geU7XBWLLuvb9DgIsAIuIBSQ+SPcPwlsaTRcSstis1\ns2Fzt159edO7aMNrSRe2c55mY+63AOMk7SppG+AkYEHDMcvJJlyRNJos2H/RTjFm1jn9xtZfilfC\n1N6gnXtErJc0A7gOGAHMjYhlkqbn788BPg3Mk7SU7H8WH4mIJztct5m1SOIFwCeBU/C69Z7hjcPM\nakziIOBrZH/Dfm8EjyUuyYbIG4eZ2fMktpa4EPgPsjmx4x3svcXbD5jVjMSryLr1R4DXdMODq614\n7tzNakJihMSHyVZazAb+wsHeu9y5m9WAxB5kD9JYD0yM4MG0FVlq7tzNupjEVhJ/BdwIfAc40sFu\n4M7drGtJvByYS7btx6ER3Ju4JKsQd+5mXUZCEu8m2zrgJ8CfO9itkTt3sy4i8VJgDrArcFQES9NW\nZFXlzt2sS0icQLZ9wF3ABAe7Dcadu1kXyIP9s8CxESxJXY9Vn7cfMOsCEv8NXBrBValrsXJ5+wGz\nmpLYm2y31WtS12Ldw+FuVn3TgHkRrEtdiHUPD8uYVVi+Xe8qYFKEn5PQizwsY1ZPxwG3OdhtqBzu\nZtU2HbgsdRHWfRzuZhUlsS8wjs0fbWnWlMPdrLrOBK7wRKq1wxOqZhUk8UdkE6kTvMtjb/OEqlm9\nHA/c4mC3djnczappGtkGYWZtcbibVUz+DNTdgR+krsW6l8PdrHo8kWrD5glVswrpN5H6ZxGsTFyO\nVYAnVM3q4QTgZge7DZfD3axaPJFqhXC4m1VEPpG6G/DvqWux7udwN6uO6cBcT6RaETyhalYBEtuR\nTaQeGMFDqeux6vCEqll3OwFY4mC3ojjczaphOp5ItQI53M0Sk9gPeAWeSLUCOdzN0ptGNpG6PnUh\nVh+eUDVLqN9E6gER/DJ1PVY9nlA1604nAosd7Fa0puEuaYqk5ZLulzRzgGP6JN0u6W5Jiwqv0qy+\n/IxU64hBh2UkjQDuBY4C1gA3A1MjYlm/Y3YEbgCOjojVkkZFxONbOJeHZcz6kRgPLAR29Xi7DaRT\nwzITgRURsTIi1gHzgWMbjjkZ+G5ErAbYUrCb2RZ5ItU6plm4jyGb7Nlgdf61/sYBO0n6iaRbJJ1S\nZIFmdSTxx2SN0VdT12L1NLLJ+60spdkaOBCYDGwHLJZ0Y0TcP9zizGrsROCGiE2aJ7PCNAv3NcDY\nfq/HknXv/a0CHo+ItcBaST8F9gc2C3dJs/q9XBQRi4ZasFlNTAcuSl2EVY+kPqBv2OdpMqE6kmxC\ndTLwMHATm0+o7g18GTga2BZYApwUEfc0nMsTqmaAxP5kz0d9pcfbrZl2s3PQzj0i1kuaAVwHjADm\nRsQySdPz9+dExHJJ1wJ3As8BlzcGu5ltYhrwVQe7dZLvUDUrUT6RugoYH7HZEKfZZnyHqll3eAdw\nvYPdOs3hblYuPyPVSuFwNyuJxGuAnYFrU9di9ddsKaRZz5EQ2bLfifmvA8lWi91MtmJsaQS/b+PU\n08kmUp8tqlazgXhC1XqexIuBP2NjmE8CRLas9ybgNuBlwIT8/b2Ae/L3biIL/eWDhbbEC4Ff4olU\nG6J2s9Phbj1FYluym+w2hPhEsuC+jY1hvQRYFbHlO7TzPdhfk//ZDYE/GriVjd39Tf3PIXEG8OaI\nzfZmMhuUw91sC/Ku/M1s7MpfDdzHpkG+bLhrziV2YmP3P4GN3f+Gzv544NwIFg7nc6z3ONzNtkDi\n+2R3Tl9HFuS3R/DbEj5XwC5s2tmf4fF2GyqHu1kDiQOBBcAebU6AmiXnm5jMNncB8DkHu/Uid+5W\nS/ma8oXA7hGsTV2PWbvcuZtt6nzg8w5261Xu3K12JPYDfkjWtf8udT1mw+HO3Wyj84G/c7BbL3Pn\nbrUi8Srgx8BuZSx5NOs0d+5mmY8DX3SwW69z5261IbE38FOysfbfpK7HrAju3M3gY8AlDnYzd+5W\nExJ7AjeQde3PpK7HrCju3K3XnQdc6mA3y/hhHdb1JHYH3gTskboWs6pw5251cB7wjxE8nboQs6pw\n525dTeKVwFtx1262CXfu1u0+CsyO4KnUhZhViTt361oSrwCOA/ZMXYtZ1bhzt252LnBZBE+kLsSs\naty5W1eSGAucBOyVuhazKnLnbt1qJvDVCB5LXYhZFfkOVes6EmOAu4B9IngkdT1mneQ7VK2XfASY\n52A3G5g7d+sqEjsDPwf2jeDXqesx6zR37tYrPgx8zcFuNjh37tY1JEYDy4BXR/Bw6nrMyuDO3XrB\nh4BvONjNmnPnbl1B4iXAvcD4CFanrsesLO7cre7+BviWg92sNU3DXdIUScsl3S9p5iDHTZC0XtLb\niy3Rep3EKGAacHHqWsy6xaDhLmkE8GVgCrAvMFXSPgMc91ngWsBDL1a0c4CrIvhl6kLMukWzvWUm\nAisiYiWApPnAsWQrFvp7P3AVMKHoAq23SewEvA84KHUtZt2k2bDMGGBVv9er8689T9IYssCfnX+p\nnBla6xUfAL4XwcrUhZh1k2adeytBfQlwbkSEJOFhGSuIxI7AWcCk1LWYdZtm4b4GGNvv9VjYbLXC\nQcD8LNcZBbxR0rqIWNB4Mkmz+r1cFBGLhlqw9QYJAXPIVsg8kLoes7JI6gP6hn2ewda5SxpJtrZ4\nMvAwcBMwNSIax9w3HD8P+H5EXL2F97zO3VomMQM4HTgkgrWp6zFLpd3sHLRzj4j1kmYA1wEjgLkR\nsUzS9Pz9OW1VazYIiYnABcBrHexm7fEdqlYp+eqY24APRrDZ3wDNek272elwt8qQ2ApYANwXwQdT\n12NWBd5+wOrgI8BOZI/QM7Nh8AOyrRIk+sjuRJ0QwbrE5Zh1PXfulpzES4FvAH8ZsclNc2bWJoe7\nJSUxErgS+GoE16Wux6wuHO6W2ieA9cAnUxdiVicec7dkJI4BTgUOiuDZ1PWY1YnD3ZKQeDkwDzgu\ngkdT12NWNx6WsdJJbAN8G/hCBNenrsesjnwTk5VO4kvArsBbI7xFtNlgOrK3jFnRJE4A3kw2zu5g\nN+sQh7uVRmJP4J+AKRE8lboeszrzmLuVQmI7skcxnh/BranrMas7j7lbKSSuALYF3uXhGLPWeczd\nKkviPcDBwEQHu1k53LlbR0mMB34EHB7BPanrMes23vLXKkdiB7Jx9nMc7GblcuduHZE/eGM+8GQE\n701dj1m38pi7VYaEgC8DY8j2jjGzkjncrVB5sF8KHAAcHcHvE5dk1pMc7laYPNgvASYAb4jgmcQl\nmfUsh7sVIg/2LwKHAK+P4H8Tl2TW0xzuNmx5sH8eOAw4KoKnE5dk1vMc7jYsebB/BpgMTPaeMWbV\n4HC3tuXBfhFwNFmwP5m4JDPLOdytLXmwfxJ4E3BkBE8kLsnM+nG4W7suBN4GHBHB46mLMbNNOdxt\nyCTOB04kC/bHUtdjZptzuNuQSHwMOJks2B9JXY+ZbZnD3VomcS7ZdgJ9Efw6dT1mNjCHu7VE4sPA\n6WTB/qvU9ZjZ4Bzu1pTEB4HpZMG+JnU9Ztacw90GJXEOcBZZsK9OXY+ZtcbhbgOSeD9wNtnk6arU\n9ZhZ6xzutpn8BqWzgb8m69gfSlySmQ2Rw902IfFC4CvA/mQd+8q0FZlZO1p6hqqkKZKWS7pf0swt\nvP9OSUsl3SnpBknjiy/VOi1/mPUtwO+BSRE8mLgkM2tT03CXNILskWlTgH2BqZL2aTjsF8BhETEe\n+BRwWdGFWudISOIM4EfARRGcEcHvUtdlZu1rZVhmIrAiIlYCSJoPHAss23BARCzud/wSYJcCa7QO\nahiGOSxi439XM+terQzLjIFNVkqszr82kNOBhcMpysqxhWEYB7tZTbTSuUerJ5N0BHAacOgA78/q\n93JRRCxq9dxWnHw1zOnAxcAHI/h64pLMLCepD+gb7nlaCfc1wNh+r8fC5jez5JOolwNTImKLT+OJ\niFlt1GgF8jCMWbXlTe+iDa8lXdjOeVoZlrkFGCdpV0nbACcBC/ofIOnlwNXAuyJiRTuFWOd5GMas\ndzTt3CNivaQZwHXACGBuRCyTND1/fw5wAfBiYLYkgHURMbFzZdtQeBjGrPcoouUh9eF9kBQRoVI+\nzJ7XMAxzort1s+7Sbna2dBOTdScPw5j1Lod7DTXclPRp35Rk1nu8t0zNSGwNzAYOxqthzHqWw71G\nJHYAvgOsBw6O4P8Sl2RmiXhYpiYkxgA/BR4EjnWwm/U2h3sNSOwHLAbmA++LYH3ikswsMQ/LdDmJ\nycCVwDkRfDN1PWZWDe7cu5jEqcA3gRMc7GbWnzv3LpTfcfpxsk3ajojgnsQlmVnFONy7TL+ljgcA\nh0Twq8QlmVkFOdy7iMT2ZEsdnwUO94oYMxuIx9y7hMTLyJY6PoSXOppZEw73LiDxarKljt8C3uul\njmbWjIdlKk7iSLL1617qaGYtc+deYRKnkK1hP9HBbmZD4c69gvKljucBZ+KljmbWBod7xUhsB3wJ\nOBB4rZc6mlk7PCxTERK7SXwe+CWwPdlSRwe7mbXF4Z6QxFYSb5T4AbAk//LECN7hpY5mNhwelklA\n4sXAe4C/Ap4Bvkw2aeqnJZlZIRzuJZJ4DXAWcDzw78ApwI0RlPOUcjPrGQ73DpPYBjiOLNRfAXwF\n2DuCR5IWZma15nDP5RtyXQCMBR4FHmn456PAYxGsa/F8Y4DpZMsZlwF/D1zju0vNrAwOd54fA/8O\n8If8n38KjCHbeXF0/vpPgVESz7Bp4Df+/lngVOAosr3Wj4rg52X++5iZKaKc4V5JEREq5cOGQGIP\n4AfAtcCHBuusJbYCdmJj4PcP/g2/3x64GvhaBM90tnozq7t2s7Onw13icLLNuGZF8JXU9ZiZNWo3\nO3t2WEbiNOBi4J0R/FfqeszMitRz4S4xAvgM8Fayu0CXJy7JzKxwlQl3CXV6vbfEC4FvAC8CDo7g\niU5+nplZKpXYfkBib+B2idMkXtChzxgLXA88DrzBwW5mdVaJcAfuA2aS3ezzkMTf5o+VK4TEROBG\n4F+BMyL4Q1HnNjOrokqEewTPRXBdBH8BvI5s2ORuiW9KHDycc0ucSLbU8X0RfMG3+ptZL6jsUkiJ\nFwGnAe8HHiPb4/yqVrvu/IEX5wOnA2+JYOnQqzYzS6u269zz1S1vAs4G9gZmA5dF8Oggf+YFwBXA\n7sCxEfy6varNzNJqNzsrMSwzmAiejeCaCCYDU8g237pXYl6+y+ImJEYDPwFGAH0OdjPrRU3DXdIU\nScsl3S9p5gDH/EP+/lJJBxRfZiaCuyI4ExhHNgn7fYn/lni7xEiJ/cgeenEd8I4I1naqFjOzKhs0\n3CWNIHuQxBRgX2CqpH0ajjkG2CMixgHTyIZNOiqCxyO4GNgN+Efgb4AVwI+Aj0Ywq8oTp5L6UtdQ\nFb4WG/labORrMXzNOveJwIqIWBkR64D5wLENx7wF+BeAiFgC7ChpdOGVbkEE6yL4dgSHki2jPCKC\nK8v47GHqS11AhfSlLqBC+lIXUCF9qQvods3CfQywqt/r1fnXmh2zy/BLG5oIbvXWumZmmWbh3urQ\nRuNMbmWHRMzMekGzvWXWkD2ZaIOxZJ35YMfskn9tM5Ic+jlJF6auoSp8LTbytdjI12J4moX7LcA4\nSbsCDwMnAVMbjlkAzADmSzoYeDoiNns+aNX2cjczq7NBwz0i1kuaQba0cAQwNyKWSZqevz8nIhZK\nOkbSCuC3wHs6XrWZmQ2qtDtUzcysPIXfoVqlm55Sa3YtJL0zvwZ3SrpB0vgUdZahle+L/LgJktZL\nenuZ9ZWlxZ+PPkm3S7pb0qKSSyxNCz8foyRdK+mO/Fq8O0GZpZB0haRHJN01yDFDy82IKOwX2dDN\nCmBXYGvgDmCfhmOOARbmv58E3FhkDVX51eK1eC3wovz3U3r5WvQ77sdku3gel7ruRN8TOwI/B3bJ\nX49KXXfCazELuHjDdQCeAEamrr1D1+N1wAHAXQO8P+TcLLpzr/RNTyVrei0iYnFE/G/+cgkJ7g8o\nSSvfF5DtAHoV2S6gddTKdTgZ+G5ErAaIiMdLrrEsrVyLXwE75L/fAXgiItaXWGNpIuJnwFODHDLk\n3Cw63LvmpqcStHIt+jsdWNjRitJpei0kjSH74d6wfUUdJ4Na+Z4YB+wk6SeSbpF0SmnVlauVa3E5\n8CpJDwNLgQ+UVFsVDTk3i36Gqm962qjlfydJR5DtXX9o58pJqpVrcQlwbkSEJLH590gdtHIdtgYO\nBCYD2wGLJd0YEfd3tLLytXItzgPuiIg+SbsD/ylp/4j4TYdrq6oh5WbR4V7oTU9drpVrQT6Jejkw\nJSIG+2tZN2vlWhxEdq8EZOOrb5S0LiIWlFNiKVq5DquAxyNiLbBW0k+B/YG6hXsr1+IQ4CKAiHhA\n0oPAXmT33/SaIedm0cMyz9/0JGkbspueGn84FwCnAgx201MNNL0Wkl4OXA28KyJWJKixLE2vRUTs\nFhGvjIhXko27v69mwQ6t/XxcA/y5pBGStiObPLun5DrL0Mq1WA4cBZCPL+8F/KLUKqtjyLlZaOce\nvunpea1cC+AC4MXA7LxjXRcRE1PV3CktXovaa/HnY7mka4E7geeAyyOiduHe4vfEp4F5kpaSNaIf\niYgnkxXdQZKuBA4HRklaBVxINkTXdm76JiYzsxqq/GP2zMxs6BzuZmY15HA3M6shh7uZWQ053M3M\nasjhbmZWQw53M7MacribmdXQ/wOQgNxeWEapggAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10a13cc18>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Калибровочная кривая улучшилась\n"
]
}
],
"source": [
"from sklearn.calibration import CalibratedClassifierCV\n",
"clf = CalibratedClassifierCV(base_estimator=SGDClassifier(loss='hinge'))\n",
"clf.fit(X, y_train)\n",
"preds = clf.predict_proba(X_test)[:, 1]\n",
"plot_calibration_curve(y_test, preds)\n",
"plt.show()\n",
"print('Калибровочная кривая улучшилась')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание 6\n",
"Здесь вы можете вставить вашу любимую картинку про Австралию."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"У меня сейчас нет интернета на ноутбуке — не могу найти картинку."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Часть 2. Градиентный спуск своими руками\n",
"**(опциональная часть, за правильное выполнение к оценке добавляется 3 балла)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В этой части вам предстоит самостоятельно реализовать градиентный спуск для логистической функции потерь — то есть, по сути, собственными руками обучить логистическую регрессию. Будем использовать данные из предыдущей части."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Добавьте к обучающей и тестовой выборкам единичный признак. Переведите значения целевого вектора в множество $\\{-1, +1\\}$."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X = np.hstack((X, np.ones((X.shape[0], 1))))\n",
"X_test = np.hstack((X_test, np.ones((X_test.shape[0], 1))))\n",
"y_train[y_train == 0] = -1\n",
"y_test[y_test == 0] = -1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Для начала реализуйте функции, которые вычисляют функционал, его градиент, а также прогнозы модели."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def sigmoid(arr):\n",
" return 1 / (1 + np.exp(-arr))\n",
"# возвращает вектор предсказанных вероятностей для выборки X\n",
"def make_pred(X, w):\n",
" pred = sigmoid(X.dot(w))\n",
" return np.hstack((1 - pred, pred))"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# возвращает значение логистического функционала для выборки (X, y) и вектора весов w\n",
"def get_func(w, X, y):\n",
" Z = -y[:, None] * X\n",
" return np.sum(np.log(1 + np.exp(Z.dot(w))), axis=0) / y.size"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# возвращает градиент логистического функционала для выборки (X, y) и вектора весов w\n",
"def get_grad(w, X, y):\n",
" Z = -y[:, None] * X\n",
" anc_var = np.exp(-Z.dot(w))\n",
" return np.sum(Z / (1 + anc_var), axis=0).reshape(w.shape) / y.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Теперь реализуйте градиентный спуск (не стохастический). Функция должна возвращать вектор весов и список значений функционала на каждой итерации. В градиентном спуске должно выполняться не более max_iter итераций."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from IPython.display import clear_output\n",
"\n",
"def _linesearch_armiho(fun, point, gradient, step_0=1.0, theta=0.5, eps=1e-2,\n",
" direction=None, point_loss=None, maxstep=np.inf):\n",
" \"\"\"\n",
" Одномерный поиск для выбора шага, основанный на правиле Армихо.\n",
" fun — оптимизируемая функция\n",
" point — текущее приближение метода\n",
" direction — текущее направление оптимизации. По умолчанию равно антиградиенту.\n",
" gradient — градиент в текущей точке\n",
" step_0 — начальная длина шага\n",
" theta, eps — параметры правила Армихо\n",
" point_loss — значение оптимизируемой функции в текущем приближении метода\n",
" Возвращает кортеж (new point, step) — новое приближение и длину шага\n",
" \"\"\"\n",
" if point_loss is None:\n",
" current_loss = fun(point)\n",
" else:\n",
" current_loss = point_loss\n",
" if len(gradient.shape) == 2:\n",
" gradient = gradient[:, 0]\n",
" if direction is None:\n",
" direction = -gradient\n",
"\n",
" step = step_0/theta\n",
" new_point = point + step * direction.reshape(point.shape)\n",
" while fun(new_point) > current_loss + eps * step * direction.dot(gradient):\n",
" step *= theta\n",
" new_point = point + step * direction.reshape(point.shape)\n",
" return new_point, step\n",
"\n",
"def grad_descent(X, y, step_size, max_iter):\n",
" if step_size == 'armiho':\n",
" step = 1.0\n",
" np.random.seed(2)\n",
"# w = np.random.normal(size=(X.shape[1], 1))/1000\n",
" w = np.zeros((X.shape[1], 1))\n",
" def loss_fun(point):\n",
" return get_func(point, X, y)\n",
" def grad_fun(point):\n",
" return get_grad(point, X, y)\n",
" loss_lst = []\n",
" for i in range(max_iter):\n",
" clear_output(wait=True)\n",
" print('step size', step_size)\n",
" print('Iteration ', i+1, '/', max_iter)\n",
" loss = loss_fun(w)\n",
" grad = grad_fun(w)\n",
" loss_lst.append(loss)\n",
" if step_size == 'armiho':\n",
" w, step = _linesearch_armiho(fun=loss_fun, gradient=grad, point_loss=loss, point=w, step_0=step)\n",
" else:\n",
" w -= step_size * grad\n",
" return w, loss_lst\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Запустите градиентный спуск для значений размера шага из набора [0.001, 1, 10]. Нарисуйте кривые зависимости значения функционала от номера итерации. Проанализируйте их. Вычислите AUC-ROC на тестовой выборке для лучшего из обученных вариантов."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"step size armiho\n",
"Iteration 20 / 20\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAESCAYAAADjS5I+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xe8m3X5//HXuwNaCoWyRKADkELZQ5ZSCEMpIkOmVRAQ\ncDLUn6iIksYFiPJFRUBkg5Ylu18ZApH5lVlmQQq0FAotKKPsjuv3x+cTTk56Tk5yzp3cd3Ku5+Nx\nP5LcuXPfV0+TXPlsmRnOOedcdwakHYBzzrls80ThnHOuKk8UzjnnqvJE4ZxzripPFM4556ryROGc\nc64qTxTOOeeq8kThnHOuqkFpB9AdSXsCuwHDgXPN7JaUQ3LOuX5JWR+ZLWk54DdmdnjasTjnXH/U\n1KonSedJmiPpsYr9EyQ9JekZST+seNlPgNObF6VzzrlyzW6jOB+YUL5D0kBCIpgArAdMlDROwcnA\n381sapPjdM45FzW1jcLM7pQ0pmL3lsB0M5sBIOlSYE9gZ2AnYLikT5jZn5oYqnPOuSgLjdmrAbPK\nHr8IbGVmRwF/SCck55xzJVlIFL1uTZeU7ZZ455zLKDNTrcdmIVG8BIwsezySUKqoST3/WFedpElm\nNintONqF/z2T43/LZNX7IzsLA+4eANaWNEbSEsABwHW1vljSJEm5RgXnnHPtQlJO0qR6X9fs7rGT\ngXuAsZJmSTrUzBYARwI3AU8Cl5nZtFrPaWaTzKzYkICdc66NmFmxNyWzZvd6mtjN/r8Df+/NOWN2\nLHqySEQx7QDaTDHtANpIMe0A2kGsfcnV/bqsj8yuRpJ5G4VzztWn3u/OLDRmO+cc4D0ZGyGJH9Mt\nnyi86sm59uK1BMmpTLxe9eSca3n+mU5Wd3/Pev/OWege65xzLsO86sk55/qJ3lY9tXyJwsdROOea\nQdLykq6W9LakGZK67O4fj/2upJclvSnp3DiYuMfzSBos6UpJz0taJGn7JP8NvR1H0fKJwjnnmuSP\nwPvAysCXgTMlrVd5kKRdgB8COwKjgTWBQh3nuQM4EHiFPsyFl6SWb8wm/Ad41ZNzbSCrjdmShgH/\nBdY3s+lx34XAbDM7ruLYvwLPmdlP4uMdgL+a2cfrPM8s4Mtmdkcf4u709yyresr3q8Zsr3pyzjXB\nWGBB6cs9egRYv4tj14vPlTwKfEzSiDrPkzivenLOucZZGnirYt88YJlujn2z7HHpdcvUeZ7MaPle\nT865fiSpkdv1V2+9DQyv2Lcs4Uu+p2OXjbfz6jxPZrR8icKnGXeuHzFTIlv9/g0MkvSJsn0bA493\ncewTwCYVx80xs9frPE/iejvNeMs3Zmex4cs51ztZ/kzHZRIMOBzYDLgB2KZyWYTY6+kCQq+nV4Cr\ngXvM7Me1nEfSkoCAZ4CvAnea2fu9jNlHZjvnXBN9CxgKzAUuAb5hZtMkjZI0T9LqAGZ2E/Br4HZg\nBvAskO/pPGXPPw28C6xKWKfnHUmjGvkP64mXKJxzmeGf6WR5icI551xTeKJwzjlXVct3j/VJAZ1z\nrja+HoVzruX5ZzpZ3kbhnHOuKTxROOecq8oThXPOuao8UTjnnKvKE4VzzrmqPFE451wPJB0p6QFJ\n70s6v4dju10GtVW1fKJoh9ljVVDL/z841+ZeAn4OnFftoBqWQU2Vzx7bolTQEsB04NuWt+vTjse5\nNGX9My3p58DqZnZoN893uwxqE8Msj8fHUbSJ7YFFwLkqaFzawTjnqurpy7XaMqgtyxNF+vYCzgJ+\nAFyrgpZLOR7nXPd6qoKptgxqy/JEkaLYNrEncI3l7QLgf4HJKmhgqoE5l1ESlsTWlxB6eL7aMqgt\nyxNFujYH3ra8PRUffx9YAvhleiE5l11mKImtLyH08Hy1ZVBblieKdO0FXFN6YHlbAOwP7K+CJqYW\nlXOuE0kDJQ0hzLg9UNKSUpcl/4uAwySNi+0SPwWqdqdtBZ4o0tUpUQBY3v4T9/9OBW2WSlTOuUo/\nJSxP+kPgQOA94PheLIPakjLbPVbSGsDxwLJmtl83x2S6K101Kmgs8E9gNcvboi6e3wc4FdjC8ja3\nCfEMA0ZY3l5s9LWc604rf6azKKnusZlNFCWSrmjTRHEssJbl7RtVjvk5ofvszpa3DxsYyybAlcAK\nwIvAlLjdG6vDnGuKVv5MZ1FLjqOQdJ6kOZIeq9g/QdJTkp6R9MNmxpSixaqdupAH3gBOa1QQKugw\n4BZC0XpF4GvAAuD3wBwVNFkFHaiCVmxUDM65bGtqiULSeEL3sYvMbMO4byDwNLAzYZj8/cBEM5sW\nn2+7EoUKWgV4CviY5e2DHo4dDvwfcJrl7ewEY1gK+COwFbCP5cPfu+KY1YDPAbsBOxB6dJRKG49Y\nPuPFUddyWvUznVUtW/UkaQxwfVmi2AbIm9mE+PhH8dCzgV8BOwHnmNnJXZyrJd9UKugIYCfL2xdr\nPH5t4C7CF/pdCVx/LKGq6VHgG5a3t2t4zZKEarDd4jaEMO5jCnBrLedwriet+pnOqqQSxaBkw+qV\n1YBZZY9fBLYys/8C3dbfl1RMcFU0s2Ki0TXGXsDFtR5seXtGBR0MXK6CtrK8zerxRd1QQfsBZxCq\nmv5Ua6kglnxuBm5WQd8BxhISxtHAJSroHkLSuNzy9kpv4+sLFfQJ4BjCNArfLhuf4ly/FidOzfX6\n9RkoUewDTDCzI+LjAwmJ4qgaztVyvz5U0DKEKraRlrc3ezq+4rXHAgcA4y1v79X52iUI3fb2APaz\nvD1Yz+t7OPdw4DPA7oSR5jcTktEdja6eUkECxgPfBbYllETnEBLh9yxvNSdkl75W/ExnWTuVKF4C\nRpY9HkkoVdQklihapSQBMAG4p94kEf2GMOrzzyrooFq/hFXQKOAy4FVgc8snO0rU8vYW8Dfgbypo\nWeAg4EzAVNBZwEW9/Pd2SwUNBvYFvkeYJuF/gAMtb+/E528nlMB2BI4s7XeuP+ttySILJYpBhMbs\nnYDZwH2UNWb3cK6W+/Whgv4C3Gl5O6uXrx9KaK/4q+XttzUcPwG4gDAm4zddjdlohPhLfzvgW8Bn\ngSuAMy1vD/fxvMsBRxCqvKYT/l1TuhmLsjRwOrAlsL/l7fG+XNs1Xit+prOsJRuzJU0mNIiuAMwF\nTjCz8yXtSugCOhA418xOrPF8LfWmitU/c4D1LW+z+3CeUYSeUIda3m7q5piBwAnA4cBEy9sdvb1e\nX8VeXocBXyeUIM8ktGW8X8c51iS0PxxEaAv5H8vbQzW+9mBCaew44FzvrZVdrfaZzrqWTBRJk2SE\n1aNaoupJBX0G+LnlbesEzjWe0HPp05a36RXPrQz8hVC1ODGtxuVKMXntBnyTMCHihcBZlrdnuzle\nwKcI1UvbA38GTre8vdSLa48jVL89Rujp1ZDZPFXQCEJC/CbhR8G1wHXA481OUPHvPbSVeqRlNVFI\nOhI4BNgAmFy+cJGknQhdzUcC/wIOMbMXujnP8sC5hDa914DjzGxyA+Pu9Pcsq3rK96tEkcU3VXdU\n0B+BWZa3kxI639cJv7K3ju0EqKBPA5cSJifLZ3VktQpai/CFegjwEKGUMcXytkAFDQL2ISSIFQjt\nDxf29QsvVtudRhgTckBfq8Eqzr0G8B1CiecG4HfAcoTOA3sSFqe6Nm53NeL/JU5bvz7h37cDIbkO\nBq4C/mB5eyDpayYtq59pSV8g/B/uAgwtJQpJKxKqQA8Drgd+AYw3s226OU8pKRwGbEooHX/KzJ5s\nUNxeomilEkX8EL9AmI4jsW6bsbH448DehC+qHwBftbxNSeoajaSChhAapb8FrE74It2DMKHaqcAN\nlreFCV/zi8AfgEnAGX35pa+CtgL+H3G8D+EL+cWKYwRsREfSWIMwBuVa4Kbelm7ieccS1mfegfBL\n8U3ChHS3AUVgPuFL6VvAy4R/95WNnBKmL7KaKEoql0KV9DXgK2a2bXy8FKGksImZ/bvitcOA/wLr\nm4VaAEkXArPN7LgGxesliqy/qcqpoC2Aiy1v6yZ83iWAW4GVCKtp7W95m5HkNZpFBW1MGGMypdG/\nfuMgxsuA54DDLW9v1PHagYSuwP+PUN1wGqHto6YvfBW0Oh1diT8F3E2soqrWdhUTwxqEpFBKDvMJ\nSeF24PbuxtiUxXwUMA74E2EcTSaqJUuy/pmW9AtgtbJE8TtgkJl9u+yYR4FJZnZVxWs3Be4ys2Fl\n+74H5MxsjwbF6yWKrL+pyqmgXwIDLW8/6vHg+s+9MqHK4/SepgRxHeJo81OAzwNftLzd18PxSwEH\nE8ZsvAH8FvhbX6qR4hiUXQhJY1dC4ipVUT1OKGWVqpJ2JCxsVUoMtwHP11siUkEbAEcSxuT8L/B7\ny9u/evtvSFLWP9NdlCjOAV4tLxFIugs428wuqnjteOByM/t42b4jgC+Z2Q4NirdtxlH0SQuNo9iL\nUAWQuDgNeY9dZV1nMakerYKKwA0q6CRCb6pOX7wq6GPAtwkzBdxL6El2ZxKN07Ft6Qrgijg2ZFtC\n0riO0MaxgFCFdBtwMvB0X68buwl/QwUdBxwK/FUF/YdQLXV5ln9sqKBEftlavtfJqPJ1lUufQhjX\n01Xpsp5jG6JlxlEkKeu/Pkri3EpFYPVmjWNw9VFBYwhVUXOBQyxv/4k9pb5HaEO5jJBEnm5SPAJW\nBV5u9HsmVkt9jlAttRGhd9lZtfYui7GuBIyK28iy+6MI0/S8Qphb7LHSbVfrrGT9M91FieII4OCy\nNophhIGttbZRXAzMMrMfNyher3rK+puqJE69sabl7Ztpx+K6F9t7fkmoknkM+CSh2+OZlrdX04yt\nWVTQuoRqqS8Rpp//PTCVxb/8R1bcf5vQWWNWvC2/P5vQ4WIjYMOy2w+oSB5M4sEsfqbjLNeDCVP/\nr0YY9LkAGEHo9fRVQjXez4BtzexT3ZxnMmHd7cOBzQg95LapZYBxL+P2RNFCieJu4GfdDY5z2RLH\nu4wEJtc7p1a7iFOxHExIGiPpnAAqE8KL9U6REkshq9M5cWzEJDZgEk9RmUBgZqlkFV9b+uIeFLee\n7g8mJKbn6um48FG8oYr7hIrdk8zsZ3EcxenAaMJA2I/GUUj6MSFxfC4+HgGcR8c4ih+Z2aX1xlNH\n3J4oWqF7bByVPI2w9kQmuyQ61534pUyzBgtKMiaxMZUJJCyqZXQkgIWEX/Tz421398v3DQXWjPue\nrdiei7cvtVP1cNO6x0o6GrjYLNmJ5JLQCiWKuPbEjpa3iWnH4lzWdfsLOPQ4M+KXfm8TV1l7yprA\nWhXbmoSqpBl0Th6l7fl6pp3JgqaVKCSV6mwfIhSZbrKMFENaJFFMIcyeelnasTiXdWl/plXQMMJY\nlfLkUbo/CngHeJ3QPbp0+0YP+94AXk+jN1lTq54kDSDMAHoIoYHvcsLkfV3O0dMsab+pelK29sTq\npSk2nHPdy/JnOvYOGxG35cpua7k/glBd9jqh59N/4lZ+v6vtv5a3+b2OuZnjKMxskaRXCJOcLST8\no6+U9A8zO7bWizVCxsdRTADu9iThXOuLU8m8Fre6xCqvoYTvzuUJc5iVbysTRsxX7h+hgt6hc2J5\nFZgJPF+2zaoloTRsHIWkY4CvxCDPAa42s/mxlPGMma1V70WTkuVfH0Bp7Yk7LG9/SjsW51pB1j/T\nzRbniBvO4kllTNzWiNsqhLm8ZlCeQCZxEZMYCcwub6RvRIlieWBvM5tZvjOWMnav9UL9TeyTvyvw\n/bRjcc61pvjlXmrn6LaqP37fjKRz8tg1Pn0/oWTyAh1JpC61lCi2AZ4wi9NYS8OBcWbpzw2T5V8f\nsS/+zyzf9XTDzrnFZfkz3YpKf8/Ya2w0pSQyidOT7vU0FdjMLA52CSMUHzCzTXsffjKy/KaKa0+8\nYHk7Oe1YnGsVWf5Mt6KkGrMH1HJQKUnE+wsJoyJdN2K94p7ANWnH4pzLPkmPS9ou3p8U54DKjFoS\nxfOSjpY0WNISsXH7uUYHVqv4R82lHUeFzYF5zZpAzjnX2sxsA7OP1rVv2Dg1SbnYU7QutSSKbwCf\nJowHeBHYGvhavRdqFDOblMGusXvhpQnn+qVYPd+nUyQSSBfMrGhmk+p9XY+JwszmmNkBZrZy3Caa\nLT49sOvEE4VzbUbSjyRNl/SWpCck7RX3HyLpbkmnSnoNmCTpfElnSPpfSfMk3SlpFUm/k/S6pGmS\nNik79wxJO8aHBiwh6cJ4rcclbV527DhJxXiex5vR+7THRCFpZUnHS/pz/MefL+m8RgfWquLaEyMI\nXdKcc+1jOmEm2OGEyUgvkbRKfG5LQvfVlQlT1QvYDzieMKHhh4SZZe8nDDm4krAmfEl5dZMI66tP\nJixsdB1hdlokDQauB24kzFl1FPAXSWMT/rd2UkvV07WEAR+3AFPKNte1PYFr22kGSuccmNmVZmGN\ncTO7HHiGkCAAZpvZH81skZm9T/jiv8rMHjazD4CrgXfM7JI4V97lQLWeo3ea2Y3x2EuAjeP+rYFh\nZnaSmS0ws9sJa1o0dNLRWgbcDTWzHzYyiDazF2HxEudcworFZJZCzeXq74Ir6SuE9dLHxF1LE0oL\nCwnrc1Qqr6J/v+Lxe/H13ZlTdv9dYEicDWPVLq41k7CYUsPUkihukLSbmXkpogdx7Yn1CAvfO+cS\n1psv+CRIGg2cDewI3BtGselhOhqemzWj9mxgpOJAiLhvNPBUIy9aS9XTd4DrJb0fG2XmScrMJHcZ\n6x67O3CjL1DkXNsZRkgGrwEDJB0KbBCf6yp5NSqh/YtQwvhBHLKQAz4P1LRKXsO6x5rZ0mY2wMyG\nmNkycRte74UaJWPdY723k3NtyMyeBH4L3Au8QkgSdxGSR2nr9JKKfd0d0+XlujvWzD4k/CDdlTCL\n7OnAQWb27xr/Hb3qHlvrehQjgLWBIWUXvKP7VzRHlob7+9oTzvVdlj7T7aBp61FIOgI4mjAz4cOE\nVvd7CXV1roOvPeGca0u1tFEcQ+gCNsPMdiB06XqzoVG1Jq92cs61pVoSxftm9h6ApCFm9hSwTmPD\nai1la09cl3YszjmXtFq6x86KbRTXALdIep2wipLrsD3wtOXt5bQDcc65pPWYKMzsC/HuJElFwijt\nGxsZVAvyaifnXNuqpURBnJBqW0IXrbtiF62GkjQMOAP4ACia2V8bfc3eKFt7Yqe0Y3HOuUaoZVLA\nE4ALCBNZrQicL+mnDY4LYG/gcjP7GmGCrKzytSecc22tlhLFgcBGcaIrJJ0IPAL8vN6LxVlndwPm\nmtmGZfsnAKcRVs47x8xOJsxd8kg8ZGG912qivfFqJ+cSIyUzn5NLTi2J4iVgKGFSKwiD7l7s5fXO\nB/4AXFTaERf5OB3YOV7rfknXxWuMBB6lxiVbm00FDQcOIzRmO+f6yAfbZVMtieIt4AlJN8fHnwHu\nk/QHwMzs6FovZmZ3ShpTsXtLYLqZzQCQdCmhzv/3wOmSdiO73U6/A9xkeZuWdiDOOdcotSSKq+NW\nKg4W432RzIyJq9F52twXga3M7F3gqwmcvyFU0PKEEetbpx2Lc841Ui3dYy+QtBQwKg62S1qfkk3F\nTIjFJk4Q+H3gasvb9CZdzznneiXOMpvr7etrmetpD+AUYElgjKRNgYKZJdUT6SVCW0TJSOpoA+nN\nTIh9pYJWBr5O9RWqnHMuE+IP6GLpsaR8Pa+vpZF4ErAV8Hq84MPAmvVcpAcPAGtLGiNpCeAA6miT\nSGk9iuOAv1jeXmjydZ1zrtcath4FMN/M3qjY16v1oCVNBu4BxkqaJelQM1sAHAncBDwJXGZWe+Nw\ns9ejUEGrAwcDJzbrms45l4TerkdRS2P2E5K+DAyStDahAfeeei8EYGZdLgBuZn8H/t6bc8bs2My2\nieOBc3xeJ+dcq+ltW0WPCxfFhuyfAJ+Nu24Cfl4agJemZi9yooLWIFSVrWN5e61Z13XOuSQlunCR\npEHAlLgOxY/7GlwbOAH4oycJ51x/UjVRmNkCSYskLddFO0UmNKvqSQWtQ1jEfO1GXsc55xqlkVVP\n1xG6gd4CvBN31zUiu1GaWfWkgiYDj1neftWM6znnXKMkvmY2cFXcShklqRHZLUMFbQTsAByRdizO\nOddstSSKK4H3zGwhfDSJ35CGRlWHJlU9/Qw42fL2dgOv4ZxzDdXIqqf/A3Y2C1+SkpYBbjKzT9Uf\nZrKaUfWkgrYgzHW1tuXD2uHOOdfK6v3urGXA3ZBSkgAws3nAUr0JrkX9HPilJwnnXH9VS6J4Jy6F\nCoCkTwKZ+dJs5BQeKmg8sA5wbiPO75xzzdTbKTxqqXraArgUKI1E/jhwgJk9UO/FktbIqicVJMIk\nWhdY3s5vxDWccy4Nifd6MrP7JY0j/LI24Gkzm9+HGFvFToSkeHHagTjnXJp6rHqStD+hneIx4AvA\nZZI2a3hkKYqliV8AecvbgrTjcc65NNXSRvFTM3tL0raEX9nnAWc1NqzU7QYMAy5LOxDnnEtbLYli\nYbz9PPBnM7sBGNy4kOqTdGO2ChpAKE381PLWq+nUnXMuixq5HsVLks4mLCg0RdKQGl/XFA1Yj2If\nYAFwbYLndM651PV2PYpaej0NAyYAj5rZM5I+DmxoZjf3KtIEJd3rSQUNBB4Hvmt5uzGp8zrnXJYk\n1utJ0nAze4uwVvbtcd/ywAeENRna0ZeA/xDW3HDOOUeVEoWkKWa2m6QZLD4JoJlZkutm90qSJQoV\nNBh4CjjM8s1bWtU555otsRKFme0Wb8ckEFfDJDgp4CHA854knHPtKvFJAXsaK2FmD9V7saQlVaJQ\nQUOAfwMHWN7u7XtkzjmXXUmOzD6VUOU0FNgceDTu34jQRrFNb4PMoCOARz1JOOfc4rrt5mpmubhW\n9mxgMzPb3Mw2J6x2N7tZATaaChpGWA/8p2nH4pxzWVTLeIh14/QdAJjZ48C4xoXUdN8G7rK8PZx2\nIM45l0W1rHD3qKRzgEsIy6B+CXikoVE1yZe/vN06Q8byg/HT2R1phSqH9rT0ay1LwyZxjmZco6/n\naMq/YxarcwgXrPg064x6m6VHfcgSoxYwaPQiBowytMwAFj01mPnTluLdJ9flqSfvYvxrvYizr/+W\n1nhf9DSYyvV7tQy4Gwp8Exgfd90BnGlm7zc4th71tTH7u8du9Miuuzyx0XuvL8/Cl5deNPzlDxeO\nfPm1Bcu99OHCoS+xcOiLLBo8r+cwagk1gXM04xp9PUdi/46FDOAVVuF51hgwk9GayWjNYIxeYJRm\nMlovMEpL8gGjeMFG8YKNZqaNYYaNZqYN4x2eYt0BT7C+nmQ9Pcl6WpIPWI8nbT2etPV5wjbgcVuf\nJ2wF/tuof0urvC+Skn7Ca58fQg2PQ7BUPd+dPSaKLEui19PILS5dYa1hAz83fPh/dll66Te3XHbZ\nV8esscZjb48ePW3+Ciu8vMyAAQs/GDDApgPPAqXb0vZyLufzQVWSWAJYGlgm3na1dffcssBIYHXg\nDWBm3GaU3Z8JzDTjrRrjEbAqsH7ZtgGwHvA28ETlZsabvf8LtBGpVRJeS/0QSjkOCeb1q0QBFEhm\nHEU8J0OALYHx0qLtll321U+NGfPk3A02uPu5TTe97Y2xYx9k6aXfWhVYCxgOPAdcAZyay1nP5Y82\nITGA8DfYpGzbGFiZ8EaeR/gSLt12tXX13DzgReAFs8aupBgTyEgWTyDjCO13VsNGN/sXEXoK3hS3\naWaJ/Bp1rtfKxlHk+1WiaNQKdx3XYBDhS3C7uI0n/NK9Y6WVXrjviCOOn73zzpfsK7Ez8Cvg7FzO\nPmhkTM0mMZTwBVqeFDYCXgOmlm2PALPN+DClUBMRk+BQQsKrtlHluUHAFsAucRsI3ExIGv8w677O\ny7lGq/e70xNF3ddkAOEXZ3niWG7cuH/NPvLIY4avttr0IbfeOvGqs8465dL584c8A8wyo2UWP5JY\nic4JYRNgDeBpOieFR814Pa04W0ksuYylI2mMB56ko7RxXyu9R1zrSzxRSFoH+D4who5eUmZmO/Y2\nyKSkkSi6joNlCdUwa+2xx5k777ffqXuZaak//enX79199x7DYcAsFm/fmA4834TqlYHACGBFYIUe\nbtcAlqJzQphKqDZp6VJClkgsCXyajsQxGriNmDjMmJlieK4faESieBQ4E3iIjkWMzMwe7HWUCclK\noqhULErAXsCvFi0a8Oo//vGlM0488eK3CcnkE/F2LcIXxGvA88C7hL9vV9uCKs+Vnh9A+MKv/PJf\nFniTMCvuaz3cvkBoJG7dYmYLklgF+AwhaXwWeJ2QNG4lNOLPAV41++jz51yfNCJRPBhHZGdOVhNF\nSbGoQcDBwCTgfuD4XM6mlZ6Pv/ZHEkprQwj12F1tg6o8V9qM8GVfmQBe9y+Y1hGrNjchJI3tgdWA\njxFKha8TksYc4JWy+5XbXK/KctU0IlFMAl4FriKsRQGAmaXeGJf1RFFSLGoocCRwLHAdMCmXsxfT\njcq1ktipYkVC0lgl3na3rUDocDGHUM15PXC9GXOaH7nLokYkihn0g/UomqFY1HLAD4GvAecCJ+Vy\n6Sdc115iSbWUVDYA9iSUUB4DrgGuNuO59CJ0aWubXk+S1gCOB5Y1s/26OaalEkVJsahVgTywN/Bb\n4Pe5nL2bblSuncXxQTsS2s72JJQ2riYkjqneLtW/NKJEsQRhCo/tCCWLfwJnmdn8vgRaK0lXtFui\nKCkWtQ7wC2BbwtQopbEIjwCzc7mMZnHX0mKJY2tC0vgCoQ3smrjd5e0b7a8RieJcwhvpQsJAooOA\nBWZ2eI0BnQfsBsw1sw3L9k8ATiM0xJ5jZid38/q2TRQlxaLGEkaDb0zHCGfRkTQeISSRp3I5826q\nLjFxjMf6hISxFzAKuIFQ2ril0d23XToa0j3WzDbqaV+V148nTM1wUSlRSBpIGMC1M/ASoUfQROCT\nwGbAKWY2Ox7b9omiUuxeuwodSaOUQEYTVuIrL3k8ksvZf1IK1bUZidGEqqm9CAuW/QMoAtPiNtur\nqVpfIxLFQ8D+ZjY9Pl4LuMLMqi6VWnGOMcD1ZYliG8JcIxPi4x8BmNlJZa9ZnjAlxk50U+Jo10TR\nnWJRSxFadLIPAAAUqklEQVR+/ZUnkI2AtwjzCk0r257yhnLXFxIrAJ8nVFONi9sQ4Ck6v9emAc95\nN+zW0YhEsRNwPmFQGIQ+/4ea2W11BDWGzoliX2AXMzsiPj4Q2MrMjqr1nPF1/SpRdKVY1ADC/8mG\ndHyY142379H5Q126P8vbP1xvSCxPx/usfFuFMNtAZQL5t1dfZU+Sa2YDYGa3ShoLrENozH7arM+T\n3iX2JRXHeZQkNotsq4jTnD8Xt2tL+2P11ap0JI1xhCqFdYHhxaKeZvEE8my7TWjokhUnM7w7bh+R\nWIrwHVF6r+1LmMZ9TYkXCSPNLwXuNsOn5m+ysllje/f67koUknaKSWIfwhd7KfsYgJldVUeQY+hc\notgamFRW9XQcsKi7Bu0q5+33JYreiOM51qVzElmXUDKZS5g2YgahFDmjbJvljemuHhKDCQlkD+CL\nwPLAZYSk8YC3d6QjsaonSQUzy0u6gC5KAGZ2aB1BjaFzohhEaMzeCZgN3AdMNOuY3qLG8ya+HkV/\nFqccWY2QMLraViUkksoEUtpm5XLN6TbtWpPE+sABhM4rAwgJ41IzHks1sH6iYetRSFrTzJ7raV+V\n108mzFmzAuFL5gQzO1/SrnR0jz3XzE6sNeiyc3uJooliIlmd7hPJx4H/0v0cROXba7mceX/9fip2\ny92UUMr4IqFDxqXAZWY8k2Zs/UFDej1V9nDKykSBXqLIlmJRg4GVWHzuoa7mJqqc5O6jCe0I3anf\njdt7Pdx/z5ejbW1xIsStCQljf8IKh5cCl5vxQpqxtZvESxSSxhEao04hrEchQhXUcOBYM1u/byH3\nnZcoWlcsnZTmIyrfVgaGEdbFGBpvK++XPx4CfEhHAnkXeB+YX7Z9WPG4lufuzuXslsb9BVxX4qjx\n7QlJY29CR4tLgSvNeCXN2NpJkm0UexJGa+5OmPG0ZB5wqZnd05dAk+AlChd7dw2hcwIZCgwu25ao\neNzVtkTF/S8R1mD5Ti4XBn+65pJYgjAo94uExvDZwF109Lp61hvD69PINoptzOzePkXXIF6icI0S\np4Y/Hvg64cfImbmc+YCylMRp1jcirAz4acL8aIMICaOUPB42wztT1KARbRQXAUeb2Rvx8Qjgt2b2\n1T5FmgBPFK7RikWtB5xFKLV8PZezh1MOyfFRY/goQsIoJY+1CNMBlZLHvWa8mVqQGdaIRDHVzDbp\naV8avOrJNUMc/X4IcCLwV+CEXM7mpRqUW4zEcoRG8VLy+CRhIGqpqup2M/p1NWIjq54eAXawuKJd\nnIPpn+UzwabFSxSumYpFrUTo3LEjcHQuZ9ekHJKrIg7225SOEseOwL3An4Ep/bmaqhEliq8Q6mov\nJ/R82g/4pZld1JdAk+CJwqWhWFSOUB31NHBULmfehbMFxGlG9gOOANYELgDONePZNONKQ0NWuJO0\nPiEbG3CbmT3Z+xCT44nCpaVY1JLAD4BjCFVSv/MBhK1DYj3gcML6Oo8SShlXm9Ev5jprVKIYSBg0\nNYiOuZ5S/xXlbRQubcWi1gbOIAw0/HouZ/9KOSRXB4klCcMAjiD0qroY+LMZdU0n1Coa2UZxFGF9\n57nQMd+8t1E4F8SxHF8CfgNcBfw4lzPvbdNiJNYCDgMOBZ4llDKuMKPt1rNvRBvFs8CWZtlbRc0T\nhcuSYlEjgJMIg1S/B1zu04u0ntgIvhuhlLE1YWT4n82YmmpgCWpEorgd+KxZ9mYF9UThsqhY1KeA\nPwBrE+q/p5Ztj+dy9n6K4bk6SIwEvkooacwBLgL+1urdbBuRKM4DxgJTCPPiAJiZndrrKBPibRQu\ny2IJo7Te+abxdm1CtUZ58pjq655nW5yD6rOE6dF3Bx4HriTMQfVSmrHVo5FtFJPi3dKBIiSKQt1R\nJsxLFK7VxN5S6xGSRimBbAy8SUXyAJ73JWuzJzaA70zoarsHYXXIKwgljVlpxlarhvR6yipPFK4d\nlK17vgmdE8jShCVErwP+N5cL0+i47IgTF+5ESBp7Av8mJI0rszxFeqPaKCqZme1Yb3BJ80Th2lmx\nqI8TGlX3IFQX3E9IGtflcvZ8iqG5LsRG8B0JSWMvQhVjKWnMSDG0xTQiUXyy7OEQYB9ggZkd27sQ\nk+OJwvUXxaKGEao79gA+T2hYvS5uD3jvqmyJSWMHYF/COI0ZdCSNmlYHbaSmVD1Jut/Mtqj7hQnz\nROH6o2JRA4EtCVUdewDLAdcTksat3qsqW+IU6TlC0tibMN/UCWY8kl5MyZcoli97OIAwI+PvzGyd\n3oWYHO/15NxHo8N3JySOTYBbCUljSi5nr6YZm+tMYihhjZMfEqZCn2TGE827fuN6Pc2go8fTAkIR\nqmBmd/UizkR5icK5zopFrQB8jlDS+AyhcfVB4OG4PZ7L2XvpRegAJIYB3yYsM30LUDDj3827fnJL\noY7KwnxO1XiicK57sSvuloQeVKVtLGGNhofLtqm5nL2eVpz9mcRw4GjgO4Tqw5+Z0fCOCkkmiofN\nbNN4/29mtk9CMSbGE4Vz9Skbx1GePDYG/kNZ4oi3L/o4juaIiy59j1DKuBL4RSPHZDQqUXx0P0s8\nUTjXd3Ecx1p0Th6bAgMJSeNB4AFC99yZnjwaR2IF4Fjga8BfgF+Z8XLy1/FE4ZxLQBzHsSmwGbAF\noSPLknQkjQcIXXNbZgqLViHxMUKD9yHA+cDJZsxN7vzJJYqF8NH0ukOB8gYwM7PhvY4yIZ4onGuu\nYlGrEhLGJ+lIHvOJSYOYQLy3VTIkVgV+TJjG/mzgFDP6PC+YT+HhnGuauBbHKDqSRml7g87J475c\nzualFWerkxgF/IQw4Pl04LdmvNX78/WzRIGPo3AuU8raPEqlji0IVVhPAXcAdwJ3eamjfhJrEr7z\nPgP8jLBORs1LQDRsHEWWeYnCudYQe1ttAYwHtgM+BbxESBp3AHfmctnujp8lEpsCvwZGAz8irPdd\n85d5vytReKJwrvXEaUg2piNxjCe0id5JR/J42ntYdU9ChDUyfg28A3zfjHtqe60nCudci4ltHWPp\nSBrbAUsRprm4I25TffLDxcVFlQ4EfgHcBxzX0yhvTxTOubZQLGokHUkjB6wA3ExYo+PmXM5eSS+6\n7InzSB1NGIdxGWGU95yuj/VE4ZxrQ8WiRhOqWiYQ1n2YCdxISBx353L2YZWX9xsSKwLHA18BTgNO\nNeOdzsd4onDOtbliUYOArYBd4rYuoXrqRuCmXM6mpxheJkisBfySUCrLAxeYsSA854nCOdfPFIta\nkbCwUylxvEcoadwE3Nafx3BIbAmcAqxIGO09BbSoLRKFpD0Jy0AOB841s1u6OMYThXOuk9gwviEd\nSWMrwnxVNwCT++OUI7GH1G6EHlJzQLm2SBQlkpYDfmNmh3fxnCcK51xVcRnZ7QlLku4DPARcDFzV\n30oacbW9Q0FnZypRSDqPkMnmmtmGZfsnEBpaBgLnmNnJ3bz+N8AlZja1i+c8UTjnalYsaghhNcCv\nEOrupwAXEZaQXZBmbM2UuTYKSeOBt4GLSolC0kDgaUKd4kuEuWAmEob8b0aoT3sZOAm42cxu7ebc\nniicc71SLGol4IvAQcBIYDKhpDG13Qf6ZS5RAEgaA1xflii2Icw1MiE+/hGAmZ1U9pqjCVn/fmCq\nmf2pi/N6onDO9VmxqHUJg9YOBOYREsZfczl7MdXAGqRVEsW+wC5mdkR8fCCwlZkdVed5S5MClvjk\ngM65XosTGo4nlDL2pk3aM8omAyzJ3qSAXSSKfYAJSSQKL1E45xqhWNRQQnvGQXS0Z5yRy9ndqQaW\ngHq/Owc1MpgqXiLUCZaMBHpVxJM0CS9JOOcSlsvZe8DlwOWxPWMicHGxqJmEKb6LrdaW0UXJorbX\npVSiGERozN4JmE2YyGqimU2r87xeonDONU2xqMGE1eaOB+YCPyfMO9VqCSNbbRSSJhP6MK9A+MOe\nYGbnS9qVju6x55rZib04ty9c5JxrujhN+v6EVefeJszcekPWE4YvXOScc00WG7/3JiSMRYSEcU3W\np0PPXImikTxROOeyIE4bsjvwU2AIYTK+K3I5W5hqYN3od4kCr3pyzmVETBi7ACcAywO/IozHyMSo\nb696cs65jIgJY0dCCWMkIWFcnJU1M/pdicIThXMuy4pFjSckjLHAycCf0y5h9LtEgVc9OedaQLGo\nrQmJ4l3gi7mcvdnsGLzqyTnnMi6uzPc7wpCBz+dyNiONOOr97hzQyGCcc851iFVORwJ/Au4pFrVN\nyiHVxEsUzjmXgmJRnwMuAI7J5WxyM6/tbRTOOdciikVtCFwPnA/8rNEju72NwjnnWlCxqFWAa4Hp\nwGG5nL3f6Gt6G4VzzrWQXM5eIfzKHwTcWixq5XQjWpwnCuecS1mc0nwicBvwf8Wi1k85pE5avuoJ\nb6NwzrWRYlEHAb8FDsrl7KYkz+1tFM451yaKRW0LXElo4D4j6fP3u15Pniicc+2oWNRawA3AzcD3\nkpyJ1huznXOuDeRy9iywDbAecF2xqGXSisUThXPOZVQuZ28AnwNmAXcXixqdRhyeKJxzLsNyOZsP\nfJMwKO/eYlFbNTuGlm+jwHs9Oef6iWJRuxNWz9s8JpC6eK8n55zrB4pFDe5NkijnvZ6cc85V5b2e\nnHPOJcoThXPOuao8UTjnnKvKE4VzzrmqBqUdQF9JmoR3j3XOuR6VdY+t73Xe68k55/oX7/XknHMu\nUZ4onHPOVeWJwjnnXFWeKJxzzlXlicI551xVniicc85VldlEIWldSWdKulzSYWnH45xz/VXmx1FI\nGgBcamb7d/Gcj6Nwzrk6ZW4chaTzJM2R9FjF/gmSnpL0jKQfdvPa3YEpwKWNjtN9NGrTJcT/nsnx\nv2W6mlH1dD4woXyHpIHA6XH/esBESeMkHSTpfyStCmBm15vZrsDBTYjT9WJov6sql3YAbSSXdgD9\nWcPnejKzOyWNqdi9JTDdzGYASLoU2NPMTgIujvu2B/YGhgC3NzpO55xzXUtrUsDVgFllj18EOi0Y\nbmb/BP7ZzKCcc84tLq1EkVgLuqRst8a3GEn5tGNoJ/73TI7/LdOTVqJ4CRhZ9ngkoVRRF+/x5Jxz\njZfWOIoHgLUljZG0BHAAcF1KsTjnnKuiGd1jJwP3AGMlzZJ0qJktAI4EbgKeBC4zs2mNjsU551z9\nMj/grjuSJgCnAQOBc8zs5JRDammSZgBvAQuB+Wa2ZboRtQ5J5wG7AXPNbMO4b3ngMmA0MAPY38ze\nSC3IFtLN33MScDjwajzsODO7MZ0IW4ukkcBFwMqE9uGzzez39bxHMzuFRzXdjcNIN6qWZ0DOzDb1\nJFG3xcYKAT8CbjGzscCt8bGrTVd/TwNOje/PTT1J1GU+8F0zWx/YGvh2/L6s+T3akomCsnEYZjaf\nMHJ7z5RjagfeOaAXzOxO4PWK3XsAF8b7FwJ7NTWoFtbN3xP8/dkrZvaKmU2N998GphGGKNT8Hm3V\nRNHVOIzVUoqlXRjwD0kPSDoi7WDawMfMbE68Pwf4WJrBtImjJD0i6VxJy6UdTCuKg583Bf5FHe/R\nVk0Urdmwkm2fNrNNgV0JRdPxaQfULiw0BPp7tm/OBNYANgFeBn6bbjitR9LSwN+AY8xsXvlzPb1H\nWzVRJDIOw3Uws5fj7avA1YTqPdd7cyStAiDp48DclONpaWY21yLgHPz9WRdJgwlJ4mIzuyburvk9\n2qqJwsdhJEjSUpKWifeHAZ8FHqv+KteD6+iYzPJg4Joqx7oexC+yki/g78+aSRJwLvCkmZ1W9lTN\n79FW7h67Kx3dY881sxNTDqllSVqDUIqAMFr/L/73rF0cK7Q9sCKhrvcE4FrgcmAU3j22Ll38PfOE\n2WM3IVSPPA98vax+3VUhaVvgDuBROqqXjgPuo8b3aMsmCuecc83RqlVPzjnnmsQThXPOuao8UTjn\nnKvKE4VzzrmqPFE455yryhOFc865qjxRuJYk6e14O1rSxITP/eOKx3cnef6kSTpE0h/SjsO1L08U\nrlWVBgCtAXypnhdK6mkJ4OM6Xcjs0/WcPwV9Ggwlyb8HXFX+BnGt7iRgvKSHJR0jaYCkUyTdF2ca\n/RqApJykOyVdCzwe910TZ8t9vDRjrqSTgKHxfBfHfaXSi+K5H5P0qKT9y85dlHSFpGmSLukq0HjM\nSZL+JenpOGJ2sRKBpBskbVe6tqRfxxhvkbS1pH9KelbS7mWnHynpdkn/lnRC2bkOjNd7WNJZpaQQ\nz/sbSVMJaxQ41z0z8823ltuAefF2e+D6sv1fA46P95cE7gfGEKaAeBsYXXbsiHg7lDB30Ijyc3dx\nrX2AmwnrIqwMzARWied+A1g1PncPYTbeyphvB06J93clLBoDcAjwh7Ljrge2i/cXAbvE+1fF6w8E\nNgIeLnv9bGAEMCT+WzYHxhHm8xkYjzsDOKjsvPum/f/oW2tsPRXBncu6ysVsPgtsKGnf+Hg48Alg\nAXCfmc0sO/YYSaXFWkYCaxPmv+nOtsBfzcyAuZL+CWxBWEL2PjObDRB/pY8BumrbuCrePhSP6cmH\nZnZTvP8Y8L6ZLZT0eMXrbzaz1+P1r4qxLiQkjAfCvHAMBV6Jxy8kzCbqXI88Ubh2dKSZ3VK+Q1IO\neKfi8U7A1mb2vqTbCb/GqzEWT0yl9oEPyvYtpPvP1gddHLOAztXA5XHML7u/CPgQwMwWVWlrUVlc\nF5rZj7s45v2Y8JzrkbdRuFY3D1im7PFNwLdKX6KSxkpaqovXDQdej0liXTrX08/v5kv4TuCA2A6y\nErAdoQTS1yU6ZwCbxDaQkfRurYXPSBohaShhWeC7COsg7xtjRdLykkb1MVbXD3mJwrWq0q/hR4CF\nsbrnfOD3hCqZh+I8/HMJ6xdUruB1I/ANSU8CTwP3lj13NvCopAfN7KDS68zsaknbxGsacKyZzVVY\nqL7y13ktv9ZL571L0vPAk4T1jB+sch7r4r4REtbfgNUJi9M8BCDpJ8DNsRF7PvAt4IUa43MO8GnG\nnXPO9cCrnpxzzlXlicI551xVniicc85V5YnCOedcVZ4onHPOVeWJwjnnXFWeKJxzzlXlicI551xV\n/x9YXMM0iqc13QAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10e2d1b38>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"step_lst = [0.001, 1.0, 10.0, 'armiho']\n",
"color_lst = ['-r', '-b', '-g', '-y']\n",
"loss_lists = []\n",
"for step_size in step_lst:\n",
" w, loss_lst = grad_descent(X, y_train, step_size, max_iter=20)\n",
" loss_lists.append(loss_lst)\n",
"\n",
"fun_min = loss_lists[0][0]\n",
"for loss_list in loss_lists:\n",
" if min(loss_list) < fun_min:\n",
" fun_min = min(loss_list)\n",
"for loss_list, clr, step_size in list(zip(loss_lists, color_lst, step_lst)):\n",
" plt.semilogy(range(1, len(loss_lst) + 1), [loss - fun_min for loss in loss_list], clr, label=str(step_size))\n",
"plt.legend()\n",
"plt.xlabel('Iteration number')\n",
"plt.ylabel('Function discrepancy')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"step size armiho\n",
"Iteration 20 / 20\n",
"ROC-AUC: 0.856175389725\n"
]
}
],
"source": [
"w, loss_lst = grad_descent(X, y_train, 'armiho', max_iter=20)\n",
"predicted = make_pred(X_test, w)[:, 1]\n",
"print('ROC-AUC:', roc_auc_score(y_test, predicted))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Теперь реализуйте стохастический градиентный спуск. Функция должна возвращать вектор весов и список значений функционала на каждой итерации. В стохастическом градиентном спуске должно выполняться не более max_iter итераций."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def stoch_grad_approximation(w, X, y, i):\n",
" Z = -y[i, None] * X[i, :]\n",
" anc_var = np.exp(-Z.dot(w))\n",
" return (Z / (1 + anc_var)).reshape(w.shape)\n",
"\n",
"def sgd(X, y, step_size, max_iter):\n",
" update_rate = 5\n",
" step0 = step_size\n",
" gamma = 0.5\n",
" step = step0\n",
" w = np.zeros((X.shape[1], 1))\n",
" loss_lst = []\n",
"\n",
" for i in range(max_iter):\n",
" clear_output(wait=True)\n",
" print('Iteration ', i+1, '/', max_iter)\n",
" loss_lst.append(get_func(w, X, y))\n",
" index = np.random.randint(0, y.size - 1)\n",
" gradient = stoch_grad_approximation(w, X, y, index)\n",
" w = w - step * gradient\n",
" if not (i % y.size) and i != 0:\n",
" step = step0 / np.power(iteration_counter + 1, gamma)\n",
" return w, loss_lst"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Подберите размер шага, при котором SGD будет сходиться. Нарисуйте график сходимости. Вычислите AUC-ROC на тестовой выборке."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 100 / 100\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAESCAYAAAABl4lHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHM1JREFUeJzt3Xm0XGWd7vHvk0BIGJOoTDEYhoCAIJMMV8SIStOgCMIV\nsaUVrrCkpQH1Yl+wVejVvdrbdrfSeoHbKkjjBUFlthUiEhChBSSYMI9hCkkEEgmRIcPv/vG+xSkO\nZ3irztk1nHo+a+11qvap2vutDannvONWRGBmZlZiXLsLYGZm3cOhYWZmxRwaZmZWzKFhZmbFHBpm\nZlbMoWFmZsUcGmZmVsyhYWZmxdZqdwEGI2k94GzgFWBORFzU5iKZmfW8Tq5pfBS4NCKOBw5pd2HM\nzKzFoSHpPEmLJc3vt/9ASfdLekjS3+Td04An8+PVrSynmZkNrNU1jfOBA+t3SBoPfCfv3wE4StL2\nwFPA9PyyTq4RmZn1jJZ+GUfEr4Gl/XbvCTwcEQsiYiXwI+AjwGXA4ZLOBq5qZTnNzGxgndARXt8M\nBamGsVdE/Ak4tj1FMjOzgXRCaDS9Nrskr+tuZtaEiFAz7+uE0Hiavr4L8uOnSt/c7AcfaySdERFn\ntLscncDXoo+vRR9fiz4j+YO7EzqY7wBmSpohaQJwJO7DMDPrSK0ecnsxcAuwraQnJR0TEauAE4Fr\ngXuBSyLivlaWy8zMyrS0eSoijhpk/8+Bn7eyLGPQnHYXoIPMaXcBOsicdhegg8xpdwHGAnXzPcIl\nhfs0zMwaM5Lvzk7oCDczGxUeUflGo/2HtUPDzMYUtz70qSJEO2H0lJmZdQmHhpmZFev65ilJZ5Du\ntzGnzUUxM+tokmYBs0Z0DI+eMrOxotO/EyTtC/wTaUXv1cB9wCkRcYekzYC/Aw4CNgSWADcBX4+I\nByTNAB4FVuTDrQBuB86KiF8Ocr4Br8dIrpObp8zMWkDShsA1wFnAFNJirWcCr0h6E2ni80Rg34jY\nANgNuBH4YL9DbZR/vzMwG7hc0qda8ylc0zCzMaSTvxMk7QHMjogpA/zu74GDI2LXId4/g1TTWCsi\n1tTt/yJwakRsOsB7XNMwM+tSDwCrJf0g3620Pjw+AFze5HEvBzaWtN2IS1jAoWFm1gIRsRzYl3Q7\niO8CSyRdKWkT4E3AotprJR0iaamkFyRdO8yhF+afU6sod38ODTPrKRIxGlsz546I+yPimIiYDrwD\n2Bz4JvBcflx73VW5GevzwIRhDjst/3y+mTI1yqFhZj0lAo3GNvJyxAPABaTwuB44VFL/45ac5zBg\ncT5e5RwaZmYtIGk7SV+QNC0/nw4cBdwK/CtpRNWFkrZSsgGwC2+8u6ny+zeRdCLwVeC0Vn0Oh4aZ\nWWssB/YCfivpRVJYzAO+GBHPAXsDLwM3Ay8Ac4H1gBP6HWdZfv884EDgiIj4QUs+AR5ya2ZjiL8T\nXq+KIbdeRsTMrEd4GRH/VWFmdfyd8Hqe3GdmZm3l0DAzs2IODTMzK+bQMDOzYg4NMzMr1vVDbs3M\n6knq3iGhXcChYWZjhofbVs/NU2ZmVsyhYWZmxRwaZmZWzKFhZmbFur4j3AsWmpmV8YKFXpzMzKxh\nXrDQzMxawqFhZmbFHBpmZlbMoWFmZsUcGmZmVsyhYWZmxRwaZmZWzKFhZmbFHBpmZlbMoWFmZsUc\nGmZmVsyhYWZmxRwaZmZWzEujm5n1CC+N7qXRzcwa5qXRzcysJRwaZmZWzKFhZmbFHBpmZlbMoWFm\nZsUcGmZmVsyhYWZmxRwaZmZWbNjQkHSSpCmtKIyZmXW2kprGJsDtki6VdKAkz8A2M+tRRcuISBoH\nHAB8GtgDuBT4fkQ8Umnphi+XlxExM2tQ5cuIRMQaYBGwGFgNTAF+IukbzZzUzMy607A1DUknA38J\nPAd8D7g8Ilbm2sdDEbF19cUctGyuaZiZNWgk350lS6NPBT4aEY/X74yINZI+3MxJzcysO5U0T/0C\nWFp7ImlDSXsBRMS9VRXMzMw6T0lonAO8WPd8BXBuNcUxM7NO1khHeO3xamB8ZSVqkKQz8t2ozMxs\nCJJm5budNn+Mgo7wy4EbSDUOAScA74uIQ0dy4tHgjnAzs8ZVPeT2s8C7gaeBp4C9geObOZmZmXU3\n3yPczKzHVDrkVtLGwHHAjLrXR0Qc28wJzcyse5XM07gSuAmYDdQ6xLu3emJmZk0r6Qi/KyJ2aVF5\nGuLmKTOzxlXdEX6NpIObObiZmY0tJTWNF4F1gVeBlXl3RMSGFZdtWK5pmJk1rtKO8IhYv5kDm5nZ\n2FPSEU6+c99MYGJtX0TcVFWhzMysM5UMuT0OOAmYDswlTe67Fdi/2qKZmVmnKekIPxnYE1gQEe8D\ndgX+WGmpzMysI5WExssR8RKApIkRcT+wXbXFMjOzTlTSp/Fk7tO4ApgtaSmwoNJSmZlZR2po7am8\nBPmGwC8i4tWqClXKQ27NzBpX9e1ekbQ7sC9p+ZCbOyEwzMys9Ybt05D0VeAHpHuFvxk4X9JXKi6X\nmZl1oJIZ4Q8CO0fEy/n5JOD3EbFtC8o3JDdPmZk1ruq1p54GJtU9n0i6GZOZmfWYkj6NF4B7JF2X\nn38QuE3St0lrUJ1UWenMzKyjlITG5XmrtWPNyY9FB9xXI98kfU5EzGlzUczMOloeATtrRMcoGXIr\naV1gizyxr2O4T8PMrHGV9mlIOoS05tQv8vNdJV3VzMnMzKy7lXSEnwHsBSwFiIi5wFYVlsnMzDpU\nSWisjIhl/fatGfCVZmY2ppV0hN8j6S+AtSTNJC2Tfku1xTIzs05UUtM4EdgReAW4mDQE95QqC2Vm\nZp1pyNFTktYCZuf7aHQcj54yM2tcZaOnImIVsEbS5KZKZmZmY0pJn8YKYL6k2fkxeCa4mVlPKgmN\ny/JWa8fqiJngZmbWeiWr3K4PvBQRq/Pz8cDEiFgx5BtbwH0aZmaNq3qV21/y+lVu1wVmN3MyMzPr\nbiWhMTEiXqw9iYjlpOAwM7MeUxIaK/LtXgGQtAfwUnVFMjOzTlXSEX4KcKmkZ/LzzYAjqyuSmZl1\nqtKl0ScA25FGTT0QESurLlgJd4SbmTWu6qXRP0bq15gPHAZcImm3Zk5mZmbdraRP4ysR8YKkfYH3\nA+cB51ZbLDMz60QlobE6//wQ8N2IuAZYu7oimZlZpyoJjacl/Tup8/tnkiYWvs/MzMaYkhnh6wEH\nAvMi4iFJmwE7RcR1rSjgUNwRbmbWuJF8dw4aGpI2zH0ZUwf6fUQ838wJR5NDw8yscVWFxs8i4mBJ\nC3jjAoUREW2/T7hDw8yscZWERjdwaJiZNW4k352Dzggfbi5GRNzZzAnNzKx7DdU8NYfULDUJ2B2Y\nl3+1M3BHROzTigIOxTUNM7PGVTIjPCJm5XuDLwR2i4jdI2J3YNe8z8zMekzJfIu35yVEAIiIu4Ht\nqytSYySdIWlWu8thZtbpJM2SdMaIjlEwT+NHwIvAD0m3ev0EsH5EHDWSE48GN0+ZmTWu0tFTkiYB\nJwDvybtuAs6JiJebOeFocmiYmTXOQ27NzKxY1fcINzMzAxwaZmbWAIeGmZkVG/Ye4ZK2A/4nMKPu\n9RER+1dYLjMz60Alo6fmAecAd9J3Q6aIiN9VXLZhuSPczKxxlaw9VWdlRJzTzMHNzGxsKenTuFrS\n5yRtJmlqbau8ZGZm1nFKmqcW4PtpmJmNGZ7cZ2ZmxSrt05A0gbSMyH6kGseNwLkRsbKZE5qZWfcq\naZ76PilcLiAtWHg0sCoiPlN98YbmmoaZWeOqXrBwXkTsPNy+dnBomJk1ruq1p1ZJ2qbuZFsDq5o5\nmZmZdbeSeRqnAr+S9Fh+PgM4prISmZlZxyoaPSVpIrAdqSP8gYh4peqClXDzlJlZ4yrp05D0/oi4\nXtLhpLConSAAIuKyZk44mhwaZmaNq2rI7X7A9cCHeePkPoC2h4aZmbVWyeiprSLi0eH2tYNrGmZm\njat69NRPBtj342ZOZmZm3W3Q5ilJ2wM7AJMlfZTUpxHAhsDE1hTPzMw6yVB9GtuS+jM2yj9rlgPH\nVVkoMzPrTCV9GvtExK0tKk9D3KdhZta4qvs0TpA0ue5kUySd18zJzMysu5WExs4Rsaz2JCKWArtV\nVyQzM+tUJaGh+jv15cfjqyuSmZl1qpK1p/4FuFXSpaQRVP8d+IdKS2VmZh2pdO2pHYH9SUNufxUR\n91ZdsBLuCDcza1zlt3uVNB7YlFQzqa099UQzJxxNkgJiLnAlcAUwL2LAJU/MzCyr+iZMfw18DVgC\nrK7tj4idmjnhaMqhMQv4CHAYsBK4BLgkgrvbWTYzs05VdWg8AuwZEc81c4Iq1X9wCQF7AEcCHwPW\nAe6t2+4G7ong2brXr59fVzM+P59EmvW+GniVFEaLInipBR/LzKxSVYfGDcABEbGymRNUabAPngNh\nc9IyKNsDO+btHaS7Do4jLYfyct5qAnipbv84YELeJgPXktbi+s8IVgxcJsaRZtG/iRRKfwSeA5a7\n6czMOkHVoXEeaUmRn5H+6gaIiPjXZk44mhr94DlMNiHVHF6IoDgIJd4CHEoaPbYfKQieBJ4C1iP1\n+WxGCos/5d8vpy9AJub9r+Ttpfz7F4BlwCPAA3lbmsu4Mv/+2dEMHIm1SIG6HTA7gmXDvMXMxpCq\nQ+OM/LD2QpFC48xmTjia2jV6SmJtUkBMB6YBK4BngEXAHwYKI4kJpGavdUgBMgnYgFTjmQpsQ/oS\n35YUNGvnbTKppvMosIAURstINZg35/e8HdiYFEorSEFzH3BX3sbn425HqnHtRAq8x4HdgXOAb0XQ\ncU2QZjb6Kh891al6ZcitxEbAlqT7s08hBclkUo3kflLtZBEpiNbLv9sR2CVvq+irxdwH3BXB8nzs\nrYDTgI8CPwL+H3Brfc1GYlwEa6r+nGbWGq3o0+gvImL/Zk44mnolNFpBYjrwl8AnSbWhG4G3AjPz\nz0eB/8rbXOBB10zMulPVobFH3dOJwOHAqog4tZkTjiaHxujL/T67AnuSmsMeIvXbzAT2yds7SE1d\nq0jNXONITWnjSM1hs0l9JY+3uPhmVqDlzVOSbo+IdzVzwtGU5mlwJjAnIua0uTg9JYfLW0h9OqtJ\nnfYihc0HgQ+Q+lh+m7f5pMmhtX6dhcC9/YZATyX18Tzu5jCz0SdpFjAL+FqVNY2pdU/HkeZCnBUR\n2zVzwtHkmkbnykOPZwJ75W17UrC8RBo9Np00gmslaRTZ5vl3L5KC4zZS2Cwh1WhWAU8A1zcy6s3M\n3qjq5qkF9I2cWkVqsjgzIm5u5oSjyaHR3XLtYlPSfJaFtbkveXjzXqRay2RSDWUtUrPYTOCnwGWk\nZrPaSLL1SCPaatvmeZtC6oe5ltQPE/kca+ff1ebhrE36//vVvD0b0bcCgtlYUkloSNqiE9aXGopD\no/dIzAA+DhxMGnJcG0lWG/Zc254mNYG9COwL/BmpGe0FUqhMJgXOK6SQWEUamjyB1Hw2HrgeuA64\nidRk9kpdOSYAW5Ca4J5pZh6NxGRSDWzeYJNFrXUk3gZ8kfTf/gv1/73HmqpCY25E7Jof/zQiDh9B\nGSvh0LBSuVbzdlIgPMMwNQmJaaR+mQNInf/TgGdJYbQxKXgWkmpJQRoAMB94mDRR8zF4rTYzmRRw\nm5Iml25BGmywSX7tNOA84P948EBrSYwnDUv/a+DDwHdJtdnJwGERvNDG4lWmFaHx2uNO4tCwVsmz\n6DcnfcEvBp6MYGXdkjW7kCZNbg1sRZpT8wqpNrOUNClzcd6eJoXMgxGsznNlPgd8GrgVuBi4qjaX\nxkaXxDakoeXvJjWBLgL+gxTay3KQfJv0x8JBETzTtsJWxKFhNgZIrE9aqubjwHuAXwJzSM1jd9fX\njHJYrUsacfZm+gJtM9KcmjkRPNnv+FNIzXQfAg4k1ZSuyNvcblsbTWJz4BOkQP7BUOXPQXAQKZx3\nA35Ian78r4HmG+XrezrwGVKN467R/wTtU1VorCa110IaJlm/wmtExIbNnHA0OTRsrJJ4E6nfZj9S\ngGxKGmVWW15mXVIfzfN5W0iqwSwmzaF5L2mAwAL61kWbCNwAXAP8nBQyh5JuKzAZ+H3eHiKF0My8\nbUQaOal8/AuBi0ayZln+Ut6MNIJu57zNBC4C/m8EqwZ4/cak5XZ2AI4A3gVcnt/7B+B/9K8VSOwA\nfIpUs3gSOBu4NOJ1C5UOVc4jge8AX4rg/KY+7Ajkz70faa28uaN3XC8jYjam5RBZl76l+v801Bdf\nHvK8A2k2f21wwLMDzX/JX0zTSM1rO5O+mBcCD5ICZCmp32YN6Yv9WFKN5XpSmEwjhcyivO9XwDz6\nFut8Sy7HdFJ/Tm2dtZdJy+DUwupp4FRSOJySj3EwcAjwPlJIPpS3a0hNeC/lkXBfAY4H/p40ZHtH\n4J2kPqUfAhdE0NQdRyW2J43W+w1wC32TXF8Cjo1gfjPHHeacU0hh91nSII3JpCbNv4vgtpEf36Fh\nZi1UVxNaQfqyfwZ4G+m20O8nDTpYSho88BxpePQTeXsUeCCC5wc4rki1n38mBc51wNXAtREsGaZM\n+5DC5nHgnrzNHY2h0xIbAN8gDXy4lTSM+53A/wb+Afg30gKkRwJ/QRoQcWYET9QdYzLphnE70bc4\n6R3AlyNYkF8zATiJtB7ctaSa0W9IAziOzfvvyeed02yTokPDzMaUPPBgXMRrt2PoSBJbk5rr1ifV\noq4ndaq/Czgh/242qe/l4Pz720mLhz5Maho8mTR67mZSGDwKnBLBgwOcbx3gaFKN7AVSkF3Wvzlv\n+HI7NMzM2iIH3IGkTvVn6/ZvAnyZFCAXkfqBBup034y0HNJewN8C1wxXg8jNj4cAXyL1WZ0FnFc6\n4s6hYWbWo3Kz3BdI/T7/AXyv1n+Tm7sOB04Ejuvb79AwM+tpElsCx5Hm+zxGWrvtKFIfyHdINZhV\n6bUODTMz47XmsoNIExcvGmjUmEPDzMyKjeS7c9xoF8bMzMYuh4aZmRVzaJiZWTGHhpmZFXNomJlZ\nMYeGmZkVc2iYmVkxh4aZmRVzaJiZWTGHhpmZFXNomJlZMYeGmZkVc2iYmVkxh4aZmRVzaJiZWTGH\nhpmZFXNomJlZMYeGmZkVc2iYmVkxh4aZmRVzaJiZWTGHhpmZFXNomJlZMYeGmZkVc2iYmVkxh4aZ\nmRVzaJiZWTGHhpmZFXNomJlZMYeGmZkVc2iYmVkxh4aZmRVzaJiZWTGHhpmZFXNomJlZMYeGmZkV\nc2iYmVkxh4aZmRVzaJiZWTGHhpmZFXNomJlZMYeGmZkVc2iYmVkxh4aZmRVzaJiZWbGODQ1JW0r6\nnqQft7ssZmaWdGxoRMRjEfGZdpfDzMz6VB4aks6TtFjS/H77D5R0v6SHJP1N1eUY6yTNancZOoWv\nRR9fiz6+FqOjFTWN84ED63dIGg98J+/fAThK0vaSjpb0TUmbt6BcY82sdhegg8xqdwE6yKx2F6CD\nzGp3AcaCykMjIn4NLO23e0/g4YhYEBErgR8BH4mICyPi8xGxUNJUSecCu7gmYmbWGdZq03mnAU/W\nPX8K2Kv+BRHxPPDZVhbKzMyG1q7QiNE6kKRRO1a3k/S1dpehU/ha9PG16ONrMXLtCo2ngel1z6eT\nahsNiQiNWonMzGxY7RpyewcwU9IMSROAI4Gr2lQWMzMr1IohtxcDtwDbSnpS0jERsQo4EbgWuBe4\nJCLuq7osZmY2Mq0YPXVURGweEetExPSIOD/v/3lEbBcR20TEPzZ63F6e5yFpuqQbJN0j6W5JJ+X9\nUyXNlvSgpOskTW53WVtB0nhJcyVdnZ/35HUAkDRZ0k8k3SfpXkl79eL1kHRa/vcxX9JFktbplesw\n0Ny4oT57vlYP5e/TA4Y7fsfOCB/KYPM82luqlloJfD4idgT2Bj6XP///AmZHxLbA9fl5LziZVGOt\nDYro1esAcBbwnxGxPbAzcD89dj0kzQCOA3aLiJ2A8cDH6Z3r8Ia5cQzy2SXtQOoe2CG/52xJQ+ZC\nV4YGg8zzaHOZWiYiFkXEXfnxi8B9pGHMhwAX5JddABzanhK2jqS3AgcB3wNqAyN67joASNoIeE9E\nnAcQEasi4o/03vV4gfSH1bqS1gLWBRbSI9dhkLlxg332jwAXR8TKiFgAPEz6fh1Ut4bGQPM8prWp\nLG2V/6raFfgtsElELM6/Wgxs0qZitdI3gVOBNXX7evE6AGwJ/EHS+ZLulPRdSevRY9cjz/H6F+AJ\nUlgsi4jZ9Nh16Gewz745rx+5Oux3abeGhudmAJLWB34KnBwRy+t/FxHBGL9Okj4ELImIufTVMl6n\nF65DnbWA3YCzI2I3YAX9mmB64XpI2ho4BZhB+lJcX9In61/TC9dhMAWffcjr0q2hMSrzPLqZpLVJ\ngXFhRFyRdy+WtGn+/WbAknaVr0X+G3CIpMeAi4H9JV1I712HmqeApyLi9vz8J6QQWdRj12MP4JaI\neC6P1LwM2Ifeuw71Bvs30f+79K1536C6NTR6ep6HJAHfB+6NiG/V/eoq4FP58aeAK/q/dyyJiNPz\niLwtSR2dv4qIo+mx61ATEYuAJyVtm3d9ALgHuJreuh73A3tLmpT/rXyANFCi165DvcH+TVwFfFzS\nBElbAjOB24Y6kFJNpftI+nPgW6SREd9vZthut5K0L3ATMI++quRppP/YlwJbAAuAj0XEsnaUsdUk\nvRf4YkQcImkqvXsd3kkaFDABeAQ4hvRvpKeuh6Qvkb4c1wB3Ap8BNqAHrkOeG/de4M2k/ouvAlcy\nyGeXdDpwLLCK1NR97ZDH79bQMDOz1uvW5ikzM2sDh4aZmRVzaJiZWTGHhpmZFXNomJlZMYeGmZkV\nc2hYV5L0Yv75NklHjfKxT+/3/DejefzRJunTkr7d7nJYb3BoWLeqTTDaEvhEI2/MK58O5bTXnSji\n3Y0cvw1GNNlquKWwzer5fxbrdl8H3pNvwnSypHGSviHpNkm/l3Q8gKRZkn4t6Urg7rzvCkl35BtZ\nHZf3fR2YlI93Yd5Xq9UoH3u+pHmSPlZ37DmSfpxvfvTDgQqaX/N1Sb+V9ECe2f+GmoKkayTtVzu3\npH/KZZwtaW9JN0p6RNKH6w5fuzHXg5K+WnesT+bzzZV0bi0g8nH/WdJdpHuymJWJCG/eum4Dluef\n7wWurtt/PPDl/Hgd4HbSaqezgBeBt9W9dkr+OQmYX/d8+SDnOhy4jrSi7sbA48Cm+djLSCuqinR7\n43cPUOYbgG/kx39OuikOwKeBb9e97mpgv/x4DfBn+fFl+fzjSTdYmlv3/oXAFGBi/iy7A9uT1hYa\nn193NnB03XGPaPd/R2/dtw1XTTfrdP2XRD8A2EnSEfn5hsA2pHV1bouIx+tee7Kk2s1opjP8Ym37\nAhdFRABLJN0IvIt005/bImIhQP7rfQYwUF/IZfnnnfk1w3k1+tYCmg+8HBGrJd3d7/3XRcTSfP7L\ncllXk8LjjrRuH5OARfn1q0mrJJs1xKFhY9GJkW668xpJs0j3l6h//n5g74h4WdINpL/ShxK8MaRq\n/Qmv1O1bzeD/tl4Z4DWreH1TcX05VtY9XgO8ChARa4bom1FduS6IiNMHeM3LOfzMGuI+Det2y0mr\nl9ZcC/xV7QtV0raS1h3gfRsCS3NgvJ3Xt+uvHOQL+dfAkbnf5C3AfqSayYA3gGrAAmCX3GcynWFu\ntzmID0qaImkS6RaeN5PuBX1ELiuSpkraYoRltR7nmoZ1q9pfyb8HVucmofOBfyM129yZ76WwBDgs\nv77+L+tfAJ+VdC/wAHBr3e/+HZgn6XeR7s8RABFxuaR98jkDODUilkjanjeOYCr5K7523JuVbiR1\nL+l+778b4jgxwOMghddPSTfRuTAi7gSQ9LfAdbkDfCXwV6TboLqWYU3x0uhmZlbMzVNmZlbMoWFm\nZsUcGmZmVsyhYWZmxRwaZmZWzKFhZmbFHBpmZlbMoWFmZsX+PwrR6zf0u12zAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10cb28320>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"ROC-AUC: 0.728851134442\n"
]
}
],
"source": [
"w, loss_lst = sgd(X, y_train, step_size=1e-2, max_iter=100)\n",
"plt.semilogy(range(len(loss_lst)), [loss - fun_min for loss in loss_lst], label='SGD')\n",
"plt.legend()\n",
"plt.xlabel('Iteration number')\n",
"plt.ylabel('Function discrepancy')\n",
"plt.show()\n",
"\n",
"predicted = make_pred(X_test, w)[:, 1]\n",
"print('ROC-AUC:', roc_auc_score(y_test, predicted))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"А теперь самое сложное: заново сформируйте выборку с помощью transform_data, но теперь не масштабируйте вещественные признаки. Запустите на этой выборке градиентный спуск. Что вы наблюдаете? Удается ли достичь такого же качества, как и при использовании масштабирования?"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"step size armiho\n",
"Iteration 100 / 100\n",
"ROC-AUC: 0.512250178675\n",
"Несмотря на то, что методу было дано большее количество итераций, \n",
"чем в случае со шкалированными признаками, результат оказался хуже.\n"
]
}
],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"scaler = StandardScaler()\n",
"transformer = DictVectorizer()\n",
"X_noscale, scaler, transformer = transform_data(data_train, scaler, transformer, False, scale=False)\n",
"X_test_noscale, _, _ = transform_data(data_test, scaler, transformer, True, scale=False)\n",
"w, loss_lst = grad_descent(X_noscale, y_train, 'armiho', max_iter=100)\n",
"predicted = make_pred(X_test_noscale, w)[:, 1]\n",
"print('ROC-AUC:', roc_auc_score(y_test, predicted))\n",
"print('Несмотря на то, что методу было дано большее количество итераций, ')\n",
"print('чем в случае со шкалированными признаками, результат оказался хуже.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Здесь вы можете поделиться своими мыслями по поводу этой части."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Я дополнительно реализовал градиентный спуск с правилом Армихо для выбора длины шага и использовал его в качестве лучшего варианта."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Часть 3. Линейная регрессия на простом примере"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В этой части мы очень кратко разберемся с линейной регрессией и измерением качества ее прогнозов. Будем использовать датасет diabetes."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"from sklearn.cross_validation import train_test_split\n",
"data = datasets.load_diabetes()\n",
"X, X_test, y, y_test = train_test_split(data.data, data.target, train_size=0.7, random_state=241)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучите линейную регрессию с L2-регуляризатором, подобрав лучшее значение параметра регуляризации на тестовой выборке. Используйте MSE в качестве тестовой выборки. При каком значении этого параметра достигается наилучшее качество?"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Наименьшее MSE 3096.03015101 достигается при alpha = 0.1\n"
]
}
],
"source": [
"from sklearn.linear_model import Ridge\n",
"alpha_grid = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]\n",
"mse_lst = []\n",
"for alpha in alpha_grid:\n",
" linreg = Ridge(alpha)\n",
" linreg.fit(X, y)\n",
" preds = linreg.predict(X_test)\n",
" mse_lst.append(np.linalg.norm(preds - y_test)**2 / y_test.size)\n",
"print('Наименьшее MSE', np.min(mse_lst), 'достигается при alpha =', alpha_grid[np.argmin(mse_lst)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Вычислите для лучшей из моделей RMSE и коэффициент детерминации (r2_score) на тестовой выборке. Что вы можете сказать о значении коэффицинта детерминации? Насколько данная модель близка к оптимальной?"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RMSE: 55.6419819113\n",
"Коэффициент детерминации: 0.417024184822\n",
"Полученный коэффициент детерминации показывает, что модель\n",
"далека от хорошей, и ее предсказания лишь вдвое лучше, чем \n",
"модель, предсказывающая среднее значение по выборке для любого объекта.\n"
]
}
],
"source": [
"from sklearn.metrics import r2_score\n",
"alpha = alpha_grid[np.argmin(mse_lst)]\n",
"linreg = Ridge(alpha)\n",
"linreg.fit(X, y)\n",
"preds = linreg.predict(X_test)\n",
"print('RMSE:', np.linalg.norm(preds - y_test) / np.sqrt(y_test.size))\n",
"print('Коэффициент детерминации:', r2_score(y_test, preds))\n",
"print('Полученный коэффициент детерминации показывает, что модель')\n",
"print('далека от хорошей, и ее предсказания лишь вдвое лучше, чем ')\n",
"print('модель, предсказывающая среднее значение по выборке для любого объекта.')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment