Created
June 19, 2020 10:00
-
-
Save vamshigvk/4964c8d15ee9cd363210b733c18d8798 to your computer and use it in GitHub Desktop.
Machine learning assignment for classification algorithms part of IBM data science week6 project
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "<a href=\"https://www.bigdatauniversity.com\"><img src=\"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width=\"400\" align=\"center\"></a>\n\n<h1 align=\"center\"><font size=\"5\">Classification with Python</font></h1>" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "In this notebook we try to practice all the classification algorithms that we learned in this course.\n\nWe load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.\n\nLets first load required libraries:" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [], | |
"source": "import itertools\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.ticker import NullFormatter\nimport pandas as pd\nimport numpy as np\nimport matplotlib.ticker as ticker\nfrom sklearn import preprocessing\n%matplotlib inline" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "### About dataset" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:\n\n| Field | Description |\n|----------------|---------------------------------------------------------------------------------------|\n| Loan_status | Whether a loan is paid off on in collection |\n| Principal | Basic principal loan amount at the |\n| Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |\n| Effective_date | When the loan got originated and took effects |\n| Due_date | Since it\u2019s one-time payoff schedule, each loan has one single due date |\n| Age | Age of applicant |\n| Education | Education of applicant |\n| Gender | The gender of applicant |" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Lets download the dataset" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "--2020-06-19 06:37:28-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv\nResolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196\nConnecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 23101 (23K) [text/csv]\nSaving to: \u2018loan_train.csv\u2019\n\n100%[======================================>] 23,101 --.-K/s in 0.001s \n\n2020-06-19 06:37:28 (15.9 MB/s) - \u2018loan_train.csv\u2019 saved [23101/23101]\n\n" | |
} | |
], | |
"source": "!wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "### Load Data From CSV File " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/8/2016</td>\n <td>10/7/2016</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>male</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/8/2016</td>\n <td>10/7/2016</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>female</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>9/8/2016</td>\n <td>9/22/2016</td>\n <td>27</td>\n <td>college</td>\n <td>male</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/9/2016</td>\n <td>10/8/2016</td>\n <td>28</td>\n <td>college</td>\n <td>female</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/9/2016</td>\n <td>10/8/2016</td>\n <td>29</td>\n <td>college</td>\n <td>male</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 9/8/2016 \n1 2 2 PAIDOFF 1000 30 9/8/2016 \n2 3 3 PAIDOFF 1000 15 9/8/2016 \n3 4 4 PAIDOFF 1000 30 9/9/2016 \n4 6 6 PAIDOFF 1000 30 9/9/2016 \n\n due_date age education Gender \n0 10/7/2016 45 High School or Below male \n1 10/7/2016 33 Bechalor female \n2 9/22/2016 27 college male \n3 10/8/2016 28 college female \n4 10/8/2016 29 college male " | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df = pd.read_csv('loan_train.csv')\ndf.head()" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "(346, 10)" | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df.shape" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "### Convert to date time object " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>male</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>female</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>2016-09-08</td>\n <td>2016-09-22</td>\n <td>27</td>\n <td>college</td>\n <td>male</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>28</td>\n <td>college</td>\n <td>female</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>29</td>\n <td>college</td>\n <td>male</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 2016-09-08 \n1 2 2 PAIDOFF 1000 30 2016-09-08 \n2 3 3 PAIDOFF 1000 15 2016-09-08 \n3 4 4 PAIDOFF 1000 30 2016-09-09 \n4 6 6 PAIDOFF 1000 30 2016-09-09 \n\n due_date age education Gender \n0 2016-10-07 45 High School or Below male \n1 2016-10-07 33 Bechalor female \n2 2016-09-22 27 college male \n3 2016-10-08 28 college female \n4 2016-10-08 29 college male " | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df['due_date'] = pd.to_datetime(df['due_date'])\ndf['effective_date'] = pd.to_datetime(df['effective_date'])\ndf.head()" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "# Data visualization and pre-processing\n\n" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Let\u2019s see how many of each class is in our data set " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "PAIDOFF 260\nCOLLECTION 86\nName: loan_status, dtype: int64" | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df['loan_status'].value_counts()" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "260 people have paid off the loan on time while 86 have gone into collection \n" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Lets plot some columns to underestand data better:" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "Solving environment: done\n\n## Package Plan ##\n\n environment location: /opt/conda/envs/Python36\n\n added / updated specs: \n - seaborn\n\n\nThe following packages will be downloaded:\n\n package | build\n ---------------------------|-----------------\n certifi-2020.4.5.2 | py36_0 160 KB anaconda\n openssl-1.1.1g | h7b6447c_0 3.8 MB anaconda\n ca-certificates-2020.1.1 | 0 132 KB anaconda\n seaborn-0.10.1 | py_0 160 KB anaconda\n ------------------------------------------------------------\n Total: 4.2 MB\n\nThe following packages will be UPDATED:\n\n ca-certificates: 2020.1.1-0 --> 2020.1.1-0 anaconda\n certifi: 2020.4.5.1-py36_0 --> 2020.4.5.2-py36_0 anaconda\n openssl: 1.1.1g-h7b6447c_0 --> 1.1.1g-h7b6447c_0 anaconda\n seaborn: 0.9.0-pyh91ea838_1 --> 0.10.1-py_0 anaconda\n\n\nDownloading and Extracting Packages\ncertifi-2020.4.5.2 | 160 KB | ##################################### | 100% \nopenssl-1.1.1g | 3.8 MB | ##################################### | 100% \nca-certificates-2020 | 132 KB | ##################################### | 100% \nseaborn-0.10.1 | 160 KB | ##################################### | 100% \nPreparing transaction: done\nVerifying transaction: done\nExecuting transaction: done\n" | |
} | |
], | |
"source": "# notice: installing seaborn might takes a few minutes\n!conda install -c anaconda seaborn -y" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": "count, binedges = np.histogram(df.Principal)" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "[ 3 0 2 0 0 0 0 81 2 258] [ 300. 370. 440. 510. 580. 650. 720. 790. 860. 930. 1000.]\n" | |
} | |
], | |
"source": "print(count,binedges)" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAG5VJREFUeJzt3Xt8VeWd7/HPV8yIClWRqAhiIqKISgNmtF6Hwsig1tuxWqxHceo5VKt1mNbjreelnfoa64XWtkel4sih0ypq6YAObbVU5Si2XgJGBC+UatQoIFCn1iIU8Hf+WCvpBncgyd7JXtn7+3691itrPev22yt58tvr2Ws/jyICMzOzrNmh1AGYmZnl4wRlZmaZ5ARlZmaZ5ARlZmaZ5ARlZmaZ5ARlZmaZ5ATVRSTtLek+Sa9LWijpt5LOLNKxR0uaW4xjdQdJ8yXVlzoOK41yqguSqiU9K+kFScd34Xk+7Kpj9yROUF1AkoA5wJMRcUBEHAFMAAaVKJ4dS3FeszKsC2OBVyNiZEQ8VYyYrG1OUF1jDPCXiPhhS0FEvBkR/wdAUi9Jt0p6XtJiSV9Oy0endxuzJL0q6d60giNpfFq2APhvLceVtKuk6emxXpB0elp+oaSfSvpP4FeFvBhJMyRNlfRE+i7479JzviJpRs52UyU1SFoq6V/aONa49B30ojS+PoXEZplXNnVBUh1wC3CypEZJO7f19yypSdKN6boGSaMkPSrp95IuTrfpI+mxdN+XWuLNc97/lXN98tarshURnoo8AZcDt21j/STgf6fzOwENQC0wGvgjybvLHYDfAscBvYG3gaGAgAeBuen+NwL/PZ3fHVgG7ApcCDQD/dqI4SmgMc/093m2nQHcn577dOAD4PA0xoVAXbpdv/RnL2A+MCJdng/UA/2BJ4Fd0/KrgOtK/fvy1HVTGdaFC4Hb0/k2/56BJuCSdP42YDHQF6gG3kvLdwQ+lXOs5YDS5Q/Tn+OAaelr3QGYC5xQ6t9rd01u+ukGku4gqVx/iYi/JfmjGyHp8+kmu5FUuL8Az0VEc7pfI1ADfAi8ERG/S8t/QlKxSY91mqQr0uXewOB0fl5E/CFfTBHR0fbz/4yIkPQSsCoiXkpjWZrG2AicI2kSScUbAAwnqZgtPpOWPZ2+Gf4bkn88ViHKpC602N7f88Ppz5eAPhHxJ+BPktZL2h34M3CjpBOAj4GBwN7AypxjjEunF9LlPiTX58lOxtyjOEF1jaXAWS0LEXGppP4k7w4heTf01Yh4NHcnSaOBDTlFm/nr76itThMFnBURr211rKNIKkD+naSnSN7Rbe2KiPh1nvKWuD7eKsaPgR0l1QJXAH8bEe+nTX+988Q6LyLObSsuKzvlWBdyz7etv+dt1hngPJI7qiMiYqOkJvLXmW9HxF3biKNs+TOorvE40FvSJTllu+TMPwpcIqkKQNJBknbdxvFeBWolDUmXcyvEo8BXc9rnR7YnwIg4PiLq8kzbqpDb8imSfwJ/lLQ3cFKebZ4BjpV0YBrrLpIO6uT5rGco57pQ6N/zbiTNfRslfRbYP882jwJfyvlsa6CkvTpwjh7NCaoLRNJ4fAbwd5LekPQc8COSNmqAfwNeBhZJWgLcxTbuZiNiPUkzxs/TD4bfzFl9A1AFLE6PdUOxX097RMSLJM0QS4HpwNN5tllN0oY/U9Jikgo+rBvDtG5WznWhCH/P9wL1khpI7qZezXOOXwH3Ab9Nm9dnkf9uryy1fCBnZmaWKb6DMjOzTHKCMjOzTHKCMjOzTHKCMjOzTMpEgho/fnyQfLfBk6dymYrG9cNTmU3tlokEtWbNmlKHYJZZrh9WqTKRoMzMzLbmBGVmZpnkBGVmZpnkzmLNrKxs3LiR5uZm1q9fX+pQKlrv3r0ZNGgQVVVVnT6GE5SZlZXm5mb69u1LTU0Nab+x1s0igrVr19Lc3ExtbW2nj+MmPjMrK+vXr2fPPfd0ciohSey5554F38U6QVnF2X/AACQVNO0/YECpX4Ztg5NT6RXjd+AmPqs4b61cSfO+gwo6xqB3m4sUjZm1xXdQZlbWinHH3NG75169elFXV8dhhx3G2Wefzbp161rXzZ49G0m8+upfh39qamrisMMOA2D+/PnstttujBw5koMPPpgTTjiBuXPnbnH8adOmMWzYMIYNG8aRRx7JggULWteNHj2agw8+mLq6Ourq6pg1a9YWMbVMTU1NhVzWbuE7KDMra8W4Y87VnrvnnXfemcbGRgDOO+88fvjDH/K1r30NgJkzZ3Lcccdx//33881vfjPv/scff3xrUmpsbOSMM85g5513ZuzYscydO5e77rqLBQsW0L9/fxYtWsQZZ5zBc889xz777APAvffeS319fZsx9RTbvYOSNF3Se+kIlS1l35T0jqTGdDo5Z901kpZLek3SP3RV4GZmPcHxxx/P8uXLAfjwww95+umnueeee7j//vvbtX9dXR3XXXcdt99+OwA333wzt956K/379wdg1KhRTJw4kTvuuKNrXkAJtaeJbwYwPk/5bRFRl06/AJA0HJgAHJruc6ekXsUK1sysJ9m0aRO//OUvOfzwwwGYM2cO48eP56CDDqJfv34sWrSoXccZNWpUa5Pg0qVLOeKII7ZYX19fz9KlS1uXzzvvvNamvLVr1wLw0UcftZadeeaZxXh5XW67TXwR8aSkmnYe73Tg/ojYALwhaTlwJPDbTkdoZtbDtCQDSO6gLrroIiBp3ps8eTIAEyZMYObMmYwaNWq7x4vYdifgEbHFU3Pl0sRXyGdQl0m6AGgAvh4R7wMDgWdytmlOyz5B0iRgEsDgwYMLCMOs/Lh+9Gz5ksHatWt5/PHHWbJkCZLYvHkzkrjlllu2e7wXXniBQw45BIDhw4ezcOFCxowZ07p+0aJFDB8+vLgvIgM6+xTfVGAIUAesAL6Tlud78D1v6o+IaRFRHxH11dXVnQzDrDy5fpSfWbNmccEFF/Dmm2/S1NTE22+/TW1t7RZP4OWzePFibrjhBi699FIArrzySq666qrWprvGxkZmzJjBV77ylS5/Dd2tU3dQEbGqZV7S3UDLM5DNwH45mw4C3u10dGZmBRq8zz5F/d7a4PRJuY6aOXMmV1999RZlZ511Fvfddx9XXXXVFuVPPfUUI0eOZN26dey111784Ac/YOzYsQCcdtppvPPOOxxzzDFIom/fvvzkJz9hQBl+eVzba9sESD+DmhsRh6XLAyJiRTr/z8BRETFB0qHAfSSfO+0LPAYMjYjN2zp+fX19NDQ0FPI6zNpNUlG+qLudulO0rgxcPzrmlVdeaW0Os9Jq43fR7rqx3TsoSTOB0UB/Sc3A9cBoSXUkzXdNwJcBImKppAeBl4FNwKXbS05mZmb5tOcpvnPzFN+zje3/FfjXQoIyMzNzV0dmZpZJTlBmZpZJTlBmZpZJTlBmZpZJTlBmVtb2HTS4qMNt7DuofT17rFy5kgkTJjBkyBCGDx/OySefzLJly1i6dCljxozhoIMOYujQodxwww2tX1mYMWMGl1122SeOVVNTw5o1a7YomzFjBtXV1VsMofHyyy8DsGzZMk4++WQOPPBADjnkEM455xweeOCB1u369OnTOiTHBRdcwPz58/nc5z7Xeuw5c+YwYsQIhg0bxuGHH86cOXNa11144YUMHDiQDRs2ALBmzRpqamo69DtpLw+3YWZlbcU7b3PUdY8U7XjPfitf39lbigjOPPNMJk6c2NpreWNjI6tWreLCCy9k6tSpjBs3jnXr1nHWWWdx5513tvYU0RFf+MIXWns5b7F+/XpOOeUUvvvd73LqqacC8MQTT1BdXd3a/dLo0aOZMmVKa3998+fPb93/xRdf5IorrmDevHnU1tbyxhtvcOKJJ3LAAQcwYsQIIBlbavr06VxyySUdjrkjfAdlZlZkTzzxBFVVVVx88cWtZXV1dSxbtoxjjz2WcePGAbDLLrtw++23c9NNNxXt3Pfddx9HH310a3IC+OxnP9s6IOL2TJkyhWuvvZba2loAamtrueaaa7j11ltbt5k8eTK33XYbmzZtKlrc+ThBmZkV2ZIlSz4xJAbkHypjyJAhfPjhh3zwwQcdPk9us11dXR0fffRRm+dur/YM5zF48GCOO+44fvzjH3f6PO3hJj4zs26y9bAYudoq35Z8TXyFyhdjvrJrr72W0047jVNOOaWo58/lOygzsyI79NBDWbhwYd7yrftVfP311+nTpw99+/bt0nN3ZP+tY8w3nMeBBx5IXV0dDz74YKfPtT1OUGZmRTZmzBg2bNjA3Xff3Vr2/PPPM3ToUBYsWMCvf/1rIBnY8PLLL+fKK68s2rm/+MUv8pvf/Iaf//znrWWPPPIIL730Urv2v+KKK/j2t79NU1MTAE1NTdx44418/etf/8S23/jGN5gyZUpR4s7HTXxmVtYGDNyvXU/edeR42yOJ2bNnM3nyZG666SZ69+5NTU0N3/ve93jooYf46le/yqWXXsrmzZs5//zzt3i0fMaMGVs81v3MM8kYsCNGjGCHHZJ7inPOOYcRI0bwwAMPbDGe1J133skxxxzD3LlzmTx5MpMnT6aqqooRI0bw/e9/v12vr66ujptvvplTTz2VjRs3UlVVxS233NI6QnCuQw89lFGjRrV76PqOatdwG13NwwlYd/JwG+XNw21kR6HDbWy3iU/SdEnvSVqSU3arpFclLZY0W9LuaXmNpI8kNabTD9sbiJmZWa72fAY1A9j6/ngecFhEjACWAdfkrPt9RNSl08WYmZl1wnYTVEQ8Cfxhq7JfRUTLN7SeIRna3cwsE7Lw0UWlK8bvoBhP8X0J+GXOcq2kFyT9P0nHt7WTpEmSGiQ1rF69ughhmJUP14/O6927N2vXrnWSKqGIYO3atfTu3bug4xT0FJ+kb5AM7X5vWrQCGBwRayUdAcyRdGhEfOIr0hExDZgGyYfAhcRhVm5cPzpv0KBBNDc348ReWr1792bQoMIa1zqdoCRNBD4HjI30rUpEbAA2pPMLJf0eOAjwI0hm1i2qqqpa+5Gznq1TTXySxgNXAadFxLqc8mpJvdL5A4ChwOvFCNTMzCrLdu+gJM0ERgP9JTUD15M8tbcTMC/tn+mZ9Im9E4BvSdoEbAYujog/5D2wmZnZNmw3QUXEuXmK72lj258BPys0KDMzM/fFZ2ZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWZmmdSuBCVpuqT3JC3JKesnaZ6k36U/90jLJekHkpZLWixpVFcFb2Zm5au9d1AzgPFblV0NPBYRQ4HH0mWAk0hG0h0KTAKmFh6mmZlVmnYlqIh4Eth6ZNzTgR+l8z8Czsgp//dIPAPsLmlAMYI1M7PKUchnUHtHxAqA9OdeaflA4O2c7ZrTsi1ImiSpQVLD6tWrCwjDrPy4fph1zUMSylMWnyiImBYR9RFRX11d3QVhmPVcrh9mhSWoVS1Nd+nP99LyZmC/nO0GAe8WcB4zM6tAhSSoh4GJ6fxE4KGc8gvSp/k+A/yxpSnQzMysvXZsz0aSZgKjgf6SmoHrgZuAByVdBLwFnJ1u/gvgZGA5sA74xyLHbGZmFaBdCSoizm1j1dg82wZwaSFBmZmZuScJMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLpHb1Zp6PpIOBB3KKDgCuA3YH/ifQMk71tRHxi05HaGZmFanTCSoiXgPqACT1At4BZpOM/3RbREwpSoRmZlaRitXENxb4fUS8WaTjmZlZhStWgpoAzMxZvkzSYknTJe2RbwdJkyQ1SGpYvXp1vk3MKpbrh1kREpSkvwFOA36aFk0FhpA0/60AvpNvv4iYFhH1EVFfXV1daBhmZcX1w6w4d1AnAYsiYhVARKyKiM0R8TFwN3BkEc5hZmYVphgJ6lxymvckDchZdyawpAjnMDOzCtPpp/gAJO0CnAh8Oaf4Fkl1QABNW60zMzNrl4ISVESsA/bcquz8giIyMzPDPUmYmVlGOUGZmVkmOUGZmVkmOUGZmVkmOUGZmVkmOUGZmVkmFfSYuVlPpF5VDHq3ueBjmFnXcoKyihObN3LUdY8UdIxnvzW+SNGYWVvcxGdmZpnkBGVmZpnkBGVmZpnkBGVmZpnkBGVmZpnkBGVmZplU8GPmkpqAPwGbgU0RUS+pH/AAUEMyJtQ5EfF+oecyM7PKUaw7qM9GRF1E1KfLVwOPRcRQ4LF02Yz9BwxAUkHT/gMGbP9EZtbjddUXdU8HRqfzPwLmA1d10bmsB3lr5Uqa9x1U0DEK7QXCzHqGYtxBBfArSQslTUrL9o6IFQDpz7223knSJEkNkhpWr15dhDDMyofrh1lxEtSxETEKOAm4VNIJ7dkpIqZFRH1E1FdXVxchDLPy4fphVoQEFRHvpj/fA2YDRwKrJA0ASH++V+h5zMysshSUoCTtKqlvyzwwDlgCPAxMTDebCDxUyHnMzKzyFPqQxN7AbEktx7ovIh6R9DzwoKSLgLeAsws8j5mZVZiCElREvA58Ok/5WmBsIcc2M7PK5p4kzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwMyN6I1101oq6ZmfUwWRvx2ndQZmaWSZ1OUJL2k/SEpFckLZX0T2n5NyW9I6kxnU4uXrhmZlYpCmni2wR8PSIWpYMWLpQ0L113W0RMKTw8MzOrVJ1OUBGxAliRzv9J0ivAwGIFZmZmla0on0FJqgFGAs+mRZdJWixpuqQ92thnkqQGSQ2rV68uRhhmZcP1w6wICUpSH+BnwOSI+ACYCgwB6kjusL6Tb7+ImBYR9RFRX11dXWgYZmXF9cOswAQlqYokOd0bEf8BEBGrImJzRHwM3A0cWXiYZmZWaQp5ik/APcArEfHdnPLcb2mdCSzpfHhmZlapCnmK71jgfOAlSY1p2bXAuZLqgACagC8XFKGZmVWkQp7iWwAoz6pfdD4cMzOzhHuSMDOzTHJffNat1Kuq4L661KuqSNGYWZY5QVm3is0bOeq6Rwo6xrPfGl+kaMwsy9zEZ2ZmmeQEZWZmmeQEZWZmmeQEZWZmmeQEZWaWEYUOuV7M4dazwE/xmZllRKFDrhdzuPUs8B2UmZllkhOUmZllkpv4zMwMyF5PL05QZmYGZK+nFzfxmZlZJnVZgpI0XtJrkpZLurrQ4/nxSzOzytIlTXySegF3ACcCzcDzkh6OiJc7e0w/fmlmVlm66jOoI4HlEfE6gKT7gdOBTieoUtt/wADeWrmyoGMM3mcf3lyxokgRVTYp31iZViqF1g/XjUShDyns0KuqrOqGIqL4B5U+D4yPiP+RLp8PHBURl+VsMwmYlC4eDLxW9EDarz+wpoTnL4RjL43txb4mIjr9aXGG6kc5/46yrJxjb3fd6Ko7qHwpfItMGBHTgGlddP4OkdQQEfWljqMzHHtpdHXsWakf/h2VhmNPdNVDEs3AfjnLg4B3u+hcZmZWhroqQT0PDJVUK+lvgAnAw110LjMzK0Nd0sQXEZskXQY8CvQCpkfE0q44V5GUvCmlAI69NHpy7B3Rk1+nYy+NosXeJQ9JmJmZFco9SZiZWSY5QZmZWSZVTIKS1EvSC5Lmpsu1kp6V9DtJD6QPcyBpp3R5ebq+psRx7y5plqRXJb0i6WhJ/STNS2OfJ2mPdFtJ+kEa+2JJo0oc+z9LWippiaSZknpn9bpLmi7pPUlLcso6fJ0lTUy3/52kid35GjrLdaMksbtutEPFJCjgn4BXcpZvBm6LiKHA+8BFaflFwPsRcSBwW7pdKX0feCQihgGfJnkNVwOPpbE/li4DnAQMTadJwNTuDzchaSBwOVAfEYeRPCwzgexe9xnA1l8e7NB1ltQPuB44iqQ3letbKm7GuW50I9eNDtSNiCj7ieR7WI8BY4C5JF8kXgPsmK4/Gng0nX8UODqd3zHdTiWK+1PAG1ufn6RXgQHp/ADgtXT+LuDcfNuVIPaBwNtAv/Q6zgX+IcvXHagBlnT2OgPnAnfllG+xXRYn1w3XjXbGXJK6USl3UN8DrgQ+Tpf3BP4rIjaly80kfzTw1z8e0vV/TLcvhQOA1cD/TZtg/k3SrsDeEbEijXEFsFe6fWvsqdzX1a0i4h1gCvAWsILkOi6kZ1z3Fh29zpm5/h3gutHNXDe2KN+msk9Qkj4HvBcRC3OL82wa7VjX3XYERgFTI2Ik8Gf+eiudT2ZiT2/fTwdqgX2BXUlu/7eWxeu+PW3F2pNeg+uG60ZXKGrdKPsEBRwLnCapCbifpCnje8Duklq+qJzbFVNrN03p+t2AP3RnwDmageaIeDZdnkVSKVdJGgCQ/nwvZ/usdDH198AbEbE6IjYC/wEcQ8+47i06ep2zdP3bw3WjNFw32nn9yz5BRcQ1ETEoImpIPoh8PCLOA54APp9uNhF4KJ1/OF0mXf94pI2m3S0iVgJvSzo4LRpLMmRJboxbx35B+iTNZ4A/ttyGl8BbwGck7SJJ/DX2zF/3HB29zo8C4yTtkb5LHpeWZZLrhutGAbqnbpTiQ8JSTcBoYG46fwDwHLAc+CmwU1reO11enq4/oMQx1wENwGJgDrAHSfvzY8Dv0p/90m1FMlDk74GXSJ4SKmXs/wK8CiwBfgzslNXrDswk+TxgI8m7vYs6c52BL6WvYTnwj6X+m+/A63fd6N7YXTfacW53dWRmZplU9k18ZmbWMzlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBZZikzZIa0x6Pfypplza2+4Wk3Ttx/H0lzSogviZJ/Tu7v1lnuW5UBj9mnmGSPoyIPun8vcDCiPhuznqR/A4/busYXRxfE8n3HNaU4vxWuVw3KoPvoHqOp4ADJdUoGfvmTmARsF/Lu7WcdXcrGWvmV5J2BpB0oKRfS3pR0iJJQ9Ltl6TrL5T0kKRHJL0m6fqWE0uaI2lhesxJJXn1Zm1z3ShTTlA9QNr/1kkk38wGOBj494gYGRFvbrX5UOCOiDgU+C/grLT83rT80yT9fuXr5uVI4DySb+ifLak+Lf9SRBwB1AOXSyp1T8pmgOtGuXOCyradJTWSdOfyFnBPWv5mRDzTxj5vRERjOr8QqJHUFxgYEbMBImJ9RKzLs++8iFgbER+RdGB5XFp+uaQXgWdIOnwcWvArMyuM60YF2HH7m1gJfRQRdbkFSdM6f97GPhty5jcDO5O/q/t8tv5AMiSNJul9+eiIWCdpPknfYGal5LpRAXwHVQEi4gOgWdIZAJJ2auOppxMl9Uvb5s8Anibp2v/9tAIOAz7TbYGbdTHXjWxzgqoc55M0RywGfgPsk2ebBSQ9KzcCP4uIBuARYMd0vxtImjLMyonrRkb5MXMDkieVSB6LvazUsZhlietG6fgOyszMMsl3UGZmlkm+gzIzs0xygjIzs0xygjIzs0xygjIzs0xygjIzs0z6/xyTZnBsvgucAAAAAElFTkSuQmCC\n", | |
"text/plain": "<Figure size 432x216 with 2 Axes>" | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": "import seaborn as sns\n\nbins = np.linspace(df.Principal.min(), df.Principal.max(), 10)\ng = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'Principal', bins=binedges, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAGfZJREFUeJzt3XuQVOW57/HvTxgdFbygo4yMwKgoopIBZ3tDDYJy2N49XuKOR7GOJx4Naqjo8ZZTVrLdZbyVmhwvkUQLK1HUmA26SUWDCidi4gVwRBBv0UFHQS7RKAchgs/5o9fMHqBhembWTK/u+X2qVnWvt1e/61lMvzy93vX2uxQRmJmZZc02xQ7AzMwsHycoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCeolEjaU9Ijkt6XNE/SXySdkVLdoyXNSKOu7iBptqT6YsdhxVdO7UJSlaSXJb0m6Zgu3M/qrqq71DhBpUCSgOnAnyJin4g4FDgXqClSPL2LsV+z1sqwXYwF3oqIERHxQhox2dY5QaVjDPCPiPhFc0FELImI/wMgqZek2yS9KmmBpP+ZlI9OzjaekPSWpIeTRo2k8UnZHOC/NtcraUdJDyZ1vSbptKT8Qkm/lfQfwB87czCSpki6T9Ks5Jvvt5N9LpY0pdV290maK2mRpJ9soa5xybfm+Ul8fToTm5WUsmkXkuqAW4ETJTVI2n5Ln21JjZJuSl6bK2mkpGck/VXSJck2fSQ9l7z3jeZ48+z3f7X698nbxspaRHjp5AJcAdy5ldcvBv538nw7YC5QC4wG/k7uG+U2wF+Ao4FK4CNgCCDgcWBG8v6bgP+WPN8FeAfYEbgQaAL6bSGGF4CGPMvxebadAjya7Ps04AvgkCTGeUBdsl2/5LEXMBsYnqzPBuqB3YE/ATsm5dcANxT77+Wle5YybBcXAncnz7f42QYagUuT53cCC4C+QBWwPCnvDezUqq73ACXrq5PHccDk5Fi3AWYAxxb779qdi7uCuoCke8g1qH9ExD+R+6ANl3RWssnO5BrZP4BXIqIpeV8DMBhYDXwQEe8m5b8h15hJ6jpV0lXJeiUwMHk+MyL+li+miGhvn/l/RERIegP4NCLeSGJZlMTYAJwj6WJyja0aGEauMTY7Iil7MfkCvC25/2ysByqTdtGsrc/2U8njG0CfiPgS+FLSWkm7AP8PuEnSscA3wABgT2BZqzrGJctryXofcv8+f+pgzCXHCSodi4Azm1ciYqKk3cl9I4TcN6DLI+KZ1m+SNBpY16poA//5N9nSJIkCzoyItzep63ByH/r8b5JeIPctblNXRcSzecqb4/pmkxi/AXpLqgWuAv4pIj5Luv4q88Q6MyL+ZUtxWVkrx3bRen9b+2xvtf0A55E7ozo0Ir6W1Ej+9vPTiLh/K3GUNV+DSsfzQKWkS1uV7dDq+TPApZIqACTtL2nHrdT3FlArad9kvXUjeAa4vFWf/IhCAoyIYyKiLs+ytUa4NTuRa/h/l7Qn8M95tnkJGCVpvyTWHSTt38H9Wekp53bR2c/2zuS6+76WdBwwKM82zwD/vdW1rQGS9mjHPkqeE1QKItdhfDrwbUkfSHoFeIhcvzTAr4A3gfmSFgL3s5Wz14hYS67r4vfJxeAlrV6+EagAFiR13Zj28RQiIl4n1/WwCHgQeDHPNivI9dtPlbSAXKMe2o1hWhGVc7tI4bP9MFAvaS65s6m38uzjj8AjwF+SrvYnyH+2V7aaL8qZmZllis+gzMwsk5ygzMwsk5ygzMwsk5ygzMwsk7o1QY0fPz7I/Y7Bi5dyXTrN7cRLD1gK0q0JauXKld25O7OS5HZiluMuPjMzyyQnKDMzyyQnKDMzyyRPFmtmZefrr7+mqamJtWvXFjuUHq2yspKamhoqKio69H4nKDMrO01NTfTt25fBgweTzB9r3SwiWLVqFU1NTdTW1naoDnfxmVnZWbt2LbvttpuTUxFJYrfdduvUWawTVDcaVF2NpFSWQdXVxT4cs0xzciq+zv4N3MXXjT5ctoymvWpSqavmk6ZU6jEzyyqfQZlZ2Uuz96LQHoxevXpRV1fHwQcfzNlnn82aNWtaXps2bRqSeOut/7wNVGNjIwcffDAAs2fPZuedd2bEiBEccMABHHvsscyYMWOj+idPnszQoUMZOnQohx12GHPmzGl5bfTo0RxwwAHU1dVRV1fHE088sVFMzUtjY2Nn/lm7nM+gzKzspdl7AYX1YGy//fY0NDQAcN555/GLX/yCH/7whwBMnTqVo48+mkcffZQf//jHed9/zDHHtCSlhoYGTj/9dLbffnvGjh3LjBkzuP/++5kzZw6777478+fP5/TTT+eVV16hf//+ADz88MPU19dvMaZS4DMoM7Mudswxx/Dee+8BsHr1al588UUeeOABHn300YLeX1dXxw033MDdd98NwC233MJtt93G7rvvDsDIkSOZMGEC99xzT9ccQJE4QZmZdaH169fzhz/8gUMOOQSA6dOnM378ePbff3/69evH/PnzC6pn5MiRLV2CixYt4tBDD93o9fr6ehYtWtSyft5557V05a1atQqAr776qqXsjDPOSOPwupS7+MzMukBzMoDcGdRFF10E5Lr3Jk2aBMC5557L1KlTGTlyZJv1RWx9EvCI2GjUXDl08RWUoCQ1Al8CG4D1EVEvqR/wGDAYaATOiYjPuiZMM7PSki8ZrFq1iueff56FCxciiQ0bNiCJW2+9tc36XnvtNQ488EAAhg0bxrx58xgzZkzL6/Pnz2fYsGHpHkSRtaeL77iIqIuI5pR8LfBcRAwBnkvWzcxsC5544gkuuOAClixZQmNjIx999BG1tbUbjcDLZ8GCBdx4441MnDgRgKuvvpprrrmmpeuuoaGBKVOm8P3vf7/Lj6E7daaL7zRgdPL8IWA2cE0n4zEzS93A/v1T/e3gwGSkXHtNnTqVa6/d+Lv8mWeeySOPPMI112z83+cLL7zAiBEjWLNmDXvssQc///nPGTt2LACnnnoqH3/8MUcddRSS6Nu3L7/5zW+oLrMf8Kutfk0ASR8An5G7E+L9ETFZ0ucRsUurbT6LiF3zvPdi4GKAgQMHHrpkyZLUgi81klL9oW4hfzvrdh366bzbSboWL17c0h1mxbWFv0VB7aTQLr5RETES+GdgoqRjCw0uIiZHRH1E1FdVVRX6NrMexe3EbHMFJaiI+CR5XA5MAw4DPpVUDZA8Lu+qIM3MrOdpM0FJ2lFS3+bnwDhgIfAUMCHZbALwZFcFaWZmPU8hgyT2BKYl4+t7A49ExNOSXgUel3QR8CFwdteFaWZmPU2bCSoi3ge+lad8FTC2K4IyMzPzVEdmZpZJTlBmVvb2qhmY6u029qoZWNB+ly1bxrnnnsu+++7LsGHDOPHEE3nnnXdYtGgRY8aMYf/992fIkCHceOONLT8bmTJlCpdddtlmdQ0ePJiVK1duVDZlyhSqqqo2uoXGm2++CcA777zDiSeeyH777ceBBx7IOeecw2OPPdayXZ8+fVpuyXHBBRcwe/ZsTj755Ja6p0+fzvDhwxk6dCiHHHII06dPb3ntwgsvZMCAAaxbtw6AlStXMnjw4Hb9TQrhufgKMKi6mg+XLSt2GGbWQUs//ojDb3g6tfpe/tfxbW4TEZxxxhlMmDChZdbyhoYGPv30Uy688ELuu+8+xo0bx5o1azjzzDO59957W2aKaI/vfOc7LbOcN1u7di0nnXQSd9xxB6eccgoAs2bNoqqqqmX6pdGjR3P77be3zNc3e/bslve//vrrXHXVVcycOZPa2lo++OADTjjhBPbZZx+GDx8O5O4t9eCDD3LppZe2O+ZCOUEVIK17yfguuGY9x6xZs6ioqOCSSy5pKaurq+OBBx5g1KhRjBs3DoAddtiBu+++m9GjR3coQeXzyCOPcOSRR7YkJ4Djjjuu4PfffvvtXH/99dTW1gJQW1vLddddx2233cavf/1rACZNmsSdd97J9773vVRizsddfGZmXWDhwoWb3RID8t8qY99992X16tV88cUX7d5P6267uro6vvrqqy3uu1CF3M5j4MCBHH300S0Jqyv4DMrMrBtteluM1rZUvjX5uvg6K1+M+cquv/56Tj31VE466aRU99/MZ1BmZl3goIMOYt68eXnL586du1HZ+++/T58+fejbt2+X7rs97980xny389hvv/2oq6vj8ccf7/C+tsYJysysC4wZM4Z169bxy1/+sqXs1VdfZciQIcyZM4dnn30WyN3Y8IorruDqq69Obd/f/e53+fOf/8zvf//7lrKnn36aN954o6D3X3XVVfz0pz+lsbERgMbGRm666SauvPLKzbb90Y9+xO23355K3JtyF5+Zlb3qAXsXNPKuPfW1RRLTpk1j0qRJ3HzzzVRWVjJ48GDuuusunnzySS6//HImTpzIhg0bOP/88zcaWj5lypSNhnW/9NJLAAwfPpxttsmdV5xzzjkMHz6cxx57bKP7Sd17770cddRRzJgxg0mTJjFp0iQqKioYPnw4P/vZzwo6vrq6Om655RZOOeUUvv76ayoqKrj11ltb7hDc2kEHHcTIkSMLvnV9exR0u4201NfXx6anjaUgrdtk1HzS5NttlL8O3W6jtVJtJ1ni221kR3fcbsPMzKxbOUGZmVkmOUGZWVlyF3jxdfZv4ARlZmWnsrKSVatWOUkVUUSwatUqKisrO1yHR/GZWdmpqamhqamJFStWFDuUHq2yspKamo4PDHOCKlHb0bFfneczsH9/lixdmkpdZllQUVHRMo+clS4nqBK1DlIdsm5mljUFX4OS1EvSa5JmJOu1kl6W9K6kxyRt23VhmplZT9OeQRI/ABa3Wr8FuDMihgCfARelGZiZmfVsBSUoSTXAScCvknUBY4Ankk0eAk7vigDNzKxnKvQM6i7gauCbZH034POIWJ+sNwED8r1R0sWS5kqa6xE1Zvm5nZhtrs0EJelkYHlEtJ67Pd/wsbw/OIiIyRFRHxH1VVVVHQzTrLy5nZhtrpBRfKOAUyWdCFQCO5E7o9pFUu/kLKoG+KTrwjQzs56mzTOoiLguImoiYjBwLvB8RJwHzALOSjabADzZZVGamVmP05mpjq4BfijpPXLXpB5IJyQzM7N2/lA3ImYDs5Pn7wOHpR+SmZmZJ4s1M7OMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMajNBSaqU9Iqk1yUtkvSTpLxW0suS3pX0mKRtuz5cMzPrKQo5g1oHjImIbwF1wHhJRwC3AHdGxBDgM+CirgvTzMx6mjYTVOSsTlYrkiWAMcATSflDwOldEqGZmfVIBV2DktRLUgOwHJgJ/BX4PCLWJ5s0AQO28N6LJc2VNHfFihVpxGxWdtxOzDZXUIKKiA0RUQfUAIcBB+bbbAvvnRwR9RFRX1VV1fFIzcqY24nZ5to1ii8iPgdmA0cAu0jqnbxUA3ySbmhmZtaTFTKKr0rSLsnz7YHjgcXALOCsZLMJwJNdFaSZmfU8vdvehGrgIUm9yCW0xyNihqQ3gUcl/RvwGvBAF8ZpZmY9TJsJKiIWACPylL9P7nqUmZlZ6jyThJmZZZITlJmZZZITlJmZZZITlJmZZVLZJqhB1dVISmUxM7PuV8gw85L04bJlNO1Vk0pdNZ80pVKPmZkVrmzPoMzMrLQ5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSa1maAk7S1plqTFkhZJ+kFS3k/STEnvJo+7dn24ZmbWUxRyBrUeuDIiDgSOACZKGgZcCzwXEUOA55J1MzOzVLSZoCJiaUTMT55/CSwGBgCnAQ8lmz0EnN5VQZqZWc/TrmtQkgYDI4CXgT0jYinkkhiwxxbec7GkuZLmrlixonPRmpUptxOzzRWcoCT1AX4HTIqILwp9X0RMjoj6iKivqqrqSIxmZc/txGxzBSUoSRXkktPDEfHvSfGnkqqT16uB5V0TopmZ9USFjOIT8ACwOCLuaPXSU8CE5PkE4Mn0w7PusB20edv7QpZB1dXFPhQzKyOF3PJ9FHA+8IakhqTseuBm4HFJFwEfAmd3TYjW1dYBTXvVdLqemk+aOh+MmVmizQQVEXMAbeHlsemGk03qVZHKf77qvW1q/4mrV0Uq9ZiZZVUhZ1A9Xmz4msNveLrT9bz8r+NTqae5LjOzcuapjszMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJPKdiaJtKYnMjOz4ijbBJXW9ETgaYXMzIrBXXxmZpZJTlBmZpZJTlBmZpZJZXsNqtylOQjE95ayrBlUXc2Hy5Z1up7tt+nFV99sSCEiGNi/P0uWLk2lLiuME1SJ8iAQK2cfLluW2l2e06inuS7rXm128Ul6UNJySQtblfWTNFPSu8njrl0bppmZ9TSFXIOaAmz6Ffta4LmIGAI8l6xbD7cdICmVZVB1dbEPx8yKrM0uvoj4k6TBmxSfBoxOnj8EzAauSTEuK0HrwN0pZpaajo7i2zMilgIkj3tsaUNJF0uaK2nuihUrOrg7s/JWDu1kUHV1amfQZtANgyQiYjIwGaC+vj66en9mpagc2klaAxvAZ9CW09EzqE8lVQMkj8vTC8nMzKzjCeopYELyfALwZDrhmJmZ5RQyzHwq8BfgAElNki4CbgZOkPQucEKybmZmlppCRvH9yxZeGptyLGZmZi0yNRefRwGZmVmzTE115FFAZmbWLFMJyoojrYlnPemsmaXJCcpSm3jWk86aWZoydQ3KzMysmROUmZllkhOUmZllkhOUmZllkhOUZZLvLdU9/NtDyzKP4rNM8r2luod/e2hZ5gRlqUnr91TNdZlZz+YEZalJ6/dU4N9UmZmvQZmZWUb5DMoyKc3uwm16VaRyEX9g//4sWbo0hYjKU6pdvL239fRbBRhUXc2Hy5alUlcWP99OUJZJaXcXpjEQwIMAti7tv5mn32pbuQ9ycRefmZllUqbOoNLsIjAzs9KWqQTlUWBmZtasUwlK0njgZ0Av4FcRcXMqUZmlqBzvd5XmxXErTFqDbQC26V3BN+u/TqWuctbhBCWpF3APcALQBLwq6amIeDOt4MzSUI73u0rr4ri71Av3jQfudLvODJI4DHgvIt6PiH8AjwKnpROWmZn1dIqIjr1ROgsYHxH/I1k/Hzg8Ii7bZLuLgYuT1QOAtzsebovdgZUp1JMFPpZs6uixrIyIdp9qdVE7Af9NsqqnH0tB7aQz16DydcZulu0iYjIwuRP72XzH0tyIqE+zzmLxsWRTdx9LV7QT8N8kq3wshelMF18TsHer9Rrgk86FY2ZmltOZBPUqMERSraRtgXOBp9IJy8zMeroOd/FFxHpJlwHPkBtm/mBELEotsq1LvSukiHws2VQux1IuxwE+lqzqsmPp8CAJMzOzruS5+MzMLJOcoMzMLJMyn6Ak7S1plqTFkhZJ+kFS3k/STEnvJo+7FjvWtkiqlPSKpNeTY/lJUl4r6eXkWB5LBp1knqRekl6TNCNZL8njAJDUKOkNSQ2S5iZlJfMZczvJtnJpK93dTjKfoID1wJURcSBwBDBR0jDgWuC5iBgCPJesZ906YExEfAuoA8ZLOgK4BbgzOZbPgIuKGGN7/ABY3Gq9VI+j2XERUdfqNx2l9BlzO8m2cmor3ddOIqKkFuBJcvP/vQ1UJ2XVwNvFjq2dx7EDMB84nNyvsHsn5UcCzxQ7vgLir0k+jGOAGeR+uF1yx9HqeBqB3TcpK9nPmNtJdpZyaivd3U5K4QyqhaTBwAjgZWDPiFgKkDzuUbzICpec6jcAy4GZwF+BzyNifbJJEzCgWPG1w13A1cA3yfpulOZxNAvgj5LmJdMOQel+xgbjdpIl5dRWurWdZOp+UFsjqQ/wO2BSRHyR1rT33S0iNgB1knYBpgEH5tuse6NqH0knA8sjYp6k0c3FeTbN9HFsYlREfCJpD2CmpLeKHVBHuJ1kSxm2lW5tJyWRoCRVkGt0D0fEvyfFn0qqjoilkqrJfdMqGRHxuaTZ5K4X7CKpd/KNqhSmjBoFnCrpRKAS2Inct8RSO44WEfFJ8rhc0jRys/WX1GfM7SSTyqqtdHc7yXwXn3JfAR8AFkfEHa1eegqYkDyfQK7PPdMkVSXfCJG0PXA8uQuns4Czks0yfywRcV1E1ETEYHJTXD0fEedRYsfRTNKOkvo2PwfGAQspoc+Y20k2lVNbKUo7KfZFtwIuyh1N7vR3AdCQLCeS68d9Dng3eexX7FgLOJbhwGvJsSwEbkjK9wFeAd4DfgtsV+xY23FMo4EZpXwcSdyvJ8si4EdJecl8xtxOsr+UelspRjvxVEdmZpZJme/iMzOznskJyszMMskJyszMMskJyszMMskJyszMMskJyszMMskJyszMMskJqsRJmp5M3LioefJGSRdJekfSbEm/lHR3Ul4l6XeSXk2WUcWN3qx7uJ2UJv9Qt8RJ6hcRf0umhHkV+C/Ai8BI4EvgeeD1iLhM0iPAvRExR9JAclP855uE06ysuJ2UppKYLNa26gpJZyTP9wbOB/5vRPwNQNJvgf2T148HhrWa4XonSX0j4svuDNisCNxOSpATVAlLpu8/HjgyItYksz6/Tf5bE0CuS/fIiPiqeyI0Kz63k9Lla1ClbWfgs6TRDSV3S4IdgG9L2lVSb+DMVtv/EbiseUVSXbdGa1YcbiclygmqtD0N9Ja0ALgReAn4GLiJ3N1UnwXeBP6ebH8FUC9pgaQ3gUu6P2Szbud2UqI8SKIMSeoTEauTb4bTgAcjYlqx4zLLEreT7PMZVHn6saQGcvfS+QCYXuR4zLLI7STjfAZlZmaZ5DMoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLpP8PlTlGZbaTvVAAAAAASUVORK5CYII=\n", | |
"text/plain": "<Figure size 432x216 with 2 Axes>" | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": "bins = np.linspace(df.age.min(), df.age.max(), 10)\ng = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'age', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "# Pre-processing: Feature selection/extraction" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "### Lets look at the day of the week people get the loan " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAGp1JREFUeJzt3XuYVNWZ7/HvT+gMKHil1ZYWaRFBVKbBjncMQsJD8IbjJSSOgmPGo/ESjuGImhyTE88xqIyXxCsRA4mIGhLRIYkOKkTwDtiCiEFGUVtBgUk0BlHE9/xRu3saUkB3V1XX7urf53nqqapVe6/1broWb+1Vu9ZSRGBmZpY2OxQ7ADMzs2ycoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoApI0l6S7pP0hqSFkp6VdGqe6h4saVY+6moNkuZKqil2HNb6SqkfSCqX9LyklyQNKmA7Hxeq7rbECapAJAmYCTwVEftHxGHAKKCySPF0LEa71r6VYD8YCrwWEQMiYl4+YrKtc4IqnCHAZxFxZ31BRLwVET8DkNRB0g2SXpS0WNL/SMoHJ2cbMyS9Jmla0smRNDwpmw/8U329knaSdE9S10uSTknKx0j6taR/B/4jl4ORNEXSHZLmJJ+Ev5K0uUzSlEbb3SFpgaSlkv7PVuoalnyKXpTE1yWX2CzVSqYfSKoGrgdGSKqV1Hlr72VJKyVdm7y2QNJASY9J+k9JFyTbdJH0RLLvkvp4s7T7vxr9+2TtUyUrInwrwA24FLhpG6+fD/wgefwPwAKgChgMfEjmE+YOwLPAsUAn4B2gNyDgQWBWsv+1wD8nj3cFlgM7AWOAOmD3rcQwD6jNcvtqlm2nAPcnbZ8CfAQcmsS4EKhOtts9ue8AzAX6J8/nAjVAN+ApYKekfDxwdbH/Xr4V5laC/WAMcGvyeKvvZWAlcGHy+CZgMdAVKAc+SMo7Ajs3qmsFoOT5x8n9MGBScqw7ALOA44r9d22tm4d9Womk28h0sM8i4stk3nj9JZ2ebLILmU73GfBCRNQl+9UCPYGPgTcj4vWk/F4ynZukrpMljUuedwJ6JI9nR8R/ZYspIpo7hv7vERGSlgDvR8SSJJalSYy1wJmSzifT+SqAfmQ6Z70jk7Knkw/EXyLzn4+1AyXSD+pt7738SHK/BOgSEX8F/ippg6Rdgb8B10o6DvgC6A7sBaxuVMew5PZS8rwLmX+fp1oYc5viBFU4S4HT6p9ExEWSupH5hAiZT0SXRMRjjXeSNBj4tFHRJv7777S1iRMFnBYRf9qiriPIdILsO0nzyHyq29K4iHg8S3l9XF9sEeMXQEdJVcA44MsR8edk6K9TllhnR8Q3txaXlZRS7AeN29vWe3mb/QU4i8wZ1WERsVHSSrL3l59ExF3biKNk+TuownkS6CTpwkZlOzZ6/BhwoaQyAEkHStppG/W9BlRJ6pU8b9wpHgMuaTRGP6ApAUbEoIioznLbVqfclp3J/EfwoaS9gK9n2eY54BhJBySx7ijpwBa2Z+lXyv0g1/fyLmSG+zZKOh7YL8s2jwH/0ui7re6S9mxGG22aE1SBRGYAeSTwFUlvSnoBmEpmnBrgbuBVYJGkV4C72MYZbURsIDOU8bvky+G3Gr18DVAGLE7quibfx9MUEfEymaGIpcA9wNNZtllDZhx/uqTFZDp531YM01pRKfeDPLyXpwE1khaQOZt6LUsb/wHcBzybDK3PIPvZXkmq/0LOzMwsVXwGZWZmqeQEZWZmqeQEZWZmqeQEZWZmqZSKBDV8+PAg89sG33wrpVvO3Dd8K9Fbk6QiQa1du7bYIZilkvuGtWepSFBmZmZbcoIyM7NU2m6CSqav/yD5ZXZ92e6SZkt6PbnfLSmXpJ9KWpFMDT+wkMGbmVnpaspksVOAW4FfNiq7AngiIiZIuiJ5Pp7M3Gu9k9sRwB3JvZlZq9m4cSN1dXVs2LCh2KG0a506daKyspKysrIW7b/dBBURT0nquUXxKWTWa4HMvFpzySSoU4BfJvNvPSdpV0kVEbGqRdGZmbVAXV0dXbt2pWfPniRzx1oriwjWrVtHXV0dVVVVLaqjpd9B7VWfdJL7+tl1u5NZTKxeXVJmZtZqNmzYwB577OHkVESS2GOPPXI6i833RRLZ3g1Zr3mXdH6yFPKCNWvW5DkMa4/2q6hAUqvc9quoKNhxuG/kh5NT8eX6N2jpgoXv1w/dSaoAPkjK64B9G21XCbyXrYKImERmKWNqamqa/MMts615e/Vq6vapbJW2Kt+rK1jd7htmGS09g3oEGJ08Hg083Kj8nORqviOBD/39k5kVW77PrptyBt2hQweqq6s55JBDOOOMM1i/fn3Daw899BCSeO21/14CauXKlRxyyCEAzJ07l1122YUBAwbQp08fjjvuOGbNmrVZ/ZMmTaJv37707duXww8/nPnz5ze8NnjwYPr06UN1dTXV1dXMmDFjs5jqbytXrszln7XgtnsGJWk6mQsiukmqA34ITAAelHQe8DZwRrL574ERwApgPXBuAWI2M2uWfJ9dN+UMunPnztTW1gJw1llnceedd3LZZZcBMH36dI499ljuv/9+fvSjH2Xdf9CgQQ1Jqba2lpEjR9K5c2eGDh3KrFmzuOuuu5g/fz7dunVj0aJFjBw5khdeeIG9994bgGnTplFTU7PVmNqC7Z5BRcQ3I6IiIsoiojIiJkfEuogYGhG9k/v/SraNiLgoInpFxKERsaDwh2Bmlm6DBg1ixYoVAHz88cc8/fTTTJ48mfvvv79J+1dXV3P11Vdz6623AnDddddxww030K1bNwAGDhzI6NGjue222wpzAEXimSTMzAro888/5w9/+AOHHnooADNnzmT48OEceOCB7L777ixatKhJ9QwcOLBhSHDp0qUcdthhm71eU1PD0qVLG56fddZZDUN569atA+CTTz5pKDv11FPzcXgF1dKLJMzMbBvqkwFkzqDOO+88IDO8N3bsWABGjRrF9OnTGThw+5PuZH5euu3XG181VwpDfE5QZmYFkC0ZrFu3jieffJJXXnkFSWzatAlJXH/99dut76WXXuKggw4CoF+/fixcuJAhQ4Y0vL5o0SL69euX34MoMg/xmZm1khkzZnDOOefw1ltvsXLlSt555x2qqqo2uwIvm8WLF3PNNddw0UUXAXD55Zczfvz4hqG72tpapkyZwne+852CH0Nr8hmUmZW8HnvvndffrvVIrpRrrunTp3PFFVdsVnbaaadx3333MX78+M3K582bx4ABA1i/fj177rknP/3pTxk6dCgAJ598Mu+++y5HH300kujatSv33nsvFQX8AXkxaHvjmq2hpqYmFizwBX+WG0mt+kPdJvSdnKcycN9omWXLljUMh1lxbeVv0aS+4SE+MzNLJScoMzNLJScoMzNLJScoMzNLJScoMzNLJScoMzNLJScoMyt5+1T2yOtyG/tU9mhSu6tXr2bUqFH06tWLfv36MWLECJYvX87SpUsZMmQIBx54IL179+aaa65p+NnClClTuPjii/+urp49e7J27drNyqZMmUJ5eflmS2i8+uqrACxfvpwRI0ZwwAEHcNBBB3HmmWfywAMPNGzXpUuXhiU5zjnnHObOncuJJ57YUPfMmTPp378/ffv25dBDD2XmzJkNr40ZM4bu3bvz6aefArB27Vp69uzZrL9JU/iHumZW8la9+w5HXP1o3up7/sfDt7tNRHDqqacyevTohlnLa2tref/99xkzZgx33HEHw4YNY/369Zx22mncfvvtDTNFNMc3vvGNhlnO623YsIETTjiBG2+8kZNOOgmAOXPmUF5e3jD90uDBg5k4cWLDfH1z585t2P/ll19m3LhxzJ49m6qqKt58802+9rWvsf/++9O/f38gs7bUPffcw4UXXtjsmJvKZ1BmZgUwZ84cysrKuOCCCxrKqqurWb58OccccwzDhg0DYMcdd+TWW29lwoQJeWv7vvvu46ijjmpITgDHH398w4KI2zNx4kSuuuoqqqqqAKiqquLKK6/khhtuaNhm7Nix3HTTTXz++ed5i3tLTlBmZgXwyiuv/N2SGJB9qYxevXrx8ccf89FHHzW7ncbDdtXV1XzyySdbbbupmrKcR48ePTj22GP51a9+1eJ2tsdDfGZmrWjLZTEa21r5tmQb4stVthizlV111VWcfPLJnHDCCXltv57PoMzMCuDggw9m4cKFWcu3nF/xjTfeoEuXLnTt2rWgbTdn/y1jzLacxwEHHEB1dTUPPvhgi9vaFicoM7MCGDJkCJ9++ik///nPG8pefPFFevfuzfz583n88ceBzMKGl156KZdffnne2v7Wt77FM888w+9+97uGskcffZQlS5Y0af9x48bxk5/8hJUrVwKwcuVKrr32Wr73ve/93bbf//73mThxYl7i3pKH+Mys5FV037dJV941p77tkcRDDz3E2LFjmTBhAp06daJnz57cfPPNPPzww1xyySVcdNFFbNq0ibPPPnuzS8unTJmy2WXdzz33HAD9+/dnhx0y5xVnnnkm/fv354EHHthsPanbb7+do48+mlmzZjF27FjGjh1LWVkZ/fv355ZbbmnS8VVXV3Pddddx0kknsXHjRsrKyrj++usbVghu7OCDD2bgwIFNXrq+OXJabkPS/wS+DQSwBDgXqADuB3YHFgFnR8Rn26rHSwpYPni5Davn5TbSoyjLbUjqDlwK1ETEIUAHYBRwHXBTRPQG/gyc19I2zMys/cr1O6iOQGdJHYEdgVXAEGBG8vpUYGSObZiZWTvU4gQVEe8CE4G3ySSmD4GFwF8iov6XW3VA92z7Szpf0gJJC9asWdPSMMxKjvtGfqRhtfD2Lte/QS5DfLsBpwBVwD7ATsDXs2yaNcKImBQRNRFRU15e3tIwzEqO+0buOnXqxLp165ykiigiWLduHZ06dWpxHblcxfdV4M2IWAMg6bfA0cCukjomZ1GVwHs5tGFm1myVlZXU1dXhM9Di6tSpE5WVLb9wKZcE9TZwpKQdgU+AocACYA5wOpkr+UYDD+fQhplZs5WVlTXMI2dtVy7fQT1P5mKIRWQuMd8BmASMBy6TtALYA5ichzjNzKydyemHuhHxQ+CHWxS/ARyeS71mZmae6sjMzFLJCcrMzFLJCcrMzFLJCcrMzFLJCcrMzFLJy21YyVCHMirfq2u1tsyssJygrGTEpo0ccfWjrdJWPtcWMrPsPMRnZmap5ARlZmap5ARlZmap5ARlZmap1CYS1H4VFUhqtdt+FRXFPmQzs3avTVzF9/bq1dTt0/I1RZqrtS5VNjOzrWsTZ1BmZtb+OEGZmVkqOUGZmVkqOUGZmVkqOUGZmVkqOUGZmVkq5ZSgJO0qaYak1yQtk3SUpN0lzZb0enK/W76CNTOz9iPXM6hbgEcjoi/wj8Ay4ArgiYjoDTyRPDczM2uWFicoSTsDxwGTASLis4j4C3AKMDXZbCowMtcgzcys/cnlDGp/YA3wC0kvSbpb0k7AXhGxCiC53zPbzpLOl7RA0oI1a9bkEIZZaXHfMMvIJUF1BAYCd0TEAOBvNGM4LyImRURNRNSUl5fnEIZZaXHfMMvIJUHVAXUR8XzyfAaZhPW+pAqA5P6D3EI0M7P2qMUJKiJWA+9I6pMUDQVeBR4BRidlo4GHc4rQzMzapVxnM78EmCbpS8AbwLlkkt6Dks4D3gbOyLENMzNrh3JKUBFRC9RkeWloLvWamZl5JgkzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0slJygzM0ulnBOUpA6SXpI0K3leJel5Sa9LekDSl3IP08zM2pt8nEF9F1jW6Pl1wE0R0Rv4M3BeHtowM7N2JqcEJakSOAG4O3kuYAgwI9lkKjAylzbMzKx9yvUM6mbgcuCL5PkewF8i4vPkeR3QPduOks6XtEDSgjVr1uQYhlnpcN8wy2hxgpJ0IvBBRCxsXJxl08i2f0RMioiaiKgpLy9vaRhmJcd9wyyjYw77HgOcLGkE0AnYmcwZ1a6SOiZnUZXAe7mHaWZm7U2Lz6Ai4sqIqIyInsAo4MmIOAuYA5yebDYaeDjnKM3MrN0pxO+gxgOXSVpB5jupyQVow8zMSlwuQ3wNImIuMDd5/AZweD7qNTOz9sszSZiZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QRXZfhUVSGq1234VFcU+ZDOzJsnLZLHWcm+vXk3dPpWt1l7le3Wt1paZWS58BmVmZqnkBGVmZqnkBGVmZqnkBGVmZqnkBGVmZqnkBGVmZqnU4gQlaV9JcyQtk7RU0neT8t0lzZb0enK/W/7CNTOz9iKXM6jPge9FxEHAkcBFkvoBVwBPRERv4InkuZmZWbO0OEFFxKqIWJQ8/iuwDOgOnAJMTTabCozMNUgzM2t/8vIdlKSewADgeWCviFgFmSQG7LmVfc6XtEDSgjVr1uQjDLOS4L5hlpFzgpLUBfgNMDYiPmrqfhExKSJqIqKmvLw81zDMSob7hllGTglKUhmZ5DQtIn6bFL8vqSJ5vQL4ILcQzcysPcrlKj4Bk4FlEXFjo5ceAUYnj0cDD7c8PDMzy1VbXTUhl9nMjwHOBpZIqk3KrgImAA9KOg94GzgjtxDNzCwXbXXVhBYnqIiYD2grLw9tab1mZmbgmSTM2rzWHL7xgpfWmrxgoVkb15rDN17w0lqTz6DMzCyVfAaVxT8AmYsUzcysWJygsvgUPGRiZlZkHuIzM7NUcoIyM7NUcoIyM7NUcoIyM7NUcoIyM7NUcoIyM7NUcoIyM7NUcoIyM7NU8g91zcxKnDqUteqkAOpQlpd6nKDMrMlaexqwHnvvzVurVrVae6UqNm3kiKsfbbX2nv/x8LzU4wRlZk3WmtOAgacCa+/8HZSZmaWSE5QVzD6VPVptIT3PPm9WejzEZwWz6t132uS4t5mlQ5tIUG31ChQzy01rXpThCzLSpyAJStJw4BagA3B3REzIpb62egVKGnkxRmtLvDZb+5b3BCWpA3Ab8DWgDnhR0iMR8Wq+27Lmc4c3y661P7x13qEDn3yxqdXaa4sKcQZ1OLAiIt4AkHQ/cArgBGVmqVWMS+j9YXHbFBH5rVA6HRgeEd9Onp8NHBERF2+x3fnA+cnTPsCftlFtN2BtXgNNDx9b29SUY1sbEc0eL25m32hqLG2Vj61t2t6xNalvFOIMKts58t9lwYiYBExqUoXSgoioyTWwNPKxtU2FPLbm9I1Cx1JsPra2KV/HVojfQdUB+zZ6Xgm8V4B2zMyshBUiQb0I9JZUJelLwCjgkQK0Y2ZmJSzvQ3wR8bmki4HHyFxmfk9ELM2x2iYPd7RBPra2KU3HlqZY8s3H1jbl5djyfpGEmZlZPnguPjMzSyUnKDMzS6XUJyhJwyX9SdIKSVcUO558kbSvpDmSlklaKum7xY4p3yR1kPSSpFnFjiWfJO0qaYak15K/31FFisN9o41y32hifWn+DiqZNmk5jaZNAr5ZCtMmSaoAKiJikaSuwEJgZCkcWz1JlwE1wM4RcWKx48kXSVOBeRFxd3Kl6o4R8ZdWjsF9ow1z32iatJ9BNUybFBGfAfXTJrV5EbEqIhYlj/8KLAO6Fzeq/JFUCZwA3F3sWPJJ0s7AccBkgIj4rLWTU8J9o41y32i6tCeo7sA7jZ7XUUJv1HqSegIDgOeLG0le3QxcDnxR7EDybH9gDfCLZIjmbkk7FSEO9422y32jidKeoJo0bVJbJqkL8BtgbER8VOx48kHSicAHEbGw2LEUQEdgIHBHRAwA/gYU4/sf9402yH2jedKeoEp62iRJZWQ64LSI+G2x48mjY4CTJa0kM/Q0RNK9xQ0pb+qAuoio/0Q/g0ynLEYc7httj/tGM6Q9QZXstEnKLDwzGVgWETcWO558iogrI6IyInqS+Zs9GRH/XOSw8iIiVgPvSOqTFA2lOEvJuG+0Qe4bzZPqJd8LNG1SWhwDnA0skVSblF0VEb8vYkzWNJcA05LE8AZwbmsH4L5hKZXXvpHqy8zNzKz9SvsQn5mZtVNOUGZmlkpOUGZmlkpOUGZmlkpOUGZmlkpOUCkk6UeSxuWxvr6SapPpR3rlq95G9c+VVJPves2ycf9oP5yg2oeRwMMRMSAi/rPYwZiljPtHSjlBpYSk7ydr+zwO9EnK/lXSi5JelvQbSTtK6irpzWQqGCTtLGmlpDJJ1ZKek7RY0kOSdpM0AhgLfDtZY+dySZcm+94k6cnk8dD6KVckDZP0rKRFkn6dzImGpMMk/VHSQkmPJcsiND6GHSRNlfR/W+0fztoF94/2yQkqBSQdRmbakwHAPwFfTl76bUR8OSL+kcySA+clyw/MJTNdP8l+v4mIjcAvgfER0R9YAvww+fX9ncBNEXE88BQwKNm3BuiSdOZjgXmSugE/AL4aEQOBBcBlyTY/A06PiMOAe4D/1+gwOgLTgOUR8YM8/vNYO+f+0X6leqqjdmQQ8FBErAeQVD+n2iHJp61dgS5kprWBzDoylwMzyUwl8q+SdgF2jYg/JttMBX6dpa2FwGHKLAT3KbCITEccBFwKHAn0A57OTInGl4BnyXxqPQSYnZR3AFY1qvcu4MGIaNwpzfLB/aOdcoJKj2xzTk0hs5Loy5LGAIMBIuJpST0lfQXoEBGvJB1w+41EbFRmJuVzgWeAxcDxQC8yn0J7AbMj4puN95N0KLA0Ira2hPMzwPGS/i0iNjQlFrNmcP9ohzzElw5PAadK6px8cjspKe8KrEqGD87aYp9fAtOBXwBExIfAnyXVD0+cDfyR7J4CxiX384ALgNrITMz4HHCMpAMAknH9A4E/AeWSjkrKyyQd3KjOycDvgV9L8gcfyyf3j3bKCSoFkuWtHwBqyayBMy956X+TWUl0NvDaFrtNA3Yj0wnrjQZukLQYqAZ+vJUm5wEVwLMR8T6wob7NiFgDjAGmJ/U8B/RNlhU/HbhO0stJrEdvcRw3khkS+ZUkv7csL9w/2i/PZt5GSTodOCUizi52LGZp4/5RGnyq2QZJ+hnwdWBEsWMxSxv3j9LhMygzM0slj4OamVkqOUGZmVkqOUGZmVkqOUGZmVkqOUGZmVkq/X+B9kdy9nlKPwAAAABJRU5ErkJggg==\n", | |
"text/plain": "<Figure size 432x216 with 2 Axes>" | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": "df['dayofweek'] = df['effective_date'].dt.dayofweek\nbins = np.linspace(df.dayofweek.min(), df.dayofweek.max(), 7)\ng = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'dayofweek', bins=bins, ec=\"k\")\ng.axes[-1].legend()\nplt.show()\n" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less than day 4 " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n <th>dayofweek</th>\n <th>weekend</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>male</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>female</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>2016-09-08</td>\n <td>2016-09-22</td>\n <td>27</td>\n <td>college</td>\n <td>male</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>28</td>\n <td>college</td>\n <td>female</td>\n <td>4</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>29</td>\n <td>college</td>\n <td>male</td>\n <td>4</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 2016-09-08 \n1 2 2 PAIDOFF 1000 30 2016-09-08 \n2 3 3 PAIDOFF 1000 15 2016-09-08 \n3 4 4 PAIDOFF 1000 30 2016-09-09 \n4 6 6 PAIDOFF 1000 30 2016-09-09 \n\n due_date age education Gender dayofweek weekend \n0 2016-10-07 45 High School or Below male 3 0 \n1 2016-10-07 33 Bechalor female 3 0 \n2 2016-09-22 27 college male 3 0 \n3 2016-10-08 28 college female 4 1 \n4 2016-10-08 29 college male 4 1 " | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df['weekend'] = df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\ndf.head()" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "## Convert Categorical features to numerical values" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Lets look at gender:" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 105, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "Gender loan_status\n0 PAIDOFF 0.731293\n COLLECTION 0.268707\n1 PAIDOFF 0.865385\n COLLECTION 0.134615\nName: loan_status, dtype: float64" | |
}, | |
"execution_count": 105, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "86 % of female pay there loans while only 73 % of males pay there loan\n" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Lets convert male to 0 and female to 1:\n" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n <th>dayofweek</th>\n <th>weekend</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>0</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>1</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>2016-09-08</td>\n <td>2016-09-22</td>\n <td>27</td>\n <td>college</td>\n <td>0</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>28</td>\n <td>college</td>\n <td>1</td>\n <td>4</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>29</td>\n <td>college</td>\n <td>0</td>\n <td>4</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 2016-09-08 \n1 2 2 PAIDOFF 1000 30 2016-09-08 \n2 3 3 PAIDOFF 1000 15 2016-09-08 \n3 4 4 PAIDOFF 1000 30 2016-09-09 \n4 6 6 PAIDOFF 1000 30 2016-09-09 \n\n due_date age education Gender dayofweek weekend \n0 2016-10-07 45 High School or Below 0 3 0 \n1 2016-10-07 33 Bechalor 1 3 0 \n2 2016-09-22 27 college 0 3 0 \n3 2016-10-08 28 college 1 4 1 \n4 2016-10-08 29 college 0 4 1 " | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\ndf.head()" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "## One Hot Encoding \n#### How about education?" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "education loan_status\nBechalor PAIDOFF 0.750000\n COLLECTION 0.250000\nHigh School or Below PAIDOFF 0.741722\n COLLECTION 0.258278\nMaster or Above COLLECTION 0.500000\n PAIDOFF 0.500000\ncollege PAIDOFF 0.765101\n COLLECTION 0.234899\nName: loan_status, dtype: float64" | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df.groupby(['education'])['loan_status'].value_counts(normalize=True)" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "#### Feature before One Hot Encoding" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>education</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>High School or Below</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>Bechalor</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>college</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>college</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>college</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Principal terms age Gender education\n0 1000 30 45 0 High School or Below\n1 1000 30 33 1 Bechalor\n2 1000 15 27 0 college\n3 1000 30 28 1 college\n4 1000 30 29 0 college" | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df[['Principal','terms','age','Gender','education']].head()" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "#### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>weekend</th>\n <th>Bechalor</th>\n <th>High School or Below</th>\n <th>college</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Principal terms age Gender weekend Bechalor High School or Below \\\n0 1000 30 45 0 0 0 1 \n1 1000 30 33 1 0 1 0 \n2 1000 15 27 0 0 0 0 \n3 1000 30 28 1 1 0 0 \n4 1000 30 29 0 1 0 0 \n\n college \n0 0 \n1 0 \n2 1 \n3 1 \n4 1 " | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "Feature = df[['Principal','terms','age','Gender','weekend']]\nFeature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)\nFeature.drop(['Master or Above'], axis = 1,inplace=True)\nFeature.head()\n" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "### Feature selection" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Lets defind feature sets, X:" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>weekend</th>\n <th>Bechalor</th>\n <th>High School or Below</th>\n <th>college</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Principal terms age Gender weekend Bechalor High School or Below \\\n0 1000 30 45 0 0 0 1 \n1 1000 30 33 1 0 1 0 \n2 1000 15 27 0 0 0 0 \n3 1000 30 28 1 1 0 0 \n4 1000 30 29 0 1 0 0 \n\n college \n0 0 \n1 0 \n2 1 \n3 1 \n4 1 " | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "X = Feature\nX[0:5]" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "What are our lables?" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)" | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "y = df['loan_status'].values\ny[0:5]" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "## Normalize Data " | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Data Standardization give data zero mean and unit variance (technically should be done after train test split )" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 107, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "array([[ 0.52, 0.92, 2.33, -0.42, -1.21],\n [ 0.52, 0.92, 0.34, 2.38, -1.21],\n [ 0.52, -0.96, -0.65, -0.42, -1.21],\n [ 0.52, 0.92, -0.49, 2.38, 0.83],\n [ 0.52, 0.92, -0.32, -0.42, 0.83]])" | |
}, | |
"execution_count": 107, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "X= preprocessing.StandardScaler().fit(X).transform(X)\nX[0:5]" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "# Classification " | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model\nYou should use the following algorithm:\n- K Nearest Neighbor(KNN)\n- Decision Tree\n- Support Vector Machine\n- Logistic Regression\n\n\n\n__ Notice:__ \n- You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.\n- You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.\n- You should include the code of the algorithm in the following cells." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# K Nearest Neighbor(KNN)\nNotice: You should find the best k to build the model with the best accuracy. \n**warning:** You should not use the __loan_test.csv__ for finding the best k, however, you can split your train_loan.csv into train and test to find the best __k__." | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "array([0.71153846, 0.625 , 0.72115385, 0.72115385, 0.73076923,\n 0.71153846, 0.72115385, 0.72115385, 0.75 ])" | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.model_selection import train_test_split\nfrom sklearn import metrics\n\nX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=4)\n\nKs = 10\nmean_acc = np.zeros((Ks-1))\nstd_acc = np.zeros((Ks-1))\nConfustionMx = [];\nfor n in range(1,Ks):\n \n #Train Model and Predict \n neigh = KNeighborsClassifier(n_neighbors = n).fit(X_train,y_train)\n yhat=neigh.predict(X_test)\n mean_acc[n-1] = metrics.accuracy_score(y_test, yhat)\n std_acc[n-1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])\n\nmean_acc" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Accuracy is high when K=9" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [], | |
"source": "neigh_high = KNeighborsClassifier(n_neighbors = 9).fit(X,y)\n" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Decision Tree" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "0.7403846153846154" | |
}, | |
"execution_count": 25, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.tree import DecisionTreeClassifier\n\nloanTree = DecisionTreeClassifier(criterion=\"entropy\", max_depth = 4)\nX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=4)\nloanTree.fit(X_train,y_train)\n\npredTree = loanTree.predict(X_test)\nmetrics.accuracy_score(y_test, predTree)" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 45, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=4,\n max_features=None, max_leaf_nodes=None,\n min_impurity_decrease=0.0, min_impurity_split=None,\n min_samples_leaf=1, min_samples_split=2,\n min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n splitter='best')" | |
}, | |
"execution_count": 45, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "loanTree_high = DecisionTreeClassifier(criterion=\"entropy\", max_depth = 4)\nloanTree_high.fit(X,y)" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": {}, | |
"outputs": [], | |
"source": "#!conda install -c conda-forge pydotplus -y\n#!conda install -c conda-forge python-graphviz -y" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [], | |
"source": "from sklearn.externals.six import StringIO\nimport pydotplus\nimport matplotlib.image as mpimg\nfrom sklearn import tree\n%matplotlib inline " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "Index(['Principal', 'terms', 'age', 'Gender', 'weekend', 'Bechalor',\n 'High School or Below'],\n dtype='object')" | |
}, | |
"execution_count": 32, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "Feature.columns[0:7]\n" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "['PAIDOFF', 'COLLECTION']" | |
}, | |
"execution_count": 36, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "df[\"loan_status\"].unique().tolist()" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": "dot_data = StringIO()\nfilename = \"drugtree.png\"\nfeatureNames = Feature.columns[0:8]\ntargetNames = df[\"loan_status\"].unique().tolist()\nout=tree.export_graphviz(loanTree_high,feature_names=featureNames, out_file=dot_data, class_names= np.unique(y), filled=True, special_characters=True,rotate=False) \ngraph = pydotplus.graph_from_dot_data(dot_data.getvalue()) \ngraph.write_png(filename)\nimg = mpimg.imread(filename)\nplt.figure(figsize=(100, 200))\nplt.imshow(img,interpolation='nearest')" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Support Vector Machine" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "1.Linear\n2.Polynomial\n3.Radial basis function (RBF)\n4.Sigmoid" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n \"avoid this warning.\", FutureWarning)\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)" | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn import svm\nclf = svm.SVC(kernel='poly')\n\nX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=4)\nclf.fit(X_train, y_train) \nyhat = clf.predict(X_test)\nyhat [0:5]" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"metadata": {}, | |
"outputs": [], | |
"source": "def plot_confusion_matrix(cm, classes,\n normalize=False,\n title='Confusion matrix',\n cmap=plt.cm.Blues):\n \"\"\"\n This function prints and plots the confusion matrix.\n Normalization can be applied by setting `normalize=True`.\n \"\"\"\n if normalize:\n cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n print(\"Normalized confusion matrix\")\n else:\n print('Confusion matrix, without normalization')\n\n print(cm)\n\n plt.imshow(cm, interpolation='nearest', cmap=cmap)\n plt.title(title)\n plt.colorbar()\n tick_marks = np.arange(len(classes))\n plt.xticks(tick_marks, classes, rotation=45)\n plt.yticks(tick_marks, classes)\n\n fmt = '.2f' if normalize else 'd'\n thresh = cm.max() / 2.\n for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n plt.text(j, i, format(cm[i, j], fmt),\n horizontalalignment=\"center\",\n color=\"white\" if cm[i, j] > thresh else \"black\")\n\n plt.tight_layout()\n plt.ylabel('True label')\n plt.xlabel('Predicted label')" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 74, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": " precision recall f1-score support\n\n COLLECTION 0.67 0.07 0.13 27\n PAIDOFF 0.75 0.99 0.85 77\n\n micro avg 0.75 0.75 0.75 104\n macro avg 0.71 0.53 0.49 104\nweighted avg 0.73 0.75 0.67 104\n\nConfusion matrix, without normalization\n[[76 1]\n [25 2]]\n" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAa0AAAFuCAYAAAAyKkctAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3XecXHX9/fHXSUJJDC2kEEF6FyWEgHTp0oSooYMBIhEQBJEOX0QUBUVBBPGHIgREikioSjGIgiKaQGhSIiVSQkLooZny/v1xP4uzO7uzM8nO3nuz5+ljHjv3zp07701wTj7lfq4iAjMzszLolXcBZmZm9XJomZlZaTi0zMysNBxaZmZWGg4tMzMrDYeWmZmVhkPLLJHUV9Itkt6S9NsFOM/+ku7sytryImlLSU/lXYdZC/k6LSsbSfsBxwJrA+8Ak4GzIuK+BTzvgcBRwGYRMWeBCy04SQGsERH/zrsWs3q5pWWlIulY4Hzge8AQYEXgZ8AeXXD6lYCne0Jg1UNSn7xrMGvLoWWlIWkp4EzgaxFxQ0S8GxGzI+KWiDg+HbOYpPMlvZwe50taLL22taQXJX1T0gxJ0yQdnF77NnA6sLekWZLGSDpD0q8rPn9lSdHyZS7pIEnPSnpH0nOS9q/Yf1/F+zaT9M/U7fhPSZtVvHaPpO9I+ms6z52SBnbw+7fUf0JF/SMl7SLpaUmvSzql4viNJd0v6c107IWSFk2v/SUd9nD6ffeuOP+Jkl4BLmvZl96zWvqM4Wn745JmStp6gf5izRrg0LIy2RRYHBhf45hTgU2AYcD6wMbAaRWvLwcsBSwPjAEukrRMRHyLrPV2bUT0j4hLaxUi6WPABcDOEbEEsBlZN2Xb4wYAt6VjlwV+DNwmadmKw/YDDgYGA4sCx9X46OXI/gyWJwvZXwAHABsCWwKnS1o1HTsX+AYwkOzPbjvgCICI2Cods376fa+tOP8Aslbn2MoPjohngBOBqyT1Ay4DLo+Ie2rUa9alHFpWJssCMzvpvtsfODMiZkTEq8C3gQMrXp+dXp8dEb8HZgFrzWc984D1JPWNiGkR8Xg7x+wKTImIKyNiTkRcDTwJfL7imMsi4umIeB+4jixwOzKbbPxuNnANWSD9JCLeSZ//OPBpgIiYFBF/T5/7PPD/gM/W8Tt9KyI+TPW0EhG/AKYADwBDyf6RYNZtHFpWJq8BAzsZa/k4MLVie2ra99E52oTee0D/RguJiHeBvYHDgGmSbpO0dh31tNS0fMX2Kw3U81pEzE3PW0JlesXr77e8X9Kakm6V9Iqkt8laku12PVZ4NSI+6OSYXwDrAT+NiA87OdasSzm0rEzuBz4ARtY45mWyrq0WK6Z98+NdoF/F9nKVL0bEHRGxA1mL40myL/PO6mmp6aX5rKkRF5PVtUZELAmcAqiT99ScTiypP9lEmEuBM1L3p1m3cWhZaUTEW2TjOBelCQj9JC0iaWdJP0iHXQ2cJmlQmtBwOvDrjs7ZicnAVpJWTJNATm55QdIQSbunsa0PyboZ57Zzjt8Da0raT1IfSXsD6wK3zmdNjVgCeBuYlVqBh7d5fTqwatW7avsJMCkivkI2VvfzBa7SrAEOLSuViPgx2TVapwGvAi8ARwI3pkO+C0wEHgEeBR5M++bns+4Crk3nmkTroOkFfJOsJfU62VjREe2c4zVgt3Tsa8AJwG4RMXN+amrQcWSTPN4hawVe2+b1M4BxaXbhXp2dTNIewE5kXaKQ/T0Mb5k1adYdfHGxmZmVhltaZmZWGg4tMzMrDYeWmZmVhkPLzMxKwwtidiP16RtadIm8y7CC2GCdFfMuwQrkwQcnzYyIQXnXAdB7yZUi5lQtiFJTvP/qHRGxU5NK+ohDqxtp0SVYbK1OZxZbD/HXBy7MuwQrkL6LqO3KKbmJOe83/F31weSLOlttpUs4tMzMrA2Bijl65NAyM7PWBKizFb/y4dAyM7NqbmmZmVlpuKVlZmbl4DEtMzMrE7e0zMysFIRbWmZmVhZyS8vMzErELS0zMysNt7TMzKwcPHvQzMzKwitimJlZqbilZWZm5eDuQTMzK5Ne7h40M7MyKPDFxcWsyszM8iU19uj0dFpL0uSKx9uSjpE0QNJdkqakn8vUOo9Dy8zM2khjWo08OhERT0XEsIgYBmwIvAeMB04CJkTEGsCEtN0hh5aZmVXr4pZWG9sBz0TEVGAPYFzaPw4YWeuNDi0zM+tu+wBXp+dDImIaQPo5uNYbPRHDzMyqNT4RY6CkiRXbl0TEJVWnlRYFdgdOnp+yHFpmZtba/HX5zYyIEXUctzPwYERMT9vTJQ2NiGmShgIzar3Z3YNmZlatiydiVNiX/3UNAtwMjE7PRwM31XqzQ8vMzKo1YSKGpH7ADsANFbvPBnaQNCW9dnatc7h70MzM2mjOMk4R8R6wbJt9r5HNJqyLQ8vMzKp5lXczMyuFAi/j5NAyM7M2vMq7mZmVibsHzcysNNzSMjOz0nBLy8zMSkEe0zIzszJxS8vMzMpCDi0zMysD4dAyM7OyUHoUkEPLzMzakFtaZmZWHg4tMzMrDYeWmZmVhkPLzMzKwRMxzMysLOSJGGZmViYOLTMzKw2HlpmZlYZDy8zMysETMczMrEzc0jIzs1Lw7EEzMysVh5aZmZVHMTPLoWVmZm2ouC2tXnkXYGZmxSOpoUed51xa0vWSnpT0hKRNJQ2QdJekKennMrXO4dAyM7MqzQgt4CfA7RGxNrA+8ARwEjAhItYAJqTtDjm0zMys6SQtCWwFXAoQEf+NiDeBPYBx6bBxwMha53FomZlZKy1T3ru4pbUq8CpwmaSHJP1S0seAIRExDSD9HFzrJA4tMzOrpgYfMFDSxIrH2DZn7AMMBy6OiA2Ad+mkK7A9nj1oZmatzd/swZkRMaLG6y8CL0bEA2n7erLQmi5paERMkzQUmFHrQxxa1jRrrDSYK8855KPtVZZflu9cfBsX/uYeDt/nsxy291bMmTuP2+99jFN/clN+hVouvvqVQ/jD729l0ODBTJr8WN7lWBtdPeU9Il6R9IKktSLiKWA74F/pMRo4O/2s+WXg0LKmmTJ1BpvsczYAvXqJZ+44i5v/9DBbjViD3bb+FBvt9X3+O3sOg5bpn3OllocDRx/EYUccyVcO+XLepVg7mnSd1lHAVZIWBZ4FDiYbprpO0hjgP8CetU7g0LJusc3Ga/Hci6/yn2lv8L1jvsC5l93Ff2fPAeDVN2blXJ3lYYstt2Lq88/nXYZ1pAmZFRGTgfa6ELer9xyeiGHdYs/Pbch1t08CYPWVBrP5BqvxlyuO485fHs2G666Yc3Vm1laTrtNaYKUJLUlzJU2W9Jik30rqV/HaFySFpLUr9q0s6bH0fGtJb6Vplk9J+ouk3dqcf2y6SvtJSf+QtEXFa/ek901Oj1Ftamp5rNzsP4cyWqRPb3b97Ke44a6HAOjTuxfLLNmPrb58LqecdyO//sEhnZzBzLpTo4HVnaFVpu7B9yNiGICkq4DDgB+n1/YF7gP2Ac7o4P33RsRu6f3DgBslvR8RE1KAfRXYIiJmShqeXt84Il5J798/IiZ2VJN17HNbrMvkJ19gxuvvAPDS9De5ccLDAEx8fCrz5gUDl+nPTHcTmhWG1x7sWvcCqwNI6g9sDowhC61OpX7VM4Ej064TgeMjYmZ6/UGyK7O/1rVl90x77TTio65BgFvueYStN14TgNVXHMyii/RxYJkVTFFbWqULLUl9gJ2BR9OukWRrWT0NvJ5aSfV4EGjpTvwkMKnN6xPT/hZXVXQDLpv29a3YN76Dese2XGwXc96vs7SFR9/FF2Hbz6zNTXdP/mjfuBvvZ5Xll2Xib0/hirMP5iunX5ljhZaXLx+wL1tvuSlPP/UUq628Apf/6tK8S7JKjV9c3C3K1D3YV1LLN9+9pPWryLoGz0/Pr0nbD9Zxvs7+mAVExfZ8dQ9GxCXAJQC9+g2OWscujN7/YDYrbHNiq32z58zlkNOuyKkiK4orfn113iVYDUXtHixTaFUFRGrxbAusJymA3kBIOqGO821AtsIwZBe3bQjcXfH68LTfzKxn8f20mmYUcEVErBQRK0fEJ4DngC1qvUnSp4H/Ay5Ku34AnNPS7ZcmahwE/KxZhZuZFZUAqbFHdylTS6s9+5It/VHpd8B+wDlt9m8p6SGgH9naVl+PiAkAEXGzpOWBv6UW2zvAAS0rD5uZ9SzdO7miEaUJrYioWusnIrZuZ98FFZvrpX33AEt1cv6LgYs7eK3qczqqycxsYVDQzCpPaJmZWfdxS8vMzMqhm8epGuHQMjOzVkR2Z4YicmiZmVkVt7TMzKw0PKZlZmbl4DEtMzMri+zi4mKmlkPLzMza8MXFZmZWIgXNLIeWmZlVc0vLzMzKwRMxzMysLDwRw8zMSqWgmeXQMjOzam5pmZlZaRQ0sxxaZmbWhorb0uqVdwFmZmb1ckvLzMxayWYPNuG80vPAO8BcYE5EjJA0ALgWWBl4HtgrIt7o6BxuaZmZWRvZMk6NPBqwTUQMi4gRafskYEJErAFMSNsdcmiZmVkVqbHHAtgDGJeejwNG1jrYoWVmZlXmo6U1UNLEisfYdk4bwJ2SJlW8PiQipgGkn4Nr1eUxLTMza23+Wk8zK7r8OrJ5RLwsaTBwl6QnG/0Qh5aZmbXSrGWcIuLl9HOGpPHAxsB0SUMjYpqkocCMWudw96CZmVXp6okYkj4maYmW58COwGPAzcDodNho4KZa53FLy8zMqjShoTUEGJ8Crg/wm4i4XdI/geskjQH+A+xZ6yQOLTMzq9LV3YMR8Sywfjv7XwO2q/c8Di0zM2vN99MyM7OyEA1fMNxtHFpmZlaloJnl0DIzs2q9CppaDi0zM6tS0MxyaJmZWWsq8P20HFpmZlalVzEzy6FlZmbV3NIyM7PSKGhmObTMzKw1kV2rVUQOLTMzq1K6MS1JS9Z6Y0S83fXlmJlZ7upcuT0PtVpaj5PdZbKy8pbtAFZsYl1mZpajgmZWx6EVEZ/ozkLMzKwYRHFXxKjrJpCS9pF0Snq+gqQNm1uWmZnlSWrs0V06DS1JFwLbAAemXe8BP29mUWZmlq+uvnNxV6ln9uBmETFc0kMAEfG6pEWbXJeZmeWku1tPjagntGZL6kU2+QJJywLzmlqVmZnlqsxjWhcBvwMGSfo2cB9wTlOrMjOzXKnBR3fptKUVEVdImgRsn3btGRGPNbcsMzPLUxmv06rUG5hN1kVY14xDMzMrp2zKe95VtK+e2YOnAlcDHwdWAH4j6eRmF2ZmZjlpcOZg0WYPHgBsGBHvAUg6C5gEfL+ZhZmZWX4K2jtYV2hNbXNcH+DZ5pRjZmZFULoxLUnnkY1hvQc8LumOtL0j2QxCMzOzblWrpdUyQ/Bx4LaK/X9vXjlmZpa3Ik/EqLVg7qXdWYiZmRVHM7oHJfUGJgIvRcRuklYBrgEGAA8CB0bEf2udo57Zg6tJukbSI5Kebnl0xS9gZmbF1KSLi48GnqjYPgc4LyLWAN4AxnR2gnquubocuCzVtTNwHVkympnZQkjKlnFq5NH5ObUCsCvwy7QtYFvg+nTIOGBkZ+epJ7T6RcQdABHxTEScRrbqu5mZLaTm49YkAyVNrHiMbXPK84ET+N/atcsCb0bEnLT9IrB8Z3XVM+X9w5SIz0g6DHgJGFzH+8zMrKTmY0xrZkSM6OBcuwEzImKSpK1bdrdzaHT2IfWE1jeA/sDXgbOApYBD6nifmZmVVBfPw9gc2F3SLsDiwJJkLa+lJfVJra0VgJc7O1E9C+Y+kJ6+w/9uBGlmZgspUd84Vb0i4mTgZIDU0jouIvaX9FtgFNk8idHATZ2dq9bFxeOp0VSLiC82VraZmZVC990E8kTgGknfBR4COr3UqlZL68Kuqsoyq64ylB+NOy3vMqwg5s7rtPveLDfNWsYpIu4B7knPnwU2buT9tS4unrAghZmZWXkV9R5U9d5Py8zMeghRwgVzzcys5yrd2oNtSVosIj5sZjFmZlYMRQ2tetYe3FjSo8CUtL2+pJ82vTIzM8tFtspFMe9cXM9Y2wXAbsBrABHxMF7GycxsodZLjT26Sz3dg70iYmqbJJ3bpHrMzKwACjoPo67QekHSxkCke6EcBfjWJGZmC6nsJpDFTK16Qutwsi7CFYHpwB/TPjMzW0iV9jqtiJgB7NMNtZiZWUEUtKHVeWhJ+gXtrEEYEW3vlWJmZgsB1XljxzzU0z34x4rniwNfAF5oTjlmZlYEBc2suroHr63clnQlcFfTKjIzs9wV9eLi+VnGaRVgpa4uxMzMiqHUswclvcH/xrR6Aa8DJzWzKDMzy1dBM6t2aCm7onh94KW0a15E+CZAZmYLs25e5aIRNafip4AaHxFz08OBZWbWA6jB/3WXeq4f+4ek4U2vxMzMCiEb0yrZ2oOS+kTEHGAL4FBJzwDvkv0+EREOMjOzhVRRuwdrjWn9AxgOjOymWszMrCDKeOdiAUTEM91Ui5mZFUBL92AR1QqtQZKO7ejFiPhxE+oxM7O8qZxT3nsD/aEbp4WYmZnVUCu0pkXEmd1WiZmZFUYZV8QoZsVmZtZUZR3T2q7bqjAzs0IpaEOr44uLI+L17izEzMyKQvRq8NHpGaXFJf1D0sOSHpf07bR/FUkPSJoi6VpJi9Y6T1HvqGxmZjkRWUurkUcdPgS2jYj1gWHATpI2Ac4BzouINYA3gDG1TuLQMjOz1hpcwqme8a/IzEqbi6RHANsC16f94+hkQQuHlpmZVeklNfQABkqaWPEY2/acknpLmgzMILuZ8DPAm2nJQIAXgeVr1TU/N4E0M7OFWEv3YINmRsSIWgdExFxgmKSlgfHAOu0dVuscDi0zM6vSzOu0IuJNSfcAmwBLVyzQvgLwcs26mlaVmZmVVldPxJA0KLWwkNQX2B54AvgTMCodNhq4qdZ53NIyM7NWRFNaNEOBcZJ6p9NfFxG3SvoXcI2k7wIPAZfWOolDy8zMWlPX35okIh4BNmhn/7PAxvWex6FlZmZVCroghkPLzMxay9YeLGZsObTMzKxKMSPLoWVmZu0oaEPLoWVmZm2pyydidBWHlpmZtdKkKe9dwqFlZmZV3NIyM7PSKGZkObTMzKytJlxc3FUcWmZm1orHtMzMrFTc0jIzs9IoZmQ5tMzMrB0FbWg5tMzMrLVsTKuYqeXQMjOzKm5pmZlZSQi5pWVmZmXhlpaZmZWCx7TMzKw85JaWmZmViEPLzMxKo6gTMYq6vJSZmVkVt7TMzKwVAb2K2dByaFnzvPrKS/zk1K/z5mszkHqx46gD+Pz+h3L1xedy1++uYskBywJwwFEnM2LL7XKu1rrbiy+8wKFjRjP9lVfo1asXB485lK8ddXTeZVlS1O5Bh5Y1Te/efTj4uG+x2jqf5v13Z/HNfT7HsE22AmD3A8cycvThOVdoeerTpw/fP+dchm0wnHfeeYctNxnBttvvwDrrrJt3aYYnYlgPNGDQEAYMGgJA34/1Z4VV1+C1Ga/kXJUVxXJDh7Lc0KEALLHEEqy19jpMe+klh1ZBFLWl5YkY1i2mv/QCzz75KGt+ajgAt13zK44etS0/Pf0bzHr7zZyrs7xNff55Hn74IUZs/Jm8SzH+N6bVyKPTc0qfkPQnSU9IelzS0Wn/AEl3SZqSfi5T6zxNDS1Jy0m6RtIzkv4l6feS1pT0SUl3S3o6Ffp/Sncck3SQpAvbOdfzkga22XeQpFclTa54rJteWzN93r/TH9J1kvauOG6WpKfS8yskbS3p1opzj5T0iKQnJT0qaWTFa5dLeknSYml7oKTnm/THWHrvv/cu53xzDGOOP5N+/Zdg571G8/Nb/8551/2RZQYN5rJzv513iZajWbNmsf8+ozjn3PNYcskl8y7HgJa1Bxv5Xx3mAN+MiHWATYCvpe/rk4AJEbEGMCFtd6hpoZVCaDxwT0SsFhHrAqcAQ4CbgbMjYk1gfWAz4Ij5/KhrI2JYxeNfkhYHbgMujojV0x/SxcDjLccBE4H90/aX29S+PnAusEdErA3sDpwr6dMVh80FDpnPmnuMObNnc86xY/jsLl9k0+13BWDpZQfRu3dvevXqxQ5fPIApjz2Uc5WWl9mzZ7P/3qPYe5/92GPkF/Mux1qkFTEaeXQmIqZFxIPp+TvAE8DywB7AuHTYOGBk+2fINLOltQ0wOyJ+3rIjIiYDawJ/jYg70773gCPpJF0btB9wf0TcUvHZf4qIx+p8/3HA9yLiufTe54DvA8dXHHM+8A1JHhfsQERw4RnHssKqa7DHlw/7aP/rr07/6PkDd/+eFVdfO4/yLGcRwRFf/Qprrb02Rx1zbN7lWBtq8AEMlDSx4jG2w3NLKwMbAA8AQyJiGmTBBgyuVVczv3DXAya1s/+TbfdHxDOS+kuan76BvSVtUbG9aY3PrtcnyVpalSYCX6vY/g9wH3AgcAsdSH9xYwEGDV1+AUoqnyce+gf33Ho9K62xDsfstT2QTW+/9w/jee6px5HE4I9/gsP/7wc5V2p5uP9vf+Xqq67kk+t9ik032gCAM848i8/tvEvOlVk2ptXwRIyZETGi03NL/YHfAcdExNtq8HPyaCUIiA5e62h/LddGxJGtPmDB52q2V2N7+75H1tV5W0cniohLgEsAVv/k+vPz+5XWusM/w40PT6va72uyDGCzzbdg1ofz8i7DOtCMuYOSFiELrKsi4oa0e7qkoRExTdJQYEatczSze/BxYMMO9rdKY0mrArNSP2czP7uR97f9F8Nw4F+VOyLi38BkYK8F+Cwzs+KZj/7BmqfLWhOXAk9ExI8rXroZGJ2ejwZuqnWeZobW3cBikg5t2SFpI2AKsIWk7dO+vsAFQFf2Ef0G2EzSrhWfvZOkT9X5/nOBk1O/a0v/6ynAj9o59iyyMTAzs4VGE2YPbk42nLJtxSzuXYCzgR0kTQF2SNsdalr3YESEpC8A50s6CfgAeB44hmy2yE8lXQT0Bq4EKqe5H1Q5xZxseiTAI5Ja+hOuAx6hekzriIj4m6Td0mefD8xOx9a1RkxETJZ0InBLas7OBk5IE0naHvu4pAfJWmJmZguFrl4RIyLuo+M2Wd1jBk0d04qIl+m462zrDt5zOXB5Oy+t3MF52juWiHgS2KlGbVu32b4HuKdi+wbgBtoREQe12fZcXTNbqBRzPQwv42RmZu0paGo5tMzMrJVsbkUxU8uhZWZmrdW5ykUeHFpmZlaloJnl0DIzs3YUNLUcWmZm1kbd1151O4eWmZlV8ZiWmZmVQp0rM+XCoWVmZtUKmloOLTMzq+IxLTMzKw2PaZmZWWkUNLMcWmZm1kaBZ2I4tMzMrIrHtMzMrBSEx7TMzKxECppZDi0zM2tHQVPLoWVmZlWKOqbVK+8CzMzM6uWWlpmZVfFEDDMzK42CZpZDy8zM2lHQ1HJomZlZK9mCGMVMLYeWmZm1Jo9pmZlZiRQ0szzl3czM2qEGH52dTvqVpBmSHqvYN0DSXZKmpJ/LdHYeh5aZmbWhhv9Xh8uBndrsOwmYEBFrABPSdk0OLTMzqyI19uhMRPwFeL3N7j2Acen5OGBkZ+fxmJaZmbXSjbfTGhIR0wAiYpqkwZ29waFlZmbVGk+tgZImVmxfEhGXdF1BGYeWmZlVmY/rtGZGxIgG3zNd0tDUyhoKzOjsDR7TMjOzKl09ptWBm4HR6flo4KbO3uDQMjOzKl084x1JVwP3A2tJelHSGOBsYAdJU4Ad0nZN7h40M7PWmrAiRkTs28FL2zVyHoeWmZm1o5hrYji0zMysFeG1B83MrEQKmlkOLTMzq+aWlpmZlYbvp2VmZuVRzMxyaJmZWbWCZpZDy8zMWlvAVS6ayqFlZmZVPKZlZmblUczMcmiZmVm1gmaWQ8vMzKp5TMvMzEpCHtMyM7NyKPLag76flpmZlYZbWmZmVqWoLS2HlpmZVfGYlpmZlUOBV8TwmJaZmZWGW1pmZtaK8MXFZmZWJgVNLYeWmZlV8UQMMzMrjaJOxHBomZlZlYJmlkPLzMzaUdDUcmiZmVmVoo5pKSLyrqHHkPQqMDXvOgpgIDAz7yKsMPzfQ2aliBiUdxEAkm4n+3tpxMyI2KkZ9VRyaFm3kzQxIkbkXYcVg/97sEZ4RQwzMysNh5aZmZWGQ8vycEneBVih+L8Hq5vHtMzMrDTc0jIzs9JwaJmZWWk4tMxsoSDJ32c9gP+SrbAkrS2pb951WLFJGg4QEfMcXAs//wVbIUnaCbgNWD7vWqzwTpM0ARxcPYH/cq1wJO0GnAx8NSL+LeljeddkhbYX8IakW8DBtbDzX6wViqQlgfHAAxHxR0krA+MlrZNrYVYokjaXNFzSMhExB9gTeFfSH8DBtTDzdVpWGJJWTy2rzwFXAd8CRgK3RMQF+VZnRSFpeeAeYAhwLzAZuA54FjgfWCoiRqVje0XEvJxKtSbwv0SsECTtAlwuadWIuAPYD/gxML0lsCT1zrNGK4aIeAn4IfBn4Pdk456HArcDDwA7SLo2HevAWsg4tCx3qWX1HeD0iHhWUt+IuBPYBfi8pC+lQ/0F1INJWlbSCgARcQlwC7AmcD3wdeBHQJDd/uczkj6eV63WPL4JpOVK0krAz4AfRMTdklYELpQ0NiImSNoL+I2kxSPiqnyrtbykyTmnpefPAc8DpwIHkU3EmAPcFBFzJY0HekXEjHyqtWZyaFmuImJq+pLZVNJUsi+m8RHxShqPuEPSQcBFkm4GZoUHYnsUSTuQdRWPAR4BPp2eXwPsA/QGRgF9Jd0VEb6h5ELMEzEsF2mW4GIR8WraPgn4CvDbiDg57esNRJoJ9rGIeDe/ii0vkk4FnouI31TsW5asO/DZiDhT0mnAYOBk/3eycHNLy7qdpF2BE4BFJD0P/A34AfAesKWkTwH/Sl09Sm97L5dirQiWAwYBH4VWRLwm6Rrgy2n7u5IGOLAWfp6IYd0qTbo4HzgT+AJwK1l3z3lpluDDwOnA+pA1syp/Ws8g6dOSvpE2/5Dt0lJtDpsEDJU0ECAiXu/OGi0fDi3rNuliz+2AUyJiQkRMB64GLiIbjxgTEd8FpgNHS1osx3ItJ5LWBAYCO6bxzD8B6wInSFq64tBntZ28AAAKX0lEQVSdyL7D3u/2Ii037h60bpPGpvoCq0M2ZpW6AB8B/gFsA1waEUdKGhQRH+ZZr3W/tObkccBRZJMvjgJmkc0QvBY4S9LiwBPAIcDe7hLsWRxa1t0eJ/tXMymw+kTEHEm3A3tJ6h8Rs1omaFjPkbqOzwWOiIgn0mxSkQWXgN2BjYAdgf8CX4qIJ/Kq1/Lh0LKmkrQ+8AlgKbLxqiuAf0j6YUQcn9aNg+yLCLLrbayHSYF1LdkqF/emyx3ek3RvOuRIYImI+BXZ0k3WQzm0rGnS0kznki21swEwF7gD2BT4q6R+wDvAK8BhwKiI+CCnci0nkjYFLiBbumtPsm7BHwIvR8T7KbgCOCl1Kf8iv2otb75Oy5oiTWs/HfhGRPwtBdTawM/JLgq9GPhS2vch8LuI+Fde9Vo+0ur9nwKmRcS9kgYAlwD/AX4YEdPScf2AzwBTIuLF3Aq23Dm0rEul66qWJevC+XNEHNbm9Y2AbwNfj4h/t7zHU9p7nvQPm1OBS8luR/N2Gt9cCvgl8AIVwWUGnvJuXW9gWkbnBGAZSV9r8/oUsunM67bscGD1PJJ2JLte7zjgVxHxegqs3hHxFtnqKB8HzpC0XJ61WrE4tKzLSBoE3CPpSxFxC3A52bU2h7ccExFvAv8E3sinSiuI3YFTI+JvpO+h1OKemyZhvAUcDixGNp5lBji0rAulaerfAk6VtHtE/IFsDGvnluCStB+wJfBcfpVaXiqW5VqtYnfbVU9WSreneQM4OF2EbgZ49qB1sYi4XtIcsotAiYib0/fUQZK2J+vy2c+D6T2PpK2AD8guJP8DsIWkuyNiZsWF5v2APYBfA++769jackvLFoiknSUdK2m9ln0RcSPZzMGzKlpcvyG7w+zYiHgsp3ItJ2kM63L+9w/liUA/YHdJAyNibtq/B9malP5usna5pWXzLa0luDNwALCRpNlkA+vvRcT4dGuRM9INHK+T9EcvudPzpFmCZwAHpMsfBgGPAn8Etia7l9p9wADgUGAv38DROuJ/zVjDWsYlImIeWTfP08A3yJbWORH4kaSVI+J6sjGuoyQt4cDqedLMvzOAB1JgDQHuBz4XEdcAvyJbrX0Hsq7jUW6JWy2+Tssa1rJeYMX2dcCjEfEdSUcD3wOeIlv94g6yLyyvxN0DSepPdhH5esBMYDfgyoi4JNfCrLQcWtaQdOvzQ8jWEXwyIm5My/DsQjZOcR7ZjfneA7YAbvCki56p5aLxdDuRXcgWvn0xIvasOOaLwNvA3anlblaTQ8vqlm4bcSZwJdmtzVcAfgo8S7aiwVbADhFxdzq+VYvMegZJQ1qmqVcEV3/g88DmwKSIuEzSKOC7wOcjYkqOJVuJeEzL6pLWhPs98J2I+CnZ+nD9gFXSBcPHABPI7nMEgAOrZ1FmINkq/vtBdu1VCq5ZwO3A34Bhkq4lm7TzBQeWNcKhZXVJtzL/PHC2pCUj4gVgNjAwTcyYSrbKxeYVF5BaDxKZmWS3ETlD0p4t+9N1WG+QLZb8MNAfONT3w7JGecq71S0ibpM0D5gk6Q6yltYV6QLQNyXdRNb14z7nHihd4hARcUu6/OHi1Mq6DmgZr9oVWBLYNyLezqtWKy+PaVnD0soWdwLLRcQMSf0i4r2867J8SFqLbPWK/6Rr94iIeWkM9GLgxHSd3liyi863jYincyzZSsyhZfNF0s5kN3jcxheC9lzpNiInAksD34+IFypaXC3BdQHwd7Kbf46KiIfzq9jKzqFl803SHmQXD48gDWnkXJLlQNLGwEhgceD81OJaBJiX1hPcCTgHODAiHsmzVis/h5YtEEn908ww60EkrQksBbwJPAOsDIwFFgUuiIjn03FHAq8At0bEB7kUawsVh5aZNUTS7mSrnkwlu63IELKLhweQXXi+CHA82UoYPwF2iYiH8qnWFjYOLTOrm6TNgUuB/SNiUtr3M7J7pH2WbGzrYGB7YCWyNQYfzalcWwj5Oi0za8RywC8jYpKkxQAi4gjgPuBmspt73kx2zywHlnU5t7TMrFOSRqSnI4GNIuJzaf8iETE7TbwYDxwWES9KWiwiPsyrXlt4uaVlZjWl+2H9ClgNuBV4QdIeknqlwOoTEbPJvk+WAnBgWbN4RQwz65Ckz5JNptg/Ih6QtDjZAsnbkIXU+IiYk5Zs+jjwan7VWk/g7kEz65CkY4G5EfETSYtGxH/ToriHAOsCq5AtgrsbWbD5OixrKre0zKxKyy1FyELprbR7duoSnCnpAmB9snumPUo2OeOZnMq1HsRjWmZWpWJ1k/HAJpI2bNmXxrA+ADYCJkTEnQ4s6y4OLTOr5QGy6ex7p+Cal8aw9gZGA6/nW571NB7TMrOaJC0PjAG2A/4JfACMIlv89rE8a7Oex6FlZp2S1BfYkGyli2nAn3x7EcuDQ8vMzErDY1pmZlYaDi0zMysNh5aZmZWGQ8vMzErDoWVmZqXh0DIzs9JwaFmPIWmupMmSHpP0W0n9FuBcW0u6NT3fXdJJNY5dWtIR8/EZZ0g6rt79bY65XNKoBj5rZUm+UNgKz6FlPcn7ETEsItYD/gscVvmiMg3/fyIibo6Is2scsjTQcGiZWTWHlvVU9wKrpxbGE5J+BjwIfELSjpLul/RgapH1B5C0k6QnJd0HfLHlRJIOknRhej5E0nhJD6fHZsDZwGqplffDdNzxkv4p6RFJ364416mSnpL0R2Ctzn4JSYem8zws6XdtWo/bS7pX0tOSdkvH95b0w4rP/uqC/kGadSeHlvU4kvoAO5PdUgOycLgiIjYA3gVOA7aPiOHARODYdPPDXwCfB7YEluvg9BcAf46I9YHhwOPAScAzqZV3vKQdgTWAjYFhwIaStpK0IbAPsAFZKG5Ux69zQ0RslD7vCbI1AlusDHwW2BX4efodxgBvRcRG6fyHSlqljs8xKwTfT8t6kr6SJqfn9wKXkt1td2pE/D3t34Ts5oZ/lQSwKHA/sDbwXERMAZD0a2BsO5+xLfBlgIiYC7wlaZk2x+yYHg+l7f5kIbYE2Z2A30ufcXMdv9N6kr5L1gXZH7ij4rXrImIeMEXSs+l32BH4dMV411Lps72OoJWCQ8t6kvcjYljljhRM71buAu6KiH3bHDcM6KqFOgV8PyL+X5vPOGY+PuNyYGREPCzpIGDritfanivSZx8VEZXhhqSVG/xcs1y4e9Cstb8Dm0taHUBSP0lrAk8Cq0haLR23bwfvnwAcnt7bW9KSwDtkragWdwCHVIyVLS9pMPAX4AuS+kpagqwrsjNLANMkLQLs3+a1PSX1SjWvCjyVPvvwdDyS1pT0sTo+x6wQ3NIyqxARr6YWy9WSFku7T4uIpyWNBW6TNJPsxojrtXOKo4FLJI0B5gKHR8T9kv6appT/IY1rrQPcn1p6s4ADIuJBSdcCk4GpZF2Ynfk/shs1TiUbo6sMx6eAPwNDgMMi4gNJvyQb63pQ2Ye/Coys70/HLH++NYmZmZWGuwfNzKw0HFpmZlYaDi0zMysNh5aZmZWGQ8vMzErDoWVmZqXh0DIzs9L4/4lZgvcfVwp7AAAAAElFTkSuQmCC\n", | |
"text/plain": "<Figure size 720x360 with 2 Axes>" | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": "# Compute confusion matrix\nfrom sklearn.metrics import classification_report, confusion_matrix\n\ncnf_matrix = confusion_matrix(y_test, yhat, labels=['PAIDOFF','COLLECTION'])\nnp.set_printoptions(precision=2)\n\nprint (classification_report(y_test, yhat))\n\n# Plot non-normalized confusion matrix\nplt.figure(figsize=(10,5))\nplot_confusion_matrix(cnf_matrix, classes=['PAIDOFF','COLLECTION'],normalize= False, title='Confusion matrix')" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 75, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "0.6668539325842696" | |
}, | |
"execution_count": 75, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.metrics import f1_score\nf1_score(y_test, yhat, average='weighted') " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 76, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "0.75" | |
}, | |
"execution_count": 76, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.metrics import jaccard_similarity_score\njaccard_similarity_score(y_test, yhat)" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Model for entire data using poly as it gave more f1_score and similarity score than others" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 77, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n decision_function_shape='ovr', degree=3, gamma='auto_deprecated',\n kernel='poly', max_iter=-1, probability=False, random_state=None,\n shrinking=True, tol=0.001, verbose=False)" | |
}, | |
"execution_count": 77, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn import svm\nclf_high = svm.SVC(kernel='poly')\nclf_high.fit(X, y) " | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Logistic Regression" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "can use any of these'newton-cg\u2019, \u2018lbfgs\u2019, \u2018liblinear\u2019, \u2018sag\u2019, \u2018saga\u2019" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 87, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='warn',\n n_jobs=None, penalty='l2', random_state=None, solver='sag',\n tol=0.0001, verbose=0, warm_start=False)" | |
}, | |
"execution_count": 87, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.linear_model import LogisticRegression\nX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=4)\n\nLR = LogisticRegression(C=0.01, solver='sag').fit(X_train,y_train)\nLR" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 88, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)" | |
}, | |
"execution_count": 88, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "yhat = LR.predict(X_test)\nyhat[0:5]" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 89, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "array([[0.31, 0.69],\n [0.28, 0.72],\n [0.17, 0.83],\n [0.2 , 0.8 ],\n [0.18, 0.82]])" | |
}, | |
"execution_count": 89, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "yhat_prob = LR.predict_proba(X_test)\nyhat_prob[0:5]" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 90, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "0.7403846153846154" | |
}, | |
"execution_count": 90, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.metrics import jaccard_similarity_score\njaccard_similarity_score(y_test, yhat)" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 84, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "Confusion matrix, without normalization\n[[73 4]\n [25 2]]\n" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAVIAAAEmCAYAAAAwZhg4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3XecXFX9//HXexNKQkIJKQT40rtIQigqvRt6UHpASiSCUgRBQFABQVFREUH8oQiIIkQk0kEMoDTRBEKTEimhhYRQEwgYwuf3xzkDwyY7M9mZ2ZnJvp8+7mNn7r1z7mcW95Nzzr3nHEUEZmbWeW2NDsDMrNU5kZqZVcmJ1MysSk6kZmZVciI1M6uSE6mZWZWcSK1hJPWSdL2ktyT9qYpyRkr6ay1jaxRJm0t6stFx2PyRnyO1ciTtDxwHrAXMACYCZ0XE3VWWeyBwFLBJRHxQdaBNTlIAq0fEfxsdi9WWa6RWkqTjgHOB7wODgBWAXwK716D4FYGnukMSrYSkno2OwTopIrx5m+cGLAHMBPYqcc4ipET7ct7OBRbJx7YCXgS+AUwDpgCH5GOnA/8DZudrjAJOA35fVPZKQAA98/uDgWdIteJngZFF++8u+twmwL+Bt/LPTYqO3Ql8D7gnl/NXoH8H360Q/zeL4h8B7AQ8BbwOfKvo/I2B+4A387nnAwvnY//I3+Wd/H33KSr/ROAV4PLCvvyZVfM1huX3ywLTga0a/f8Nb5/cXCO1Uj4HLAqMLXHOKcBngaHAEFIyObXo+DKkhLwcKVleIGmpiPguqZZ7VUT0iYiLSwUiaTHgPGDHiOhLSpYT53FeP+DGfO7SwE+BGyUtXXTa/sAhwEBgYeD4EpdehvQ7WA74DvBr4ABgA2Bz4DuSVsnnzgGOBfqTfnfbAl8FiIgt8jlD8ve9qqj8fqTa+ejiC0fE06Qk+wdJvYFLgEsj4s4S8VoDOJFaKUsD06N003skcEZETIuIV0k1zQOLjs/Ox2dHxE2k2tianYznQ2BdSb0iYkpEPDaPc3YGJkXE5RHxQUT8EXgC2LXonEsi4qmImAWMIf0j0JHZpP7g2cCVpCT584iYka//GLAeQERMiIh/5us+B/w/YMsKvtN3I+L9HM8nRMSvgUnA/cBg0j9c1mScSK2U14D+ZfrulgUmF72fnPd9VEa7RPwu0Gd+A4mId0jN4cOBKZJulLRWBfEUYlqu6P0r8xHPaxExJ78uJLqpRcdnFT4vaQ1JN0h6RdLbpBp3/xJlA7waEe+VOefXwLrALyLi/TLnWgM4kVop9wHvkfoFO/IyqVlasELe1xnvAL2L3i9TfDAibo2I7Uk1sydICaZcPIWYXupkTPPjQlJcq0fE4sC3AJX5TMnHZiT1IfU7XwyclrsurMk4kVqHIuItUr/gBZJGSOotaSFJO0r6UT7tj8CpkgZI6p/P/30nLzkR2ELSCpKWAE4uHJA0SNJuua/0fVIXwZx5lHETsIak/SX1lLQPsA5wQydjmh99gbeBmbm2fES741OBVeb6VGk/ByZExJdJfb+/qjpKqzknUispIn5Keob0VOBV4AXgSOAv+ZQzgfHAw8AjwAN5X2eudRtwVS5rAp9Mfm2ku/8vk+5kb0m+kdOujNeAXfK5r5HuuO8SEdM7E9N8Op50I2sGqbZ8VbvjpwGXSXpT0t7lCpO0OzCc1J0B6b/DMEkjaxax1YQfyDczq5JrpGZmVXIiNTOrkhOpmVmVnEjNzKrkSRK6kHr2Ci3ct9FhdFtD1l6h0SF0axMfmDA9IgbUoqwei68Y8cFcA8HmErNevTUihtfimqU4kXYhLdyXRdYs+9SL1cnf7zmv0SF0a0v06tF+xFmnxQezKvpbem/iBeVGltWEE6mZtR4J2no0OoqPOJGaWWtS89zicSI1s9akctMYdJ3mSelmZhXLTftyW7lSpDUlTSza3pb0dUn9JN0maVL+uVSpcpxIzaz1iNS0L7eVERFPRsTQiBhKmqz7XdJE5icB4yJidWBcft8hJ1Iza0FKTfty2/zZFng6IiaT1iS7LO+/jNJTSbqP1MxaVO3v2u9LmhYSYFBETAGIiCmSBpYMpdaRmJnVnypt2veXNL5oGz3P0qSFgd2AP3UmGtdIzaz1iEqb7tMjYsMKztsReCAiCsvITJU0ONdGB5NWke2Qa6Rm1oIEbT3Lb5Xbj4+b9QDXAQfl1wcB15b6sBOpmbWmNpXfKpCXut4euKZo99nA9pIm5WNnlyrDTXszaz2Fx59qICLeJS09XrzvNdJd/Io4kZpZC/JYezOz6jXREFEnUjNrTZ60xMysCp5Gz8ysBty0NzOrhty0NzOrinDT3sysOq6RmplVz32kZmZVctPezKwKctPezKx6btqbmXWegLY210jNzDpPeWsSTqRm1oKE3LQ3M6uOm/ZmZlVyjdTMrBpN1kfaPHVjM7MKCdHW1lZ2q6gsaUlJV0t6QtLjkj4nqZ+k2yRNyj+XKlWGE6mZtSRJZbcK/Ry4JSLWAoYAjwMnAeMiYnVgXH7fISdSM2tJtUikkhYHtgAuBoiI/0XEm8DuwGX5tMuAEaXKcSI1s9YjUJvKbhVYBXgVuETSg5J+I2kxYFBETAHIPweWKsSJ1MxajihfG8010v6Sxhdto9sV1RMYBlwYEesD71CmGT8vvmtvZi2pwj7Q6RGxYYnjLwIvRsT9+f3VpEQ6VdLgiJgiaTAwrdRFXCM1s9ZTo6Z9RLwCvCBpzbxrW+A/wHXAQXnfQcC1pcpxjdTMWlINH8g/CviDpIWBZ4BDSJXMMZJGAc8De5UqwInUzFpSrRJpREwE5tX837bSMpxIbZ5WX3Egl//w0I/er7zc0nzvwhvpt+Ri7LLlenwYwauvz2D0d3/PlFffamCk3cOcOXPYctONWXbZZRlzzfWNDqfhRMV35buEE6nN06TJ0/jsvmcD0NYmnr71LK674yHeeHsWZ/zyRgC+ut+WnDx6R44+68pGhtotXHj+eay55lrMmPF2o0NpDmqusfa+2WRlbb3xmjz74qs8P+UNZrzz3kf7e/dahIhoYGTdw0svvsitt9zElw4Z1ehQmkoNRzZVzTVSK2uvz2/AmFsmfPT+tK/tyshdNuatmbMYPvq8BkbWPZx0wrGccdbZzJw5o9GhNJVmatq3TI1U0hxJEyU9KulPknoXHdtDUkhaq2jfSpIeza+3kvRWHrnwpKR/SNqlXfmj86QFT0j6l6TNio7dmT83MW97toupsK1U799DV1uoZw923vLTXHPbgx/tO+2C61l9x29z5c3jOXyfLRoY3YLvlptuYMDAgaw/bINGh9J0mqlG2jKJFJgVEUMjYl3gf8DhRcf2A+4G9i3x+bsiYv2IWBM4Gjhf0rYAOal+BdgsT1xwOHCFpGWKPj8yX39oRFzdLqbC9lxNvmkT+fxm6zDxiReY9vrctaExN/+bEdsObUBU3cc/77uXm2+4nk+vuQqHfml//nHnHRx2yIGNDqvhKkmiTqTl3QWsBiCpD7ApMIrSifQj+XGHM4Aj864TgRMiYno+/gBpooKv1Tbs1rP38A0/0axfdYUBH73eecv1eOq5qY0Iq9s47Xvf5/Gnn+eRJ5/ht7+7gi222ppfX3J5o8NqCrWaRq8WWq6PVFJPYEfglrxrBGkKrKckvS5pWE6E5TwAnJBffwqY0O74eD4e2QDpgd1Z+fW2EfEa0EvSxLzv2YjYYx7xjgbS+N6F+lQQVvPotehCbPOZtTjyzD9+tO/Mo3dn9RUH8uGHwfNTXvcde2uc5ukibalEWpy07iJPe0Vq1p+bX1+Z31eSSMv9ZxBQfEt6ZESMb3fOrIgo2baNiIuAiwDaeg9sqVvcs96bzfJbn/iJffsd/5sGRWObb7EVm2+xVaPDaBrN9PhTKyXSuZKWpKWBbYB1JQXQAwhJ36ygvPVJE7hCGlu7AXB70fFheb+ZNRkpPd/cLFq1j7RgT+B3EbFiRKwUEf8HPAtsVupDktYDvg1ckHf9CPhhTsxIGgocDPyyXoGbWTWa62ZTK9VI52U/4Ox2+/4M7A/8sN3+zSU9CPQmTYl1dESMA4iI6yQtB9yba7YzgAMKE7uaWfNpopZ96yTSiJjrTk1EbDWPfcVPiK+b990JLFGm/AuBCzs4Ntd1OorJzLpAkzXtWyaRmpkVCCdSM7OquWlvZlYNN+3NzKoj/BypmVmVuvbxpnKcSM2sJdWqaS/pOdIjj3OADyJiQ0n9gKuAlYDngL0j4o0OY6lJJGZmXUnpZlO5bT5snWdwK6zddBIwLiJWB8ZRZq17J1IzazmFPtI6jmzanTQDHPnniFInO5GaWUtqa1PZrUIB/FXShDxbG8CgwsjG/HNgqQLcR2pmLanCCmd/ScWztl2UZ2QrtmlEvCxpIHCbpCfmNxYnUjNrPZWvIjq9qN9zniLi5fxzmqSxwMbAVEmDI2KKpMGk+Tk65Ka9mbUcUb5ZX0nTXtJikvoWXgM7AI8C1/HxxO4HAdeWKsc1UjNrSTV6jHQQMDbXbnsCV0TELZL+DYyRNAp4HtirVCFOpGbWkmrxQH5EPAMMmcf+14BtKy3HidTMWk6zzZDvRGpmLclDRM3MqtREedSJ1MxakJv2ZmbVUavM/iRp8VIfjIi3ax+OmVllmiiPlqyRPkYag1ocbuF9ACvUMS4zs5J6tELTPq8Rb2bWdFT5ENEuUdEQUUn7SvpWfr28pA3qG5aZWWltKr91WSzlTpB0PrA1cGDe9S7wq3oGZWZWTg2n0ataJXftN4mIYZIeBIiI1yUtXOe4zMw6JNKd+2ZRSSKdLamNdIMJSUsDH9Y1KjOzMproXlNFifQC4M/AAEmnA3sDp9c1KjOzUtS1TfdyyibSiPidpAnAdnnXXhHxaH3DMjPrmIC2JrprX+nIph7AbFLz3pNBm1nDNVEereiu/SnAH4FlgeWBKySdXO/AzMw6UphGr5Xu2h8AbBAR7wJIOguYAPygnoGZmZXSak37ye3O6wk8U59wzMwq0zxptPSkJT8j9Ym+Czwm6db8fgfg7q4Jz8xsbqK2Y+0l9QDGAy9FxC6SVgauBPoBDwAHRsT/Ovp8qRpp4c78Y8CNRfv/WV3IZmZVUs2n0TsGeBwozHr3Q+BnEXGlpF8Bo4ALO/pwqUlLLq5llGZmtVSrPCppeWBn4CzgOKUMvQ2wfz7lMuA0OpNIiy6yar7AOsCihf0RsUZnAzczq0aNm/bnAt8E+ub3SwNvRsQH+f2LwHKlCqjkmdBLgUtIse8IjCH1HZiZNYxy877UBvSXNL5oG92ujF2AaRExoXj3PC4XpWKp5K5974i4VdI5EfE0cKqkuyr4nJlZ3VRYH50eERuWOL4psJuknUgt7sVJNdQlJfXMtdLlgZdLXaSSGun7uc/gaUmHS9oVGFjRVzAzqwMpNe3LbeVExMkRsXxErATsC9weESOBO4A982kHAdeWKqeSRHos0Ac4mpS9DwMOreBzZmZ1U2HTvrNOJN14+i+pz7TkzfdKJi25P7+cwceTO5uZNVStBzZFxJ3Anfn1M8DGlX621AP5YynRwRoRX6g4QjOzGpIqa7p3lVI10vO7LIpuYpWVB3POZac0Ooxua+GenrhsQdJMi9+VeiB/XFcGYmY2P5rpn8VK5yM1M2satR5rXy0nUjNrSU2URytPpJIWiYj36xmMmVklpObqI61khvyNJT0CTMrvh0j6Rd0jMzMroUdb+a2rVHKp84BdgNcAIuIhYOt6BmVmVkph8btyW1eppGnfFhGT21Wj59QpHjOzirTaXfsXJG0MRJ5F+ijgqfqGZWbWsVZ6IL/gCFLzfgVgKvC3vM/MrGGa6F5TRWPtp5FmRTEzaxpNVCGtaIb8XzOPMfcRMXoep5uZ1V0rPpD/t6LXiwJ7AC/UJxwzswqoxWqkEXFV8XtJlwO31S0iM7MKqIlWtu/MENGVgRVrHYiZWaUENNNkXpX0kb7Bx32kbcDrwEn1DMrMrJxmGiJaMpHmtZqGAC/lXR9GRMnV9MzM6i2NbGp0FB8rWTnOSXNsRMzJm5OomTVejRa/k7SopH9JekjSY5JOz/tXlnS/pEmSrpK0cKlyKull+JekYZV9OzOz+ivUSMttFXgf2CYihgBDgeGSPgv8EPhZRKwOvAGMKlVIh4lUUqHZvxkpmT4p6QFJD0p6oKIQzczqJE2lV3orJ5KZ+e1CeQtgG+DqvP8yYESpckr1kf4LGFauADOzriZEjxrdbMpziEwAVgMuAJ4G3oyID/IpLwLLlSqjVCIVQEQ8XX2oZmY1VHnTvb+k8UXvL4qIi4pPiIg5wFBJSwJjgbXnUU7J+0OlEukAScd1dDAiflqqYDOzeqpwvtHpEbFhJSdGxJuS7gQ+CywpqWeulS4PvFwylhLHegB9gL4dbGZmDVEYa1+Du/YDck0USb2A7YDHgTuAPfNpBwHXliqnVI10SkScUcF3MjPrcjXqIh0MXJb7SduAMRFxg6T/AFdKOhN4ELi4VCFl+0jNzJqNqM0M+RHxMLD+PPY/A2xcaTmlEum2nYjLzKz+VHEfaZfoMJFGxOtdGYiZWaUKi981i87M/mRm1nDNk0adSM2sJYm2Jpq1xInUzFpOrW421YoTqZm1pJaZj9TMrCm1yl17M7Nm5aa9mVkNuGlvZlalJrpp70RqZq0nNe2bJ5M6kZpZS2qilr0TqZm1IvmuvZlZNdy0NzOrVoWL23UVJ1Kby/RXXuLnpxzDG69No01tbL/nAew68stceeE53PbnK1i8Xz8ADjjqZDbY3LMt1tsLL7zAlw/5ElOnvkJbWxuHjhrNkUcf0+iwGs5Ne2tqbT16cvDx32HVtddj1jsz+ca+wxn62S0A2PXAwxhx0BENjrB76dmzJ2f/6CesP2wYM2bMYJPPbMC2223P2uus0+jQGqawrn2zcCK1ufQbMIh+AwYB0GuxPiy/ymq8Nm1Kg6PqvgYPHszgwYMB6Nu3L2uttTYvv/xSt06kkJZkbhbNNMrKmtC0l17g2SceZY1PDwPgpisv4et7bssvvnMsM99+s8HRdT+Tn3uOiRMfZKONP9PoUBquTSq7lSPp/yTdIelxSY9JOibv7yfpNkmT8s+lSsZSo+/UUZDLSLpS0tOS/iPpJklrSPqUpNslPZUD/bbyeC9JB0s6fx5lPSepf7t9B0t6VdLEom2dfGyNfL3/5l/SGEn7FJ03U9KT+fXvJG0l6YaiskdIeljSE5IekTSi6Nilkl6StEh+31/Sc3X6NTbMrHff4Yff+DKHnnAGvfv0ZfjeB3HhDffx0zG3sdSAQVxyzumNDrFbmTlzJvvt/UV+/JNzWXzxxRsdTkMVmvbltgp8AHwjItYmLcP8tZxDTgLGRcTqwLj8vkN1S6Q5MY4F7oyIVSNiHeBbwCDgOuDsiFgDGAJsAny1k5e6KiKGFm3/kbQocCNwYUSsln9JFwKPFc4DxgMj8/svtYt9CHAOsHtErAXsBpwjab2i0+YAh3Yy5qb3wezZ/Oi4L7PFTl/gc9vtBMCSSw+gR48etLW1scMXRjLp0YkNjrL7mD17Nvvt/UX22W8kI/b4QqPDaQKq6H/lRMSUiHggv55BWop5OWB34LJ82mXAiHmXkNSzRro1MDsiflXYERETgTWAeyLir3nfu8CRlMn482l/4L6IuL7o2ndExKMVfv544PsR8Wz+7LPAD4ATis45FzhW0gLXzxwRXHDaN1h+ldXZ/Utf+Wj/669O/ej1P2+/mRVXW7MR4XU7EcHhh41izbXW5phjj2t0OM2hgtro/N6MkrQSaUXR+4FBETEFUrIFBpb6bD2TwLrAhHns/1T7/RHxtKQ+kjrTXtlH0mZF7z9X4tqV+hSpRlpsPPC1ovfPA3cDBwLX0wFJo4HRAAMGL1dFSF3n8Qf/xZ03XM2Kq6/NsXtvB6RHne66+S88++RjSGLgsstz+Ld/1OBIu4d777mHK/5wOeuu+2k+s8FQAE4/8/sM33GnBkfWOPOx+F1/SeOL3l8UERfNVZ7UB/gz8PWIeHt+Z5ZqRG1KQHRwrKP9pVwVEUd+4gLVP182rxjnte/7pG6KGzsqKP9HuwhgtU8N6cz363LrDPsMYx96ea79fma0MTbdbDNmzW6J/+t0qQr/yqdHxIYly5EWIiXRP0TENXn3VEmDI2KKpMHAtFJl1LNp/xiwQQf7P/HFJK0CzMx9FPW89vx8vv0vfxjwn+IdEfFfYCKwdxXXMrNOkFR2q6AMARcDj0fET4sOXQcclF8fBFxbqpx6JtLbgUUkHVbYIWkjYBKwmaTt8r5ewHlALduJVwCbSNq56NrDJX26ws+fA5yc+0wKfSffAn4yj3PPIvWpmlkXkspvFdiU1D23TdETPTsBZwPbS5oEbJ/fd6huTfuICEl7AOdKOgl4D3gO+DrpjtgvJF0A9AAuB4ofeTq4+HEj0mMJAA9L+jC/HgM8zNx9pF+NiHsl7ZKvfS4wO59b0bi6iJgo6UTg+lztnw18M98sa3/uY5IeINVYzayL1OJx/Ii4u0RRFfdl1bWPNCJepuNm71YdfOZS4NJ5HFqpg3LmdS4R8QQwvERsW7V7fydwZ9H7a4BrmIeIOLjdez+PYtaFhJcaMTOrjmd/MjOrXhPlUSdSM2tFld2V7ypOpGbWkpoojzqRmlnrEW7am5lVzU17M7MqNVEedSI1s9bURHnUidTMWpDctDczq0oa2dToKD7mRGpmLamJ8qgTqZm1Jjftzcyq1ER51InUzFpTE+VRJ1Izaz2eRs/MrFqeRs/MrHpNlEfrumaTmVmdlF/4rsLF734raZqkR4v29ZN0m6RJ+edS5cpxIjWzllSjxe8uZe4liU4CxkXE6sC4/L4kJ1IzazmqcCsnIv4BvN5u9+7AZfn1ZcAIynAfqZm1pDretR8UEVMAImKKpIHlPuBEamYtqcI82l/S+KL3F0XERbWOxYnUzFpShfXR6RGx4XwWPVXS4FwbHQxMK/cB95GaWevJ0+hVe9e+A9cBB+XXBwHXlvuAE6mZtZzCNHrV3rWX9EfgPmBNSS9KGgWcDWwvaRKwfX5fkpv2ZtaSanGrKSL26+DQtvNTjhOpmbWktiYaI+pEamatqXnyqBOpmbWmJsqjTqRm1nokN+3NzKrXPHnUidTMWlMT5VEnUjNrRXLT3sysGs22rr1HNpmZVck1UjNrSW7am5lVw4vfmZlVp9IZ8LuKE6mZtSSva29mVqUmyqNOpGbWmpoojzqRmllraqamvSKi0TF0G5JeBSY3Oo4q9AemNzqIbqzVf/8rRsSAWhQk6RbS76Oc6RHRft36mnMitYpJGt+JhcSsRvz7b14e2WRmViUnUjOzKjmR2vy4qNEBdHP+/Tcp95GamVXJNVIzsyo5kZqZVcmJ1KybkeS/+xrzL9RqTtJakno1Og77JEnDACLiQyfT2vIv02pK0nDgRmC5RsdiczlV0jhwMq01/yKtZiTtApwMfCUi/itpsUbHZJ+wN/CGpOvBybSW/Eu0mpC0ODAWuD8i/iZpJWCspLUbGlg3J2lTScMkLRURHwB7Ae9IuhmcTGvFz5Fa1SStlmugnwf+AHwXGAFcHxHnNTa67kvScsCdwCDgLmAiMAZ4BjgXWCIi9szntkXEhw0KteX5XyKriqSdgEslrRIRtwL7Az8FphaSqKQejYyxu4qIl4AfA38HbiL1Wx8G3ALcD2wv6ap8rpNoFZxIrdNyDfR7wHci4hlJvSLir8BOwK6SvphP9R9pF5K0tKTlASLiIuB6YA3gauBo4CdAkKZ0/IykZRsV64LCEztbp0haEfgl8KOIuF3SCsD5kkZHxDhJewNXSFo0Iv7Q2Gi7j3zD79T8+lngOeAU4GDSzaYPgGsjYo6ksUBbRExrTLQLDidS65SImJz/ED8naTLpj3dsRLyS+9tulXQwcIGk64CZ4Q75upK0PalbZRTwMLBefn0lsC/QA9gT6CXptoho5Umim4pvNtl8yXfnF4mIV/P7k4AvA3+KiJPzvh5A5DvCi0XEO42LuPuQdArwbERcUbRvaVJT/pmIOEPSqcBA4GT/d6kd10itYpJ2Br4JLCTpOeBe4EfAu8Dmkj4N/Cc3GwsL6rzbkGC7p2WAAcBHiTQiXpN0JfCl/P5MSf2cRGvLN5usIvnG0rnAGcAewA2kpuPP8t35h4DvAEMgVUeLf1p9SFpP0rH57c1pl5Zod9oEYLCk/gAR8XpXxtgdOJFaWfmB7W2Bb0XEuIiYCvwRuIDU3zYqIs4EpgLHSFqkgeF2G5LWIC0At0Puj74DWAf4pqQli04dTvpbn9XlQXYTbtpbWbmvsxewGqQ+0Nx8fxj4F7A1cHFEHClpQES838h4u4M8p8HxwFGkG0xHATNJd+avAs6StCjwOHAosI+b8/XjRGqVeoxU2yEn0Z4R8UFeFndvSX0iYmbhJpTVT+5mOQf4akQ8np+aECmZCtgN2AjYAfgf8MWIeLxR8XYHTqQ2T5KGAP8HLEHq//wd8C9JP46IE/K4bUh/rJCeT7Q6y0n0KtJopbvyo2bvSrorn3Ik0DcifksaFmpdwInU5pKHfZ5DGla4PjAHuBX4HHCPpN7ADOAV4HBgz4h4r0HhdhuSPgecRxqGuxepSf9j4OWImJWTaQAn5e6XXzcu2u7Fz5HaJ+RHnL4DHBsR9+akuRbwK9KD3RcCX8z73gf+HBH/aVS83UWeRevTwJSIuEtSP9Kqos8DP46IKfm83sBngEkR8WLDAu5mnEgNSM/MAEuTmoN/j4jD2x3fCDgdODoi/lv4jB9vqr/8j9spwMWkqQrfzv3TSwC/AV6gKJla1/PjT1bQPw8Z/CawlKSvtTs+ifSozTqFHU6i9SdpB9Lzu8cDv42I13MS7RERb5FGlS0LnCZpmUbG2p05kRqSBgB3SvpiRFwPXEp6NvGIwjkR8Sbwb+CNxkTZbe0GnBIR95L/XnNLYE6+0fQWcASwCKl/1BrAidTIjyx9FzhF0m4RcTOpT3THQjKVtD+wOfBs4yLtPoqG2K5atLv9aLEV89SFbwCH5IES1gC+a28ARMTVkj4gPchNRFyX/5YPlrQdqfm4v29g1J+kLYD3SIMdbgY2k3R7REy6S9OpAAAHv0lEQVQvGgzRG9gd+D0wy90sjeUaaTclaUdJx0lat7AvIv5CumN/VlHN9ArSzOqjI+LRBoXbbeQ+0Uv5uJIzHugN7Capf0TMyft3J8154L/hJuAaaTeUx87vCBwAbCRpNulmxrsRMTZPg3danpR5jKS/eXhh/eW786cBB+RHzwYAjwB/A7Yizf16N9CPtGTI3p6UuTn4X7NupNDvltfnuRl4CjiWNIzwROAnklaKiKtJfaZHSerrJFp/+Y77aaRVWO+VNAi4D/h8RFwJ/JY0i9P2pG6WPd1CaB5+jrQbKYyPL3o/BngkIr4n6Rjg+8CTpFFMt5L+qD1jUBeQ1Ic00GFdYDqwC3B5XnPJmpwTaTeRl6E4lDRu/omI+EsecrgTqR/uZ6TJf98FNgOu8Y2lrlEY2JCnvtuJNPnIixGxV9E5XwDeBm73ip/Nx4m0G8hTrp0BXE5aZmJ54Bek9c3HAlsA20fE7fn8T9RcrT4kDSo8slSUTPsAuwKbAhMi4hJJewJnArtGxKQGhmwdcB/pAi6Pyb4J+F5E/II0Prs3sHJ+yP7rwDjSvJUAOInWl5L+pNm09of0bGhOpjNJ687fCwxVWnf+eGAPJ9Hm5US6gMvLSuwKnC1p8Yh4AZgN9M83nyaTRittWvQQuNVRJNNJU96dJmmvwv78nOgbpAliHgL6AId5PtHm5sefuoGIuFHSh8AESbeSaqS/yw9xvynpWlIz0v08XUAfr7J6fX707MJcGx0DFPo/dwYWB/aLiLcbFatVxn2k3UgeofRXYJmImCapd0R4lc8uImlN0iik5/OzvIVlXIaTpic8MT+3O5o0MGKbiHiqgSFbhZxIuxlJO5Imbd7aD3N3nTzl3YnAksAPIuKFopppIZmeB/yTNIH2nhHxUOMitvnhRNoNSdqd9MD9huQuuwaH1C1I2hgYASwKnJtrpgsBH+bx88OBHwIHRsTDjYzV5o8TaTdVWKyu0XEs6JSWTF4CeBN4GlgJGA0sDJwXEc/l844kLd1yg5dtaT1OpGZ1Imk30mixyaQp8AaRHrjvRxocsRBwAmlE08+BnSLiwcZEa9VwIjWrA0mbkpYGGRkRE/K+X5LmdN2S1Fd6CLAdsCJpTP0jDQrXquTnSM3qYxngNxExQdIiABHxVeBu4DrSBNnXkeYcdRJtca6RmtWQpA3zyxHARhHx+bx/oYiYnW8ujQUOj4gXJS0SEe83Kl6rDddIzWokzyf6W9LyIDcAL0jaPa+tNDvPYTCb9He3BICT6ILBI5vMakDSlqQbRiMj4n5Ji5ImhdmalDjH5tU/9yLNJ/pq46K1WnPT3qwGJB0HzImIn0taOCL+lycmOZS0hPXKpIlIdiElWz8nugBxjdSsCoXp70iJ8q28e3Zuzk+XdB4whDTH6yOkG1BPNyhcqxP3kZpVoWhU2Fjgs5I2KOzLfaLvARsB4yLir06iCyYnUrPauJ/0aNM+OZl+mPtE9wEOAl5vbHhWT+4jNasRScsBo4BtgX+T1qbfEy9Ut8BzIjWrIUm9gA1II5amAHd4KrwFnxOpmVmV3EdqZlYlJ1Izsyo5kZqZVcmJ1MysSk6kZmZVciI1M6uSE6nVjaQ5kiZKelTSnyT1rqKsrSTdkF/vJumkEucuKemrnbjGaZKOr3R/u3MulbTnfFxrJUl+SH8B4URq9TQrIoZGxLrA/4DDiw8qme//D0bEdRFxdolTlgTmO5GadZYTqXWVu4DVck3s8bx+0QPA/0naQdJ9kh7INdc+AJKGS3pC0t3AFwoFSTpY0vn59SBJYyU9lLdNgLOBVXNt+Mf5vBMk/VvSw5JOLyrrFElPSvobsGa5LyHpsFzOQ5L+3K6WvZ2kuyQ9JWmXfH4PST8uuvZXqv1FWvNxIrW6k9QT2JE0jRykhPW7iFgfeAc4FdguIoYB44Hj8sTIvwZ2JS0Yt0wHxZ8H/D0ihgDDgMeAk4Cnc234BEk7AKsDGwNDgQ0kbSFpA2BfYH1Sot6ogq9zTURslK/3OGlsfcFKpIXtdgZ+lb/DKOCtiNgol3+YpJUruI61EM9HavXUS9LE/Pou0qqaywKTI+Kfef9nSRMf3yMJ0nrv9wFrAc9GxCQASb8nrQff3jbAlwAiYg7wlqSl2p2zQ94KSx33ISXWvqSZ69/N17iugu+0rqQzSd0HfYBbi46NiYgPgUmSnsnfYQdgvaL+0yXytT3+fgHiRGr1NCsihhbvyMnyneJdwG0RsV+784aS1oKvBQE/iIj/1+4aX+/ENS4FRkTEQ5IOBrYqOta+rMjXPioiihMuklaaz+taE3PT3hrtn8CmklYDkNRb0hrAE8DKklbN5+3XwefHAUfkz/aQtDgwg1TbLLgVOLSo73U5SQOBfwB7SOolqS+pG6GcvsCUvBroyHbH9pLUlmNeBXgyX/uIfD6S1pC0WAXXsRbiGqk1VES8mmt2f1Re/x04NSKekjQauFHSdNKkyevOo4hjgIskjQLmAEdExH2S7smPF92c+0nXBu7LNeKZwAER8YCkq4CJwGRS90M53yZN4jyZ1OdbnLCfBP4ODCItt/yepN+Q+k4fULr4q6Slmm0B4mn0zMyq5Ka9mVmVnEjNzKrkRGpmViUnUjOzKjmRmplVyYnUzKxKTqRmZlX6//9FlrpSDnGuAAAAAElFTkSuQmCC\n", | |
"text/plain": "<Figure size 432x288 with 2 Axes>" | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": "# Compute confusion matrix\ncnf_matrix = confusion_matrix(y_test, yhat, labels=['PAIDOFF','COLLECTION'])\nnp.set_printoptions(precision=2)\n\n\n# Plot non-normalized confusion matrix\nplt.figure()\nplot_confusion_matrix(cnf_matrix, classes=['PAIDOFF','COLLECTION'],normalize= False, title='Confusion matrix')" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 91, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": " precision recall f1-score support\n\n COLLECTION 0.00 0.00 0.00 27\n PAIDOFF 0.74 1.00 0.85 77\n\n micro avg 0.74 0.74 0.74 104\n macro avg 0.37 0.50 0.43 104\nweighted avg 0.55 0.74 0.63 104\n\n" | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n 'precision', 'predicted', average, warn_for)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n 'precision', 'predicted', average, warn_for)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n 'precision', 'predicted', average, warn_for)\n" | |
} | |
], | |
"source": "print (classification_report(y_test, yhat))" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 92, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "0.5225909174948803" | |
}, | |
"execution_count": 92, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.metrics import log_loss\nlog_loss(y_test, yhat_prob)" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "sag gave better accuracy , so lets build model using sag for entire dataset" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 96, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='warn',\n n_jobs=None, penalty='l2', random_state=None, solver='sag',\n tol=0.0001, verbose=0, warm_start=False)" | |
}, | |
"execution_count": 96, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "from sklearn.linear_model import LogisticRegression\n\nLR_high = LogisticRegression(C=0.01, solver='sag').fit(X,y)\nLR_high" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Model Evaluation using Test set" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 93, | |
"metadata": {}, | |
"outputs": [], | |
"source": "from sklearn.metrics import jaccard_similarity_score\nfrom sklearn.metrics import f1_score\nfrom sklearn.metrics import log_loss" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "First, download and load the test set:" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": "!wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "### Load Test set for evaluation " | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 95, | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>1</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/8/2016</td>\n <td>10/7/2016</td>\n <td>50</td>\n <td>Bechalor</td>\n <td>female</td>\n </tr>\n <tr>\n <th>1</th>\n <td>5</td>\n <td>5</td>\n <td>PAIDOFF</td>\n <td>300</td>\n <td>7</td>\n <td>9/9/2016</td>\n <td>9/15/2016</td>\n <td>35</td>\n <td>Master or Above</td>\n <td>male</td>\n </tr>\n <tr>\n <th>2</th>\n <td>21</td>\n <td>21</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/10/2016</td>\n <td>10/9/2016</td>\n <td>43</td>\n <td>High School or Below</td>\n <td>female</td>\n </tr>\n <tr>\n <th>3</th>\n <td>24</td>\n <td>24</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/10/2016</td>\n <td>10/9/2016</td>\n <td>26</td>\n <td>college</td>\n <td>male</td>\n </tr>\n <tr>\n <th>4</th>\n <td>35</td>\n <td>35</td>\n <td>PAIDOFF</td>\n <td>800</td>\n <td>15</td>\n <td>9/11/2016</td>\n <td>9/25/2016</td>\n <td>29</td>\n <td>Bechalor</td>\n <td>male</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 1 1 PAIDOFF 1000 30 9/8/2016 \n1 5 5 PAIDOFF 300 7 9/9/2016 \n2 21 21 PAIDOFF 1000 30 9/10/2016 \n3 24 24 PAIDOFF 1000 30 9/10/2016 \n4 35 35 PAIDOFF 800 15 9/11/2016 \n\n due_date age education Gender \n0 10/7/2016 50 Bechalor female \n1 9/15/2016 35 Master or Above male \n2 10/9/2016 43 High School or Below female \n3 10/9/2016 26 college male \n4 9/25/2016 29 Bechalor male " | |
}, | |
"execution_count": 95, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "test_df = pd.read_csv('loan_test.csv')\ntest_df.head()" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 98, | |
"metadata": {}, | |
"outputs": [], | |
"source": "test_df['due_date'] = pd.to_datetime(test_df['due_date'])\ntest_df['effective_date'] = pd.to_datetime(test_df['effective_date'])" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 99, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n <th>dayofweek</th>\n <th>weekend</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>1</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>50</td>\n <td>Bechalor</td>\n <td>1</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>5</td>\n <td>5</td>\n <td>PAIDOFF</td>\n <td>300</td>\n <td>7</td>\n <td>2016-09-09</td>\n <td>2016-09-15</td>\n <td>35</td>\n <td>Master or Above</td>\n <td>0</td>\n <td>4</td>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>21</td>\n <td>21</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-10</td>\n <td>2016-10-09</td>\n <td>43</td>\n <td>High School or Below</td>\n <td>1</td>\n <td>5</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>24</td>\n <td>24</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-10</td>\n <td>2016-10-09</td>\n <td>26</td>\n <td>college</td>\n <td>0</td>\n <td>5</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>35</td>\n <td>35</td>\n <td>PAIDOFF</td>\n <td>800</td>\n <td>15</td>\n <td>2016-09-11</td>\n <td>2016-09-25</td>\n <td>29</td>\n <td>Bechalor</td>\n <td>0</td>\n <td>6</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 1 1 PAIDOFF 1000 30 2016-09-08 \n1 5 5 PAIDOFF 300 7 2016-09-09 \n2 21 21 PAIDOFF 1000 30 2016-09-10 \n3 24 24 PAIDOFF 1000 30 2016-09-10 \n4 35 35 PAIDOFF 800 15 2016-09-11 \n\n due_date age education Gender dayofweek weekend \n0 2016-10-07 50 Bechalor 1 3 0 \n1 2016-09-15 35 Master or Above 0 4 1 \n2 2016-10-09 43 High School or Below 1 5 1 \n3 2016-10-09 26 college 0 5 1 \n4 2016-09-25 29 Bechalor 0 6 1 " | |
}, | |
"execution_count": 99, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "test_df['dayofweek'] = test_df['effective_date'].dt.dayofweek\ntest_df['weekend'] = test_df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\ntest_df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\ntest_df.head()" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 109, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>weekend</th>\n <th>Bechalor</th>\n <th>High School or Below</th>\n <th>college</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>1</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>1</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Principal terms age Gender weekend Bechalor High School or Below \\\n0 1000 30 45 0 0 1.0 0.0 \n1 1000 30 33 1 0 0.0 0.0 \n2 1000 15 27 0 0 0.0 1.0 \n3 1000 30 28 1 1 0.0 0.0 \n4 1000 30 29 0 1 1.0 0.0 \n\n college \n0 0.0 \n1 0.0 \n2 0.0 \n3 1.0 \n4 0.0 " | |
}, | |
"execution_count": 109, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "Feature_test = test_df[['Principal','terms','age','Gender','weekend']]\nFeature_test = pd.concat([Feature,pd.get_dummies(test_df['education'])], axis=1)\nFeature_test.drop(['Master or Above'], axis = 1,inplace=True)\nFeature_test.head()\n" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 111, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>weekend</th>\n <th>Bechalor</th>\n <th>High School or Below</th>\n <th>college</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>1</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>1</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Principal terms age Gender weekend Bechalor High School or Below \\\n0 1000 30 45 0 0 1.0 0.0 \n1 1000 30 33 1 0 0.0 0.0 \n2 1000 15 27 0 0 0.0 1.0 \n3 1000 30 28 1 1 0.0 0.0 \n4 1000 30 29 0 1 1.0 0.0 \n\n college \n0 0.0 \n1 0.0 \n2 0.0 \n3 1.0 \n4 0.0 " | |
}, | |
"execution_count": 111, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "X_test_final = Feature_test\nX_test_final[0:5]" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 112, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)" | |
}, | |
"execution_count": 112, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "y_test_final = df['loan_status'].values\ny_test_final[0:5]" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 113, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n return self.partial_fit(X, y)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/ipykernel/__main__.py:1: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n if __name__ == '__main__':\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "array([[ 0.52, 0.92, 2.33, -0.42, -1.21, 2.4 , -0.8 , -0.86],\n [ 0.52, 0.92, 0.34, 2.38, -1.21, -0.42, -0.8 , -0.86],\n [ 0.52, -0.96, -0.65, -0.42, -1.21, -0.42, 1.25, -0.86],\n [ 0.52, 0.92, -0.49, 2.38, 0.83, -0.42, -0.8 , 1.16],\n [ 0.52, 0.92, -0.32, -0.42, 0.83, 2.4 , -0.8 , -0.86]])" | |
}, | |
"execution_count": 113, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": "X_test_final= preprocessing.StandardScaler().fit(X_test_final).transform(X_test_final)\nX_test_final[0:5]" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 132, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "total null values: 876\ntotal values: 2768\ntotal non null values: 1892\n" | |
} | |
], | |
"source": "#Before:\nprint('total null values: ',np. isnan(X_test_final).sum())\nprint('total values: ',X_test_final.size)\nprint('total non null values: ',np.count_nonzero(~np.isnan(X_test_final)))" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 134, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "[ 3.80e-16 8.21e-17 -2.05e-16 -4.88e-17 8.21e-17 2.47e-17 1.64e-17\n -1.64e-17]\n" | |
} | |
], | |
"source": "col_mean = np.nanmean(X_test_final, axis=0)\nprint(col_mean)\n#Find indices that you need to replace\ninds = np.where(np.isnan(X_test_final))\n#Place column means in the indices. Align the arrays using take\nX_test_final[inds] = np.take(col_mean, inds[1])" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 135, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "total null values: 0\ntotal values: 2768\ntotal non null values: 2768\n" | |
} | |
], | |
"source": "#After:\nprint('total null values: ',np. isnan(X_test_final).sum())\nprint('total values: ',X_test_final.size)\nprint('total non null values: ',np.count_nonzero(~np.isnan(X_test_final)))" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 159, | |
"metadata": {}, | |
"outputs": [], | |
"source": "yhat1_knn = neigh_high.predict(X_test_final)\nyhat2_Dtree = loanTree_high.predict(X_test_final)\nyhat3_svm = clf_high.predict(X_test_final)\nyhat4_LR = LR_high.predict(X_test_final)" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 160, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "jaccard KNN: 0.7283236994219653\njaccard decision tree: 0.7514450867052023\njaccard SVM : 0.7427745664739884\njaccard LR: 0.7514450867052023\n" | |
} | |
], | |
"source": "print('jaccard KNN: ',jaccard_similarity_score(y_test_final, yhat1_knn))\nprint('jaccard decision tree:',jaccard_similarity_score(y_test_final,yhat2_Dtree))\nprint('jaccard SVM :',jaccard_similarity_score(y_test_final, yhat3_svm))\nprint('jaccard LR: ',jaccard_similarity_score(y_test_final, yhat4_LR))" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 168, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "f1 score knn: 0.7169912838520613\nf1 score Dtree: 0.673085302317342\nf1 score svm: 0.6405352812047661\nf1 score LR: 0.6448043648295465\n" | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.\n 'precision', 'predicted', average, warn_for)\n" | |
} | |
], | |
"source": "print('f1 score knn: ',f1_score(y_test_final, yhat1_knn, average='weighted')) \nprint('f1 score Dtree:',f1_score(y_test_final, yhat2_Dtree,average='weighted'))\nprint('f1 score svm: ',f1_score(y_test_final, yhat3_svm, average='weighted'))\nprint('f1 score LR: ',f1_score(y_test_final, yhat4_LR, average='weighted'))" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 169, | |
"metadata": {}, | |
"outputs": [], | |
"source": "#yhat1_prob_knn = neighTree_high.predict_proba(X_test_final)\nyhat2_prob_dtree = loanTree_high.predict_proba(X_test_final)\n#yhat3_prob_svm = clf_high.predict_proba(X_test_final)\nyhat4_prob_LR= LR_high.predict_proba(X_test_final)\n" | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 170, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "dtree log loss: 0.6704484261331349\nLR log loss: 0.4933433171847397\n" | |
} | |
], | |
"source": "print('dtree log loss: ',log_loss(y_test_final, yhat2_prob_dtree))\nprint('LR log loss: ',log_loss(y_test_final, yhat4_prob_LR))" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Report\nYou should be able to report the accuracy of the built model using different evaluation metrics:" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "| Algorithm | Jaccard | F1-score | LogLoss |\n|--------------------|---------|----------|---------|\n| KNN | 0.73 | 0.72 | NA |\n| Decision Tree | 0.75 | 0.67 | NA |\n| SVM | 0.74 | 0.64 | NA |\n| LogisticRegression | 0.75 | 0.64 | 0.49 |" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"button": false, | |
"new_sheet": false, | |
"run_control": { | |
"read_only": false | |
} | |
}, | |
"source": "<h2>Want to learn more?</h2>\n\nIBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems \u2013 by your enterprise as a whole. A free trial is available through this course, available here: <a href=\"http://cocl.us/ML0101EN-SPSSModeler\">SPSS Modeler</a>\n\nAlso, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at <a href=\"https://cocl.us/ML0101EN_DSX\">Watson Studio</a>\n\n<h3>Thanks for completing this lesson!</h3>\n\n<h4>Author: <a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a></h4>\n<p><a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients\u2019 ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>\n\n<hr>\n\n<p>Copyright © 2018 <a href=\"https://cocl.us/DX0108EN_CC\">Cognitive Class</a>. This notebook and its source code are released under the terms of the <a href=\"https://bigdatauniversity.com/mit-license/\">MIT License</a>.</p>" | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3.6", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.9" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment