Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 1155027716/5995d165df407c78c4e3edf111f562b1 to your computer and use it in GitHub Desktop.
Save 1155027716/5995d165df407c78c4e3edf111f562b1 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width = 400, align = \"center\"></a>\n",
"\n",
"<h1 align=center><font size = 5> Classification with Python</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"In this notebook we try to practice all the classification algorithms that we learned in this course.\n",
"\n",
"We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.\n",
"\n",
"Lets first load required libraries:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"import itertools\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.ticker import NullFormatter\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.ticker as ticker\n",
"from sklearn import preprocessing\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### About dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:\n",
"\n",
"| Field | Description |\n",
"|----------------|---------------------------------------------------------------------------------------|\n",
"| Loan_status | Whether a loan is paid off on in collection |\n",
"| Principal | Basic principal loan amount at the |\n",
"| Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |\n",
"| Effective_date | When the loan got originated and took effects |\n",
"| Due_date | Since it’s one-time payoff schedule, each loan has one single due date |\n",
"| Age | Age of applicant |\n",
"| Education | Education of applicant |\n",
"| Gender | The gender of applicant |"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets download the dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2018-11-21 02:25:29-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 23101 (23K) [text/csv]\n",
"Saving to: ‘loan_train.csv’\n",
"\n",
"loan_train.csv 100%[=====================>] 22.56K --.-KB/s in 0.02s \n",
"\n",
"2018-11-21 02:25:30 (1.06 MB/s) - ‘loan_train.csv’ saved [23101/23101]\n",
"\n"
]
}
],
"source": [
"!wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Load Data From CSV File "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/8/2016</td>\n",
" <td>10/7/2016</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/8/2016</td>\n",
" <td>10/7/2016</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>9/8/2016</td>\n",
" <td>9/22/2016</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/9/2016</td>\n",
" <td>10/8/2016</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/9/2016</td>\n",
" <td>10/8/2016</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 9/8/2016 \n",
"1 2 2 PAIDOFF 1000 30 9/8/2016 \n",
"2 3 3 PAIDOFF 1000 15 9/8/2016 \n",
"3 4 4 PAIDOFF 1000 30 9/9/2016 \n",
"4 6 6 PAIDOFF 1000 30 9/9/2016 \n",
"\n",
" due_date age education Gender \n",
"0 10/7/2016 45 High School or Below male \n",
"1 10/7/2016 33 Bechalor female \n",
"2 9/22/2016 27 college male \n",
"3 10/8/2016 28 college female \n",
"4 10/8/2016 29 college male "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('loan_train.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(346, 10)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Convert to date time object "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-09-22</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 2016-09-08 \n",
"1 2 2 PAIDOFF 1000 30 2016-09-08 \n",
"2 3 3 PAIDOFF 1000 15 2016-09-08 \n",
"3 4 4 PAIDOFF 1000 30 2016-09-09 \n",
"4 6 6 PAIDOFF 1000 30 2016-09-09 \n",
"\n",
" due_date age education Gender \n",
"0 2016-10-07 45 High School or Below male \n",
"1 2016-10-07 33 Bechalor female \n",
"2 2016-09-22 27 college male \n",
"3 2016-10-08 28 college female \n",
"4 2016-10-08 29 college male "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['due_date'] = pd.to_datetime(df['due_date'])\n",
"df['effective_date'] = pd.to_datetime(df['effective_date'])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Data visualization and pre-processing\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let’s see how many of each class is in our data set "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"PAIDOFF 260\n",
"COLLECTION 86\n",
"Name: loan_status, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['loan_status'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"260 people have paid off the loan on time while 86 have gone into collection \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets plot some columns to underestand data better:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"## Package Plan ##\n",
"\n",
" environment location: /home/jupyterlab/conda\n",
"\n",
" added / updated specs: \n",
" - seaborn\n",
"\n",
"\n",
"The following packages will be downloaded:\n",
"\n",
" package | build\n",
" ---------------------------|-----------------\n",
" numpy-base-1.15.0 | py36h3dfced4_0 4.2 MB anaconda\n",
" openssl-1.0.2p | h14c3975_0 3.5 MB anaconda\n",
" matplotlib-2.2.2 | py36hb69df0a_2 6.6 MB anaconda\n",
" ca-certificates-2018.03.07 | 0 124 KB anaconda\n",
" mkl_fft-1.0.4 | py36h4414c95_1 150 KB anaconda\n",
" seaborn-0.9.0 | py36_0 379 KB anaconda\n",
" conda-4.5.11 | py36_0 1.0 MB anaconda\n",
" scipy-1.1.0 | py36hc49cb51_0 18.1 MB anaconda\n",
" statsmodels-0.9.0 | py36h035aef0_0 9.0 MB anaconda\n",
" pandas-0.23.4 | py36h04863e7_0 10.1 MB anaconda\n",
" numpy-1.15.0 | py36h1b885b7_0 35 KB anaconda\n",
" patsy-0.5.1 | py36_0 380 KB anaconda\n",
" mkl_random-1.0.1 | py36h4414c95_1 373 KB anaconda\n",
" certifi-2018.10.15 | py36_0 139 KB anaconda\n",
" ------------------------------------------------------------\n",
" Total: 54.0 MB\n",
"\n",
"The following packages will be UPDATED:\n",
"\n",
" certifi: 2018.8.24-py36_1001 conda-forge --> 2018.10.15-py36_0 anaconda\n",
" conda: 4.5.11-py36_0 conda-forge --> 4.5.11-py36_0 anaconda\n",
" matplotlib: 2.2.2-py36h8e2386c_2 conda-forge --> 2.2.2-py36hb69df0a_2 anaconda\n",
" mkl_fft: 1.0.1-py36h3010b51_0 --> 1.0.4-py36h4414c95_1 anaconda\n",
" mkl_random: 1.0.1-py36h629b387_0 --> 1.0.1-py36h4414c95_1 anaconda\n",
" numpy: 1.14.3-py36hcd700cb_1 --> 1.15.0-py36h1b885b7_0 anaconda\n",
" numpy-base: 1.14.3-py36h9be14a7_1 --> 1.15.0-py36h3dfced4_0 anaconda\n",
" openssl: 1.0.2p-h470a237_0 conda-forge --> 1.0.2p-h14c3975_0 anaconda\n",
" pandas: 0.23.0-py36h637b7d7_0 --> 0.23.4-py36h04863e7_0 anaconda\n",
" patsy: 0.5.0-py36_0 --> 0.5.1-py36_0 anaconda\n",
" scipy: 1.1.0-py36hfc37229_0 --> 1.1.0-py36hc49cb51_0 anaconda\n",
" seaborn: 0.8.1-py36hfad7ec4_0 --> 0.9.0-py36_0 anaconda\n",
" statsmodels: 0.9.0-py36h3010b51_0 --> 0.9.0-py36h035aef0_0 anaconda\n",
"\n",
"The following packages will be DOWNGRADED:\n",
"\n",
" ca-certificates: 2018.8.24-ha4d7672_0 conda-forge --> 2018.03.07-0 anaconda\n",
"\n",
"\n",
"Downloading and Extracting Packages\n",
"numpy-base-1.15.0 | 4.2 MB | ##################################### | 100% \n",
"openssl-1.0.2p | 3.5 MB | ##################################### | 100% \n",
"matplotlib-2.2.2 | 6.6 MB | ##################################### | 100% \n",
"ca-certificates-2018 | 124 KB | ##################################### | 100% \n",
"mkl_fft-1.0.4 | 150 KB | ##################################### | 100% \n",
"seaborn-0.9.0 | 379 KB | ##################################### | 100% \n",
"conda-4.5.11 | 1.0 MB | ##################################### | 100% \n",
"scipy-1.1.0 | 18.1 MB | ##################################### | 100% \n",
"statsmodels-0.9.0 | 9.0 MB | ##################################### | 100% \n",
"pandas-0.23.4 | 10.1 MB | ##################################### | 100% \n",
"numpy-1.15.0 | 35 KB | ##################################### | 100% \n",
"patsy-0.5.1 | 380 KB | ##################################### | 100% \n",
"mkl_random-1.0.1 | 373 KB | ##################################### | 100% \n",
"certifi-2018.10.15 | 139 KB | ##################################### | 100% \n",
"Preparing transaction: done\n",
"Verifying transaction: done\n",
"Executing transaction: done\n"
]
}
],
"source": [
"# notice: installing seaborn might takes a few minutes\n",
"!conda install -c anaconda seaborn -y"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAG4xJREFUeJzt3XucFOWd7/HPV5wVFaIioyKIMyKKqGTAWY3XJbCyqPF2jAbjUdx4DtFoXDbxeMt5aTa+1nghMclRibhyyCaKGrKgSxINUTmKiRfAEcELITrqKCAQN8YgBPB3/qiaSYM9zKV7pmu6v+/Xq15T9VTVU7+umWd+XU9XP6WIwMzMLGt2KHUAZmZm+ThBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBdRFJe0u6T9LrkhZJ+q2kM4tU92hJc4tRV3eQNF9SfanjsNIop7YgqVrSs5JekHR8Fx7nw66quydxguoCkgTMAZ6MiAMi4ghgAjCoRPHsWIrjmpVhWxgLvBoRIyPiqWLEZK1zguoaY4C/RMQPmwsi4s2I+D8AknpJulXS85KWSPpyWj46vdqYJelVSfemDRxJ49OyBcB/a65X0q6Spqd1vSDp9LT8Qkk/lfSfwK8KeTGSZkiaKumJ9F3w36XHfEXSjJztpkpaKGmZpH9ppa5x6TvoxWl8fQqJzTKvbNqCpDrgFuBkSQ2Sdm7t71lSo6Qb03ULJY2S9Kik30u6ON2mj6TH0n1fao43z3H/V875yduuylZEeCryBFwO3Lad9ZOA/53O7wQsBGqB0cAfSd5d7gD8FjgO6A28DQwFBDwIzE33vxH47+n87sByYFfgQqAJ6NdKDE8BDXmmv8+z7Qzg/vTYpwMfAIenMS4C6tLt+qU/ewHzgRHp8nygHugPPAnsmpZfBVxX6t+Xp66byrAtXAjcns63+vcMNAKXpPO3AUuAvkA18F5aviPwqZy6VgBKlz9Mf44DpqWvdQdgLnBCqX+v3TW566cbSLqDpHH9JSL+luSPboSkz6eb7EbS4P4CPBcRTel+DUAN8CHwRkT8Li3/CUnDJq3rNElXpMu9gcHp/LyI+EO+mCKio/3n/xkRIeklYHVEvJTGsiyNsQE4R9IkkoY3ABhO0jCbfSYtezp9M/w3JP94rEKUSVto1tbf88Ppz5eAPhHxJ+BPkjZI2h34M3CjpBOAj4GBwN7Aqpw6xqXTC+lyH5Lz82QnY+5RnKC6xjLgrOaFiLhUUn+Sd4eQvBv6akQ8mruTpNHAxpyiLfz1d9TaoIkCzoqI17ap6yiSBpB/J+kpknd027oiIn6dp7w5ro+3ifFjYEdJtcAVwN9GxPtp11/vPLHOi4hzW4vLyk45toXc423v73m7bQY4j+SK6oiI2CSpkfxt5tsRcdd24ihb/gyqazwO9JZ0SU7ZLjnzjwKXSKoCkHSQpF23U9+rQK2kIelyboN4FPhqTv/8yPYEGBHHR0Rdnml7DXJ7PkXyT+CPkvYGTsqzzTPAsZIOTGPdRdJBnTye9Qzl3BYK/XvejaS7b5OkzwL759nmUeBLOZ9tDZS0VweO0aM5QXWBSDqPzwD+TtIbkp4DfkTSRw3wb8DLwGJJS4G72M7VbERsIOnG+Hn6wfCbOatvAKqAJWldNxT79bRHRLxI0g2xDJgOPJ1nmzUkffgzJS0haeDDujFM62bl3BaK8Pd8L1AvaSHJ1dSreY7xK+A+4Ldp9/os8l/tlaXmD+TMzMwyxVdQZmaWSU5QZmaWSU5QZmaWSU5QZmaWSZlIUOPHjw+S7zZ48lQuU9G4fXgqs6ndMpGg1q5dW+oQzDLL7cMqVSYSlJmZ2bacoMzMLJOcoMzMLJM8WKyZlZVNmzbR1NTEhg0bSh1KRevduzeDBg2iqqqq03U4QZlZWWlqaqJv377U1NSQjhtr3SwiWLduHU1NTdTW1na6HnfxmVlZ2bBhA3vuuaeTUwlJYs899yz4KtYJyirG/gMGIKko0/4DBpT65dh2ODmVXjF+B+7is4rx1qpVNO07qCh1DXq3qSj1mFnrfAVlZmWtmFfO7b167tWrF3V1dRx22GGcffbZrF+/vmXd7NmzkcSrr/718U+NjY0cdthhAMyfP5/ddtuNkSNHcvDBB3PCCScwd+7creqfNm0aw4YNY9iwYRx55JEsWLCgZd3o0aM5+OCDqauro66ujlmzZm0VU/PU2NhYyGntFr6CMrOyVswrZ2jf1fPOO+9MQ0MDAOeddx4//OEP+drXvgbAzJkzOe6447j//vv55je/mXf/448/viUpNTQ0cMYZZ7DzzjszduxY5s6dy1133cWCBQvo378/ixcv5owzzuC5555jn332AeDee++lvr6+1Zh6ijavoCRNl/Re+oTK5rJvSnpHUkM6nZyz7hpJKyS9JukfuipwM7Oe4Pjjj2fFihUAfPjhhzz99NPcc8893H///e3av66ujuuuu47bb78dgJtvvplbb72V/v37AzBq1CgmTpzIHXfc0TUvoITa08U3Axifp/y2iKhLp18ASBoOTAAOTfe5U1KvYgVrZtaTbN68mV/+8pccfvjhAMyZM4fx48dz0EEH0a9fPxYvXtyuekaNGtXSJbhs2TKOOOKIrdbX19ezbNmyluXzzjuvpStv3bp1AHz00UctZWeeeWYxXl6Xa7OLLyKelFTTzvpOB+6PiI3AG5JWAEcCv+10hGZmPUxzMoDkCuqiiy4Cku69yZMnAzBhwgRmzpzJqFGj2qwvYvuDgEfEVnfNlUsXXyGfQV0m6QJgIfD1iHgfGAg8k7NNU1r2CZImAZMABg8eXEAYZuXH7aNny5cM1q1bx+OPP87SpUuRxJYtW5DELbfc0mZ9L7zwAocccggAw4cPZ9GiRYwZM6Zl/eLFixk+fHhxX0QGdPYuvqnAEKAOWAl8Jy3Pd+N73tQfEdMioj4i6qurqzsZhll5cvsoP7NmzeKCCy7gzTffpLGxkbfffpva2tqt7sDLZ8mSJdxwww1ceumlAFx55ZVcddVVLV13DQ0NzJgxg6985Std/hq6W6euoCJidfO8pLuB5nsgm4D9cjYdBLzb6ejMzAo0eJ99ivq9tcHpnXIdNXPmTK6++uqtys466yzuu+8+rrrqqq3Kn3rqKUaOHMn69evZa6+9+MEPfsDYsWMBOO2003jnnXc45phjkETfvn35yU9+woAy/PK42urbBEg/g5obEYelywMiYmU6/8/AURExQdKhwH0knzvtCzwGDI2ILdurv76+PhYuXFjI6zBrk6SiflG3jbZTtKEM3D465pVXXmnpDrPSauV30e620eYVlKSZwGigv6Qm4HpgtKQ6ku67RuDLABGxTNKDwMvAZuDStpKTmZlZPu25i+/cPMX3bGf7fwX+tZCgzMzMPNSRmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZW1fQcNLurjNvYd1L6RPVatWsWECRMYMmQIw4cP5+STT2b58uUsW7aMMWPGcNBBBzF06FBuuOGGlq8szJgxg8suu+wTddXU1LB27dqtymbMmEF1dfVWj9B4+eWXAVi+fDknn3wyBx54IIcccgjnnHMODzzwQMt2ffr0aXkkxwUXXMD8+fP53Oc+11L3nDlzGDFiBMOGDePwww9nzpw5LesuvPBCBg4cyMaNGwFYu3YtNTU1HfqdtJcft2FmZW3lO29z1HWPFK2+Z7+Vb+zsrUUEZ555JhMnTmwZtbyhoYHVq1dz4YUXMnXqVMaNG8f69es566yzuPPOO1tGiuiIL3zhCy2jnDfbsGEDp5xyCt/97nc59dRTAXjiiSeorq5uGX5p9OjRTJkypWW8vvnz57fs/+KLL3LFFVcwb948amtreeONNzjxxBM54IADGDFiBJA8W2r69OlccsklHY65I3wFZWZWZE888QRVVVVcfPHFLWV1dXUsX76cY489lnHjxgGwyy67cPvtt3PTTTcV7dj33XcfRx99dEtyAvjsZz/b8kDEtkyZMoVrr72W2tpaAGpra7nmmmu49dZbW7aZPHkyt912G5s3by5a3Pk4QZmZFdnSpUs/8UgMyP+ojCFDhvDhhx/ywQcfdPg4ud12dXV1fPTRR60eu73a8ziPwYMHc9xxx/HjH/+408dpD3fxmZl1k20fi5GrtfLtydfFV6h8MeYru/baaznttNM45ZRTinr8XL6CMjMrskMPPZRFixblLd92XMXXX3+dPn360Ldv3y49dkf23zbGfI/zOPDAA6mrq+PBBx/s9LHa4gRlZlZkY8aMYePGjdx9990tZc8//zxDhw5lwYIF/PrXvwaSBxtefvnlXHnllUU79he/+EV+85vf8POf/7yl7JFHHuGll15q1/5XXHEF3/72t2lsbASgsbGRG2+8ka9//euf2PYb3/gGU6ZMKUrc+biLz8zK2oCB+7XrzruO1NcWScyePZvJkydz00030bt3b2pqavje977HQw89xFe/+lUuvfRStmzZwvnnn7/VreUzZszY6rbuZ55JngE7YsQIdtghuaY455xzGDFiBA888MBWz5O68847OeaYY5g7dy6TJ09m8uTJVFVVMWLECL7//e+36/XV1dVx8803c+qpp7Jp0yaqqqq45ZZbWp4QnOvQQw9l1KhR7X50fUe163EbXc2PE7Du4MdtVAY/biM7Cn3cRptdfJKmS3pP0tKcslslvSppiaTZknZPy2skfSSpIZ1+2N5AzMzMcrXnM6gZwLbXx/OAwyJiBLAcuCZn3e8joi6dLsbMzKwT2kxQEfEk8Idtyn4VEc3f0HqG5NHuZmaZkIWPLipdMX4HxbiL70vAL3OWayW9IOn/STq+tZ0kTZK0UNLCNWvWFCEMs/Lh9tF5vXv3Zt26dU5SJRQRrFu3jt69exdUT0F38Un6Bsmj3e9Ni1YCgyNinaQjgDmSDo2IT3xFOiKmAdMg+RC4kDjMyo3bR+cNGjSIpqYmnNhLq3fv3gwaVFjnWqcTlKSJwOeAsZG+VYmIjcDGdH6RpN8DBwG+BcnMukVVVVXLOHLWs3Wqi0/SeOAq4LSIWJ9TXi2pVzp/ADAUeL0YgZqZWWVp8wpK0kxgNNBfUhNwPcldezsB89LxmZ5J79g7AfiWpM3AFuDiiPhD3orNzMy2o80EFRHn5im+p5Vtfwb8rNCgzMzMPBafmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZllUrsSlKTpkt6TtDSnrJ+keZJ+l/7cIy2XpB9IWiFpiaRRXRW8mZmVr/ZeQc0Axm9TdjXwWEQMBR5LlwFOInmS7lBgEjC18DDNzKzStCtBRcSTwLZPxj0d+FE6/yPgjJzyf4/EM8DukgYUI1gzM6schXwGtXdErARIf+6Vlg8E3s7Zrikt24qkSZIWSlq4Zs2aAsIwKz9uH2Zdc5OE8pTFJwoipkVEfUTUV1dXd0EYZj2X24dZYQlqdXPXXfrzvbS8CdgvZ7tBwLsFHMfMzCpQIQnqYWBiOj8ReCin/IL0br7PAH9s7go0MzNrrx3bs5GkmcBooL+kJuB64CbgQUkXAW8BZ6eb/wI4GVgBrAf+scgxm5lZBWhXgoqIc1tZNTbPtgFcWkhQZmZmHknCzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyqV2jmecj6WDggZyiA4DrgN2B/wk0P6f62oj4RacjNDOzitTpBBURrwF1AJJ6Ae8As0me/3RbREwpSoRmZlaRitXFNxb4fUS8WaT6zMyswhUrQU0AZuYsXyZpiaTpkvbIt4OkSZIWSlq4Zs2afJuYVSy3D7MiJChJfwOcBvw0LZoKDCHp/lsJfCfffhExLSLqI6K+urq60DDMyorbh1lxrqBOAhZHxGqAiFgdEVsi4mPgbuDIIhzDzMwqTDES1LnkdO9JGpCz7kxgaRGOYWZmFabTd/EBSNoFOBH4ck7xLZLqgAAat1lnZmbWLgUlqIhYD+y5Tdn5BUVkZmaGR5IwM7OMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMKug2c7OeRL2qGPRuU9HqMrOu5QRlFSO2bOKo6x4pSl3Pfmt8Ueoxs9a5i8/MzDLJCcrMzDLJCcrMzDLJCcrMzDLJCcrMzDLJCcrMzDKp4NvMJTUCfwK2AJsjol5SP+ABoIbkmVDnRMT7hR7LzMwqR7GuoD4bEXURUZ8uXw08FhFDgcfSZasw+w8YgKSCp/0HDGj7YGZWdrrqi7qnA6PT+R8B84GruuhYllFvrVpF076DCq6nWKM/mFnPUowrqAB+JWmRpElp2d4RsRIg/bnXtjtJmiRpoaSFa9asKUIYZuXD7cOsOAnq2IgYBZwEXCrphPbsFBHTIqI+Iuqrq6uLEIZZ+XD7MCtCgoqId9Of7wGzgSOB1ZIGAKQ/3yv0OGZmVlkKSlCSdpXUt3keGAcsBR4GJqabTQQeKuQ4ZmZWeQq9SWJvYLak5rrui4hHJD0PPCjpIuAt4OwCj2NmZhWmoAQVEa8Dn85Tvg4YW0jdZmZW2TyShJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZZITlJmZZfIJ2F31RF0zM+tBsvgEbF9BmZlZJnU6QUnaT9ITkl6RtEzSP6Xl35T0jqSGdDq5eOGamVmlKKSLbzPw9YhYnD60cJGkeem62yJiSuHhmZlZpep0goqIlcDKdP5Pkl4BBhYrMDMzq2xF+QxKUg0wEng2LbpM0hJJ0yXt0co+kyQtlLRwzZo1xQjDrGy4fZgVIUFJ6gP8DJgcER8AU4EhQB3JFdZ38u0XEdMioj4i6qurqwsNw6ysuH2YFZigJFWRJKd7I+I/ACJidURsiYiPgbuBIwsP08zMKk0hd/EJuAd4JSK+m1Oe+y2tM4GlnQ/PzMwqVSF38R0LnA+8JKkhLbsWOFdSHRBAI/DlgiI0M7OKVMhdfAsA5Vn1i86HY2ZmlvBIEmZmlkkei8+6jHpVFWVcLvWqKkI0ZtbTOEFZl4ktmzjqukcKrufZb40vQjRm1tO4i8/MzDLJCcrMzDLJCcrMzDLJCcrMzDLJCcrMrJtl8fHqWeS7+MzMulkWH6+eRb6CMjOzTHKCMjOzTHIXn5mZZXLkFycoMzPL5Mgv7uIzM7NM6rIEJWm8pNckrZB0daH1+bZMM7PK0iVdfJJ6AXcAJwJNwPOSHo6Ilztbp2/LNDOrLF31GdSRwIqIeB1A0v3A6UCnE1TW7D9gAG+tWlVwPYP32Yc3V64sQkTlTcr3bEzLIreNthXrhoQdelWVddtQRBS/UunzwPiI+B/p8vnAURFxWc42k4BJ6eLBwGtFD6T9+gNrS3j8Qjj20mgr9rUR0elPizPUPsr5d5Rl5Rx7u9tGV11B5UvpW2XCiJgGTOui43eIpIURUV/qODrDsZdGV8eelfbh31FpOPZEV90k0QTsl7M8CHi3i45lZmZlqKsS1PPAUEm1kv4GmAA83EXHMjOzMtQlXXwRsVnSZcCjQC9gekQs64pjFUnJu1IK4NhLoyfH3hE9+XU69tIoWuxdcpOEmZlZoTyShJmZZZITlJmZZVLFJChJvSS9IGluulwr6VlJv5P0QHozB5J2SpdXpOtrShz37pJmSXpV0iuSjpbUT9K8NPZ5kvZIt5WkH6SxL5E0qsSx/7OkZZKWSpopqXdWz7uk6ZLek7Q0p6zD51nSxHT730ma2J2vobPcNkoSu9tGO1RMggL+CXglZ/lm4LaIGAq8D1yUll8EvB8RBwK3pduV0veBRyJiGPBpktdwNfBYGvtj6TLAScDQdJoETO3+cBOSBgKXA/URcRjJzTITyO55nwFs++XBDp1nSf2A64GjSEZTub654Wac20Y3ctvoQNuIiLKfSL6H9RgwBphL8kXitcCO6fqjgUfT+UeBo9P5HdPtVKK4PwW8se3xSUYVGJDODwBeS+fvAs7Nt10JYh8IvA30S8/jXOAfsnzegRpgaWfPM3AucFdO+VbbZXFy23DbaGfMJWkblXIF9T3gSuDjdHlP4L8iYnO63ETyRwN//eMhXf/HdPtSOABYA/zftAvm3yTtCuwdESvTGFcCe6Xbt8Seyn1d3Soi3gGmAG8BK0nO4yJ6xnlv1tHznJnz3wFuG93MbWOr8u0q+wQl6XPAexGxKLc4z6bRjnXdbUdgFDA1IkYCf+avl9L5ZCb29PL9dKAW2BfYleTyf1tZPO9taS3WnvQa3DbcNrpCUdtG2Sco4FjgNEmNwP0kXRnfA3aX1PxF5dyhmFqGaUrX7wb8oTsDztEENEXEs+nyLJJGuVrSAID053s522dliKm/B96IiDURsQn4D+AYesZ5b9bR85yl898ebhul4bbRzvNf9gkqIq6JiEERUUPyQeTjEXEe8ATw+XSzicBD6fzD6TLp+scj7TTtbhGxCnhb0sFp0ViSR5bkxrht7Bekd9J8Bvhj82V4CbwFfEbSLpLEX2PP/HnP0dHz/CgwTtIe6bvkcWlZJrltuG0UoHvaRik+JCzVBIwG5qbzBwDPASuAnwI7peW90+UV6foDShxzHbAQWALMAfYg6X9+DPhd+rNfuq1IHhT5e+AlkruEShn7vwCvAkuBHwM7ZfW8AzNJPg/YRPJu76LOnGfgS+lrWAH8Y6n/5jvw+t02ujd2t412HNtDHZmZWSaVfRefmZn1TE5QZmaWSU5QZmaWSU5QZmaWSU5QZmaWSU5QGSZpi6SGdMTjn0rapZXtfiFp907Uv6+kWQXE1yipf2f3N+sst43K4NvMM0zShxHRJ52/F1gUEd/NWS+S3+HHrdXRxfE1knzPYW0pjm+Vy22jMvgKqud4CjhQUo2SZ9/cCSwG9mt+t5az7m4lz5r5laSdASQdKOnXkl6UtFjSkHT7pen6CyU9JOkRSa9Jur75wJLmSFqU1jmpJK/erHVuG2XKCaoHSMffOonkm9kABwP/HhEjI+LNbTYfCtwREYcC/wWclZbfm5Z/mmTcr3zDvBwJnEfyDf2zJdWn5V+KiCOAeuBySaUeSdkMcNsod05Q2bazpAaS4VzeAu5Jy9+MiGda2eeNiGhI5xcBNZL6AgMjYjZARGyIiPV59p0XEesi4iOSASyPS8svl/Qi8AzJgI9DC35lZoVx26gAO7a9iZXQRxFRl1uQdK3z5+3sszFnfguwM/mHus9n2w8kQ9JoktGXj46I9ZLmk4wNZlZKbhsVwFdQFSAiPgCaJJ0BIGmnVu56OlFSv7Rv/gzgaZKh/d9PG+Aw4DPdFrhZF3PbyDYnqMpxPkl3xBLgN8A+ebZZQDKycgPws4hYCDwC7JjudwNJV4ZZOXHbyCjfZm5AcqcSyW2xl5U6FrMscdsoHV9BmZlZJvkKyszMMslXUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlkn/H+LDZoiBEQ8dAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x216 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"\n",
"bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)\n",
"g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
"g.map(plt.hist, 'Principal', bins=bins, ec=\"k\")\n",
"\n",
"g.axes[-1].legend()\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGfZJREFUeJzt3XuQVOW57/HvTxgdFbygo4yMwKgoopIBZ3tDDYJy2N49XuKOR7GOJx4Naqjo8ZZTVrLdZbyVmhwvkUQLK1HUmA26SUWDCidi4gVwRBBv0UFHQS7RKAchgs/5o9fMHqBhembWTK/u+X2qVnWvt1e/61lMvzy93vX2uxQRmJmZZc02xQ7AzMwsHycoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCeolEjaU9Ijkt6XNE/SXySdkVLdoyXNSKOu7iBptqT6YsdhxVdO7UJSlaSXJb0m6Zgu3M/qrqq71DhBpUCSgOnAnyJin4g4FDgXqClSPL2LsV+z1sqwXYwF3oqIERHxQhox2dY5QaVjDPCPiPhFc0FELImI/wMgqZek2yS9KmmBpP+ZlI9OzjaekPSWpIeTRo2k8UnZHOC/NtcraUdJDyZ1vSbptKT8Qkm/lfQfwB87czCSpki6T9Ks5Jvvt5N9LpY0pdV290maK2mRpJ9soa5xybfm+Ul8fToTm5WUsmkXkuqAW4ETJTVI2n5Ln21JjZJuSl6bK2mkpGck/VXSJck2fSQ9l7z3jeZ48+z3f7X698nbxspaRHjp5AJcAdy5ldcvBv538nw7YC5QC4wG/k7uG+U2wF+Ao4FK4CNgCCDgcWBG8v6bgP+WPN8FeAfYEbgQaAL6bSGGF4CGPMvxebadAjya7Ps04AvgkCTGeUBdsl2/5LEXMBsYnqzPBuqB3YE/ATsm5dcANxT77+Wle5YybBcXAncnz7f42QYagUuT53cCC4C+QBWwPCnvDezUqq73ACXrq5PHccDk5Fi3AWYAxxb779qdi7uCuoCke8g1qH9ExD+R+6ANl3RWssnO5BrZP4BXIqIpeV8DMBhYDXwQEe8m5b8h15hJ6jpV0lXJeiUwMHk+MyL+li+miGhvn/l/RERIegP4NCLeSGJZlMTYAJwj6WJyja0aGEauMTY7Iil7MfkCvC25/2ysByqTdtGsrc/2U8njG0CfiPgS+FLSWkm7AP8PuEnSscA3wABgT2BZqzrGJctryXofcv8+f+pgzCXHCSodi4Azm1ciYqKk3cl9I4TcN6DLI+KZ1m+SNBpY16poA//5N9nSJIkCzoyItzep63ByH/r8b5JeIPctblNXRcSzecqb4/pmkxi/AXpLqgWuAv4pIj5Luv4q88Q6MyL+ZUtxWVkrx3bRen9b+2xvtf0A55E7ozo0Ir6W1Ej+9vPTiLh/K3GUNV+DSsfzQKWkS1uV7dDq+TPApZIqACTtL2nHrdT3FlArad9kvXUjeAa4vFWf/IhCAoyIYyKiLs+ytUa4NTuRa/h/l7Qn8M95tnkJGCVpvyTWHSTt38H9Wekp53bR2c/2zuS6+76WdBwwKM82zwD/vdW1rQGS9mjHPkqeE1QKItdhfDrwbUkfSHoFeIhcvzTAr4A3gfmSFgL3s5Wz14hYS67r4vfJxeAlrV6+EagAFiR13Zj28RQiIl4n1/WwCHgQeDHPNivI9dtPlbSAXKMe2o1hWhGVc7tI4bP9MFAvaS65s6m38uzjj8AjwF+SrvYnyH+2V7aaL8qZmZllis+gzMwsk5ygzMwsk5ygzMwsk5ygzMwsk7o1QY0fPz7I/Y7Bi5dyXTrN7cRLD1gK0q0JauXKld25O7OS5HZiluMuPjMzyyQnKDMzyyQnKDMzyyRPFmtmZefrr7+mqamJtWvXFjuUHq2yspKamhoqKio69H4nKDMrO01NTfTt25fBgweTzB9r3SwiWLVqFU1NTdTW1naoDnfxmVnZWbt2LbvttpuTUxFJYrfdduvUWawTVDcaVF2NpFSWQdXVxT4cs0xzciq+zv4N3MXXjT5ctoymvWpSqavmk6ZU6jEzyyqfQZlZ2Uuz96LQHoxevXpRV1fHwQcfzNlnn82aNWtaXps2bRqSeOut/7wNVGNjIwcffDAAs2fPZuedd2bEiBEccMABHHvsscyYMWOj+idPnszQoUMZOnQohx12GHPmzGl5bfTo0RxwwAHU1dVRV1fHE088sVFMzUtjY2Nn/lm7nM+gzKzspdl7AYX1YGy//fY0NDQAcN555/GLX/yCH/7whwBMnTqVo48+mkcffZQf//jHed9/zDHHtCSlhoYGTj/9dLbffnvGjh3LjBkzuP/++5kzZw6777478+fP5/TTT+eVV16hf//+ADz88MPU19dvMaZS4DMoM7Mudswxx/Dee+8BsHr1al588UUeeOABHn300YLeX1dXxw033MDdd98NwC233MJtt93G7rvvDsDIkSOZMGEC99xzT9ccQJE4QZmZdaH169fzhz/8gUMOOQSA6dOnM378ePbff3/69evH/PnzC6pn5MiRLV2CixYt4tBDD93o9fr6ehYtWtSyft5557V05a1atQqAr776qqXsjDPOSOPwupS7+MzMukBzMoDcGdRFF10E5Lr3Jk2aBMC5557L1KlTGTlyZJv1RWx9EvCI2GjUXDl08RWUoCQ1Al8CG4D1EVEvqR/wGDAYaATOiYjPuiZMM7PSki8ZrFq1iueff56FCxciiQ0bNiCJW2+9tc36XnvtNQ488EAAhg0bxrx58xgzZkzL6/Pnz2fYsGHpHkSRtaeL77iIqIuI5pR8LfBcRAwBnkvWzcxsC5544gkuuOAClixZQmNjIx999BG1tbUbjcDLZ8GCBdx4441MnDgRgKuvvpprrrmmpeuuoaGBKVOm8P3vf7/Lj6E7daaL7zRgdPL8IWA2cE0n4zEzS93A/v1T/e3gwGSkXHtNnTqVa6/d+Lv8mWeeySOPPMI112z83+cLL7zAiBEjWLNmDXvssQc///nPGTt2LACnnnoqH3/8MUcddRSS6Nu3L7/5zW+oLrMf8Kutfk0ASR8An5G7E+L9ETFZ0ucRsUurbT6LiF3zvPdi4GKAgQMHHrpkyZLUgi81klL9oW4hfzvrdh366bzbSboWL17c0h1mxbWFv0VB7aTQLr5RETES+GdgoqRjCw0uIiZHRH1E1FdVVRX6NrMexe3EbHMFJaiI+CR5XA5MAw4DPpVUDZA8Lu+qIM3MrOdpM0FJ2lFS3+bnwDhgIfAUMCHZbALwZFcFaWZmPU8hgyT2BKYl4+t7A49ExNOSXgUel3QR8CFwdteFaWZmPU2bCSoi3ge+lad8FTC2K4IyMzPzVEdmZpZJTlBmVvb2qhmY6u029qoZWNB+ly1bxrnnnsu+++7LsGHDOPHEE3nnnXdYtGgRY8aMYf/992fIkCHceOONLT8bmTJlCpdddtlmdQ0ePJiVK1duVDZlyhSqqqo2uoXGm2++CcA777zDiSeeyH777ceBBx7IOeecw2OPPdayXZ8+fVpuyXHBBRcwe/ZsTj755Ja6p0+fzvDhwxk6dCiHHHII06dPb3ntwgsvZMCAAaxbtw6AlStXMnjw4Hb9TQrhufgKMKi6mg+XLSt2GGbWQUs//ojDb3g6tfpe/tfxbW4TEZxxxhlMmDChZdbyhoYGPv30Uy688ELuu+8+xo0bx5o1azjzzDO59957W2aKaI/vfOc7LbOcN1u7di0nnXQSd9xxB6eccgoAs2bNoqqqqmX6pdGjR3P77be3zNc3e/bslve//vrrXHXVVcycOZPa2lo++OADTjjhBPbZZx+GDx8O5O4t9eCDD3LppZe2O+ZCOUEVIK17yfguuGY9x6xZs6ioqOCSSy5pKaurq+OBBx5g1KhRjBs3DoAddtiBu+++m9GjR3coQeXzyCOPcOSRR7YkJ4Djjjuu4PfffvvtXH/99dTW1gJQW1vLddddx2233cavf/1rACZNmsSdd97J9773vVRizsddfGZmXWDhwoWb3RID8t8qY99992X16tV88cUX7d5P6267uro6vvrqqy3uu1CF3M5j4MCBHH300S0Jqyv4DMrMrBtteluM1rZUvjX5uvg6K1+M+cquv/56Tj31VE466aRU99/MZ1BmZl3goIMOYt68eXnL586du1HZ+++/T58+fejbt2+X7rs97980xny389hvv/2oq6vj8ccf7/C+tsYJysysC4wZM4Z169bxy1/+sqXs1VdfZciQIcyZM4dnn30WyN3Y8IorruDqq69Obd/f/e53+fOf/8zvf//7lrKnn36aN954o6D3X3XVVfz0pz+lsbERgMbGRm666SauvPLKzbb90Y9+xO23355K3JtyF5+Zlb3qAXsXNPKuPfW1RRLTpk1j0qRJ3HzzzVRWVjJ48GDuuusunnzySS6//HImTpzIhg0bOP/88zcaWj5lypSNhnW/9NJLAAwfPpxttsmdV5xzzjkMHz6cxx57bKP7Sd17770cddRRzJgxg0mTJjFp0iQqKioYPnw4P/vZzwo6vrq6Om655RZOOeUUvv76ayoqKrj11ltb7hDc2kEHHcTIkSMLvnV9exR0u4201NfXx6anjaUgrdtk1HzS5NttlL8O3W6jtVJtJ1ni221kR3fcbsPMzKxbOUGZmVkmOUGZWVlyF3jxdfZv4ARlZmWnsrKSVatWOUkVUUSwatUqKisrO1yHR/GZWdmpqamhqamJFStWFDuUHq2yspKamo4PDHOCKlHb0bFfneczsH9/lixdmkpdZllQUVHRMo+clS4nqBK1DlIdsm5mljUFX4OS1EvSa5JmJOu1kl6W9K6kxyRt23VhmplZT9OeQRI/ABa3Wr8FuDMihgCfARelGZiZmfVsBSUoSTXAScCvknUBY4Ankk0eAk7vigDNzKxnKvQM6i7gauCbZH034POIWJ+sNwED8r1R0sWS5kqa6xE1Zvm5nZhtrs0EJelkYHlEtJ67Pd/wsbw/OIiIyRFRHxH1VVVVHQzTrLy5nZhtrpBRfKOAUyWdCFQCO5E7o9pFUu/kLKoG+KTrwjQzs56mzTOoiLguImoiYjBwLvB8RJwHzALOSjabADzZZVGamVmP05mpjq4BfijpPXLXpB5IJyQzM7N2/lA3ImYDs5Pn7wOHpR+SmZmZJ4s1M7OMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMajNBSaqU9Iqk1yUtkvSTpLxW0suS3pX0mKRtuz5cMzPrKQo5g1oHjImIbwF1wHhJRwC3AHdGxBDgM+CirgvTzMx6mjYTVOSsTlYrkiWAMcATSflDwOldEqGZmfVIBV2DktRLUgOwHJgJ/BX4PCLWJ5s0AQO28N6LJc2VNHfFihVpxGxWdtxOzDZXUIKKiA0RUQfUAIcBB+bbbAvvnRwR9RFRX1VV1fFIzcqY24nZ5to1ii8iPgdmA0cAu0jqnbxUA3ySbmhmZtaTFTKKr0rSLsnz7YHjgcXALOCsZLMJwJNdFaSZmfU8vdvehGrgIUm9yCW0xyNihqQ3gUcl/RvwGvBAF8ZpZmY9TJsJKiIWACPylL9P7nqUmZlZ6jyThJmZZZITlJmZZZITlJmZZZITlJmZZVLZJqhB1dVISmUxM7PuV8gw85L04bJlNO1Vk0pdNZ80pVKPmZkVrmzPoMzMrLQ5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSY5QZmZWSa1maAk7S1plqTFkhZJ+kFS3k/STEnvJo+7dn24ZmbWUxRyBrUeuDIiDgSOACZKGgZcCzwXEUOA55J1MzOzVLSZoCJiaUTMT55/CSwGBgCnAQ8lmz0EnN5VQZqZWc/TrmtQkgYDI4CXgT0jYinkkhiwxxbec7GkuZLmrlixonPRmpUptxOzzRWcoCT1AX4HTIqILwp9X0RMjoj6iKivqqrqSIxmZc/txGxzBSUoSRXkktPDEfHvSfGnkqqT16uB5V0TopmZ9USFjOIT8ACwOCLuaPXSU8CE5PkE4Mn0w7PusB20edv7QpZB1dXFPhQzKyOF3PJ9FHA+8IakhqTseuBm4HFJFwEfAmd3TYjW1dYBTXvVdLqemk+aOh+MmVmizQQVEXMAbeHlsemGk03qVZHKf77qvW1q/4mrV0Uq9ZiZZVUhZ1A9Xmz4msNveLrT9bz8r+NTqae5LjOzcuapjszMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJOcoMzMLJPKdiaJtKYnMjOz4ijbBJXW9ETgaYXMzIrBXXxmZpZJTlBmZpZJTlBmZpZJZXsNqtylOQjE95ayrBlUXc2Hy5Z1up7tt+nFV99sSCEiGNi/P0uWLk2lLiuME1SJ8iAQK2cfLluW2l2e06inuS7rXm128Ul6UNJySQtblfWTNFPSu8njrl0bppmZ9TSFXIOaAmz6Ffta4LmIGAI8l6xbD7cdICmVZVB1dbEPx8yKrM0uvoj4k6TBmxSfBoxOnj8EzAauSTEuK0HrwN0pZpaajo7i2zMilgIkj3tsaUNJF0uaK2nuihUrOrg7s/JWDu1kUHV1amfQZtANgyQiYjIwGaC+vj66en9mpagc2klaAxvAZ9CW09EzqE8lVQMkj8vTC8nMzKzjCeopYELyfALwZDrhmJmZ5RQyzHwq8BfgAElNki4CbgZOkPQucEKybmZmlppCRvH9yxZeGptyLGZmZi0yNRefRwGZmVmzTE115FFAZmbWLFMJyoojrYlnPemsmaXJCcpSm3jWk86aWZoydQ3KzMysmROUmZllkhOUmZllkhOUmZllkhOUZZLvLdU9/NtDyzKP4rNM8r2luod/e2hZ5gRlqUnr91TNdZlZz+YEZalJ6/dU4N9UmZmvQZmZWUb5DMoyKc3uwm16VaRyEX9g//4sWbo0hYjKU6pdvL239fRbBRhUXc2Hy5alUlcWP99OUJZJaXcXpjEQwIMAti7tv5mn32pbuQ9ycRefmZllUqbOoNLsIjAzs9KWqQTlUWBmZtasUwlK0njgZ0Av4FcRcXMqUZmlqBzvd5XmxXErTFqDbQC26V3BN+u/TqWuctbhBCWpF3APcALQBLwq6amIeDOt4MzSUI73u0rr4ri71Av3jQfudLvODJI4DHgvIt6PiH8AjwKnpROWmZn1dIqIjr1ROgsYHxH/I1k/Hzg8Ii7bZLuLgYuT1QOAtzsebovdgZUp1JMFPpZs6uixrIyIdp9qdVE7Af9NsqqnH0tB7aQz16DydcZulu0iYjIwuRP72XzH0tyIqE+zzmLxsWRTdx9LV7QT8N8kq3wshelMF18TsHer9Rrgk86FY2ZmltOZBPUqMERSraRtgXOBp9IJy8zMeroOd/FFxHpJlwHPkBtm/mBELEotsq1LvSukiHws2VQux1IuxwE+lqzqsmPp8CAJMzOzruS5+MzMLJOcoMzMLJMyn6Ak7S1plqTFkhZJ+kFS3k/STEnvJo+7FjvWtkiqlPSKpNeTY/lJUl4r6eXkWB5LBp1knqRekl6TNCNZL8njAJDUKOkNSQ2S5iZlJfMZczvJtnJpK93dTjKfoID1wJURcSBwBDBR0jDgWuC5iBgCPJesZ906YExEfAuoA8ZLOgK4BbgzOZbPgIuKGGN7/ABY3Gq9VI+j2XERUdfqNx2l9BlzO8m2cmor3ddOIqKkFuBJcvP/vQ1UJ2XVwNvFjq2dx7EDMB84nNyvsHsn5UcCzxQ7vgLir0k+jGOAGeR+uF1yx9HqeBqB3TcpK9nPmNtJdpZyaivd3U5K4QyqhaTBwAjgZWDPiFgKkDzuUbzICpec6jcAy4GZwF+BzyNifbJJEzCgWPG1w13A1cA3yfpulOZxNAvgj5LmJdMOQel+xgbjdpIl5dRWurWdZOp+UFsjqQ/wO2BSRHyR1rT33S0iNgB1knYBpgEH5tuse6NqH0knA8sjYp6k0c3FeTbN9HFsYlREfCJpD2CmpLeKHVBHuJ1kSxm2lW5tJyWRoCRVkGt0D0fEvyfFn0qqjoilkqrJfdMqGRHxuaTZ5K4X7CKpd/KNqhSmjBoFnCrpRKAS2Inct8RSO44WEfFJ8rhc0jRys/WX1GfM7SSTyqqtdHc7yXwXn3JfAR8AFkfEHa1eegqYkDyfQK7PPdMkVSXfCJG0PXA8uQuns4Czks0yfywRcV1E1ETEYHJTXD0fEedRYsfRTNKOkvo2PwfGAQspoc+Y20k2lVNbKUo7KfZFtwIuyh1N7vR3AdCQLCeS68d9Dng3eexX7FgLOJbhwGvJsSwEbkjK9wFeAd4DfgtsV+xY23FMo4EZpXwcSdyvJ8si4EdJecl8xtxOsr+UelspRjvxVEdmZpZJme/iMzOznskJyszMMskJyszMMskJyszMMskJyszMMskJyszMMskJyszMMskJqsRJmp5M3LioefJGSRdJekfSbEm/lHR3Ul4l6XeSXk2WUcWN3qx7uJ2UJv9Qt8RJ6hcRf0umhHkV+C/Ai8BI4EvgeeD1iLhM0iPAvRExR9JAclP855uE06ysuJ2UppKYLNa26gpJZyTP9wbOB/5vRPwNQNJvgf2T148HhrWa4XonSX0j4svuDNisCNxOSpATVAlLpu8/HjgyItYksz6/Tf5bE0CuS/fIiPiqeyI0Kz63k9Lla1ClbWfgs6TRDSV3S4IdgG9L2lVSb+DMVtv/EbiseUVSXbdGa1YcbiclygmqtD0N9Ja0ALgReAn4GLiJ3N1UnwXeBP6ebH8FUC9pgaQ3gUu6P2Szbud2UqI8SKIMSeoTEauTb4bTgAcjYlqx4zLLEreT7PMZVHn6saQGcvfS+QCYXuR4zLLI7STjfAZlZmaZ5DMoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLpP8PlTlGZbaTvVAAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x216 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bins = np.linspace(df.age.min(), df.age.max(), 10)\n",
"g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
"g.map(plt.hist, 'age', bins=bins, ec=\"k\")\n",
"\n",
"g.axes[-1].legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Pre-processing: Feature selection/extraction"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Lets look at the day of the week people get the loan "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGepJREFUeJzt3XmcVPW55/HPV2gvIriC2tIBWkQQldtgR+OCQUh4EdzwuoTEKGTMdTQuYQyDSzImN84YF8YlcSVq8EbEhUTMJTcaVIjgztKCiCFebbEVFJgYYxQFfeaPOt1poKGr6VPU6erv+/WqV1edOud3ntNdTz91fnXq91NEYGZmljU7FDsAMzOzprhAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlApUTS3pLuk/S6pAWSnpV0ckptD5U0M422tgdJcyRVFzsOK75SygtJ3SU9L2mRpCEF3M+HhWq7rXGBSoEkATOApyJiv4g4FBgDVBQpno7F2K9ZYyWYF8OBVyNiUETMTSMm2zoXqHQMAz6NiNvrF0TEmxHxcwBJHSRdJ+lFSYsl/fdk+dDkbGO6pFclTU2SGkkjk2XzgH+pb1fSzpLuTtpaJOmkZPk4SQ9J+g/gD605GElTJN0maXbyzvfLyT6XSZrSaL3bJM2XtFTSv22hrRHJu+aFSXxdWhObtSklkxeSqoBrgVGSaiTttKXXtqRaSVclz82XNFjSY5L+S9K5yTpdJD2RbLukPt4m9vs/G/1+msyxkhYRvrXyBlwE3LCV588Bfpjc/ydgPlAJDAX+Su4d5Q7As8DRQCfgLaAvIOBBYGay/VXAt5L7uwHLgZ2BcUAdsMcWYpgL1DRx+0oT604B7k/2fRLwAXBIEuMCoCpZb4/kZwdgDjAweTwHqAa6AU8BOyfLLwGuKPbfy7ftcyvBvBgH3Jzc3+JrG6gFzkvu3wAsBroC3YH3kuUdgV0atfUaoOTxh8nPEcDk5Fh3AGYCxxT777o9b+4KKgBJt5BLqE8j4ovkXmgDJZ2arLIruST7FHghIuqS7WqA3sCHwBsR8edk+b3kkpmkrRMlTUgedwJ6JvdnRcT/ayqmiGhpn/l/RERIWgK8GxFLkliWJjHWAKdLOodcspUDA8glY70vJcueTt4A70jun421QyWSF/Wae23/Nvm5BOgSEX8D/iZpnaTdgL8DV0k6Bvgc6AHsDaxq1MaI5LYoedyF3O/nqW2Muc1xgUrHUuCU+gcRcb6kbuTeEULuHdCFEfFY440kDQU+abToM/7xN9nSIIkCTomIP23S1uHkXvRNbyTNJfcublMTIuLxJpbXx/X5JjF+DnSUVAlMAL4YEX9Juv46NRHrrIj4xpbispJWinnReH9be21vNX+AM8idUR0aEesl1dJ0/vw0Iu7YShwlzZ9BpeNJoJOk8xot69zo/mPAeZLKACQdIGnnrbT3KlApqU/yuHESPAZc2KhPflA+AUbEkIioauK2tSTcml3IJf5fJe0NfK2JdZ4DjpK0fxJrZ0kHbOP+rO0p5bxo7Wt7V3LdfeslHQv0amKdx4D/1uizrR6S9mrBPto8F6gURK7DeDTwZUlvSHoBuIdcvzTAncArwEJJLwN3sJWz14hYR67r4nfJh8FvNnr6SqAMWJy0dWXax5OPiHiJXNfDUuBu4Okm1llNrt9+mqTF5JK6/3YM04qolPMihdf2VKBa0nxyZ1OvNrGPPwD3Ac8mXe3Tafpsr2TVfyhnZmaWKT6DMjOzTHKBMjOzTHKBMjOzTHKBMjOzTNquBWrkyJFB7nsMvvlWqrdWc5741g5uedmuBWrNmjXbc3dmbZLzxCzHXXxmZpZJLlBmZpZJLlBmZpZJHizWzErO+vXrqaurY926dcUOpV3r1KkTFRUVlJWVbdP2LlBmVnLq6uro2rUrvXv3Jhk/1raziGDt2rXU1dVRWVm5TW24i8/MSs66devYc889XZyKSBJ77rlnq85iXaCs5PUqL0dSq2+9ysuLfSjWAi5Oxdfav4G7+KzkrVi1irp9K1rdTsU7dSlEY2b58hmUmZW8tM6iW3I23aFDB6qqqjj44IM57bTT+Oijjxqee/jhh5HEq6/+Yxqo2tpaDj74YADmzJnDrrvuyqBBg+jXrx/HHHMMM2fO3Kj9yZMn079/f/r3789hhx3GvHnzGp4bOnQo/fr1o6qqiqqqKqZPn75RTPW32tra1vxaCy6vMyhJ/wP4DrkhKpYA3wbKgfuBPYCFwJkR8WmB4jQz22ZpnUXXy+dseqeddqKmpgaAM844g9tvv52LL74YgGnTpnH00Udz//338+Mf/7jJ7YcMGdJQlGpqahg9ejQ77bQTw4cPZ+bMmdxxxx3MmzePbt26sXDhQkaPHs0LL7zAPvvsA8DUqVOprq7eYkxtQbNnUJJ6ABcB1RFxMNABGANcA9wQEX2BvwBnFzJQM7O2asiQIbz22msAfPjhhzz99NPcdddd3H///XltX1VVxRVXXMHNN98MwDXXXMN1111Ht27dABg8eDBjx47llltuKcwBFEm+XXwdgZ0kdQQ6AyuBYeSmIIbcNM6j0w/PzKxt27BhA7///e855JBDAJgxYwYjR47kgAMOYI899mDhwoV5tTN48OCGLsGlS5dy6KGHbvR8dXU1S5cubXh8xhlnNHTlrV27FoCPP/64YdnJJ5+cxuEVVLNdfBHxtqRJwArgY+APwALg/YjYkKxWB/RoantJ5wDnAPTs2TONmM1KjvOk9NQXA8idQZ19dq6Tadq0aYwfPx6AMWPGMG3aNAYPHtxsexFbHwQ8Ija6aq4UuviaLVCSdgdOAiqB94GHgK81sWqTv72ImAxMBqiurs57mHWz9sR5UnqaKgZr167lySef5OWXX0YSn332GZK49tprm21v0aJFHHjggQAMGDCABQsWMGzYsIbnFy5cyIABA9I9iCLLp4vvK8AbEbE6ItYDvwGOBHZLuvwAKoB3ChSjmVlJmD59OmeddRZvvvkmtbW1vPXWW1RWVm50BV5TFi9ezJVXXsn5558PwMSJE7nkkksauu5qamqYMmUK3/3udwt+DNtTPlfxrQC+JKkzuS6+4cB8YDZwKrkr+cYCjxQqSDOz1ui5zz6pfo+tZ3KlXEtNmzaNSy+9dKNlp5xyCvfddx+XXHLJRsvnzp3LoEGD+Oijj9hrr7342c9+xvDhwwE48cQTefvttznyyCORRNeuXbn33nspL7Evk6u5fk0ASf8GfB3YACwid8l5D/5xmfki4FsR8cnW2qmuro758+e3NmazFpGU2hd188iXVg9f4DxpvWXLljV0h1lxbeFvkVee5PU9qIj4EfCjTRa/DhyWz/ZmZmYt5ZEkzMwsk1ygzMwsk1ygzMwsk1ygzMwsk1ygzMwsk1ygzKzk7VvRM9XpNvatyG84qlWrVjFmzBj69OnDgAEDGDVqFMuXL2fp0qUMGzaMAw44gL59+3LllVc2fIVhypQpXHDBBZu11bt3b9asWbPRsilTptC9e/eNptB45ZVXAFi+fDmjRo1i//3358ADD+T000/ngQceaFivS5cuDVNynHXWWcyZM4fjjz++oe0ZM2YwcOBA+vfvzyGHHMKMGTManhs3bhw9evTgk09y3yxas2YNvXv3btHfJB+esNDMSt7Kt9/i8CseTa29538ystl1IoKTTz6ZsWPHNoxaXlNTw7vvvsu4ceO47bbbGDFiBB999BGnnHIKt956a8NIES3x9a9/vWGU83rr1q3juOOO4/rrr+eEE04AYPbs2XTv3r1h+KWhQ4cyadKkhvH65syZ07D9Sy+9xIQJE5g1axaVlZW88cYbfPWrX2W//fZj4MCBQG5uqbvvvpvzzjuvxTHny2dQZmYFMHv2bMrKyjj33HMbllVVVbF8+XKOOuooRowYAUDnzp25+eabufrqq1Pb93333ccRRxzRUJwAjj322IYJEZszadIkLr/8ciorKwGorKzksssu47rrrmtYZ/z48dxwww1s2LBhS820mguUmVkBvPzyy5tNiQFNT5XRp08fPvzwQz744IMW76dxt11VVRUff/zxFvedr3ym8+jZsydHH300v/rVr7Z5P81xF5+Z2Xa06bQYjW1p+dY01cXXWk3F2NSyyy+/nBNPPJHjjjsu1f3X8xmUmVkBHHTQQSxYsKDJ5ZuOtfj666/TpUsXunbtWtB9t2T7TWNsajqP/fffn6qqKh588MFt3tfWuECZmRXAsGHD+OSTT/jFL37RsOzFF1+kb9++zJs3j8cffxzITWx40UUXMXHixNT2/c1vfpNnnnmG3/3udw3LHn30UZYsWZLX9hMmTOCnP/0ptbW1ANTW1nLVVVfx/e9/f7N1f/CDHzBp0qRU4t6Uu/jMrOSV9/hCXlfetaS95kji4YcfZvz48Vx99dV06tSJ3r17c+ONN/LII49w4YUXcv755/PZZ59x5plnbnRp+ZQpUza6rPu5554DYODAgeywQ+684vTTT2fgwIE88MADG80ndeutt3LkkUcyc+ZMxo8fz/jx4ykrK2PgwIHcdNNNeR1fVVUV11xzDSeccALr16+nrKyMa6+9tmGG4MYOOuggBg8enPfU9S2R13QbafE0AlYMnm6j/fF0G9nRmuk23MVnZmaZlKkC1au8PLVvevcqsZklzczam0x9BrVi1apUumKAVKd3NrO2Z2uXc9v20dqPkDJ1BmVmloZOnTqxdu3aVv+DtG0XEaxdu5ZOnTptcxuZOoMyM0tDRUUFdXV1rF69utihtGudOnWiomLbe8VcoMys5JSVlTWMI2dtl7v4zMwsk1ygzMwsk1ygzMwsk1ygzMwsk1ygzMwsk/IqUJJ2kzRd0quSlkk6QtIekmZJ+nPyc/dCB2tmZu1HvmdQNwGPRkR/4J+BZcClwBMR0Rd4InlsZmaWimYLlKRdgGOAuwAi4tOIeB84CbgnWe0eYHShgjQzs/YnnzOo/YDVwC8lLZJ0p6Sdgb0jYiVA8nOvpjaWdI6k+ZLm+1vdZk1znphtLp8C1REYDNwWEYOAv9OC7ryImBwR1RFR3b17920M06y0OU/MNpdPgaoD6iLi+eTxdHIF611J5QDJz/cKE6KZmbVHzRaoiFgFvCWpX7JoOPAK8FtgbLJsLPBIQSI0M7N2Kd/BYi8EpkraEXgd+Da54vagpLOBFcBphQnRrHXUoSyV+cHUoSyFaMwsX3kVqIioAaqbeGp4uuGYpS8+W8/hVzza6nae/8nIFKIxs3x5JAkzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8ukvAuUpA6SFkmamTyulPS8pD9LekDSjoUL08zM2puWnEF9D1jW6PE1wA0R0Rf4C3B2moGZmVn7lleBklQBHAfcmTwWMAyYnqxyDzC6EAGamVn7lO8Z1I3ARODz5PGewPsRsSF5XAf0aGpDSedImi9p/urVq1sVrFmpcp6Yba7ZAiXpeOC9iFjQeHETq0ZT20fE5Iiojojq7t27b2OYZqXNeWK2uY55rHMUcKKkUUAnYBdyZ1S7SeqYnEVVAO8ULkwzM2tvmj2DiojLIqIiInoDY4AnI+IMYDZwarLaWOCRgkVpZmbtTmu+B3UJcLGk18h9JnVXOiGZmZnl18XXICLmAHOS+68Dh6UfkpmZmUeSMDOzjHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKB2o56lZcjKZVbr/LyYh+OmVlBtWg+KGudFatWUbdvRSptVbxTl0o7ZmZZ5TMoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLpGYLlKQvSJotaZmkpZK+lyzfQ9IsSX9Ofu5e+HDNzKy9yOcMagPw/Yg4EPgScL6kAcClwBMR0Rd4InlsZmaWimYLVESsjIiFyf2/AcuAHsBJwD3JavcAowsVpJmZtT8t+gxKUm9gEPA8sHdErIRcEQP22sI250iaL2n+6tWrWxetWYlynphtLu8CJakL8GtgfER8kO92ETE5Iqojorp79+7bEqNZyXOemG0urwIlqYxccZoaEb9JFr8rqTx5vhx4rzAhmplZe5TPVXwC7gKWRcT1jZ76LTA2uT8WeCT98MzMrL3KZ8LCo4AzgSWSapJllwNXAw9KOhtYAZxWmBDNzKw9arZARcQ8QFt4eni64ZiZWTH0Ki9nxapVqbTVc599eHPlyla34ynfzcyMFatWUbdvRSptVbxTl0o7HurIMqlXeTmSUrmVorR+P73Ky4t9KGZb5DMoy6QsvpvLkrR+P6X4u7HS4TMoMzPLpJI9g/onSK17J60P/Cx/6lDmd/dm7VzJFqhPwF1EbVh8tp7Dr3g0lbae/8nIVNoxs+3LXXxmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJJTvUkZmZ5S/N8S/VoSyVdlygzMwsk+NfuovPrB2rH/Xfkx9aFvkMyqwd86j/lmU+gzIzs0xygbLU7FvRM7XuIjMzd/FZala+/VbmPmQ1s7YrUwUqi5c5mtn216u8nBWrVrW6nZ777MObK1emEJEVQ6YKVBYvc8yq+quv0uAktqxZsWpVKhdv+MKNtq1VBUrSSOAmoANwZ0RcnUpU1ixffWVmpW6bL5KQ1AG4BfgaMAD4hqQBaQVmZtZaWf2eV6/y8lRi6tyhY0lfmNSaM6jDgNci4nUASfcDJwGvpBGYmVlrZbWnIc0uzCweX1oUEdu2oXQqMDIivpM8PhM4PCIu2GS9c4Bzkof9gD9tpdluwJptCqht8PG1bfkc35qIaPEHoC3Mk3xjact8fG1bc8eXV5605gyqqXPCzapdREwGJufVoDQ/IqpbEVOm+fjatkIeX0vypNCxZIGPr21L6/ha80XdOuALjR5XAO+0LhwzM7Oc1hSoF4G+kiol7QiMAX6bTlhmZtbebXMXX0RskHQB8Bi5y8zvjoilrYwn7y6ONsrH17Zl6fiyFEsh+PjatlSOb5svkjAzMyskDxZrZmaZ5AJlZmaZlJkCJWmkpD9Jek3SpcWOJ02SviBptqRlkpZK+l6xY0qbpA6SFkmaWexYCkHSbpKmS3o1+TseUaQ4nCdtXCnnStp5konPoJJhk5YDXyV3+fqLwDcioiRGpZBUDpRHxEJJXYEFwOhSOT4ASRcD1cAuEXF8seNJm6R7gLkRcWdy1WrniHh/O8fgPCkBpZwraedJVs6gGoZNiohPgfphk0pCRKyMiIXJ/b8By4AexY0qPZIqgOOAO4sdSyFI2gU4BrgLICI+3d7FKeE8aeNKOVcKkSdZKVA9gLcaPa6jxF6Y9ST1BgYBzxc3klTdCEwEPi92IAWyH7Aa+GXSNXOnpJ2LEIfzpO0r5VxJPU+yUqDyGjaprZPUBfg1MD4iPih2PGmQdDzwXkQsKHYsBdQRGAzcFhGDgL8Dxfj8x3nShrWDXEk9T7JSoEp+2CRJZeSSbmpE/KbY8aToKOBESbXkupyGSbq3uCGlrg6oi4j6d/PTySViMeJwnrRdpZ4rqedJVgpUSQ+bpNxkK3cByyLi+mLHk6aIuCwiKiKiN7m/25MR8a0ih5WqiFgFvCWpX7JoOMWZVsZ50oaVeq4UIk8yMeV7gYZNypKjgDOBJZJqkmWXR8R/FjEma5kLgalJYXgd+Pb2DsB5Ym1AqnmSicvMzczMNpWVLj4zM7ONuECZmVkmuUCZmVkmuUCZmVkmuUCZmVkmuUBlgKQfS5qQYnv9JdUkw430SavdRu3PkVSddrtmW+M8aX9coErTaOCRiBgUEf9V7GDMMsp5knEuUEUi6QfJvD6PA/2SZf8q6UVJL0n6taTOkrpKeiMZAgZJu0iqlVQmqUrSc5IWS3pY0u6SRgHjge8kc+tMlHRRsu0Nkp5M7g+vH2ZF0ghJz0paKOmhZCw0JB0q6Y+SFkh6LJkOofEx7CDpHkn/e7v94qxdcZ60by5QRSDpUHJDnQwC/gX4YvLUbyLiixHxz+SmGjg7mXZgDrkh+km2+3VErAf+HbgkIgYCS4AfJd+6vx24ISKOBZ4ChiTbVgNdkiQ+GpgrqRvwQ+ArETEYmA9cnKzzc+DUiDgUuBv4P40OoyMwFVgeET9M8ddjBjhPLCNDHbVDQ4CHI+IjAEn146kdnLzL2g3oQm5IG8jNHTMRmEFu6JB/lbQrsFtE/DFZ5x7goSb2tQA4VLkJ4D4BFpJLwCHARcCXgAHA07mh0NgReJbcu9WDgVnJ8g7Aykbt3gE8GBGNk9EsTc6Tds4FqniaGmNqCrkZRF+SNA4YChART0vqLenLQIeIeDlJvOZ3ErFeudGTvw08AywGjgX6kHv32QeYFRHfaLydpEOApRGxpSmbnwGOlfR/I2JdPrGYbQPnSTvmLr7ieAo4WdJOyTu2E5LlXYGVSbfBGZts8+/ANOCXABHxV+Avkuq7Jc4E/kjTngImJD/nAucCNZEbiPE54ChJ+wMk/fkHAH8Cuks6IlleJumgRm3eBfwn8JAkv9GxQnCetHMuUEWQTGv9AFBDbu6buclT/4vcDKKzgFc32WwqsDu55Ks3FrhO0mKgCvjJFnY5FygHno2Id4F19fuMiNXAOGBa0s5zQP9kSvFTgWskvZTEeuQmx3E9ua6QX0nya8lS5Twxj2beRkg6FTgpIs4sdixmWeU8KS0+5WwDJP0c+BowqtixmGWV86T0+AzKzMwyyf2hZmaWSS5QZmaWSS5QZmaWSS5QZmaWSS5QZmaWSf8feZ3K8s9z83MAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x216 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df['dayofweek'] = df['effective_date'].dt.dayofweek\n",
"bins = np.linspace(df.dayofweek.min(), df.dayofweek.max(), 10)\n",
"g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
"g.map(plt.hist, 'dayofweek', bins=bins, ec=\"k\")\n",
"g.axes[-1].legend()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less then day 4 "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" <th>dayofweek</th>\n",
" <th>weekend</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>male</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>female</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-09-22</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>female</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 2016-09-08 \n",
"1 2 2 PAIDOFF 1000 30 2016-09-08 \n",
"2 3 3 PAIDOFF 1000 15 2016-09-08 \n",
"3 4 4 PAIDOFF 1000 30 2016-09-09 \n",
"4 6 6 PAIDOFF 1000 30 2016-09-09 \n",
"\n",
" due_date age education Gender dayofweek weekend \n",
"0 2016-10-07 45 High School or Below male 3 0 \n",
"1 2016-10-07 33 Bechalor female 3 0 \n",
"2 2016-09-22 27 college male 3 0 \n",
"3 2016-10-08 28 college female 4 1 \n",
"4 2016-10-08 29 college male 4 1 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['weekend'] = df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Convert Categorical features to numerical values"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets look at gender:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"Gender loan_status\n",
"female PAIDOFF 0.865385\n",
" COLLECTION 0.134615\n",
"male PAIDOFF 0.731293\n",
" COLLECTION 0.268707\n",
"Name: loan_status, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"86 % of female pay there loans while only 73 % of males pay there loan\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets convert male to 0 and female to 1:\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" <th>dayofweek</th>\n",
" <th>weekend</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-09-22</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 2016-09-08 \n",
"1 2 2 PAIDOFF 1000 30 2016-09-08 \n",
"2 3 3 PAIDOFF 1000 15 2016-09-08 \n",
"3 4 4 PAIDOFF 1000 30 2016-09-09 \n",
"4 6 6 PAIDOFF 1000 30 2016-09-09 \n",
"\n",
" due_date age education Gender dayofweek weekend \n",
"0 2016-10-07 45 High School or Below 0 3 0 \n",
"1 2016-10-07 33 Bechalor 1 3 0 \n",
"2 2016-09-22 27 college 0 3 0 \n",
"3 2016-10-08 28 college 1 4 1 \n",
"4 2016-10-08 29 college 0 4 1 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## One Hot Encoding \n",
"#### How about education?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"education loan_status\n",
"Bechalor PAIDOFF 0.750000\n",
" COLLECTION 0.250000\n",
"High School or Below PAIDOFF 0.741722\n",
" COLLECTION 0.258278\n",
"Master or Above COLLECTION 0.500000\n",
" PAIDOFF 0.500000\n",
"college PAIDOFF 0.765101\n",
" COLLECTION 0.234899\n",
"Name: loan_status, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['education'])['loan_status'].value_counts(normalize=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"#### Feature befor One Hot Encoding"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>education</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>45</td>\n",
" <td>0</td>\n",
" <td>High School or Below</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>Bechalor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>college</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>college</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>college</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender education\n",
"0 1000 30 45 0 High School or Below\n",
"1 1000 30 33 1 Bechalor\n",
"2 1000 15 27 0 college\n",
"3 1000 30 28 1 college\n",
"4 1000 30 29 0 college"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Principal','terms','age','Gender','education']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"#### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>weekend</th>\n",
" <th>Bechalor</th>\n",
" <th>High School or Below</th>\n",
" <th>college</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>45</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender weekend Bechalor High School or Below \\\n",
"0 1000 30 45 0 0 0 1 \n",
"1 1000 30 33 1 0 1 0 \n",
"2 1000 15 27 0 0 0 0 \n",
"3 1000 30 28 1 1 0 0 \n",
"4 1000 30 29 0 1 0 0 \n",
"\n",
" college \n",
"0 0 \n",
"1 0 \n",
"2 1 \n",
"3 1 \n",
"4 1 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Feature = df[['Principal','terms','age','Gender','weekend']]\n",
"Feature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)\n",
"Feature.drop(['Master or Above'], axis = 1,inplace=True)\n",
"Feature.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Feature selection"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets defind feature sets, X:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>weekend</th>\n",
" <th>Bechalor</th>\n",
" <th>High School or Below</th>\n",
" <th>college</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>45</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender weekend Bechalor High School or Below \\\n",
"0 1000 30 45 0 0 0 1 \n",
"1 1000 30 33 1 0 1 0 \n",
"2 1000 15 27 0 0 0 0 \n",
"3 1000 30 28 1 1 0 0 \n",
"4 1000 30 29 0 1 0 0 \n",
"\n",
" college \n",
"0 0 \n",
"1 0 \n",
"2 1 \n",
"3 1 \n",
"4 1 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = Feature\n",
"X[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"What are our lables?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n",
" dtype=object)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = df['loan_status'].values\n",
"y[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Try to understand the corelation of loan_status and the selected features"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>weekend</th>\n",
" <th>Bechalor</th>\n",
" <th>High School or Below</th>\n",
" <th>college</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Principal</th>\n",
" <td>1.000000</td>\n",
" <td>0.521876</td>\n",
" <td>-0.060893</td>\n",
" <td>-0.005134</td>\n",
" <td>0.089006</td>\n",
" <td>0.022212</td>\n",
" <td>0.011206</td>\n",
" <td>-0.021506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>terms</th>\n",
" <td>0.521876</td>\n",
" <td>1.000000</td>\n",
" <td>-0.064762</td>\n",
" <td>-0.032399</td>\n",
" <td>0.084842</td>\n",
" <td>-0.057337</td>\n",
" <td>0.101787</td>\n",
" <td>-0.052172</td>\n",
" </tr>\n",
" <tr>\n",
" <th>age</th>\n",
" <td>-0.060893</td>\n",
" <td>-0.064762</td>\n",
" <td>1.000000</td>\n",
" <td>-0.010519</td>\n",
" <td>0.000431</td>\n",
" <td>0.057065</td>\n",
" <td>0.066836</td>\n",
" <td>-0.131585</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Gender</th>\n",
" <td>-0.005134</td>\n",
" <td>-0.032399</td>\n",
" <td>-0.010519</td>\n",
" <td>1.000000</td>\n",
" <td>-0.079157</td>\n",
" <td>0.082229</td>\n",
" <td>-0.043927</td>\n",
" <td>-0.006420</td>\n",
" </tr>\n",
" <tr>\n",
" <th>weekend</th>\n",
" <td>0.089006</td>\n",
" <td>0.084842</td>\n",
" <td>0.000431</td>\n",
" <td>-0.079157</td>\n",
" <td>1.000000</td>\n",
" <td>0.016430</td>\n",
" <td>-0.064819</td>\n",
" <td>0.044184</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bechalor</th>\n",
" <td>0.022212</td>\n",
" <td>-0.057337</td>\n",
" <td>0.057065</td>\n",
" <td>0.082229</td>\n",
" <td>0.016430</td>\n",
" <td>1.000000</td>\n",
" <td>-0.335888</td>\n",
" <td>-0.331958</td>\n",
" </tr>\n",
" <tr>\n",
" <th>High School or Below</th>\n",
" <td>0.011206</td>\n",
" <td>0.101787</td>\n",
" <td>0.066836</td>\n",
" <td>-0.043927</td>\n",
" <td>-0.064819</td>\n",
" <td>-0.335888</td>\n",
" <td>1.000000</td>\n",
" <td>-0.765299</td>\n",
" </tr>\n",
" <tr>\n",
" <th>college</th>\n",
" <td>-0.021506</td>\n",
" <td>-0.052172</td>\n",
" <td>-0.131585</td>\n",
" <td>-0.006420</td>\n",
" <td>0.044184</td>\n",
" <td>-0.331958</td>\n",
" <td>-0.765299</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender weekend \\\n",
"Principal 1.000000 0.521876 -0.060893 -0.005134 0.089006 \n",
"terms 0.521876 1.000000 -0.064762 -0.032399 0.084842 \n",
"age -0.060893 -0.064762 1.000000 -0.010519 0.000431 \n",
"Gender -0.005134 -0.032399 -0.010519 1.000000 -0.079157 \n",
"weekend 0.089006 0.084842 0.000431 -0.079157 1.000000 \n",
"Bechalor 0.022212 -0.057337 0.057065 0.082229 0.016430 \n",
"High School or Below 0.011206 0.101787 0.066836 -0.043927 -0.064819 \n",
"college -0.021506 -0.052172 -0.131585 -0.006420 0.044184 \n",
"\n",
" Bechalor High School or Below college \n",
"Principal 0.022212 0.011206 -0.021506 \n",
"terms -0.057337 0.101787 -0.052172 \n",
"age 0.057065 0.066836 -0.131585 \n",
"Gender 0.082229 -0.043927 -0.006420 \n",
"weekend 0.016430 -0.064819 0.044184 \n",
"Bechalor 1.000000 -0.335888 -0.331958 \n",
"High School or Below -0.335888 1.000000 -0.765299 \n",
"college -0.331958 -0.765299 1.000000 "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge = pd.concat([X, df['loan_status']], axis=1, sort=False)\n",
"merge.head()\n",
"merge.corr(method='pearson')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Split dataset for test and train"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train size is (276, 8) \n",
" X_test size is (70, 8) \n",
" y_train size is (276,) \n",
" y_test size is (70,)\n",
" Principal terms age Gender weekend Bechalor High School or Below \\\n",
"188 1000 15 35 0 0 0 0 \n",
"299 1000 30 26 0 1 0 1 \n",
"239 1000 30 31 0 0 0 0 \n",
"46 1000 15 25 0 1 0 0 \n",
"259 1000 30 28 0 0 0 0 \n",
"\n",
" college \n",
"188 1 \n",
"299 0 \n",
"239 1 \n",
"46 1 \n",
"259 1 \n"
]
},
{
"data": {
"text/plain": [
"array(['PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n",
" dtype=object)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)\n",
"print(\"X_train size is \", X_train.shape, \"\\n\", \"X_test size is \", X_test.shape, \"\\n\",\n",
" \"y_train size is \", y_train.shape, \"\\n\", \"y_test size is \", y_test.shape)\n",
"print(X_train[0:5])\n",
"y_train[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Normalize Data "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Data Standardization give data zero mean and unit variance (technically should be done after train test split )"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.51578458, 0.92071769, 2.33152555, -0.42056004, -1.20577805,\n",
" -0.38170062, 1.13639374, -0.86968108],\n",
" [ 0.51578458, 0.92071769, 0.34170148, 2.37778177, -1.20577805,\n",
" 2.61985426, -0.87997669, -0.86968108],\n",
" [ 0.51578458, -0.95911111, -0.65321055, -0.42056004, -1.20577805,\n",
" -0.38170062, -0.87997669, 1.14984679],\n",
" [ 0.51578458, 0.92071769, -0.48739188, 2.37778177, 0.82934003,\n",
" -0.38170062, -0.87997669, 1.14984679],\n",
" [ 0.51578458, 0.92071769, -0.3215732 , -0.42056004, 0.82934003,\n",
" -0.38170062, -0.87997669, 1.14984679]])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = preprocessing.StandardScaler().fit(X).transform(X.astype(float))\n",
"X[0:5]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.33474248, 0.83916906, -0.19614926, -0.47756693, 0.74535599,\n",
" -0.2773501 , 1.26197963, -1.05887304],\n",
" [-1.70282047, -0.9301633 , -0.19614926, -0.47756693, 0.74535599,\n",
" -0.2773501 , -0.79240582, 0.94440028],\n",
" [ 0.33474248, -0.9301633 , -0.04012144, -0.47756693, -1.34164079,\n",
" -0.2773501 , 1.26197963, -1.05887304],\n",
" [ 0.33474248, 0.83916906, -1.13231619, -0.47756693, -1.34164079,\n",
" -0.2773501 , -0.79240582, 0.94440028],\n",
" [ 0.33474248, 0.83916906, 0.42796202, -0.47756693, -1.34164079,\n",
" -0.2773501 , -0.79240582, 0.94440028]])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# also need to normalize the test and train dataset\n",
"X_train = preprocessing.StandardScaler().fit(X_train).transform(X_train.astype(float))\n",
"X_train[0:5]\n",
"X_test = preprocessing.StandardScaler().fit(X_test).transform(X_test.astype(float))\n",
"X_test[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Classification "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model\n",
"You should use the following algorithm:\n",
"- K Nearest Neighbor(KNN)\n",
"- Decision Tree\n",
"- Support Vector Machine\n",
"- Logistic Regression\n",
"\n",
"\n",
"\n",
"__ Notice:__ \n",
"- You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.\n",
"- You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.\n",
"- You should include the code of the algorithm in the following cells."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# K Nearest Neighbor(KNN)\n",
"Notice: You should find the best k to build the model with the best accuracy. \n",
"**warning:** You should not use the __loan_test.csv__ for finding the best k, however, you can split your train_loan.csv into train and test to find the best __k__."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### k-Nearest Neighbors test - find the best k value"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test set Accuracy at k= 1 : 0.6714285714285714\n",
"Test set Accuracy at k= 2 : 0.6428571428571429\n",
"Test set Accuracy at k= 3 : 0.7285714285714285\n",
"Test set Accuracy at k= 4 : 0.6571428571428571\n",
"Test set Accuracy at k= 5 : 0.7142857142857143\n",
"Test set Accuracy at k= 6 : 0.6571428571428571\n",
"Test set Accuracy at k= 7 : 0.7428571428571429\n",
"Test set Accuracy at k= 8 : 0.7428571428571429\n",
"Test set Accuracy at k= 9 : 0.7142857142857143\n"
]
},
{
"data": {
"text/plain": [
"Text(0,0.5,'Testing Accuracy')"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEKCAYAAADjDHn2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl4m+WV8P/vkbzvTmQnzr7Y2eOsDRQcCgRKICl0gQ687Sz9za/btDNtB2jpvk2nG512pmXmnU477XSmhYGUtjRJCRQoJUBZrBBnT5zFsmNnsSNv8W6f9w9JwThe5ETSI8nnc126sKVHek6M5aPn3Pd9blFVjDHGmNG4nA7AGGNM/LNkYYwxZkyWLIwxxozJkoUxxpgxWbIwxhgzJksWxhhjxmTJwhhjzJgsWRhjjBmTJQtjjDFjSnE6gEjxeDw6Z84cp8MwxpiEUllZ2aiqRWMdlzTJYs6cObz66qtOh2GMMQlFRGrCOc7KUMYYY8ZkycIYY8yYLFkYY4wZkyULY4wxY7JkYYwxZkyWLIwxxozJkoUxxpgxJc06C2NMcnjuyFleOX7O6TCGde2iYlbPKnQ6DEdYsjDGxJVPbamivqULEacjeSNV+M/nT/D4x9czozDL6XBizpKFMSZu1Dd3Ut/SxRfftoT3XT3X6XDeoPZcBxu/90c+9csq/vv/uwKXK86yWZRFdcxCRDaKyCERqRaR+4Z5/Lsi8lrwdlhEmoc8niciJ0XkB9GM0xgTH7w+PwBrZsdfqWfmpCw+s2kxz1c38fOXfU6HE3NRSxYi4gYeAG4GlgB3iciSwceo6idUdaWqrgS+Dzw65GW+CjwbrRiNMfHFW9NMRqqLxSV5TocyrP+zbhbryzx8ffsBfE0dTocTU9G8slgHVKvqMVXtAR4Cbhvl+LuAB0PfiMgaYArwRBRjNMbEkUqfn/IZBaS643OipojwzXeV4xbhni27GRhQp0OKmWj+H5kO1A76vi5430VEZDYwF3g6+L0L+A5wbxTjM8bEka7efvbXt8T9bKNpBZl8/m1LePn4Of7rxRNOhxMz0UwWw43+jJSG7wS2qGp/8Pu/Abarau0IxwdOIPIBEXlVRF49e/bsZYRqjHHanpMt9PZrXI5XDHXHmhlct7CIbz5+kGNn250OJyaimSzqgJmDvp8B1I9w7J0MKkEBbwY+KiIngPuBvxCRbwx9kqr+UFXXquraoqIx9+4wxsQxb01gcHvVrAKHIxmbiPCNd5WT5nZx75Yq+idAOSqayeIVoExE5opIGoGE8NjQg0RkIVAIvBi6T1Xfo6qzVHUOcA/wM1W9aDaVMSZ5VNb4mTM5C09OutOhhGVKXgZfvm0plTV+frzzmNPhRF3UkoWq9gEfBXYAB4CHVXWfiHxFRG4ddOhdwEOqmvyp2RgzLFXF62uO+/GKod6+cjo3LpnC/U8cpvpMm9PhRFVUpxyo6nZVXaCq81X1a8H7vqCqjw065kujXTWo6k9V9aPRjNMY46zac500tnezOgHGKwYTEf7xHcvJTnNz98O76esfcDqkqInP+WnGmAkltBgv0a4sAIpy0/nKbcvYXdfCv/8xectRliyMMY6rrPGTneZm4dRcp0O5JG9bMY1Ny0v43u8Pc/BUq9PhRIUlC2OM47w+PytnFeBO4H5LX7ltKXkZqdz98G56k7AcZcnCGOOo8919HGhoZU0ClqAGm5yTztfesZx99a088Ey10+FEnCULY4yjdtc1M6CwKsEGt4ezcdlUbls5jR88Xc3eky1OhxNRliyMMY4KLcZbPTPxkwXAl29dSmF2Gvc8spuevuQpR1myMMY4yutrprQ4h/ysVKdDiYiCrDS+8c7lHDzVxr88dcTpcCLGkoUxxjGBxXj+hB+vGGrD4incvmYG//bsUXbXNo/9hARgycIY45hjjedp7uhl9ez47wc1Xp/fvISinHTufmQ3Xb39Yz8hzlmyMMY4prImfnfGu1z5mal8413LqT7Tznd/f9jpcC6bJQtjjGN2+fzkZaQwz5PjdChRce3CYu5aN5P/+OOxC4kxUVmyMMY4prLGz+rZhbgSeDHeWD5zy2JK8jO555HddPYkbjnKkoUxxhEtnb0cOdOekP2gxiM3I5Vv3V7O8cbzfHvHIafDuWSWLIwxjnitthnV5ByvGOrqUg9/fuVsfvLCcV461uR0OJfEkoUxxhHeGj8ugRUzk28m1HDuu3kRMwuzuHdLFR09fU6HM26WLIwxjvD6/CycmkdOeorTocREdnoK3769nFp/B9/43UGnwxk3SxZm3GrPdSTFvHHjnP4B5TVfM6sTYL/tSLpi3mTed9VcfvZiDS9UNzodzrhYsjDj0tnTz03f+yPffzp52hiY2Dtypo227r6kH9wezr03LWSuJ5t7t1TR1tXrdDhhs2RhxqWqrpmOnn6ePnjW6VBMAvPWBFpgTITB7aEy09zcf0c5DS2d/OP2xClHWbIw4+L1Bd7kBxpaaWzvdjgak6gqa/xMyk5j9uQsp0NxxJrZk3j/+nk8+LKPZw8nxgcvSxZmXCpr/KSnBH5tnk+wmquJH7t8flbPKkQkeRfjjeUTNy6gtDiHT22poqUz/stRlixM2FSVXT4/tywvIT8zlZ1HLFmY8Tt3vodjjeeTsnngeGSkuvnOHSs4297NV7fudzqcMVmyMGGraeqg6XwPb5oziavmT2ZndSOq6nRYJsHs8gWbB07Awe2hVsws4ENvmceWyjqeOnDa6XBGZcnChM0bfJOvnl1ARZmHhpYujp4973BUJtFU1vhJcQnlMyb2lUXI320oY9HUXD796B6aO3qcDmdElixM2Cpr/OSmp1BWnMv60iIAdh5JjME5Ez+8Pj9LpuWRmeZ2OpS4kJ7i5v47VnDufA9femyf0+GMyJKFCZvX18zKWQW4XcKsyVnMmpTFThvkNuPQ1z/A7tqWCbm+YjTLpufz0etL+fVr9Ty+95TT4QzLkoUJS3t3H4dOtb7hTV5R5uFPx87R2588m9Kb6Dp4qo3O3n5WT8D1FWP5yHWlLJ2Wx+d+vYdz5+OvHGXJwoRld20zA8ob3uTrSz20d/fxWpLsMWyiL7QB0ERr8xGOVLeL77x7BS2dvXz+N3udDucilixMWCpr/IjAykEdQq+a78El8JxNoTVh8vr8TMlLZ3pBptOhxKVFU/P4+A0L2FbVwNaqeqfDeQNLFiYsXp+fsuIc8jNTL9yXn5XK8hkFNshtwua1xXhj+uA181gxI5/P/3ovZ9vip0uCJQszpoEBxVvjH7aPz/pSD7vrWmhNoIZoxhln2rqoPdc5IftBjUeK28X9d6zgfE8/n/3VnrhZyxTVZCEiG0XkkIhUi8h9wzz+XRF5LXg7LCLNwftXisiLIrJPRKpE5M+iGacZ3bHGdlq7+lg1zAyWijIP/QPKi0cTc/cvEzuh5oHD/R6ZNyqbksvdNy7gif2n+c1r8VGOilqyEBE38ABwM7AEuEtElgw+RlU/oaorVXUl8H3g0eBDHcBfqOpSYCPwPRGxETGHhAYlh/tEuHpWIVlpbmv9Ycbk9flJc7tYNj3P6VASwv+/fh6rZxXwxcf2cbq1y+lwonplsQ6oVtVjqtoDPATcNsrxdwEPAqjqYVU9Evy6HjgDFEUxVjMKb00zBVmpzPNkX/RYWoqLK+ZOsvUWZkzeGj/LpueRnmKL8cLhdgn337GC7r5+Pv2o8+WoaCaL6UDtoO/rgvddRERmA3OBp4d5bB2QBhwd5rEPiMirIvLq2bM2yBotlWMMSlaUFXG88Tx1/o4YR2YSRU/fAFUnW2y8YpzmFeXwyZsW8fTBM2yprHM0lmgmi+H+soyUGu8EtqjqG/bqFJES4L+B96nqRSu/VPWHqrpWVdcWFdmFRzS0dPRSfaZ91Hnx68s8AFaKMiPaV99CT9+Ardy+BH911RzWzZ3EV367n/rmTsfiiGayqANmDvp+BjDSSM2dBEtQISKSB2wDPqeqf4pKhGZMu2pDzQNHfpOXFecwJS+d56wUZUZwYTGeXVmMm8sl3H/7CvpV+dQvqxwrR0UzWbwClInIXBFJI5AQHht6kIgsBAqBFwfdlwb8CviZqj4SxRjNGLw1flwCK0bpECoiXF3q4YXqRgYG4mOan4kvu3zNTC/IZEpehtOhJKRZk7P49M2LeO5IIw++XDv2E6IgaslCVfuAjwI7gAPAw6q6T0S+IiK3Djr0LuAhfWO6fDdwDfBXg6bWroxWrGZkXl8zi0vyyE5PGfW49WUe/B297KtvjVFkJpFUjrBOx4TvPVfM5urSyXxt235qz8V+fDCq6yxUdbuqLlDV+ar6teB9X1DVxwYd8yVVvW/I8/5HVVND02qDt9eiGau5WP+AXtj+cixXlwbGLZ6rtokG5o3qmzs51dpl/aAuk8slfPNd5QB8cktVzK/ibQW3GdHh022c7+kP6xNhcW4Gi6bm2iC3ucjr63QmORxJ4ptRmMXnNi/hxWNN/M9LNTE9tyULM6LXO4SGVz6oKPXw6gk/nT39Yx9sJgyvz09GqotFJblOh5IU7nzTTK5ZUMTXtx/kRGPsdqq0ZGFG5PX58eSkM3NSeB1CK8o89PQP8PKJc1GOzCQSb42fFTMKSHXbn5tIEBG++a7lpLiFe7fsjlk5yv7vmRF5a/ysnlUQdofQK+ZOJs3tsi605oKu3n721bfalNkIK8nP5ItvW8orJ/z85IUTMTmnJQszrKb2bk40dYzrTZ6Z5mbN7ELb38JcUFXXQt+AssYW40Xcu1ZPZ8OiYr71+EGOnm2P+vksWZhheX2BDqHjne5YUebh4Km2uOrD7yRV5bYHnud7vz/sdCiO8PoC416rbCZUxIkIX3/ncjJS3dzzSPTLUWMmCxH5kIjkRzUKE3e8Pj8pLmH59PH9rw+1/njeVnMDgU/Wu2ub+cVLPvon4ILFyho/cz3ZTM5JdzqUpFScl8E337WcD71lPi5XdDeUCufKYg7gFZFfiMgNUY3GxI3KGj9Lp+eTkTq+DqFLp+VTkJVqpaigbXsaADjT1s2rE2zgXzWwTseuKqJr47ISblo6NernGTNZBBfMlQE/Bz4kIkeCq7DnRDk245De/gGq6povaRGV2yVcPd/DzuqzjrdUdpqqsq2qgSvmTiIj1XUhcUwUvnMdNLb3WPPAJBHWmEWw4+uJ4G0AKAF+IyJfj1pkxjEHGlrp6h245PYMFWUeTrd2U30m+oNu8ey12mZONnfy7rUzuX5RMdv3nJpQpajQeIW1+UgO4YxZ/I2IvAz8M1AJlKvq+4FVgG13moS841yMN1RFqPXHBC9FbatqIM3t4oYlU9i0fBqN7d28fHzilKIqa/zkpKewYIotxksG4VxZzADuVNUbVPVBVe2GC1cbt47+VJOIKn3NlORnMK0gvMV4Q82clMWcyVkTeve8gQFl254GrlngIT8zlesWFZGZ6mZrVXzspxwL3ppmVs4swB3lgVcTG+Eki18R2NYUABHJFZG1AKq6N1qBGecEFuNdXumgoszDn4410dN30Z5VE8KuWj8NLV1sKi8BICsthesXF/P43lP09Sf/z6S9u4+Dp1qteWASCSdZ/BAY3A/3PPDv0QnHOO10axcnmzsve8VtRWkRHT397ArWrSearVUNpKW4uGHxlAv3va28hKbzPbw0AUpRVbXNDKhtdpRMwkkWrsFbmga/To1eSMZJr49XXN4nwjfPn4xLmJClqIEBZfueBq5dUERuxutvlWsXFpOV5mZrVfLPigo1oVw105JFsggnWRwXkQ+LiFtEXCLyEQKzokwSqqzxk5biYum0y1uHmZ+ZyoqZBRNykPvVGj+nW7svlKBCMlLd3LB4Co/vbUj6UpTX56esOIf8LPtcmSzCSRYfBDYAp4O3twDvj2ZQxjlen5/y6fmkpVx+J5j1pR6q6ppp6eiNQGSJY1tVPekpLjYMKkGFbCovwd/RywtHmxyILDYGBhSvr9nWVySZcBblnVbV21XVo6pFqvpuVT0di+BMbHX39bP3ZGvE5sVXlBUxoPDisYlzddE/oGzfe4rrFhaTM8xWtG9ZUEROegrbkrgUdazxPC2dvba+IsmEs84iXUQ+KCL/IiI/DN1iEZyJrb0nW+npH2BVhD4RrppVQHaae0KVol45cY6zbd1sXlEy7OMZqW5uXDKFx/edojdJS1EXxr1m20yoZBJOreFnBPpDbQZeAuYDXVGMyTgk0m/yVLeLK+dNnlCD3Fur6slIdXH9ouIRj9m0vISWzt6kbbbo9fnJz0xlnifH6VBMBIWTLBao6qeBdlX9MbARWBbdsIwTvD4/MydlUpybEbHXrCjzUNPUQe25jrEPTnB9/QM8vvcUGxZNISvt4hJUyPoFHnLTU5J2VlRlTaB5YLS7oJrYCidZhEYnm0VkMZALzI5eSMYJqkpljT/im9SEWpZPhFLUy8fP0djec9EsqKHSU9zcuHQKO/adSrpFiy2dvRw5026bHSWhcJLFj0WkEPgisAM4DHwnqlGZmDvZ3MmZtu6IL6KaX5TD1LyMpC25DLZ1TwNZaW6uWzhyCSpkc3kJbV197KxOri1oQ4swbTFe8hk1WYiIG2hUVb+qPqOqs4Kzov41RvGZGKm8zOaBIxERKso8PH+0Mak7rl4oQS2eQmba2HuAVJQWkZeRfKUor68Zl8CKmTa4nWxGTRaq2g98PEaxGAft8jWTleZm0dTIdwhdX+ahuaOXffUtEX/tePHisSbOne9h0/LRS1AhaSkublo6lSf3naa7rz/K0cWOt8bPwql5w04bNoktnDLUDhH5uIiUiEhe6Bb1yExMeX1+VswoIMUd+W3Zr54ALcu3VTWQnebm2oVFYT9nU3kJbd19/PFwcvxc+geU12qbWWNTZpNSuCu47wZeBvYFb9ZtNol09vSzv741avPiPTnpLC7JY2eSJove/gEe33eKG5dMGdc2tFeXeijISmVbkrQtP3y6jfbuPlu5naTGvFZU1ZmxCMQ4p6qumb4BjeqK2/VlHn76/Ak6e/rDquknkheONtHc0cum8mnjel6q28XGpVP57e56unr7x73febyxnfGSWzgruP/PcLdYBGdio9IX/Q6hFaUeevoHeOl48vVE2rq7ntz0lAvThMdjU3kJ53v6efZw4s+K8tY0Mzk7jVmTspwOxURBOGWo9YNuNwJfB26PZlAmtrw1zcwryqYwOy1q51g3dxJpKa6kK0X19A2w4xJKUCFvnjeZwqzUpJgV5fX5WT27EBFbjJeMwmkk+OFBt/cBK4Gw3hUislFEDolItYjcN8zj3xWR14K3wyLSPOixvxSRI8HbX47nH2XCp6qBN3mU68wZqW7eNKcw6Vp/PF/dSGtX35gL8UaS4naxcVkJTx04TWdP4s6KOne+h+ON5228IoldytSXNmDBWAcF12g8ANwMLAHuEpElg49R1U+o6kpVXQl8H3g0+NxJBBYBXgGsA74YXBhoIqymqYNz53ti8iavKC3i4Kk2zrQlT2uxrVUN5GaksL4s/FlQQ20uL6Gjp58/HDoz9sFxKtRXzMYrklc4Yxa/EpFHg7dfAweAbWG89jqgWlWPqWoP8BBw2yjH3wU8GPz6JuBJVT2nqn7gSQI9qUyEVcbwTR6q6SfLau7uvn6e2H+Km5ZOvaz9P66YOwlPThpb9yRuKcrr85PiEspnXN6mWSZ+hbNy5geDvu4DalT1RBjPmw7UDvq+jsCVwkVEZDYwF3h6lOdOD+OcZpy8Pj+56SmUFUe/Q+iSkjwmZafx3JFG3rFqRtTPF23PHW6k7TJKUCGBUtRUfll5ko6evlGbEMaryho/S6flJfyMLjOycD4OHQGeV9WnVPVZ4LSIhDOddrhRrpH6PdwJbAmuGA/7uSLyARF5VURePXs28WeTOKGyxs/KGHUIdbmEq+ZPZueRRlQTv/XHtj0N5GemcvX88c+CGmrT8ml09vbz9MHEK0X19g9QVdcSsX1QTHwKJ1k8CgxujTkA/DKM59UBg5PKDGCk1Ud38noJKuznquoPVXWtqq4tKrr0mvFE1dbVy+HTbTEdlFxf5uFMWzdHzrTH7JzR0NXbz5P7T7PxMktQIevmTqIoNz0hd9A72NBGZ2+/jVckuXB+y1OCYw4AqGo3kB7G814BykRkroikEUgIjw09SEQWAoXAi4Pu3gG8VUQKgwPbbw3eZyJod20LAxrbQcmK4EBworf++OPhs7R3X34JKsTtEm5ZNpWnD57hfHdfRF4zVrzWaXZCCCdZNInILaFvRGQzcG6sJ6lqH/BRAn/kDwAPq+o+EfmKiNw66NC7gId0UF1CVc8BXyWQcF4BvhK8z0SQ1+dHBFbOil0vn+kFmczzZLPzSGKXDbdWNVCYlcqb50+O2GtuKp9Gd98ATyVYKaqyxs/UvAym5Udu0ywTf8IZSfsw8AsReYDAuEEj8N5wXlxVtwPbh9z3hSHff2mE5/4n8J/hnMdcmsoaPwuKc8nLSI3peSvKPGyprKOnbyAiJZxY6+rt5/cHTnPbymmkRrDx4trZhRTnprOtqp5bV4yvdYiTAovxCmwxXpILZ1HeYVVdC6wCVqvqOlU9HP3QTDQNDCi7gm/yWKso9dDR03+hfJFo/nDoDB09/WxaHtk/6C6XcMvyEp45dJa2rt6xnxAHzrR2UefvtMV4E0A46yy+KiIFqtqsqs3BcYQvxyI4Ez1Hz7bT2uVMh9Ar50/G7ZKEbf2xtaqBydlpXDlvUsRfe3N5CT19Azx1IDFKUTZeMXGEcw29WVUvtOEILpJ7W/RCMrHg5Js8LyOVlTMLeC4BF+d19PTx1IEzbFw2NSp7f6yeVUhJfkbC9IqqrPGT5naxdJptcZPswvltdwdnMwEgIhlA9DrOmZiorPFTkJXKPE+2I+evKPWwp66Zlo7EKLeEPHPwLJ29/RGbBTVUqBT1x8NnaU2AUpTX18zyGfmkp9hivGQXTrJ4CHgy2NjvLwjMbvpFdMMy0eb1NbN6lnMdQteXeRhQeOFoYl1dbNtTjycnnSvmRm4W1FCbykvo6R/gyX2no3aOSOju62dPXQurYzibzjgnnAHufwS+TWCAew3wreB9JkE1d/RQfabd0UVUK2YWkJOeklClqPPdfTx98Ay3LJ+KO4or3lfNLGB6QSbb4rxX1L76Vnr6B2wx3gQRVtFVVbeq6sdV9WNAo4j8c5TjMlG0qzYwBLXKwU+EqW4XV86blFCD3E8dPENX7wCblkenBBUiItyyfCrPHTkb12W6UKdZmwk1MYSVLERkmYh8TUSOAvcDx6Mblokmb40ft0tYMcPZ8kFFqQffuQ58TR2OxhGubVX1FOems3ZO5GdBDbW5fBq9/coT+09F/VyXyuvzM6Mwk+I8W4w3EYyYLERknoh8RkT2Aj8isBgvVVXXq+r3YhahiTivz8+iqblkpzvb3fRC64/q+F/N3d7dxzOHznLL8pKolqBCymfkM6MwfktRqkplTfQ3zTLxY7Qri2oC+0q8U1WvVNXvEmhRbhJY/4Dymq85LurM84uyKcnPSIhS1FMHTtPTNxC1WVBDiQibykvYeaQR//mesZ8QY/UtXZxu7Y6L3yMTG6Mliz8jcDXxlIj8q4i8heFbh5sEcuhUG+d7+uPiE6GIUFHq4YWjTfQPxHfL8q1VDUzNy2BNDH9um5dPo28gPktRlTZeMeGMmCxU9RFVfReBLVFfAj4NTBWR74vI9bEK0ERWpS++tr+sKPPQ0tnLnpMtTocyotauXp4NlqBise9HyLLpecyenBWXC/S8NX4yU90sKsl1OhQTI+FMnW1T1f9S1Y0E9pg4CHwp2oGZ6NhV48eTk86MwkynQwHg6tLAxkHx3IX29/tP09MfuxJUiIiwaXkJLxxt4lyclaK8Pj/lM/Ij2kjRxLdx/Z9W1UZVfUBVr4lWQCa6vD4/a+KoQ6gnJ50lJXlxvb/FtqoGpuVnsGpm7GePbSovoX9AeXxv/JSiOnv62V/fGjdXpyY27GPBBNLY3s2Jpo64qzOvL/Pg9fnjctOfls5e/njkLJvKY1uCCllSksdcTzbb9oy0yWTsVdU10zegcfd7ZKLLksUEsssXWIwXb58IK8o89PYrLx+Pv/2tnth3it5+ZVO5M/tLhEpRLx5torG925EYhvIGf4+s0+zEYsliAqms8ZPqFpZNz3c6lDd405xJpKW44rIUtW1PA9MLMlkxw7mf2eYVJQwocVOKqqzxM9eTzaRs6yc6kYSzn4VfRM4NuR0XkUdEZE70QzSR4vX5WTotn4zU+OoQmpHqZt2cSeyMs8V5zR097DzSyObyEkfHeBZOyWV+UTbb4mBWlGpw0ywrQU044VxZfB/4PDAfKAU+B/wU+DXwk6hFZiKqt3+AqrrmuH2TV5R5OHy6ndOtXU6HcsET+07TN6BsdqgEFRJYoDeNl443cabN2Z9PTVMHTed7HNlh0TgrnGTx1uAMKL+qnlPVfwVuVtWfA9FvkmMi4kBDK129A3H7Jq+4MIU2fkpRv62qZ9akLJZNd35jn83l8VGK8sbZOh0TO+E2EnznkK9D1+QD0QjKRF5oxW28vsmXlOQxOTuNnXHSsvzc+R5eONrEJodLUCELpuSyYEqO4wv0Kmv85KSnUFZsi/EmmnCSxXuB9wfHKpqA9wN/LiJZwMejGp2JGK+vmZL8DEry42Mx3lAul3BVqYed1Y2oOt/6Y8e+U/QPaNTbkY/HpuXTeOXEOUdLdV5fM6tmFcSkmaKJL+Gs4K5W1ZtVdZKqTg5+fVhVO1T12VgEaS6ft8Yf91Md15d6ONvWzaHTbU6HwraqBuZMzoqrvaU3lU9FFX7nUCfa9u4+Dp1qZVWcjnuZ6ApnNpRHRD4ZbCb4w9AtFsGZyDjV0sXJ5s64HdwOqSiLj3GLpvZuXjjayObyaXFRggopLc5l0dRcx0pRu2ubGdD4LWWa6AqnDPUbYAqwE3hq0M0kiEQZlJxWkMm8omzH11v8bu8pBpSY94IKx6blJbxa46ehpTPm5w6Ne610oO2JcV44ySJbVe9W1V+o6v+GblGPzESMt8ZPeoqLJSXxU1IZyfpSDy8fP0d3X79jMWyramBeUTaLpsbfIG4ogW3fE/tZUV6fnwUDBu34AAAfkklEQVRTcsjPTI35uY3zwkkWvxORt0Y9EhM1lcEOoWkp8b9gv6KsiM7efrw1zY6c/0xbFy8db2Lz8viYBTXUvKIclpTksa0qtr2iBgY0MO4V56VMEz3h/PX4EPC4iLQHZ0T5RST+mviYYXX19rPvZGvCvMmvnDcJt0scW829I1iC2rzC2YV4o9lUXoLX18zJ5tiVoo41ttPa1Rf3kyRM9ISTLDxAKpAPFAW/L4pmUCZy9tW30NM/kDBv8tyMVFbNLHBskPu3VQ2UFeewYEr8laBCQtN5t8dwoDt0pZcoHzpM5I2YLESkLPjl0hFuJgEk4pu8osxD1ckWmjtiu+HP6dYuXjlxLi4Htgeb48lm2fQ8tsZwCm1ljZ+CrFTmebJjdk4TX0a7srgv+N8Hhrn9IMpxmQiprPEza1IWRbnpTocStvVlHlThhaNNMT3v7/Y0oEpcLcQbyebyaeyubab2XEdMzuf1+Vk1s8CRPT1MfBhtD+6/Dn55vaquH3wDNoTz4iKyUUQOiUi1iNw3wjHvFpH9IrJPRH4x6P5vBe87ICL/IvE42hjnVJVKn5/VsxJrquOKGQXkpqfEfArttj0NLJySS1kcl6BCLpSiYnB10dLRy5Ez7Ql1dWoiL5wxi5fCvO8NRMRN4CrkZmAJcJeILBlyTBnwaeBqVV1KsH2IiFwFXA2UA8uANwFvCSNWM0idv5Ozbd1xv75iqBS3iyvnT47pIHdDSyevnPCzOc5LUCEzJ2WxYkZ+TBbo7apNjHU6JrpGG7MoFpEVQKaILBeR8uCtAsgK47XXAdWqekxVe4CHgNuGHPN+4AFV9QOo6png/QpkAGlAOoEB9tPj+YeZ1xfjJWJ7hvVlHmrPdVLTdD4m5wutW7glQZIFBGZF7TnZEvWfkbfGj0tghS3Gm9BGu7LYRGBsYgZvHK/4DIH9LcYyHagd9H1d8L7BFgALROR5EfmTiGwEUNUXgWeAhuBth6oeGHoCEfmAiLwqIq+ePRtfG+fEA2+Nn6w0d1wuLhtLqGV5rEpR26rqWVySx/yinJicLxJuCZaitkW5FOX1NbNoah7Z6SlRPY+Jb6ONWfwkOD7x16p6zaAxi1tU9ZEwXnu4MYah7URTgDLgWuAu4EciUiAipcBiAolqOnC9iFwzTIw/VNW1qrq2qMhm8w7l9TWzYkYBKe74X4w31FxPNtMLMmMyhfZkcydeX3PClKBCZhRmsWpWQVR30OsfCO6MF6f7oJjYCeevSLGI5AGIyP8VkZdFJJwB7jpg5qDvZwBDl53WAb9R1V5VPQ4cIpA83gH8SVXbVbUd+B1wZRjnNEEdPX3sb2hN2DqziFBR6uGFo430D0S3ZXmoi2sizIIaatPyEvbVt3K8MTqlqMOn2zjf05+wv0cmcsJJFh9Q1dZgy48ZwIeBb4XxvFeAMhGZKyJpwJ3AY0OO+TVwHQS62xIoSx0DfMBbRCRFRFIJDG5fVIYyI6uqa6F/QBP6E2FFmYfWrj6q6qLb+uO3VQ0sm57HnARcQ3ChFBWl9h+h5oE2E8qEkyxCH+tuBn6iqpXhPE9V+4CPAjsI/KF/WFX3ichXROTW4GE7gCYR2U9gjOJeVW0CtgBHgT3AbmC3qv52HP+uCS/0Jl81M3Hf5FeXehCJbsvy2nMd7K5tZtPy+G3vMZppBZmsmV0YtVlRXp8fT04asyaFM6fFJLNwRqx2i8h2Ap/6PysiOVw89jAsVd0ObB9y3xcGfa3A3wdvg4/pBz4YzjnM8Hb5/MwryqYwO83pUC7ZpOw0lk7L47nqRv52Q9nYT7gE2xO4BBWyubyEL/92P9Vn2iktjuwAvbfGz6pZhXHZVNHEVjhXFu8DvgSsU9UOAlNa/3rUZxhHqSpeXzNrkqB0UFFaxC6fn/PdfVF5/W17GlgxI59ZkxP3k/PNy0oQifwCvab2bk40ddh4hQHCKyf1A/MIjFUAZIbzPOOcE00dnDvfkzDNA0ezvsxDb7/y0vHIt/6oaTpPVV1L3PeCGsvU/AzeNHsSWyM8buH1JV5fMRM94Wyr+gMCg9DvDd51Hvi/0QzKXB5vTfKsuF0zu5D0FFdU1luE1ifcksAlqJBN5SUcPt3O4QjuX+71+UlxCeUz8iP2miZxhXOFcJWqfhDoAlDVcwRWVps4Venzk5uRQmkCLTAbSUaqm3VzJ0VlkHtbVQMrZxYwozBxS1AhNy+figgRXXNRWeNn6bQ8MlLdEXtNk7jCSRa9IuIiOKgtIpOBgahGZS6Lt8bPyiTqELq+zMORM+2caumK2GsebzzPvvrWhFuIN5Li3AyumDuJbXsaCMwbuTy9/QNU1TUnRSnTRMZovaFCM6UeAH4JFInIl4GdwDdjEJu5BG1dvRw63ZYUJaiQitLA6vyd1ZG7ugitS0iGElTIpvJpVJ9p51AESlEHGlrp6h2w8QpzwWhXFi8DqOrPgM8B9wN+4A5VfSgGsZlLsLu2BdXkGpRcNDUXT04aO49Erv/X1qoG1swuZFpBZsRe02kbl07FFaFSVDKNe5nIGC1ZXKhhqOo+Vf1nVf2equ6NQVzmElXW+BGBlQm2h8VoXC7h6lIPO6ubIlJiqT7TzsFTbQm9tmI4RbnpXDlvMtuqLr8UVelrZmpeRlIlU3N5RluUVyQifz/Sg6r6T1GIx1wmr8/PguJc8jJSnQ4loipKPfzmtXoOnmpjcUneZb3W9j0NiCRXCSpkc/k0PvOrPRxoaGPJtEv/OXlr/HZVYd5gtCsLN5AD5I5wM3FmYEDx+vxJOSi5viw4bhGBWVFbq+p50+xJTM3PuOzXijc3LZ2C2yVs23Ppay5Ot3ZxsrmTVUl0dWou32hXFg2q+pWYReIQVWXbngauWVCU8J/Gj55tp62rL+G2UQ3H1PwMSotzeK66kfdfM++SX+fw6TYOn27ny7cujWB08WNyTjpXzZ/M1qoG7nnrwktq02HjFWY4YY1ZJLPjjef5uwd38Q9b9zsdymWrTPI3eUWph5ePN9HV23/Jr7GtKlCCunnZ1AhGFl82LS+hpqmDffWtl/T8yho/aSkulk6zxXjmdaMli3D2rEh484py+PC183n41TqeOXhm7CfEMa/PT2FWKnMTsNV2ONaXeejqHbjwyXe8QleR6+ZMojgv+UpQITctnUqKSy65E63X56d8ej5pKdbVx7xutJ3yzsUyECf93YYyFk7J5b5Hq2jp6HU6nEtWWeNndRJ3CL1i3mRSXMJzl7je4vDpdqrPtCfNQryRFGancXWph2176sc9K6q7r5+9J1uTctzLXB776ACkp7j5zrtX0NTew5d/u8/pcC5Jc0cPR8+eT+o3eU56CqtnFV7yIPfWqnpcAhuXJXeygECvqNpznew52TKu5+092UpPvy3GMxezZBG0bHo+H7mulEd3neSJfaecDmfcdk2QDqEVZR721rfgP98zruepKtuqGrhy3mSKctOjFF38uGnJVFLd4y9FhUp8ibzDookOSxaDfOS6UpaU5PGZX+0d9x8jp3l9ftwuYcXM5B6UrCjzoArPHx3f1cWBhjaONZ5P+Hbk4crPSqWi1DPuBXpen5+ZkzIpzk3eMR1zaSxZDJKW4uI7715BS2cPX3gsscpRlTV+FpfkkpUWzuaHiat8ej65GSnjLkVt21OP2yVsXJq8s6CG2lw+jZPNnbxWG94e5qp6YdzLmKEsWQyxuCSPj20o47e76yO+81i09PUPsLu2eUK8yVPcLq6aP5nnjjSG/YlZVdla1cBV8yczOSf5S1AhNyyZQprbFXavqJPNnZxp607aqdfm8liyGMaH3jKf8hn5fO7Xe2ls73Y6nDEdOt3G+Z7+CfMmrygr4mRzJyeaOsI6fl99KzVNHUnXC2os+ZmpXLPAw7Y9DQwMjJ1YQ+t0JsKHDjN+liyGkeJ28Z07VtDe1cfnfrU3Is3rommibX+5vtQDEHYX2q1VDbhdwk0TqAQVsqm8hIaWLnbVjr02ZZevmcxUN4umWjcfczFLFiMom5LL3791AY/vO8VjuyO7t3GkeWv8FOWmM6NwYnQInT05ixmFmWFttRpYiFfP1aUeCrMn3gaPNyyeQlqKK6xZUZU1flbMzCfFbX8WzMXst2IU718/j1WzCvjCb/ZxpjVyu7RFmtfnZ/WsgqRdjDeUiLC+zMOLR5vo6x9908aquhZqz3WyeYKVoEJyM1K5dkER28coRXX09LG/oXXCXJ2a8bNkMQq3S7j/jhV09fbzmV/tictyVGN7NzVNHRNmvCKkorSItu4+dteNvuhs254GUt0TswQVsqm8hNOt3VT6Ri5FVdW10D+gE+73yITPksUY5hfl8MmNi/j9gTP80nvS6XAu4p2gg5JXzZ+MyOgty0ML8SpKPeRnJXZH4cuxYfEU0lNcbB2lnOoNJpJVE+z3yITPkkUY3nfVHNbNmcSXf7uPhpZOp8N5g0qfn1S3sGx6ci/GG6owO43l0/PZWT3yIPdrtc2cbO5kU/m0GEYWf3LSU7huYTHb956if4RSlLemmXmebCZNwHEdEx5LFmFwuYRv31FOX79y3y/jqxy1q6aZpdPyyUh1Ox1KzFWUetjla6a9u2/Yx7dVNZDmdnHjkikxjiz+bF5Rwtm2bl45cXF/UNXApll2VWFGY8kiTLMnZ/PpWxbx7OGz/O8rtU6HA0BP3wC765onbJ25osxD34Dyp6NNFz02MBDa1MpDfubELUGFXL+omIzU4Rfo1TR1cO58z4T9PTLhsWQxDu+9YjZvnjeZf9h2gDp/eAvCoulAQyvdfRO3Q+ia2YVkprrZOUzL8l21fhpauiZML6ixZKWlsGHRFH63t+GiGWSV1jzQhMGSxTi4XMK3bi9HVfnUL6vCWhUbTRP9TZ6e4mbd3EnDJoutVQ2kpbi4YbGVoEI2lZfQ2N7Dy8ffWIry+vzkpqdQVmyL8czIoposRGSjiBwSkWoRuW+EY94tIvtFZJ+I/GLQ/bNE5AkRORB8fE40Yw3XzElZfHbTEp6vbuLnL/scjcXr8zMtP4OS/ImxGG8468s8VJ9pf8PEg4EBZfueBt6yoIjcBN9XPZKuW1hMVpqbrUN6nlXW+Fk5qwC3a2Ks0zGXJmrJQkTcwAPAzcAS4C4RWTLkmDLg08DVqroU+Pigh38GfFtVFwPrgLjZ8/SudTNZX+bh69sP4AuzP1E07PI1s2qC15krykKtP16/uni1xs/p1u6k3xFvvDLT3GxYPIXH9566UIpq6+rl8Om2CVvKNOGL5pXFOqBaVY+pag/wEHDbkGPeDzygqn4AVT0DEEwqKar6ZPD+dlV1fpAgSET45rvKcYtwz5bdjpSjTrV0cbK5kzUT/E2+cEounpz0N5SitlXVk57iYoOVoC6yaXkJ58738KdjgVLU7toWBpSk3mHRREY0k8V0YPC0obrgfYMtABaIyPMi8icR2Tjo/mYReVREdonIt4NXKnFjWkEmX3jbEl4+fo6fvnAi5ucPLaKa6G9yEaGidDLPVzcyMKD0Dyjb957iuoXF5KQn994el+LahUVkp7nZWhVYoOf1+RGBlTMn5riXCV80k8VwBdChH8FTgDLgWuAu4EciUhC8fz1wD/AmYB7wVxedQOQDIvKqiLx69mx4HUgj6fY1M7h+UTHf2nGQY2fbY3ruyho/6SkulpTkxfS88aiirIjG9h4OnmrjlRPnONvWbbOgRpCR6uaGJVN4fN8pevsHqKzxU1acY9OLzZiimSzqgJmDvp8BDO03UAf8RlV7VfU4cIhA8qgDdgVLWH3Ar4HVQ0+gqj9U1bWquraoqCgq/4jRiAhff+dy0lPc3PPI7hFXx0aD1+enfEY+aSk2oa0i1LK8+ixbq+rJSHVx/aJih6OKX5uWl9Dc0cvO6kZ2+fy2vsKEJZp/aV4BykRkroikAXcCjw055tfAdQAi4iFQfjoWfG6hiIQywPXA/ijGesmm5GXw5VuX4vU18+Odx2Jyzq7efvaebJnwJaiQqfkZlBXn8Ozhszy+9xQbFk0h20pQI7pmQRG56Sn8y1NHaO3qs5XbJixRSxbBK4KPAjuAA8DDqrpPRL4iIrcGD9sBNInIfuAZ4F5VbVLVfgIlqKdEZA+BktZ/RCvWy3Xbymm8dckU7n/iMNVn2qJ+vn31LfT2q81gGaSizMPz1U00tvdYCWoMGalublwyhV3BTbPsysKEI6o1DFXdrqoLVHW+qn4teN8XVPWx4Neqqn+vqktUdbmqPjTouU+qannw/r8KzqiKSyLC196xnOw0N3c/vHvMPRYul21/ebH1wSm0malurltoJaixhBJqQVYq8zzZDkdjEoEVvCOkKDedr759GbvrWvj3P0a3HOWtaWbWpCyKctOjep5EcsXcyaS5XWxYXExmWlxNnItLFWUe8jJSWDOrcMJsmmUujxV2I2hz+TR+t/cU3/v9YTYsLmbR1MjPVFJVKn3+C4O6JiA7PYX//ut1zLFPyWFJT3Hz3399hbUkN2GzK4sI++pty8jPTOXuh3fTG4VyVJ2/k7Nt3ayeZfPih7pi3mSm5GU4HUbCWDGzgJmTspwOwyQISxYRNik7jX94+3L21bfywDPVEX99W4xnjHGCJYso2LhsKm9fOY0fPF3N3pOj7xE9Xt4aP1lpbhZOsQ6hxpjYsWQRJV+6dSmTstO455HddPf1R+x1K31+Vs4sIMVt/+uMMbFjf3GipCArja+/czkHT7Xx/aciU47q6OnjQIN1CDXGxJ4liyjasHgKt6+Zwb89e5Tdtc2X/Xq7a1voH1BbRGWMiTlLFlH2+c1LKM5N5+5HdtPVe3nlqNDg9iqbCWWMiTFLFlGWn5nKN95VTvWZdr77+8OX9VreGj/zi7IpyLK58caY2LJkEQNvWVDEXetm8h9/PHahVcd4qSpen9/GK4wxjrBkESOf3bSEkvxM7nlkN5094y9HHW88j7+j18YrjDGOsGQRIznpKXz79nKON57n2zsOjfv53mCHUFuMZ4xxgiWLGLqq1MNfvHk2P3nhOC8daxrXcytr/ORmpFBalBOl6IwxZmSWLGLsUxsXMbMwi3u3VHG+uy/s5+3y+Vk1qxCXyzqEGmNiz5JFjGWnp3D/HSuo9XfwzccPhvWc1q5eDp1uY40NbhtjHGLJwgHr5k7ifVfN5Wcv1vB8deOYx++ubUYVVs+29RXGGGdYsnDIvTctZK4nm09uqaKtq3fUYytr/IjAypmWLIwxzrBk4ZDMNDf337GChpZO/nH76OUor6+ZhVNyyc1IjVF0xhjzRpYsHLRmdiHvXz+PB1/28ezhs8MeMzCgFwa3jTHGKZYsHPaJGxdQWpzDp7ZU0dJ5cTmq+mw7bV19thjPGOMoSxYOy0h18507VnC2vZuvbt1/0ePeYHsQ20bVGOMkSxZxYMXMAj78lvlsqazjqQOn3/BYZY2fwqxU5nqyHYrOGGMsWcSNv91QyqKpudz36B6aO3ou3B9qHihii/GMMc6xZBEn0lMCs6P853v40mP7AGju6OHo2fPWD8oY47gUpwMwr1s2PZ+PXl/K935/hI3LSkhPCeRya0tujHGaXVnEmY9cV8rSaXl89ld7ePLAadwuYcXMfKfDMsZMcJYs4kyq28V33r2C1q5efvGSj8UluWSl2QWgMcZZlizi0KKpeXz8hgUA1jzQGBMX7CNrnPrgNfNo7+7j7SunOx2KMcZYsohXKW4Xn9q4yOkwjDEGiHIZSkQ2isghEakWkftGOObdIrJfRPaJyC+GPJYnIidF5AfRjNMYY8zoonZlISJu4AHgRqAOeEVEHlPV/YOOKQM+DVytqn4RKR7yMl8Fno1WjMYYY8ITzSuLdUC1qh5T1R7gIeC2Ice8H3hAVf0Aqnom9ICIrAGmAE9EMUZjjDFhiGaymA7UDvq+LnjfYAuABSLyvIj8SUQ2AoiIC/gOcG8U4zPGGBOmaA5wD9fMSIc5fxlwLTADeE5ElgHvBbarau1oPZFE5APABwBmzZoVgZCNMcYMJ5rJog6YOej7GUD9MMf8SVV7geMicohA8ngzsF5E/gbIAdJEpF1V3zBIrqo/BH4IsHbt2qGJyBhjTIREswz1ClAmInNFJA24E3hsyDG/Bq4DEBEPgbLUMVV9j6rOUtU5wD3Az4YmCmOMMbETtWShqn3AR4EdwAHgYVXdJyJfEZFbg4ftAJpEZD/wDHCvqjZFKyZjjDGXRlSTo3ojImeBmst4CQ/QGKFwIsniGh+La3wsrvFJxrhmq2rRWAclTbK4XCLyqqqudTqOoSyu8bG4xsfiGp+JHJc1EjTGGDMmSxbGGGPGZMnidT90OoARWFzjY3GNj8U1PhM2LhuzMMYYMya7sjDGGDOmCZ8sROQ/ReSMiOx1OpYQEZkpIs+IyIFg6/aPOR0TgIhkiMjLIrI7GNeXnY5pMBFxi8guEdnqdCwhInJCRPaIyGsi8qrT8YSISIGIbBGRg8Hfszc7HROAiCwM/qxCt1YR+XgcxPWJ4O/8XhF5UEQynI4JQEQ+FoxpX7R/ThO+DCUi1wDtBFaJL3M6HgARKQFKVNUrIrlAJfD2we3dHYpLgGxVbReRVGAn8DFV/ZOTcYWIyN8Da4E8Vd3sdDwQSBbAWlWNq7n5IvJfwHOq+qNgh4UsVW12Oq7BgtscnASuUNXLWUN1uXFMJ/C7vkRVO0XkYQK9637qVEzBuJYR6Oa9DugBHgc+rKpHonG+CX9loap/BM45Hcdgqtqgqt7g120EVsA7vr+qBrQHv00N3uLi04aIzAA2AT9yOpZ4JyJ5wDXAjwFUtSfeEkXQBuCok4likBQgU0RSgCwu7nPnhMUEeut1BDtmPAu8I1onm/DJIt6JyBxgFfCSs5EEBEs9rwFngCdVNS7iAr4HfBIYcDqQIRR4QkQqg12S48E84Czwk2DZ7kciku10UMO4E3jQ6SBU9SRwP+ADGoAWVY2HfXb2AteIyGQRyQJu4Y3NWyPKkkUcE5Ec4JfAx1W11el4AFS1X1VXEugivC54KewoEdkMnFHVSqdjGcbVqroauBn4SLDs6bQUYDXwb6q6CjgPxFWjzmBp7FbgkTiIpZDAxm1zgWlAtoi819moQFUPAN8EniRQgtoN9EXrfJYs4lRwTOCXwM9V9VGn4xkqWLb4A7DR4VAArgZuDY4PPARcLyL/42xIAapaH/zvGeBXBOrLTqsD6gZdFW4hkDziyc2AV1VPOx0IcANwXFXPBrdTeBS4yuGYAFDVH6vqalW9hkA5PSrjFWDJIi4FB5J/DBxQ1X9yOp4QESkSkYLg15kE3kQHnY0KVPXTqjoj2NL+TuBpVXX8k5+IZAcnKBAs87yVQOnAUap6CqgVkYXBuzYAjk6eGMZdxEEJKsgHXCkiWcH35gYC44iOE5Hi4H9nAe8kij+zaG5+lBBE5EECO/V5RKQO+KKq/tjZqLga+HNgT3B8AOAzqrrdwZgASoD/Cs5ScRFoOx8301Tj0BTgV8HdHlOAX6jq486GdMHfAj8PlnuOAe9zOJ4LgvX3G4EPOh0LgKq+JCJbAC+BMs8u4mcl9y9FZDLQC3xEVf3ROtGEnzprjDFmbFaGMsYYMyZLFsYYY8ZkycIYY8yYLFkYY4wZkyULY4wxY7JkYRKKiPxBRG4act/HReRfx3he+2iPRyCuIhF5Kdg+Y/2Qx/4gImuDX88RkSND/w3Bx74d7B767UuM4drBHXdF5B9EZIeIpAdjeHXQY2tF5A+Dnqci8rZBj28VkWsvJQ6TnCxZmETzIIGFd4PFQw+hDcBBVV2lqs8Nd0Cw2eEO4G5V3THMIR8EVqvqveGcMNjUbqTHPktgvc7bVbU7eHexiNw8wlPqgM+Gc14zMVmyMIlmC7BZRNLhQqPFacBOEckRkadExBvcQ+K2oU8e5tP3D0Tkr4JfrxGRZ4NN/3YEW8UPff7s4Dmqgv+dJSIrgW8Bt0hgD4bMYeKeCjwBfE5VHxvmdR8DsoGXROTPhjtP8Lifisg/icgzBPoCXURE7ibQVO5tqto56KFvA58b7jkE+gq1iMiNIzxuJjhLFiahqGoT8DKv96S6E/hfDawu7QLeEWzcdx3wnWB7hjEFe3F9H7hdVdcA/wl8bZhDf0Bg75Ny4OfAv6jqa8AXgnGsHPIHOuRnwA9UddjGeKp6K9AZfP7/DneeQYcvAG5Q1buHeamrgQ8BNw9qJx/yItAtItcNFwPwD4ycTMwEZ8nCJKLBpajBJSgB/lFEqoDfE9gDZEqYr7kQWAY8GWyx8jkCnXWHejPwi+DX/w1UhPn6vwf+PNjKIhyjnecRVe0f4XnVBH4Obx3h8RETQqh8NnTMxRiwZGES06+BDSKyGsgMbRQFvAcoAtYE26ifBoZuf9nHG3/vQ48LsC/4yX6lqi5X1ZH+4A4Wbr+cbxHYk+SR0cYawjzP+VGOO02gBPXd4a4gVPVpAv/mK0d4/tewsQszDEsWJuEEyyt/IFAqGjywnU9gX4ve4B/K2cM8vQZYEpwhlE9gYBrgEFAkwb2oRSRVRJYO8/wXeP2q5j0EttsM1yeAVuDHYZTHLvk8qnqYQAfS/wmOpwz1NQIbRQ333CeAQmBFuOczE4MlC5OoHiTwB+2hQff9HFgbnCL6HoZpn66qtcDDQFXw+F3B+3uA24Fvishu4DWG37Pg74D3BUtdfw58LNyAg+Mqf0mge++3xjj8ks8TPNcrBDrJPiYi84c8tp3ATnkj+RrDl+DMBGZdZ40xxozJriyMMcaMyZKFMcaYMVmyMMYYMyZLFsYYY8ZkycIYY8yYLFkYY4wZkyULY4wxY7JkYYwxZkz/D0/v+mWohuXrAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# finding a suitable k value\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"k_range = range(1, 10)\n",
"accuracy_score = []\n",
"for k in k_range:\n",
" KNN = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)\n",
" # perform the test\n",
" knn_yhat = KNN.predict(X_test)\n",
" print(\"Test set Accuracy at k=\", k, \": \", jaccard_similarity_score(y_test, knn_yhat))\n",
" accuracy_score.append(jaccard_similarity_score(y_test, knn_yhat))\n",
"\n",
"# plot the relationship between K and testing accuracy\n",
"plt.plot(k_range, accuracy_score)\n",
"plt.xlabel('Value of K for KNN')\n",
"plt.ylabel('Testing Accuracy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The result shows that the best accuracy came from k = 7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform k-Nearest Neighbors test using k = 7"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=7, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for KNN\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"# perform the test\n",
"KNN = KNeighborsClassifier(n_neighbors = 7).fit(X_train, y_train)\n",
"KNN"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Decision Trees test - find the best Depth"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Depth</th>\n",
" <th>F1-score</th>\n",
" <th>Jacard</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>d=3</th>\n",
" <td>0.620577</td>\n",
" <td>0.585714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d=4</th>\n",
" <td>0.620577</td>\n",
" <td>0.585714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d=5</th>\n",
" <td>0.648789</td>\n",
" <td>0.614286</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Depth F1-score Jacard\n",
"d=3 0.620577 0.585714\n",
"d=4 0.620577 0.585714\n",
"d=5 0.648789 0.614286"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# findinng the best depth level\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.metrics import f1_score\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"\n",
"# Compare accuracy result for depth = 3, 4 and 5\n",
"d_range = range(3, 6)\n",
"f1 = []\n",
"ja = []\n",
"for d in d_range:\n",
" DT = DecisionTreeClassifier(criterion=\"entropy\", max_depth=d)\n",
" DT.fit(X_train, y_train)\n",
" dt_yhat = DT.predict(X_test)\n",
" f1.append(f1_score(y_test, dt_yhat, average='weighted'))\n",
" ja.append(jaccard_similarity_score(y_test, dt_yhat))\n",
"\n",
"result = pd.DataFrame(f1, index=['d=3','d=4', 'd=5'])\n",
"result.columns = ['F1-score']\n",
"result.insert(loc=1, column='Jacard', value=ja)\n",
"result.columns.name = \"Depth\"\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The result shows that using Depth=5 will give a higer accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Decision Trees using Depth = 5"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=5,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n",
" splitter='best')"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for Decision Trees\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"# prepare DT setting\n",
"DT = DecisionTreeClassifier(criterion=\"entropy\", max_depth=5)\n",
"# perform the test\n",
"DT.fit(X_train, y_train)\n",
"DT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Support Vector Machine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Support Vector Machines test - find the best kernel function"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.\n",
" 'precision', 'predicted', average, warn_for)\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3XmYHVW57/Hvj4RAEGRKjgqJJEpQg0CQECcQEPAwmeBwNVGPoAyHc42RSQ8qIjeeixccuD4Sh4AIyBCBwxAwGgYZBAHTYC4QQiCEIW1Em3mGhLz3j7W6qGx2997d6eqdJr/P8+ynq1atqnqruna9u1btWlsRgZmZGcA6rQ7AzMzWHE4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFe8OT9C1JZ/TzOj8s6X5Jz0k6sD/XvaaRFJK2bnUcXWnF8bEmc1LoA5Kul/SkpPVaHUtVlEyTdLek5yW1S7pI0natjq2RiDgpIg7t59VOB06LiA0j4rK+WqikIZLuldTeTZ3dy9PzPJdIulnSm/sqlr6Q3zsv5eTZ+fpghevbvXbftej4WGM5KawmSaOAXYEAJvbzugf34+p+AnwNmAZsBmwDXAbs348x9Fg/76OyrYAFvZmxQcxfB/7Zg2WtB1wCbAJ8LCKe6cNY+srUnDw7X7f0wzqtKxHh12q8gBOAm4EfA1fWTBsK/Ah4GHgauAkYmqftAvwZeApYChycy68HDi0t42DgptJ4AF8B7gcezGU/yct4Brgd2LVUfxDwLeAB4Nk8fSQwA/hRTbxXAEfW2cYxwKvAhG72w8bAOUBH3t7jgXVK23AzcGre3iXAh3L5UtJJ7qDSss4CfgFcnWO+AdiqNL277T0RuBg4N08/NJedm6evn6c9nmOZB7wlT9sCmA08ASwGDqtZ7oV5G58lnfDHd7EvHgBWAi8CzwHrNbHsVWLuYrmjgYXAvkB7N/+L3YF2YAPgKmAu+bjL09cBjstxPp63a7M8bRTpGDsEeAS4sVR2UC57DPh2aXkTgFvy/vw7cBowpOaY3bqLWK+vt72ldQ6uVzcfOzcBPwSeBB4E9i3V3Qz4NbAsT78MeFP+n6zM/5fn8v+lOD7yvBPz//epvM73lKY9BBwL3El6T/8WWD9PGwZcmed7AvgT+T0wkF4tD2Cgv/Ib/H8COwHLySeYPG1GPqi2JJ2cP5RPEG8nnVimAOsCmwPj8jyrvEmonxSuzgd9Z4L5Ql7GYOAY4NHSgfp14C7gXYCAHXLdCfkN03niHga8UI6/tM4jgIcb7IdzgMuBjfIb+j7gkNI2rAC+lPfDf5FOLjPy/vhY3h8b5vpn5fGP5Ok/qdkH3W3vifn/cCDp5DeUVZPCv5OS3wY5lp2AN+dpNwA/IyWOcaQEt2dpuS8B++X5vg/c2s3+eAjYqzTeaNmrxNzFMq8EPkE+6Xez7t3z8m8gJaL1aqYfCdwKjMj795fABXnaKNIxdg7pJDq0VHZ6Ht8BeJl8ssz78AP5/zGKlLiOLK2vqqSwHDgs/z/+g3Q8K0//HemEvSnpPbZbad+016yrfHxsAzwP7J3n+wbpPT6k9H/9CymZbJa39Yg87fukDzPr5teunfEMpFfLAxjIL9Kn/eXAsDx+L3BUHl6H9KlkhzrzfRO4tItlrvImoX5S+GiDuJ7sXC+wCJjURb2FwN55eCowp4t636b7E+CgfJIYWyr7d+D60jbcX5q2Xd6OcgJ9nNcS41nArNK0DUlXKiOb2N4TgRtrppff9F8mXaFtX1NnZF7HRqWy7wNnlZZxTWnaWODFbvbJQ+Sk0OSyb+xqWbnOJ4A/5OHdaZwUXgJeAT7Vxf99z9L42/Jx3HlSD+AdpemdZSNKZX8BJnex/iMpHd80TgovkD5dPwXcUbPO7pLC4tK0DXL9t+btWQls2sW+6S4pfAe4sDRtHeBvwO6l/+sXStNPAX6Rh6eTPhjV3daB8vI9hdVzEHBVRDyWx8/PZZA+ea9PukSvNbKL8mYtLY9IOkbSQklPS3qK1JQzrIl1nU361E3++5su6j1OeqN1ZRgwhNRs1Olh0hVSp3+Uhl8EiIjasg1L48U2RsRzpMvxLaDh9q4ybx2/ITWnzJK0TNIpktbNy34iIp7tZhseLQ2/AKzfZJt7M8vuMmZJbyKdfL7axLo6PQZMBs6W9K8107YCLpX0VN5/C0lJ6y0N4qnd/g1zfNtIulLSo5KeAU5i1f9HI9MiYpP8el8P5iviiYgX8uCGpGP+iYh4sgfL6rQFpeM4IlaS9kV3x0HncfsD0lXFVZKWSDquF+tvOSeFXpI0FPgMsFt+MzwKHAXsIGkH0pvyJeCddWZf2kU5pEvXDUrjb61TJ0px7Ar8Z45l04jYhNTWqSbWdS4wKcf7HlK7az3XAiMkje9i+mOkT5pblcreTvqE1VsjOwckbUi6VF/WxPZCaf/UiojlEfG/ImIsqTnvAOCLpKaHzSRt1Ifb0KmZZXcZM+mezijgT/k4uwR4Wz7uRnU1U0RcQmpeuVjSHqVJS0nt75uUXutHRLPx1Po56Sp5TES8mXQPS93P0tDz+W+j90I9S0n7e5M60xpt1zJKx7EkkY7FhsdBRDwbEcdExDuAjwNHS9qzyZjXGE4KvXcg6dPVWFIb8TjSifVPwBfzJ4wzgR9L2kLSIEkfzN8GOQ/YS9JnJA2WtLmkcXm584FPStogf7f7kAZxbERqr+8ABks6ASh/7fAM4HuSxuSvlW4vaXOAiGgn3Wj9DfDfEfFivRVExP2k9vAL8lf6hkhaX9JkScdFxKukm5X/W9JGkrYCjiYlnd7aT9IukoYA3wNui4ilTWxvtyTtIWk7SYNIN3WXA6/mZf8Z+H7etu1J+/681dgGAPpg2XeTTkydx9mhpCuvcXR/VUREXEBqGrxc0odz8S9I/6utACQNlzSpZ1u1io1I+/I5Se8mte+vlojoIJ2Iv5DfO1+m6w83tfP+Hfg98DNJm0paV9JH8uR/AJtL2riL2S8E9pe0Z76CPIbUNPrnRuuVdICkrXMieYZ0fni1mZjXJE4KvXcQ8OuIeCQiHu18kb558fncrHAs6SbvPFLzx8mkG7uPkG5YHpPL55Nu3kH6hs4rpIP3bBqfOOaS3gD3kS57X2LVE8WPSQf6VaQD9Vekm4Wdzia18XfVdNRpWt62GaS23wdI7dxX5OlfJX26W0L6Vsj5pKTYW+cD3yXtn52Az+fyRtvbyFtJ3/R5htRscgOvJa8ppE/ky4BLge9GxNWrsQ1lvV52RKyoOcaeAFbm8YYnnYg4m3Ss/U7SBNKN+9mkZo5nSTed39+bjcqOBT5H+nLA6aQbvH3hMNIXJR4HtqWJE3PJv5ES/r2kb7cdCRAR9wIXAEty89kW5ZkiYhGpKfWnpCvgjwMfj4hXmljnGOAa0reabgF+FhHX9yDmNULnnXpbS+VPUOcCo/LVTctJOot0M/D4VsditrbxlcJaLF8efw04Y01JCGbWWk4KaylJ7yE1A70N+L8tDsfM1hBuPjIzs4KvFMzMrNCqzsJ6bdiwYTFq1KhWh2FmNqDcfvvtj0XE8Eb1BlxSGDVqFG1tba0Ow8xsQJH0cONabj4yM7MSJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmhQH3RLPZQHXq1fe1OoSWO2rvbVodgjXgKwUzMys4KZiZWcFJwczMCpUmBUn7SFokabGk4+pMP1XS/Py6T9JTVcZjZmbdq+xGs6RBwAxgb6AdmCdpdkTc01knIo4q1f8qsGNV8ZiZWWNVXilMABZHxJKIeAWYBUzqpv4U4IIK4zEzswaqTApbAktL4+257HUkbQWMBv5YYTxmZtZAlUlBdcqii7qTgYsj4tW6C5IOl9Qmqa2jo6PPAjQzs1VVmRTagZGl8RHAsi7qTqabpqOImBkR4yNi/PDhDX9i1MzMeqnKJ5rnAWMkjQb+Rjrxf662kqR3AZsCt1QYC+AnSv00qZk1UllSiIgVkqYCc4FBwJkRsUDSdKAtImbnqlOAWRHRVdOSrSGcVJ1U7Y2v0r6PImIOMKem7ISa8ROrjMHMzJrnDvHMbMDw1Wr1V6vu5sLMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7NCpUlB0j6SFklaLOm4Lup8RtI9khZIOr/KeMzMrHuDq1qwpEHADGBvoB2YJ2l2RNxTqjMG+Cbw4Yh4UtK/VBWPmZk1VuWVwgRgcUQsiYhXgFnApJo6hwEzIuJJgIj4Z4XxmJlZA1UmhS2BpaXx9lxWtg2wjaSbJd0qaZ96C5J0uKQ2SW0dHR0VhWtmZlUmBdUpi5rxwcAYYHdgCnCGpE1eN1PEzIgYHxHjhw8f3ueBmplZUmVSaAdGlsZHAMvq1Lk8IpZHxIPAIlKSMDOzFqgyKcwDxkgaLWkIMBmYXVPnMmAPAEnDSM1JSyqMyczMulFZUoiIFcBUYC6wELgwIhZImi5pYq42F3hc0j3AdcDXI+LxqmIyM7PuVfaVVICImAPMqSk7oTQcwNH5ZWZmLeYnms3MrOCkYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMytUmhQk7SNpkaTFko6rM/1gSR2S5ufXoVXGY2Zm3Rtc1YIlDQJmAHsD7cA8SbMj4p6aqr+NiKlVxWFmZs2r8kphArA4IpZExCvALGBSheszM7PVVGVS2BJYWhpvz2W1PiXpTkkXSxpZb0GSDpfUJqmto6OjiljNzIxqk4LqlEXN+BXAqIjYHrgGOLvegiJiZkSMj4jxw4cP7+MwzcysU5VJoR0of/IfASwrV4iIxyPi5Tx6OrBThfGYmVkDVSaFecAYSaMlDQEmA7PLFSS9rTQ6EVhYYTxmZtZAZd8+iogVkqYCc4FBwJkRsUDSdKAtImYD0yRNBFYATwAHVxWPmZk1VllSAIiIOcCcmrITSsPfBL5ZZQxmZtY8P9FsZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrNAwKUiaKmnT/gjGzMxaq5krhbeSfiDnwvxLavV6PzUzszeAhkkhIo4HxgC/IvVNdL+kkyS9s+LYzMysnzV1TyEiAng0v1YAmwIXSzqlwtjMzKyfNewQT9I04CDgMeAM4OsRsVzSOsD9wDeqDdHMzPpLM72kDgM+GREPlwsjYqWkA6oJy8zMWqGZ5qM5pN86AEDSRpLeDxAR/lEcM7M3kGaSws+B50rjz+cyMzN7g2kmKSjfaAZSsxEV/ziPmZm1RjNJYYmkaZLWza+vAUuqDszMzPpfM0nhCOBDwN+AduD9wOFVBmVmZq3RsBkoIv4JTO6HWMzMrMWaeU5hfeAQYFtg/c7yiPhyE/PuA/wEGAScERH/p4t6nwYuAnaOiLbmQjczs77WTPPRb0j9H/0rcAMwAni20UySBgEzgH2BscAUSWPr1NsImAbc1nzYZmZWhWaSwtYR8R3g+Yg4G9gf2K6J+SYAiyNiSUS8AswCJtWp9z3gFOClJmM2M7OKNJMUlue/T0l6L7AxMKqJ+bYElpbG23NZQdKOwMiIuLK7BUk6XFKbpLaOjo4mVm1mZr3RTFKYmX9P4XhgNnAPcHIT89XrYrt43iH3nXQqcEyjBUXEzIgYHxHjhw8f3sSqzcysN7q90ZxP3M9ExJPAjcA7erDsdmBkaXwEsKw0vhHwXuD6/BMNbwVmS5rom81mZq3R7ZVCfnp5ai+XPQ8YI2m0pCGkr7XOLi376YgYFhGjImIUcCvghGBm1kLNNB9dLelYSSMlbdb5ajRTRKwgJZS5wELgwohYIGm6pImrGbeZmVWgmT6MOp9H+EqpLGiiKSki5pB6WS2XndBF3d2biMXMzCrUzBPNo/sjEDMza71mnmj+Yr3yiDin78MxM7NWaqb5aOfS8PrAnsAdgJOCmdkbTDPNR18tj0vamNT1hZmZvcE08+2jWi8AY/o6EDMza71m7ilcwWtPIq9D6tzuwiqDMjOz1mjmnsIPS8MrgIcjor2ieMzMrIWaSQqPAH+PiJcAJA2VNCoiHqo0MjMz63fN3FO4CFhZGn81l5mZ2RtMM0lhcP49BADy8JDqQjIzs1ZpJil0lPsqkjQJeKy6kMzMrFWauadwBHCepNPyeDtQ9ylnMzMb2Jp5eO0B4AOSNgQUEQ1/n9nMzAamhs1Hkk6StElEPBcRz0raVNJ/9UdwZmbWv5q5p7BvRDzVOZJ/hW2/6kIyM7NWaSYpDJK0XueIpKHAet3UNzOzAaqZG83nAtdK+nUe/xJwdnUhmZlZqzRzo/kUSXcCewEC/gBsVXVgZmbW/5rtJfVR0lPNnyL9nsLCyiIyM7OW6TIpSNpG0gmSFgKnAUtJX0ndIyJO62q+mmXsI2mRpMWSjqsz/QhJd0maL+kmSWN7vSVmZrbaurtSuJd0VfDxiNglIn5K6veoKZIGATOAfUndbU+pc9I/PyK2i4hxwCnAj3sUvZmZ9anuksKnSM1G10k6XdKepHsKzZoALI6IJbm/pFnApHKFiHimNPomXvvdBjMza4Euk0JEXBoRnwXeDVwPHAW8RdLPJX2siWVvSWpy6tSey1Yh6SuSHiBdKUyrtyBJh0tqk9TW0dHRxKrNzKw3Gt5ojojnI+K8iDgAGAHMB153f6COelcVr7sSiIgZEfFO4D+B47uIYWZEjI+I8cOHD29i1WZm1hs9+o3miHgiIn4ZER9tono7MLI0PgJY1k39WcCBPYnHzMz6Vo+SQg/NA8ZIGi1pCDAZmF2uIGlMaXR/4P4K4zEzswaaeaK5VyJihaSpwFxgEHBmRCyQNB1oi4jZwFRJewHLgSeBg6qKx8zMGqssKQBExBxgTk3ZCaXhr1W5fjMz65kqm4/MzGyAcVIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMysUGlSkLSPpEWSFks6rs70oyXdI+lOSddK2qrKeMzMrHuVJQVJg4AZwL7AWGCKpLE11f4KjI+I7YGLgVOqisfMzBqr8kphArA4IpZExCvALGBSuUJEXBcRL+TRW4ERFcZjZmYNVJkUtgSWlsbbc1lXDgF+X2+CpMMltUlq6+jo6MMQzcysrMqkoDplUbei9AVgPPCDetMjYmZEjI+I8cOHD+/DEM3MrGxwhctuB0aWxkcAy2orSdoL+DawW0S8XGE8ZmbWQJVXCvOAMZJGSxoCTAZmlytI2hH4JTAxIv5ZYSxmZtaEypJCRKwApgJzgYXAhRGxQNJ0SRNztR8AGwIXSZovaXYXizMzs35QZfMRETEHmFNTdkJpeK8q129mZj3jJ5rNzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrVJoUJO0jaZGkxZKOqzP9I5LukLRC0qerjMXMzBqrLClIGgTMAPYFxgJTJI2tqfYIcDBwflVxmJlZ8wZXuOwJwOKIWAIgaRYwCbins0JEPJSnrawwDjMza1KVzUdbAktL4+25rMckHS6pTVJbR0dHnwRnZmavV2VSUJ2y6M2CImJmRIyPiPHDhw9fzbDMzKwrVSaFdmBkaXwEsKzC9ZmZ2WqqMinMA8ZIGi1pCDAZmF3h+szMbDVVlhQiYgUwFZgLLAQujIgFkqZLmgggaWdJ7cD/AH4paUFV8ZiZWWNVfvuIiJgDzKkpO6E0PI/UrGRmZmsAP9FsZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKlSYFSftIWiRpsaTj6kxfT9Jv8/TbJI2qMh4zM+teZUlB0iBgBrAvMBaYImlsTbVDgCcjYmvgVODkquIxM7PGqrxSmAAsjoglEfEKMAuYVFNnEnB2Hr4Y2FOSKozJzMy6MbjCZW8JLC2NtwPv76pORKyQ9DSwOfBYuZKkw4HD8+hzkhZVEnH1hlGzbf3p6FatuO94/60+78PVM5D331bNVKoyKdT7xB+9qENEzARm9kVQrSSpLSLGtzqOgcr7b/V5H66etWH/Vdl81A6MLI2PAJZ1VUfSYGBj4IkKYzIzs25UmRTmAWMkjZY0BJgMzK6pMxs4KA9/GvhjRLzuSsHMzPpHZc1H+R7BVGAuMAg4MyIWSJoOtEXEbOBXwG8kLSZdIUyuKp41xIBvAmsx77/V5324et7w+0/+YG5mZp38RLOZmRWcFMzMrOCk0EuSnst/t5B0cavjWdtIul7SG/qrgX2l81itU/5uSfMl/VXSO/s7rlaSdEadHhb6eh1zJG1Sp/xEScdWue7VUeVzCmuFiFhG+uZUZSQNjogVVa7D3phyDwFdffg7ELg8Ir7bjyGtESLi0H5Yx35Vr6MKvlJYTZJGSbo7Dx8s6RJJf5B0v6RTSvU+JukWSXdIukjShrn8BEnzJN0taWZnNx/5k/BJkm4AvtaSjetHeT/eK+lsSXdKuljSBpL2zJ9k75J0pqT1auY7RNKppfHDJP24/7dgzZH35UJJPwPuAIZK+lE+9q6VNFzSfsCRwKGSrmttxNWS9CZJv5P0//L77LPlK818DN2Xy06XdFouP0vSzyVdJ2mJpN3yMbhQ0lml5U/Jx+fdkk4ulT8kaVge/nbuHPQa4F39uwd6xkmh740DPgtsB3xW0sh8YBwP7BUR7wPaeO2J9dMiYueIeC8wFDigtKxNImK3iPhRP8bfSu8CZkbE9sAzpH10FvDZiNiOdGX7HzXzzAImSlo3j38J+HX/hLtGexdwTkTsmMfvyMfeDcB3I2IO8Avg1IjYo1VB9pN9gGURsUN+n/2hc4KkLYDvAB8A9gbeXTPvpsBHgaOAK0gdd24LbCdpXJ7/5FxnHLCzpAPLC5C0E+nr9jsCnwR27vMt7ENOCn3v2oh4OiJeAu4h9TfyAVJPsTdLmk96YK+zH5I9crfhd5EOrG1Ly/ptP8a9JlgaETfn4XOBPYEHI+K+XHY28JHyDBHxPPBH4ABJ7wbWjYi7+ivgNdjDEXFrHl7Ja8fSucAurQmpZe4C9pJ0sqRdI+Lp0rQJwA0R8URELAcuqpn3ivxA7V3APyLirohYCSwARpFO8NdHREdu4j2PmmMU2BW4NCJeiIhneP1DvGsU31Poey+Xhl8l7WMBV0fElHJFSesDPwPGR8RSSScC65eqPF9xrGua3j40cwbwLeBefJXQqbtjZ616OCki7suf1vcDvi/pqtLkRr0yd76fV7Lqe3sl6b3d7L2+AbPPfaXQP24FPixpa4DcVr4NryWAx/I9hkpvWA8Ab5f0wTw8BbgGGNW534B/IzV/rCIibiP1ofU54IL+CHSAWYfXjq3PATe1MJZ+l5t4XoiIc4EfAu8rTf4LsJukTXP/a5/q4eJvy/MPU/oNmSm8/hi9EfiEpKGSNgI+3qsN6Se+UugHEdEh6WDggtKN0uPzJ5jTSZemD5H6i1qbLQQOkvRL4H7SDfZbgYvyG3YeqR28nguBcRHxZL9EOrA8D2wr6XbgadI9r7XJdsAPJK0ElpPuS/0QICL+Jukk0sl9GanJ9+muFlQrIv4u6ZvAdaSrjjkRcXlNnTsk/RaYDzwM/Gn1N6k67ubC1ghKP8V6Zb4R2Jv5ryTdNL22L+OyNz5JG0bEc/mDx6WkftoubXVcreLmIxvQJG0i6T7gRScE66UT8xdA7gYeBC5rcTwt5SsFMzMr+ErBzMwKTgpmZlZwUjAzs4KTgg0YKvX2KWm/3L/U2yteZ93eWHP5IqVeRudL6tNnTCQdKWmD0njdHjfN+pqfU7ABR9KewE+Bj0XEI03OU0VPs5+PiLY+XmanI0ldUrwAA7fHTRt4fKVgA4qkXYHTgf0j4oFcNlzSfyv1NjtP0odz+YlKPc9eBZyjXvRi28PYih5z8/ixueuSziuLkyX9JffIuWsuHyTph7mXzTslfVXSNGAL4DrlHkxretw8OvfIebekI0vrXph7+Vwg6SpJQ/O0aZLuycuf1fO9bmsTXynYQLIecDmwe0TcWyr/CenBtZtyc9Jc4D152k7ALhHxYn6qfBypt8qXgUWSfgq8yGu92D4v6T9JPbRObxDPeZJezMN7NhH/4IiYoNRt9XeBvYDDgdHAjhGxQtJmEfGEpKOBPSLisfICch8+XwLeT3qC9jal7tWfBMYAUyLiMEkXkrpsOBc4DhgdES+7CcoacVKwgWQ58GfgEFb9jYm9gLFS0bfZm3MfMwCzI+LFUt1rO3vJlNTZi+0mvNaLLcAQ4JYm4lml+ai0zq5ckv/eTuphszP2X3Q2bUXEEw2WsQupx83n8zovIfXCOZvUo+z8Ouu4k5TALmMtfzDLGnNSsIFkJfAZ4BpJ34qIk3L5OsAHa07+5BN8bW+hTfdi2wsrWLVJdv2a6Z3r7lwved09eYK0u149a7dtaB7en9Sd80TgO5K29S/5WVd8T8EGlIh4gfRDRJ+XdEguvgqY2llH0rgeLrarXmx76h/Av0jaPHd8eECjGUixH5H73UHSZrn8WaDelceNwIE5xjcBn6CbDtYkrQOMjIjrgG+Qrop6fL/E1h6+UrABJ7e57wPcKOkxYBowQ9KdpGP6RuCIHiyvbi+2wH1dz1V3OcslTSf1uPkg6fcdGjkD2Aa4U9Jy0k3004CZwO8l/b38y2i5x82zSF0+A5wREX/NHQrWMwg4V9IbQOCYAAAAP0lEQVTGpKuMUyPiqZ5sl61d3PeRmZkV3HxkZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRX+P3yjLdhS0AywAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# for SVM\n",
"from sklearn import svm\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"from sklearn.metrics import f1_score\n",
"\n",
"# import Matplotlib (scientific plotting library)\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"func_list = ['linear', 'poly', 'rbf', 'sigmoid']\n",
"accuracy_score = []\n",
"\n",
"for func in func_list:\n",
" SVM = svm.SVC(kernel=func)\n",
" SVM.fit(X_train, y_train)\n",
" svm_yhat = SVM.predict(X_test)\n",
" accuracy_score.append(f1_score(y_test, svm_yhat, average='weighted'))\n",
" \n",
"# plot the comparison among 4 kernel functions\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"y_pos = np.arange(len(func_list))\n",
"plt.bar(y_pos, accuracy_score, align='center', alpha=0.5)\n",
"plt.xticks(y_pos, func_list)\n",
"plt.ylabel('Accuracy')\n",
"plt.xlabel('Kernel Functions')\n",
"plt.title('Accuracy Comparison for 4 Kernal Functions')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The found best kernel function is rbf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Support Vector Machines using rbf kernel function"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n",
" decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',\n",
" max_iter=-1, probability=False, random_state=None, shrinking=True,\n",
" tol=0.001, verbose=False)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for SVM\n",
"from sklearn import svm\n",
"# prepare SVM setting\n",
"SVM = svm.SVC(kernel='rbf')\n",
"# perform the test\n",
"SVM.fit(X_train, y_train)\n",
"SVM"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Logistic Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logistic Regression test - find the best parameters"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test 0 : Accuracy at c = 0.1 solver= newton-cg is : 0.477460130698766\n",
"Test 1 : Accuracy at c = 0.1 solver= lbfgs is : 0.47746026240380063\n",
"Test 2 : Accuracy at c = 0.1 solver= liblinear is : 0.49096560818457907\n",
"Test 3 : Accuracy at c = 0.1 solver= sag is : 0.47746120029530975\n",
"Test 4 : Accuracy at c = 0.1 solver= saga is : 0.47746371482697025\n",
"Test 5 : Accuracy at c = 0.01 solver= newton-cg is : 0.48933564178286426\n",
"Test 6 : Accuracy at c = 0.01 solver= lbfgs is : 0.48933560490693945\n",
"Test 7 : Accuracy at c = 0.01 solver= liblinear is : 0.5699980927778155\n",
"Test 8 : Accuracy at c = 0.01 solver= sag is : 0.48934954495811284\n",
"Test 9 : Accuracy at c = 0.01 solver= saga is : 0.4893356454870894\n",
"Test 10 : Accuracy at c = 0.001 solver= newton-cg is : 0.5177257828275373\n",
"Test 11 : Accuracy at c = 0.001 solver= lbfgs is : 0.5177257382214536\n",
"Test 12 : Accuracy at c = 0.001 solver= liblinear is : 0.6691108543335518\n",
"Test 13 : Accuracy at c = 0.001 solver= sag is : 0.5176993195513844\n",
"Test 14 : Accuracy at c = 0.001 solver= saga is : 0.517725322622093\n"
]
},
{
"data": {
"text/plain": [
"Text(0,0.5,'Testing Accuracy')"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEKCAYAAAA4t9PUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl83HWd+PHXO1dztE2bSQq9kvQUSikFQoHGCxQsiIDicniB68LqyqKLy0JFUcEDXXfR34K6iAiuKJcCVYsFFDxaCg1QelKapEkbWtpkkl65k3n//vh+JwzTSWaazHfO9/PxmEdnvvP9fuc9kPadz/X+iKpijDHGjFZOsgMwxhiT3iyRGGOMGRNLJMYYY8bEEokxxpgxsURijDFmTCyRGGOMGRNLJMYYY8bEEokxxpgxsURijDFmTPKSHUAilJeXa3V1dbLDMMaYtPLSSy+1qWpFtPM8TSQisgz4IZAL3KOqt0c451Lg64ACr6rqx0TkLOCOkNOOAy5X1cdF5D7gPcAB972rVHX9SHFUV1dTV1c31q9jjDFZRUSaYznPs0QiIrnAXcA5QAuwTkRWqOqWkHPmAcuBWlXtEJEpAKr6LLDYPacMqAeeCrn9Dar6qFexG2OMiZ2XYyRLgHpVbVTVPuBB4KKwc64G7lLVDgBV3RfhPh8FnlTVLg9jNcYYM0peJpLpwK6Q1y3usVDzgfkislpE1rpdYeEuB34dduxbIrJBRO4QkXGRPlxErhGROhGpa21tHe13MMYYE4WXiUQiHAuvWZ8HzAPeC1wB3CMik4ZuIDIVOBFYFXLNcpwxk9OAMuDGSB+uqnerao2q1lRURB0rMsYYM0peJpIWYGbI6xnA7gjnPKGq/aq6A9iGk1iCLgUeU9X+4AFV3aOOXuDnOF1oxhhjksTLRLIOmCcis0SkAKeLakXYOY8DZwGISDlOV1djyPtXENat5bZSEBEBLgY2eRK9McaYmHg2a0tVB0TkWpxuqVzgXlXdLCK3AnWqusJ971wR2QIM4szG8gOISDVOi+YvYbd+QEQqcLrO1gOf9eo7GGOMiU6yYavdmpoatXUkxpixeqHRz8SifI6fOjHZoSSEiLykqjXRzrMSKcYYE6MbHt3A7U++luwwUk5WlEgxxpix6hsI0NLRRU6k+ahZzlokxhgTg5aOLgIKLR3dDAwGkh1OSrFEYowxMWhud4prDASU3ft7khxNarFEYowxMWhu6xx63uTvHOHM7GOJxBhjYtDkf2t8pNkSydtYIjHGmBjsbO9i/jETKMzPodlvNWRD2awtY4yJQZO/k3ccMwFVp3Vi3mItEmOMiWIwoOxq76LKV0KVr9i6tsJYIjHGmCj2HOimf1Cp8hVT5StmZ3sXgUDmVwWJlXVtGWNMFMExkSpfMYMBpXcgwN5DPUwtLUpyZKnBEokxxkQRnO5b7Ssh4K5FbGrrskTisq4tY4yJYqe/i4K8HI6dWEiVr9g51m7jJEGWSIwxJoomfyeVZcXk5AhTSwvJzxWbuRXCEokxxkTR7O+i2m2J5OXmMGNyMTstkQyxRGKMMSNQVZr9XVSWlQwdq/IVW5mUEJ4mEhFZJiLbRKReRG4a5pxLRWSLiGwWkV+FHB8UkfXuY0XI8Vki8oKIbBeRh9xtfI0xxhOth3rp7h+kurx46Fi1r4RmfxfZsDFgLDxLJCKSC9wFnAcsAK4QkQVh58wDlgO1qnoC8MWQt7tVdbH7uDDk+HeBO1R1HtABfMar72CMMcGqv5VlbyWSyrJiDvcO0N7Zl6ywUoqXLZIlQL2qNqpqH/AgcFHYOVcDd6lqB4Cq7hvphiIiwNnAo+6h+4GL4xq1McaEaGp7a+pvULB1YgPuDi8TyXRgV8jrFvdYqPnAfBFZLSJrRWRZyHuFIlLnHg8mCx+wX1UHRrinMcbETbO/i9wcYfrkt9aMVLlJxUqlOLxckBhpQ8rwDsU8YB7wXmAG8DcRWaiq+4FKVd0tIrOBP4vIRuBgDPd0PlzkGuAagMrKytF9A2NM1mtu72L6pCLyc9/6vXvG5CJEsCrALi9bJC3AzJDXM4DdEc55QlX7VXUHsA0nsaCqu90/G4HngJOBNmCSiOSNcE/c6+5W1RpVramoqIjPNzLGZJ1mf+fQIsSgcXm5TCstshaJy8tEsg6Y586yKgAuB1aEnfM4cBaAiJTjdHU1ishkERkXcrwW2KLOFIlngY+6118JPOHhdzDGZLmmts63jY8EVZcX2xiJy7NE4o5jXAusArYCD6vqZhG5VUSCs7BWAX4R2YKTIG5QVT9wPFAnIq+6x29X1S3uNTcC14tIPc6Yyc+8+g7GmOy2v6uPgz0DR7RIACrLStjZbokEPC7aqKorgZVhx24Jea7A9e4j9Jw1wInD3LMRZ0aYMcZ4qmmo6m+EFomvmPbOPg5091NalJ/o0FKKrWw3xphhNA9V/T2yRTJUvNG6tyyRGGPMcIKzsmaWRUok7hRgqwJsicQYY4bT5O9kamkhhfm5R7wXbJHYFGBLJMYYM6yd/q6IA+0AxQV5VEwYZ1OAsURijDHDavJ3UVV25EB7ULXPpgCDJRJjjInocO8AbYd7qSqP3CIBZ5zEWiSWSIwxJqLgbKyRWiRVZcXsPdhLd99gosJKSZZIjDEmgmBLY7gxEoCqcifJZPvCREskxhgTwVuLEYdPJMH1Jdm+W6IlEmOMiWBneye+kgImFA6/aj3Y7ZXtixItkRhjTARNbcNP/Q0qLc5nUnG+tUiSHYAxxqSiZn/kqr/hqsqKs35RoiUSY4wJ09M/yJ6DPVRGaZGAOwU4y8ukWCIxxpgwLR1dqBJTi6TaV8wbHd30DQQSEFlqskRijDFhmmOYsRVU6SshoE7yyVaWSIwxJsxI+5CEC04Bbs7itSSWSIwxJkyzv5MJhXlMLo6+YdVQOfm27B0n8TSRiMgyEdkmIvUictMw51wqIltEZLOI/Mo9tlhEnnePbRCRy0LOv09EdojIevex2MvvYIzJPs3+Lqp9JYhI1HPLxxdQXJCb1S0Sz7baFZFc4C7gHKAFWCciK0L2XkdE5gHLgVpV7RCRKe5bXcCnVHW7iEwDXhKRVaq6333/BlV91KvYjTHZrdnfyQnTS2M6V0Tc4o3Zm0i8bJEsAepVtVFV+4AHgYvCzrkauEtVOwBUdZ/75+uqut19vhvYB1R4GKsxxgAwMBigpaM74va6w3HKyVvXlhemA7tCXre4x0LNB+aLyGoRWSsiy8JvIiJLgAKgIeTwt9wurztEZFy8AzfGZK/d+3sYCOiIVX/DVfqKaWnvZjCgHkaWurxMJJE6F8P/K+cB84D3AlcA94jIpKEbiEwF/g/4tKoGJ2kvB44DTgPKgBsjfrjINSJSJyJ1ra2tY/kexpgs0hRD1d9w1b4S+gYD7DnQ7VVYKc3LRNICzAx5PQPYHeGcJ1S1X1V3ANtwEgsiMhH4A/AVVV0bvEBV96ijF/g5ThfaEVT1blWtUdWaigrrFTPGxCZYPr66PPYWSVVZdu/f7mUiWQfME5FZIlIAXA6sCDvnceAsABEpx+nqanTPfwz4hao+EnqB20pBnOkUFwObPPwOxpgs0+zvojA/hykTYu81D+5Lkq2JxLNZW6o6ICLXAquAXOBeVd0sIrcCdaq6wn3vXBHZAgzizMbyi8gngHcDPhG5yr3lVaq6HnhARCpwus7WA5/16jsYY7JPcJ/2WKb+Bk2dWEhBXk7WbrvrWSIBUNWVwMqwY7eEPFfgevcRes4vgV8Oc8+z4x+pMcY4mv2dzDqKbi2AnBxh5uSirJ25ZSvbjTHGFQgoO9uj70MSSXUWryWxRGKMMa69h3roHQjEVGMrXHBRotPRkl0skRhjjCvYooilfHy4Kl8x3f2DtB7qjXdYKc8SiTHGuJpHsYYkqCqLqwBbIjHGGFeTv4v8XGFqaeFRXxtsxTRlYRVgSyTGGOPa6e9i5uRi8nKP/p/G6ZOLyM0RdlqL5Egi8lkRia0MpjHGpLEmf2dM+7RHkp+bw/RJRUObYmWTWNJuNfCyiPxKRN7vcTzGGJMUqjq0D8loVfmKs3JRYtREoqo34dS/egD4rIhsF5FbRaTa49iMMSZh2jv7ONw7QGXZ6FokEEwk1iKJyK282+Q+AsBU4AkR+Y5nkRljTAIFu6Sqy0efSKp9JRzo7md/V1+8wkoLsYyR/IuIvAj8EHgJWKSqVwMnA5eNeLExxqSJt6b+jr5rK9iaybZxklhqbc0ALlfVxtCDqhoQkQu9CcsYYxKr2d+FCMyYXDTqe1QPVQHuZPHMSVHOzhyxdG09hrPVLQAiMkFEagBU1Uq4G2MyQrO/k2mlRYzLyx31PSqzdF+SWBLJ3UDof5VO4H+9CccYY5Kjub1rTOMjAIX5uRw7sTDrqgDHkkhyQra5DQ6853sXkjHGJF6zv4vKo9infThVvmJ2WovkCDtE5HMikisiOSLyeZzZW8YYkxEO9vTT3tlH9SgXI4aq9pVk3WB7LInkn4H3AXvdx3uAq70MyhhjEinYghjLjK2gSl8xbYd76ewdGPO90kUsCxL3qupHVbVcVStU9VJV3RvLzUVkmYhsE5F6EblpmHMuFZEtIrJZRH4VcvxKd/HjdhG5MuT4qSKy0b3n/5Oj2Q/TGGMiaBpD1d9wwZXx2TTgHnX6r4iMA64CTgCGSmKq6jVRrssF7gLOAVqAdSKyQlW3hJwzD1gO1Kpqh4hMcY+XAV8DagAFXnKv7QB+DFwDrMXZxncZ8GSsX9gYY8I1D7VIxp5IhsrJ+ztZMG3imO+XDmLp2voFTr2tC4AXgDlATwzXLQHqVbVRVfuAB4GLws65GrjLTRCoanCa8QeAp1W13X3vaWCZiEwFJqrq8+5+778ALo4hFmOMGVazv5MpE8ZRXBDL0rqRVWbhviSxJJL5qrocOKyqP8NpASyM4brpwK6Q1y3usbfdG5gvIqtFZK2ILIty7XT3+Uj3NMaYo9LkH90+7ZFMLMzHV1KQVcUbY0kk/e6f+0XkeGACUBXDdZHGLsI3M87DKQj5XuAK4B4RmTTCtbHc0/lwkWtEpE5E6lpbW2MI1xiTrZr9nXEZaA+q9BXT1GYtklA/E5HJOGMWq4DXgf+K4boWYGbI6xnA7gjnPKGq/aq6A9iGk1iGu7bFfT7SPQFQ1btVtUZVayoqKmII1xiTjbr7Btl7sJeqMVT9DVftK8mqDa5GTCTugHmbqnao6rOqWunO3vpRDPdeB8wTkVkiUgBcDqwIO+dx4Cz3s8pxuroacRLWuSIy2U1i5wKrVHUPcEhEznBna30KeCL2r2uMMW8X/Ae/qjx+LZIqXzG7D3TT0z8Yt3umshETiaoOAl8czY1VdQC4FicpbAUeVtXN7l4mwWKPqwC/iGwBngVuUFW/qrYDt+Eko3XAre4xgM8B9wD1QAM2Y8sYMwbBqb/xWIwYVOUrRhVaOrKjVRLLFIVVIvJF4CGcOlsAqOrBaBeq6kqcKbqhx24Jea7A9e4j/Np7gXsjHK8jtsF+Y4yJamgxYhzKowRVhawlmTtlQtzum6piSST/7P75pZBjClTGPxxjjEmsJn8nk4rzKS2OXwnB4KLEbCmVEjWRqOrMaOcYY0y62tneFdcZWwCTi/OZMC4va6YAx7Ky/WORjqvqryIdN8aYdNLk7+TkmZPjek8Roao8e/Zvj6Vr610hzwuBs3G23LVEYoxJa30DAd7o6ObDi+O/rrmqrITNuw/E/b6pKJaurc+Fvnan497nVUDGGJMob+zvJqDxqfobrspXzKrNbzIwGCAvN5Yle+lrNN/uEM56D2OMSWvxrPobrtpXwkBA2b0/ltKE6S2WMZLHeKsMSQ5OFWBbBGiMSXvNbcFEEv8WSbB4Y5O/c+h5popljOTOkOcDQLOqNnkTjjHGJE5zexclBbmUjy+I+72H9iXJglIpsSSS7cA+Ve0BEJEiEZmpqruiXGeMMSmt2d9Fpa8EL/bHmzJhHIX5OUOtnkwWyxjJb4FAyOsA8BtvwjHGmMRp8nfGtTRKqJwcobKsOCsWJcaSSPLcjakAUNVeYJx3IRljjPcGA0pLe7en4xdVvhJ2tluLBJyiiucHX4jIBUD7COcbY0zK23Ogm77BwNBYhheqfc6ixEAg4rZJGSOWMZLPAb8SkbtwZm+1AZ/wNCpjjPFYPPdpH06lr4TegQB7D/UwtbTIs89JtlgWJL4O1Lg7F6Kq+z2PyhhjPPZWIvG2RRL8rExOJFG7tkTkNhGZpKr7VXW/u9nUNxIRnDHGeKXZ30lBXg5TJxZ69hnB0vSZXrwxljGSC0JbIaraAXzIu5CMMcZ7zf4uKsuKycmJ/9TfoGmTCsnLkYyfuRVLIsl1t8oFQEQKgfiv3jHGmARq8nfGdZ/2SPJyc5hZVjy0eVamiiWRPAg8LSJXisincLbHjanyr4gsE5FtIlIvIjdFeP8qEWkVkfXu45/c42eFHFsvIj0icrH73n0isiPkvcWxf11jjAFV9WQfkkictSSZ3bUVy2D7t0VkA/B+QIDvqeofol0nIrnAXcA5QAuwTkRWqOqWsFMfUtVrwz7zWWCxe58ynP3Znwo55QZVfTRaDMYYE0nr4V66+gapLve+Bla1r5iXmztQVU9W0KeCmKr/qurvVfWLqvoFoE1EfhjDZUuAelVtdBc0PghcNIoYPwo8qaqZ3TY0xiRMcMZWpcddW+DMCjvUO0B7Z1/0k9NUTIlERBaKyLdEpAH4PrAjhsumA6H1uFrcY+EuEZENIvKoiETa1vdy4Ndhx77lXnOHiERcZS8i14hInYjUtba2xhCuMSZbNLn1r7xcjBhUNVQFOHN/Fx42kYjIbBH5sohsAu7BWYiYr6rvUtUfxHDvSG248OWdvwOqVXUR8Axwf1gMU4ETccZlgpYDxwGnAWXAjZE+XFXvVtUaVa2pqKiIIVxjTLbY2d5Fbo4wfbL3azuC4zCZXCplpBZJPfAB4COqeoaq3oFTRj5WLUBoC2MGsDv0BFX1u7W7AH4KnBp2j0uBx1S1P+SaPeroBX6O04VmjDExa/J3MX1SEfkJ2LlwZlkRItDUloUtEuAynFbIn0TkRyLyHiK3MoazDpgnIrPc6cOXAytCT3BbHEEXAlvD7nEFYd1awWvEGbW6GNh0FDEZYwzN/k5PS6OEGpeXy7TSooxelDhsIlHVR1T1EmAB8AJOl9KxIvI/InJ2tBur6gBwLU631FbgYVXdLCK3isiF7mnXichmEXkVuA64Kni9iFTjtGj+EnbrB0RkI7ARKAe+GcsXNSbVdPcNZvz6glTV7O9KWCIBZ5wkkze4imX67yGcsYv7RaQcp6XydeDPMVy7ElgZduyWkOfLcRJUpGubiDA4r6pRk5gx6eAHz7zOL9c2U/eVcygqyE12OFljf1cfB7r7EzLQHlTlK2bV5r0J+7xEO6oOQlVtU9W7VPXdXgVkTLb4y+utdPYN8lJzR7JDySqJKNYYrspXQntnHwd7+qOfnIa8H2kyxhyh7XAvr715CIDVDW1Jjia7BFeZJ7JrK1gFOFO7Mi2RGJMEaxr8AJSVFLCm3hJJIiVyMWJQpVsFOFNLpVgiMSYJ1tS3MaEwj4+fXsnGNw5woDszuzxSkbM3SCGF+Ykbl6oK2ZckE8WyH0mHiLSHPXaIyCPuzCpjzFFa3dDGGbN9vGteBQGFtY3+ZIeUNZr9nQltjQCUjMujYsK4jJ0CHEuL5H+ArwJzgLnAV4D7gMdxFgQaY47CrvYudrV3UzvHx+KZkyjKz7XurQRq8ncldMZWUFVZccaWSYklkZzrztTqUNV2Vf0RcJ6qPoBTosQYcxRWu0mjdm45BXk5LJlVxuoGa5EkQmfvAG2He6lKQNXfcFW+kuwebBeRj4Q9D65wD3gRlDGZbHWDnykTxjF3yngAauf6qN93mL0He5IcWeYbmvpblvgWSbWvmDcP9tDdN5jwz/ZaLInkE8DV7tiIH7ga+KSIFANf9DQ6YzKMqvJ8QxtL5/iG9qZYOqccgDU2DdhzzUmY+htUGZwCnIEr3KMmElWtV9XzVLVMVX3u89dVtUtVw8uXGGNGsG3vIdoO97F0bvnQsQVTJzKpOJ/V9da95bVgmZJkJJLguEwmDrhHLZHilkX5R6A69HxVvca7sIzJTGvcZLF0jm/oWE6OcOZsH2vq2zJ6F71U0OzvxFdSwITC/IR/diZPAY6aSIAngLXA34HM69wzJoHWNLRR5StmxuS3/0a8dG45T256k2Z/F9Xlie+/zxZNbYkt1hhqUnEBpUX5GbkoMZZEUqKqX/I8EmMy3MBggBca27ngpGlHvFfrtlBWN7RZIvHQzvYulsxK3mTTal9xdo6RAE+KyLmeR2JMhtvwxgEO9Q5QO9d3xHuzykuYWlo41PVl4q93YJDdB7qT1iIBqPSVZGSLJJZE8lngjyJy2J251SEi7V4HZkymCS46PHP2kYlERFg6p5w1DW0EAuE7Upt42NXejWpi9mkfTrWvmDc6uukbyKyVE7EkknIgHygFKtzXtgm6MUdpdb2f46dOxDd+XMT3a+f66OjqZ+ubBxMcWXYIzpaqTGKLpMpXQkDhjf3dSYvBC8MmEhGZ5z49YZhHVCKyTES2iUi9iNwU4f2rRKRVRNa7j38KeW8w5PiKkOOzROQFEdkuIg+52/gak9J6+gd5aWfH0FhIJLXulGDr3vJGsDxJMlskwW61TOveGmmw/SbgM8BdEd5TYMTNrUQk1732HKAFWCciK1R1S9ipD6nqtRFu0a2qiyMc/y5wh6o+KCI/cWP88UixGJNsdU0d9A0EhpJFJMdMLGRORQmrG9q4+t2zExhddtjp72RCYR6TixM/9TdoaApwWye8I2lhxN2wiURVP+M+PVtV31bjWkRi+T+xBKhX1Ub3mgeBi4DwRBIzcSbYnw18zD10P862v5ZITEpb3dBGXo5EnTFUO7ecR19qoW8gQEGe7fIQT03uPu3JXKdTMX4cxQW5Gbd/eyw/qS/EeCzcdGBXyOsWIuzBDlwiIhtE5FERmRlyvFBE6kRkrYhc7B7zAftVdSDKPY1JKWsa/CyeOYmScSPPuF86x0dX3yCvtuxPUGTZo9nfmdDtdSMRESrLijNuUeJIYyRTROQkoEhEThSRRe7jnUAso1WR0n74dJTfAdWqugh4BqeFEVSpqjU4rY8fiMicGO8ZjP8aNxHVtba2xhCuMd440N3Pxpb9b1vNPpwzZvsQeatCsImPgcEALR3dQ1veJlO1ryTjyqSM1CL5IHAnMANnrCP4+DLO/iTRtAChLYwZwO7QE1TVr6q97sufAqeGvLfb/bMReA44GWgDJolI8Ne6I+4Zcv3dqlqjqjUVFTbJzCTPC41+Asrb6msNZ1JxAQunldqAe5zt3t/DQECTUvU3XJWvmF3t3Qxm0DTvYROJqv5cVd8FfEZV362q73If56vqIzHcex0wz51lVQBcDqwIPUFEpoa8vBDY6h6fLCLj3OflQC2wRVUVeBb4qHvNlTglXIxJWWsa/BTm53By5aSYzl8618cruzro6huIfrKJSVMSq/6Gq/KV0DcYYM+BzJkCHMsYyRQRmQggIj8RkRdF5H3RLnLHMa4FVuEkiIdVdbOI3CoiF7qnXScim0XkVeA64Cr3+PFAnXv8WeD2kNleNwLXi0g9zpjJz2L6psYkyer6Nk6rLmNcXmx7hNfOKad/UFnX1OFxZNnjraq/yW+RBLvXMmmTq1hqbV2jqne6ZVJmAJ8D7iakG2o4qroSWBl27JaQ58uB5RGuWwOcOMw9G3FmhBmT8vYd7GH7vsNccuqMmK85rbqMgtwc1tS38Z751i0bD81tnRTm5zBlQuTFoIlUObSWpIulc5McTJzE0iIJduSdB/xcVV+K8Tpjst4adwvd2jnRx0eCigpyOblyEqtto6u4afJ3UVVWQk5O8kv0Ty0toiA3J6MG3GNJCK+KyErgQzgFHMczzEwpY8zbra5vo7QonwXTJh7VdbVzy9m8+yD7u/o8iiy77GzvTGpplFC5OcLMsqKMmgIcSyL5NM6ivyWq2gUU4qwmN8aMQFVZ0+DnzNk+co/yN+HauT5U4fkGm701VoGAOvu8pEgiAWesJpPKpMSy1e4gMBtnbASgKJbrjMl2zf4u3tjfHbFsfDSLZkyipCDXurfiYN+hXnoHAikx0B5U5e5L4kxETX9RE4KI3AmcBXzCPdQJ/MTLoIzJBMHxkVjWj4TLz81hyawyW08SB6k09Teo2ldCV98grYd7o5+cBmJpWSxV1X8GegBUtR2wirvGRLG6oY1jJo5j9ih3PKydW05jW2dGrTdIhuCgdjKr/oarzLD922NJJP0ikoM7wC4iPiCzdmUxJs4CAeX5Bj+1c8pHXSRwqTvTa7W1Ssak2d9Ffq4wtbQw2aEMCSa1jE8kIWVI7gJ+A1SIyDeAv+OUcjfGDOO1Nw/R3tk3qm6toOOOnUBZScHQzopmdJr9XcyYXExebuoM7U6fVESOkDFTgEdakPgicIqq/kJEXgLej1M08R9UdVNCojMmTa1xB8lHM9AelJMjnDnHx5oGP6qa1PLn6azJ35lS4yMABXk5TJ9cNLTZVrobKZEM/dSq6mZgs/fhGJMZVte3Mbu8hKmlRWO6T+2ccv6wYQ+NbZ3MqRgfp+iyh6qy09/FadUj7wOTDNW+EnZmQYukQkSuH+5NVf1vD+IxJu31DwZ4cUc7Hz5l7FvlBFs0a+rbLJGMQntnH4d6B6gsS60WCUBlWTG/37An2WHExUidhrnAeGDCMA9jTASv7tpPZ9/gUZVFGU5lWTHTJxXZgPsoDe3TXp56iaTaV8KB7v6MqF4wUotkj6remrBIjMkQq+v9iMCZMWxkFY2IUDvXx6rNexkM6FGvkM92O9udrqPKFNiHJFxVyBTgScXpvaJipBaJ/cQaMwprGto4YdrEuP3jUDu3nAPd/WzZfTAu98smTW1diMDMsrGNVXkhuNI+E0qljJRIou45Yox5u+6+QV7ZuT8u3VpBZ852WjZWLuXo7WzvYlppUcx7wSRScNwmE9aSjLRDYnsiAzEmE6xraqdvMBCXbq2gKRMLmTdlvO3jPgqpOPU3qKggl2MmjsvyPNd9AAAYyElEQVTsRGKMOXqrG9rIzxWWzIrvdNPaueWsa2qnd2AwrvfNdM3+rpQq1hiuyleSEYsSPU0kIrJMRLaJSL2I3BTh/atEpFVE1ruPf3KPLxaR591teDeIyGUh19wnIjtCrlns5Xcw5misqfdz8szJFBfEsvlo7JbO8dHTH+CVnfvjet9MdrCnn/bOvpQqHx+u2lecEYsSPUskIpKLU17lPGABcIWILIhw6kOquth93OMe6wI+paonAMuAH4jIpJBrbgi5Zr1X38GYo7G/q49Nuw+wdAyr2Ydz+mwfOYKVSzkKwT3RU7VrC5wWSdvhXjp7B5Idyph42SJZAtSraqOq9gEPAhfFcqGqvq6q293nu4F9gG1ebVLa2kY/qk43VLyVFuVz4oxJQ6XpTXRvlY9P5a6tzBhw9zKRTAd2hbxucY+Fu8TtvnpURGaGvykiS3DK1jeEHP6We80dIjIurlEbM0qr6/0UF+Ry0oxJ0U8ehdo5Ptbv2p/2v70mSnMatEiCVYCD613SlZeJJNI6lPDtwH4HVKvqIuAZ4P633UBkKvB/wKdVNVi6fjlwHHAaUAbcGPHDRa4RkToRqWttbR39tzAmRqsb2lgyq4yCPG/+WtXOLWcgoLy4wyZUxqLZ30nFhHFxH6+Kp+C+JOk+TuJlImkBQlsYM4DdoSeoql9Vg1uE/RQ4NfieiEwE/gB8RVXXhlyzRx29wM9xutCOoKp3q2qNqtZUVFivmPHWmwd6aGztjOv6kXCnVk2mIC/HpgHHqCnF9mmPZGJhPmUlBWk/c8vLRLIOmCcis0SkALgcWBF6gtviCLoQ2OoeLwAeA36hqo9EukacmtoXA1bS3iRdsGy8FwPtQYX5udRUTWa1jZPEZGeKT/0NqvIV2xjJcFR1ALgWWIWTIB5W1c0icquIXOiedp07xfdV4DrgKvf4pcC7gasiTPN9QEQ2AhuBcuCbXn0HY2K1ut7P5OJ8jj92oqefs3SOj617DuLPkL2+vdLdN8ibB3uoSsGqv+GqytI/kXjaeaiqK4GVYcduCXm+HGfMI/y6XwK/HOaeZ8c5TGPGRFVZ09DGmXN85HhcVHHp3HJ46nWeb/RzwaJpnn5WOtvZ7g60l6dDi6SEJ17dTe/AYEqWcomFrWw3Zox2tHWy50DP0B7rXlo0vZQJ4/KsrHwUwTGHtGiR+IpRhV3t3ckOZdRSdzqDMWkiOGbhxfqRcHm5OZw+u2xoTCbd9Q8G+NPWfXT1xXdK89+3O/99qtNijMSJ8bcvt3DcVKdrVDV8guvbhb6tYZNhwy99/4JjmFiYP/ZAR2CJxJgxWlPfxrTSwoTNEFo6p5xntu6jpaOLGZNT/zfukXzz91u4//lmT+49fVIRpcXe/gMaD3MrxlOQl8OPnmuIfvIoPHP9eyyRGJPKAgHl+UY/7z/+GJyJhN4LtnzW1Pu59LT0TSR/3LSH+59v5qql1Xy6tjru9/eNT4+1yqXF+fz9xrM42N0fcvTtP0uhP1rytuMyzPG3nk8t9X4vFkskxozBlj0H2d/VP7S3eiLMP2Y85ePHsaahjUtPO6IYRFrY1d7FDY9u4KQZpXz5/OM9W8SZLqZMKGTKhMJkhzFq2f1/z5gxCi4OTMRAe5CIsHSOj9UN/qh96amobyDAtb9+BYA7P3ZK1ieRTGD/B40Zg9UNfuZOGc8xExP722TtXB+th3qp33c4oZ8bD/+56jVe3bWf716yiJlpMKvKRGeJxJhR6hsIsG5HO7Vx3A0xVsEWULqVS/nT1r389G87+OQZVZx/4tToF5i0YInEmFFav2s/3f2DziLBBJtZVszMsqK0Kpey50A3X3rkVY6fOpGbP3h8ssMxcWSJxJhRWl3fRo7AGbMS3yIBqJ1TztpGPwODgegnJ9nAYIDrfv0KfQMB7vrYyRTmp+cKbhOZJRJjRmlNQxsLp5cmba3C0rnlHOoZYNPug0n5/KPxg2e2s66pg29/+ERmV4xPdjgmziyRGDMKnb0DvLJzf0Jna4Vb6o7NpPo4yd+2t3LXc/VcWjODi0+OtLedSXeWSIwZhReb2hkIaELXj4QrHz+O446dkNLlUvYd6uHfHlrP3IrxfP3CE5IdjvGIJRJjRmFNfRsFuTnUVJUlNY6lc8qpa+qgp38wqXFEMhhQvvjgeg73DnDXx09J6Z0KzdhYIjFmFFbX+zmlahJFBckdNK6d66N3IMDLzR1JjSOSHz1bz5oGP9+48ATmHzMh2eEYD1kiMeYotXf2sWXPQU+31Y3Vklll5OYIa1JsGvALjX7ueOZ1Ll48jUtr0rOMi4mdJRJjjtLaRucf7WSsHwk3oTCfk2aUsjqFxkn8h3u57sFXqPKV8M0Pn5iwYpYmeTxNJCKyTES2iUi9iNwU4f2rRKQ1ZDvdfwp570oR2e4+rgw5fqqIbHTv+f/EfkpNgq2ub2P8uDxOmlGa7FAAZ5xkQ8sBDvX0Rz/ZY4GA8qVHXqWjq587P3Yy48fZuEg28CyRiEgucBdwHrAAuEJEFkQ49SFVXew+7nGvLQO+BpwOLAG+JiKT3fN/DFwDzHMfy7z6DsZEsqbBz+mzysjLTY0G/dK5PgYDyguN7ckOhXv+3shz21r56geP54RpqZFojfe8/JuwBKhX1UZV7QMeBC6K8doPAE+raruqdgBPA8tEZCowUVWfV6fs6S+Ai70I3phIdu/vZkdbJ2cmob7WcE6pnMy4vJykd2+9vLOD7/1xG+ctPJZPnFGV1FhMYnmZSKYDu0Jet7jHwl0iIhtE5FERCY7KDXftdPd5tHsa44ng4r9EbKsbq8L8XE6rLmNNEvdxP9DVz7/+6hWOLS3k9ksW2bhIlvEykUT6SQrfPOF3QLWqLgKeAe6Pcm0s93RuIHKNiNSJSF1ra2uMIRszsjUNfnwlBbwjxaazLp3rY9veQ7Qe6k34Z6sqNzz6KnsP9nDnx06htCj1t7c18eVlImkBQuf9zQB2h56gqn5VDf7k/xQ4Ncq1Le7zYe8Zcu+7VbVGVWsqKipG/SWMCVJVVte3ceYcHzk5qfUbd3AqcjJWud+/pomntuzlpvOOY/HMSQn/fJN8XiaSdcA8EZklIgXA5cCK0BPcMY+gC4Gt7vNVwLkiMtkdZD8XWKWqe4BDInKGO1vrU8ATHn4HY4Y0tB5m36HelOrWClo4vZSJhXkJ797a2HKAb698jfcdN4XPvHNWQj/bpA7P5uap6oCIXIuTFHKBe1V1s4jcCtSp6grgOhG5EBgA2oGr3GvbReQ2nGQEcKuqBqekfA64DygCnnQfxnhutfuPdCosRAyXmyOcMduX0AH3Qz39XPvrl/GNL+D7/3CSjYtkMU8neavqSmBl2LFbQp4vB5YPc+29wL0RjtcBC+MbqTHRra5vY8bkIip9qbk9bO3ccp7aspdd7V2eb2Grqiz/7UZaOrp58JozmFxS4OnnmdSWGhPhjUlxgwFlbaM/JVsjQcFKxIkoK//gul38fsMerj9nPqdVJ7dwpUk+SyTGxGDz7gMc7BlgaRLLxkczp2I8UyaM83z73dfePMjXV2zmXfPK+dx75nj6WSY9WCIxJgbB8ZFkbmQVjYiwdI6P5xvacNbrxl9X3wCff+BlJhbl89+XLk652WsmOSyRZIg19W1c/Ys6ntu2L9mhZKQ1DW3MP2Y8FRPGJTuUES2dW07b4T627T3kyf1veWIzjW2d/PCyxSn/38IkjlVUS3MHuvr51sotPFzXQl6O8PSWvVy8eBpfvWABvvHZ9xe9p3+Qzt6BuN5zIKCsa2rn8tMq43pfLwSnJj+zZS9lxREGwCM0ICTiOl8In4T1zJa9PPpSC9e9b15KVD42qcMSSZpSVZ7c9Ca3PLGZjq4+PvueOXzuvXP42d938OPn6vnL663c8qEFXLx4elZMy+zqG+Cnf93BT/7SQLdHuwW+Mw3+8Zw+qYhZ5SV8/6nX+f5Tr8f9/qfPKuML75sX9/ua9CZe9aWmkpqaGq2rq0t2GHHz5oEevvrEJp7espeF0ydy+0cWsXD6W5VWt715iBt/s4H1u/bz7vkVfOvihZ5PB02WQEB57JU3+M9V23jzYA/nn3gsZ8yO/4B4cUEeHz55OrlpMCawZfdBXtl15I6Jkf6qD/u3P8LJebk5nL9wKqXFVgIlW4jIS6paE/U8SyTpIxBQfvXiTr775Gv0BwJcf858/rF2VsRy5oMB5f+eb+J7q7ahCv/+gXdw1dLqtPiHMFZrG/188w9b2PTGQU6aUcpXL1hAjU1FNSZuLJGEyIREUr/vMMt/u4F1TR3UzvXx7Q+fSJWvJOp1b+zv5ubHNvLctlZOmjmJ715yIscdOzEBEXunqa2T7zy5lVWb9zKttJAbzzuODy2aZjOIjIkzSyQh0jmR9A0E+N+/NPA/f66nqCCXr3zweD566oyjGvdQVVa8uptv/G4LB7v7+ex75nDt2XMpzM/1MPL4O9DVz//8eTv3P99Efm4O//LeOXzmnbMpKkiv72FMuog1kdhgewp7ZWcHN/1mI9v2HuKCRVP52odOGNWUSxHhosXTede8Cr75hy3c+Ww9Kzfu4TsfOZHTPRhPiLf+wQAPrG3mB3/azoHufi6rmcn1585nyoTCZIdmjMFaJCmps3eA7z+1jfvWNHHsxEJuu2gh719wTNzu/9fXW/nyY06dpI+dXslN5x3HxMLUG0BVVf782j6+tXIrja2d1M71cfP5C1gwLb275oxJF9a1FSKdEslz2/Zx82Ob2H2gm0+eUcUNH3gHEzz4R76rb4D/fup17l29g4oJ47j1ooV84IRj4/45o7Vl90G+tXILq+v9zK4o4ebzj+fs46ZkxVRmY1KFJZIQ6ZBI/Id7ue33W3h8/W7mThnP7R85MSEzkF7dtZ8bf7OB1948xHkLj+UbF57AlInJ6zLad7CH/3rqdR5+aRelRfn82/vn87HTK8mPMDPNGOMtSyQhUjmRqCqPr3+DW3+3hcO9A/zLe+fyL2fNYVxe4gaQ+wcD3P3XRn74p+2My8vh5vOP57LTZib0t/+e/kF++tdGfvyXBvoHA1x5ZjX/evY8W7NgTBJZIgmRqolkV3sXNz++ib++3srJlZP47iWLmJ/EvcAbWw+z/LcbeWFHO2fMLuM7H1nErPLoU4zHIhBwZpR974+vsftAD8tOOJabzjuOao8/1xgTnSWSEKmWSAYDyn1rmvj+qm3kCPzHsuP4xBlVKbFYMBBQHqrbxbdXbqVvIMAX3j+Pq98125Oupbqmdm77w1Ze3bWfE6eX8pUPHp8Ws8iMyRYpkUhEZBnwQ5ytdu9R1duHOe+jwCPAaapaJyIfB24IOWURcIqqrheR54CpQLf73rmqOmLJ29Emkpsf28iLO9qjn3iUDvcOsOdAD2e9o4JvfvhEpk8qivtnjNXegz187YnN/HHzmxwzcVzcB/wDAaWxrZNjJxbyH8vewcWLp9uCQmNSTNLXkYhILnAXcA7QAqwTkRWquiXsvAnAdcALwWOq+gDwgPv+icATqro+5LKPu1vuemrapCLmHTM+7vcVhGULj+WCRVNTdhbSMRML+cknT+WPm97k9xt2R6zTNFaXnDqDT9dWU1xgy5mMSWde/g1eAtSraiOAiDwIXARsCTvvNuB7wL8Pc58rgF97FeRIPn/W3GR8bEpZtvBYli1MnWnBxpjU4+WcyunArpDXLe6xISJyMjBTVX8/wn0u48hE8nMRWS8iX5VhfqUXkWtEpE5E6lpbW0cRvjHGmFh4mUgi/QM/1EEiIjnAHcCXhr2ByOlAl6puCjn8cVU9EXiX+/hkpGtV9W5VrVHVmoqKitHEb4wxJgZeJpIWYGbI6xnA7pDXE4CFwHMi0gScAawQkdCBncsJa42o6hvun4eAX+F0oRljjEkSLxPJOmCeiMwSkQKcpLAi+KaqHlDVclWtVtVqYC1wYXAQ3W2x/APwYPAaEckTkXL3eT5wARDaWjHGGJNgng22q+qAiFwLrMKZ/nuvqm4WkVuBOlVdMfIdeDfQEhysd40DVrlJJBd4BvipB+EbY4yJkS1INMYYE1Gs60isEp4xxpgxsURijDFmTLKia0tEWoHmUV5eDrTFMRyvpVO8Fqt30inedIoV0ivescZapapR109kRSIZCxGpi6WPMFWkU7wWq3fSKd50ihXSK95ExWpdW8YYY8bEEokxxpgxsUQS3d3JDuAopVO8Fqt30inedIoV0ivehMRqYyTGGGPGxFokxhhjxsQSyQhEZJmIbBORehG5KdnxDEdEZorIsyKyVUQ2i8gXkh1TNCKSKyKviMhIWwikBBGZJCKPishr7n/jM5Md03BE5N/cn4FNIvJrESlMdkyhROReEdknIptCjpWJyNMist39c3IyYww1TLz/6f4sbBCRx0RkUjJjDIoUa8h7/y4iGqxVGG+WSIYRssPjecAC4AoRWZDcqIY1AHxJVY/HqaL8+RSONegLwNZkBxGjHwJ/VNXjgJNI0bhFZDrObqM1qroQpx7d5cmN6gj3AcvCjt0E/ElV5wF/cl+nivs4Mt6ngYWqugh4HVie6KCGcR9HxoqIzMTZqXanVx9siWR4Qzs8qmofThXii5IcU0SqukdVX3afH8L5h276yFclj4jMAD4I3JPsWKIRkYk4BUR/BqCqfaq6P7lRjSgPKBKRPKCYt2/dkHSq+legPezwRcD97vP7gYsTGtQIIsWrqk+p6oD7ci3OFhlJN8x/W3D2ffoPQvaDijdLJMOLusNjKhKRauBk4IXkRjKiH+D8YAeSHUgMZgOtOLtyviIi94hISbKDisTdq+f7OL957gEOqOpTyY0qJseo6h5wfikCpiQ5nqPxj8CTyQ5iOCJyIfCGqr7q5edYIhneiDs8piIRGQ/8Bviiqh5MdjyRiMgFwD5VfSnZscQoDzgF+LGqngx0klpdL0PcsYWLgFnANKBERD6R3Kgyl4jcjNOt/ECyY4lERIqBm4FbvP4sSyTDi7bDY0px92j5DfCAqv422fGMoBa40N0V80HgbBH5ZXJDGlELzr44wRbeoziJJRW9H9ihqq2q2g/8Flia5JhisVdEpgK4f+5LcjxRiciVOBvrfVxTdw3FHJxfKl51/77NAF4WkWPj/UGWSIY34g6PqUREBKcPf6uq/ney4xmJqi5X1RnurpiXA39W1ZT9rVlV3wR2icg73EPvA7YkMaSR7ATOEJFi92fifaToxIAwK4Ar3edXAk8kMZaoRGQZcCPOjq5dyY5nOKq6UVWnhOxC2wKc4v5Mx5UlkmG4g2nBHR63Ag+r6ubkRjWsWuCTOL/dr3cf5yc7qAzyr8ADIrIBWAx8O8nxROS2mh4FXgY24vz9TqlV2CLya+B54B0i0iIinwFuB84Rke04s4tuT2aMoYaJ905gAvC0+3ftJ0kN0jVMrIn57NRtlRljjEkH1iIxxhgzJpZIjDHGjIklEmOMMWNiicQYY8yYWCIxxhgzJpZITEYTkUF3iuYmEXnEXe2bdCLy5SR85n0i8tFEf67JfJZITKbrVtXFbjXcPuCzsV7oVoD2ylEnEo/jMWbULJGYbPI3YC6AiDwuIi+5e3dcEzxBRA6LyK0i8gJwpojcIiLr3BbN3e6KcUTkORG5Q0T+6u5RcpqI/NbdU+ObIff7hIi86LaK/tfdh+V2nAq960XkgeHOixRPyH2PF5EXQ15XuwsmGS7mUCLSFNybQkRqROQ593mJu6/FOrdIZUpWvDapxRKJyQpuWfXzcFZ8A/yjqp4K1ADXiYjPPV4CbFLV01X178Cdqnqa26IpwqmvFNSnqu8GfoJT1uPzwELgKhHxicjxwGVAraouBgZxajPdxFstpY8Pd94w8QCgqluBAhGZ7R66DHjYfT5SzNHcjFO25jTgLOA/U7XasUkdeckOwBiPFYnIevf533D3FcFJHh92n88E5gF+nH/EfxNy/Vki8h84e3uUAZuB37nvBWuvbQQ2B0uhi0ije893AqcC69xGQRGRCxK+b4TzwuMJ9TBwKU5JkcvcR7SYozkXp6jmv7uvC4FK0qNml0kSSyQm03W7v+UPEZH34lTKPVNVu9xuneCWtD2qOuieVwj8CGfHwV0i8vWQ8wB63T8DIc+Dr/NwtiK4X1Wj7aA30nlD8UTwEPCIiPwWUFXdHkPMQQO81SMR+r4Al6jqtigxGzPEurZMNioFOtwkchzO9sSRBP+BbRNnr5ejnfH0J+CjIjIFhvYmr3Lf6xen9H+084alqg04LZav4iSVo4m5CacVBHBJyPFVwL+GjAWdHC0OYyyRmGz0RyDPHZy+DWe71CO4W+r+FKfr6nGcrQVipqpbgK8AT7mf9TQw1X37bmCDiDwQ5bxoHgI+gTs+chQxfwP4oYj8DScZBd0G5LuxbXJfGzMiq/5rjDFmTKxFYowxZkwskRhjjBkTSyTGGGPGxBKJMcaYMbFEYowxZkwskRhjjBkTSyTGGGPGxBKJMcaYMfn/+QOjBjzdb3YAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# for Logistic Regression\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import log_loss\n",
"\n",
"# import Matplotlib (scientific plotting library)\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"c_list = [0.1, 0.01, 0.001]\n",
"solver_list = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']\n",
"idx = []\n",
"\n",
"accuracy_score = []\n",
"for idx1, c in enumerate(c_list):\n",
" for idx2, sol in enumerate(solver_list):\n",
" idx.append(idx2 + idx1 * 5)\n",
" # perform the test\n",
" LR = LogisticRegression(C=c, solver=sol).fit(X_train, y_train)\n",
" # it can predict the outcome\n",
" lr_yhat = LR.predict(X_test)\n",
" lr_prob = LR.predict_proba(X_test)\n",
" print(\"Test \", (idx2 + idx1 * 5), \": Accuracy at c =\", c,\"solver=\", sol,\n",
" \"is : \", log_loss(y_test, lr_prob))\n",
" accuracy_score.append(log_loss(y_test, lr_prob))\n",
"lr_prob = LR.predict_proba(X_test)\n",
"log_loss(y_test, lr_prob)\n",
"# plot the relationship between K and testing accuracy\n",
"plt.plot(idx, accuracy_score)\n",
"plt.xlabel('Parameter value')\n",
"plt.ylabel('Testing Accuracy')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The result shows that using c=0.001 and solver=liblinear gives the highest accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Logistic Regression test using c=0.001 and solver=liblinear"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for Logistic Regression\n",
"from sklearn.linear_model import LogisticRegression\n",
"# prepare LR setting\n",
"LR = LogisticRegression(C=0.001, solver='liblinear').fit(X_train, y_train)\n",
"LR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Evaluation using Test set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### First, download and load the test set:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2018-11-21 02:28:14-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 3642 (3.6K) [text/csv]\n",
"Saving to: ‘loan_test.csv’\n",
"\n",
"loan_test.csv 100%[=====================>] 3.56K --.-KB/s in 0s \n",
"\n",
"2018-11-21 02:28:14 (88.2 MB/s) - ‘loan_test.csv’ saved [3642/3642]\n",
"\n"
]
}
],
"source": [
"!wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Load Test set for evaluation "
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0.49362588 0.92844966 3.05981865 1.97714211 -1.30384048 2.39791576\n",
" -0.79772404 -0.86135677]\n",
" [-3.56269116 -1.70427745 0.53336288 -0.50578054 0.76696499 -0.41702883\n",
" -0.79772404 -0.86135677]\n",
" [ 0.49362588 0.92844966 1.88080596 1.97714211 0.76696499 -0.41702883\n",
" 1.25356634 -0.86135677]\n",
" [ 0.49362588 0.92844966 -0.98251057 -0.50578054 0.76696499 -0.41702883\n",
" -0.79772404 1.16095912]\n",
" [-0.66532184 -0.78854628 -0.47721942 -0.50578054 0.76696499 2.39791576\n",
" -0.79772404 -0.86135677]]\n",
"(54, 8)\n",
"['PAIDOFF' 'PAIDOFF' 'PAIDOFF' 'PAIDOFF' 'PAIDOFF']\n",
"(54,)\n"
]
}
],
"source": [
"test_df = pd.read_csv('loan_test.csv')\n",
"# convert date time\n",
"test_df['due_date'] = pd.to_datetime(test_df['due_date'])\n",
"test_df['effective_date'] = pd.to_datetime(test_df['effective_date'])\n",
"test_df['dayofweek'] = test_df['effective_date'].dt.dayofweek\n",
"# evaulate weekend field\n",
"test_df['weekend'] = test_df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\n",
"# convert male to 0 and female to 1\n",
"test_df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\n",
"# work out education level\n",
"test_feature = test_df[['Principal','terms','age','Gender','weekend']]\n",
"test_feature = pd.concat([test_feature,pd.get_dummies(test_df['education'])], axis=1)\n",
"test_feature.drop(['Master or Above'], axis = 1,inplace=True)\n",
"# Testing feature\n",
"X_loan_test = test_feature\n",
"# normalize the test data\n",
"X_loan_test = preprocessing.StandardScaler().fit(X_loan_test).transform(X_loan_test)\n",
"# and target result\n",
"y_loan_test = test_df['loan_status'].values\n",
"y_loan_test[0:5]\n",
"print (X_loan_test[0:5])\n",
"print (X_loan_test.shape)\n",
"print (y_loan_test[0:5])\n",
"print (y_loan_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluate Result\n",
"\n",
"Evaulate the result by using 3 diferent algorithms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Jaccard"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.67, 0.74, 0.8, 0.78]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Jaccard setup\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"\n",
"# evaluate KNN\n",
"knn_yhat = KNN.predict(X_loan_test)\n",
"jc1 = round(jaccard_similarity_score(y_loan_test, knn_yhat), 2)\n",
"# evaluate Decision Trees\n",
"dt_yhat = DT.predict(X_loan_test)\n",
"jc2 = round(jaccard_similarity_score(y_loan_test, dt_yhat), 2)\n",
"#evaluate SVM\n",
"svm_yhat = SVM.predict(X_loan_test)\n",
"jc3 = round(jaccard_similarity_score(y_loan_test, svm_yhat), 2)\n",
"# evaluate Logistic Regression\n",
"lr_yhat = LR.predict(X_loan_test)\n",
"jc4 = round(jaccard_similarity_score(y_loan_test, lr_yhat), 2)\n",
"\n",
"list_jc = [jc1, jc2, jc3, jc4]\n",
"list_jc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### F1-score"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.63, 0.76, 0.76, 0.73]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# F1-score setup\n",
"from sklearn.metrics import f1_score\n",
"\n",
"# evaluate KNN\n",
"fs1 = round(f1_score(y_loan_test, knn_yhat, average='weighted'), 2)\n",
"# evaluate Desision Trees \n",
"fs2 = round(f1_score(y_loan_test, dt_yhat, average='weighted'), 2)\n",
"# evaluate SVM\n",
"fs3 = round(f1_score(y_loan_test, svm_yhat, average='weighted'), 2)\n",
"# evaluate Logistic Regression\n",
"fs4 = round(f1_score(y_loan_test, lr_yhat, average='weighted'),2 )\n",
"\n",
"list_fs = [fs1, fs2, fs3, fs4]\n",
"list_fs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### LogLoss"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['NA', 'NA', 'NA', 0.67]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# LogLoss\n",
"from sklearn.metrics import log_loss\n",
"lr_prob = LR.predict_proba(X_loan_test)\n",
"list_ll = ['NA', 'NA', 'NA', round(log_loss(y_loan_test, lr_prob), 2)]\n",
"list_ll"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Report\n",
"You should be able to report the accuracy of the built model using different evaluation metrics:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Algorithm</th>\n",
" <th>Jaccard</th>\n",
" <th>F1-score</th>\n",
" <th>LogLoss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>KNN</th>\n",
" <td>0.67</td>\n",
" <td>0.63</td>\n",
" <td>NA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Decision Tree</th>\n",
" <td>0.74</td>\n",
" <td>0.76</td>\n",
" <td>NA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SVM</th>\n",
" <td>0.80</td>\n",
" <td>0.76</td>\n",
" <td>NA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Logistic Regression</th>\n",
" <td>0.78</td>\n",
" <td>0.73</td>\n",
" <td>0.67</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Algorithm Jaccard F1-score LogLoss\n",
"KNN 0.67 0.63 NA\n",
"Decision Tree 0.74 0.76 NA\n",
"SVM 0.80 0.76 NA\n",
"Logistic Regression 0.78 0.73 0.67"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"# fomulate the report format\n",
"df = pd.DataFrame(list_jc, index=['KNN','Decision Tree','SVM','Logistic Regression'])\n",
"df.columns = ['Jaccard']\n",
"df.insert(loc=1, column='F1-score', value=list_fs)\n",
"df.insert(loc=2, column='LogLoss', value=list_ll)\n",
"df.columns.name = 'Algorithm'\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Algorithm | Jaccard | F1-score | LogLoss |\n",
"|--------------------|---------|----------|---------|\n",
"| KNN | ? | ? | NA |\n",
"| Decision Tree | ? | ? | NA |\n",
"| SVM | ? | ? | NA |\n",
"| LogisticRegression | ? | ? | ? |"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Want to learn more?\n",
"\n",
"IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).\n",
"\n",
"Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)\n",
"\n",
"\n",
"<hr>\n",
"Copyright &copy; 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Thanks for completing this lesson!\n",
"\n",
"Notebook created by: <a href = \"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment