Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 1155027716/5995d165df407c78c4e3edf111f562b1 to your computer and use it in GitHub Desktop.
Save 1155027716/5995d165df407c78c4e3edf111f562b1 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width = 400, align = \"center\"></a>\n",
"\n",
"<h1 align=center><font size = 5> Classification with Python</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"In this notebook we try to practice all the classification algorithms that we learned in this course.\n",
"\n",
"We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.\n",
"\n",
"Lets first load required libraries:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"import itertools\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.ticker import NullFormatter\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.ticker as ticker\n",
"from sklearn import preprocessing\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### About dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:\n",
"\n",
"| Field | Description |\n",
"|----------------|---------------------------------------------------------------------------------------|\n",
"| Loan_status | Whether a loan is paid off on in collection |\n",
"| Principal | Basic principal loan amount at the |\n",
"| Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |\n",
"| Effective_date | When the loan got originated and took effects |\n",
"| Due_date | Since it’s one-time payoff schedule, each loan has one single due date |\n",
"| Age | Age of applicant |\n",
"| Education | Education of applicant |\n",
"| Gender | The gender of applicant |"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets download the dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2018-11-21 02:25:29-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 23101 (23K) [text/csv]\n",
"Saving to: ‘loan_train.csv’\n",
"\n",
"loan_train.csv 100%[=====================>] 22.56K --.-KB/s in 0.02s \n",
"\n",
"2018-11-21 02:25:30 (1.06 MB/s) - ‘loan_train.csv’ saved [23101/23101]\n",
"\n"
]
}
],
"source": [
"!wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Load Data From CSV File "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/8/2016</td>\n",
" <td>10/7/2016</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/8/2016</td>\n",
" <td>10/7/2016</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>9/8/2016</td>\n",
" <td>9/22/2016</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/9/2016</td>\n",
" <td>10/8/2016</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>9/9/2016</td>\n",
" <td>10/8/2016</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 9/8/2016 \n",
"1 2 2 PAIDOFF 1000 30 9/8/2016 \n",
"2 3 3 PAIDOFF 1000 15 9/8/2016 \n",
"3 4 4 PAIDOFF 1000 30 9/9/2016 \n",
"4 6 6 PAIDOFF 1000 30 9/9/2016 \n",
"\n",
" due_date age education Gender \n",
"0 10/7/2016 45 High School or Below male \n",
"1 10/7/2016 33 Bechalor female \n",
"2 9/22/2016 27 college male \n",
"3 10/8/2016 28 college female \n",
"4 10/8/2016 29 college male "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('loan_train.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(346, 10)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Convert to date time object "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-09-22</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 2016-09-08 \n",
"1 2 2 PAIDOFF 1000 30 2016-09-08 \n",
"2 3 3 PAIDOFF 1000 15 2016-09-08 \n",
"3 4 4 PAIDOFF 1000 30 2016-09-09 \n",
"4 6 6 PAIDOFF 1000 30 2016-09-09 \n",
"\n",
" due_date age education Gender \n",
"0 2016-10-07 45 High School or Below male \n",
"1 2016-10-07 33 Bechalor female \n",
"2 2016-09-22 27 college male \n",
"3 2016-10-08 28 college female \n",
"4 2016-10-08 29 college male "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['due_date'] = pd.to_datetime(df['due_date'])\n",
"df['effective_date'] = pd.to_datetime(df['effective_date'])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Data visualization and pre-processing\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Let’s see how many of each class is in our data set "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"PAIDOFF 260\n",
"COLLECTION 86\n",
"Name: loan_status, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['loan_status'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"260 people have paid off the loan on time while 86 have gone into collection \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets plot some columns to underestand data better:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"## Package Plan ##\n",
"\n",
" environment location: /home/jupyterlab/conda\n",
"\n",
" added / updated specs: \n",
" - seaborn\n",
"\n",
"\n",
"The following packages will be downloaded:\n",
"\n",
" package | build\n",
" ---------------------------|-----------------\n",
" numpy-base-1.15.0 | py36h3dfced4_0 4.2 MB anaconda\n",
" openssl-1.0.2p | h14c3975_0 3.5 MB anaconda\n",
" matplotlib-2.2.2 | py36hb69df0a_2 6.6 MB anaconda\n",
" ca-certificates-2018.03.07 | 0 124 KB anaconda\n",
" mkl_fft-1.0.4 | py36h4414c95_1 150 KB anaconda\n",
" seaborn-0.9.0 | py36_0 379 KB anaconda\n",
" conda-4.5.11 | py36_0 1.0 MB anaconda\n",
" scipy-1.1.0 | py36hc49cb51_0 18.1 MB anaconda\n",
" statsmodels-0.9.0 | py36h035aef0_0 9.0 MB anaconda\n",
" pandas-0.23.4 | py36h04863e7_0 10.1 MB anaconda\n",
" numpy-1.15.0 | py36h1b885b7_0 35 KB anaconda\n",
" patsy-0.5.1 | py36_0 380 KB anaconda\n",
" mkl_random-1.0.1 | py36h4414c95_1 373 KB anaconda\n",
" certifi-2018.10.15 | py36_0 139 KB anaconda\n",
" ------------------------------------------------------------\n",
" Total: 54.0 MB\n",
"\n",
"The following packages will be UPDATED:\n",
"\n",
" certifi: 2018.8.24-py36_1001 conda-forge --> 2018.10.15-py36_0 anaconda\n",
" conda: 4.5.11-py36_0 conda-forge --> 4.5.11-py36_0 anaconda\n",
" matplotlib: 2.2.2-py36h8e2386c_2 conda-forge --> 2.2.2-py36hb69df0a_2 anaconda\n",
" mkl_fft: 1.0.1-py36h3010b51_0 --> 1.0.4-py36h4414c95_1 anaconda\n",
" mkl_random: 1.0.1-py36h629b387_0 --> 1.0.1-py36h4414c95_1 anaconda\n",
" numpy: 1.14.3-py36hcd700cb_1 --> 1.15.0-py36h1b885b7_0 anaconda\n",
" numpy-base: 1.14.3-py36h9be14a7_1 --> 1.15.0-py36h3dfced4_0 anaconda\n",
" openssl: 1.0.2p-h470a237_0 conda-forge --> 1.0.2p-h14c3975_0 anaconda\n",
" pandas: 0.23.0-py36h637b7d7_0 --> 0.23.4-py36h04863e7_0 anaconda\n",
" patsy: 0.5.0-py36_0 --> 0.5.1-py36_0 anaconda\n",
" scipy: 1.1.0-py36hfc37229_0 --> 1.1.0-py36hc49cb51_0 anaconda\n",
" seaborn: 0.8.1-py36hfad7ec4_0 --> 0.9.0-py36_0 anaconda\n",
" statsmodels: 0.9.0-py36h3010b51_0 --> 0.9.0-py36h035aef0_0 anaconda\n",
"\n",
"The following packages will be DOWNGRADED:\n",
"\n",
" ca-certificates: 2018.8.24-ha4d7672_0 conda-forge --> 2018.03.07-0 anaconda\n",
"\n",
"\n",
"Downloading and Extracting Packages\n",
"numpy-base-1.15.0 | 4.2 MB | ##################################### | 100% \n",
"openssl-1.0.2p | 3.5 MB | ##################################### | 100% \n",
"matplotlib-2.2.2 | 6.6 MB | ##################################### | 100% \n",
"ca-certificates-2018 | 124 KB | ##################################### | 100% \n",
"mkl_fft-1.0.4 | 150 KB | ##################################### | 100% \n",
"seaborn-0.9.0 | 379 KB | ##################################### | 100% \n",
"conda-4.5.11 | 1.0 MB | ##################################### | 100% \n",
"scipy-1.1.0 | 18.1 MB | ##################################### | 100% \n",
"statsmodels-0.9.0 | 9.0 MB | ##################################### | 100% \n",
"pandas-0.23.4 | 10.1 MB | ##################################### | 100% \n",
"numpy-1.15.0 | 35 KB | ##################################### | 100% \n",
"patsy-0.5.1 | 380 KB | ##################################### | 100% \n",
"mkl_random-1.0.1 | 373 KB | ##################################### | 100% \n",
"certifi-2018.10.15 | 139 KB | ##################################### | 100% \n",
"Preparing transaction: done\n",
"Verifying transaction: done\n",
"Executing transaction: done\n"
]
}
],
"source": [
"# notice: installing seaborn might takes a few minutes\n",
"!conda install -c anaconda seaborn -y"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"\n",
"bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)\n",
"g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
"g.map(plt.hist, 'Principal', bins=bins, ec=\"k\")\n",
"\n",
"g.axes[-1].legend()\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bins = np.linspace(df.age.min(), df.age.max(), 10)\n",
"g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
"g.map(plt.hist, 'age', bins=bins, ec=\"k\")\n",
"\n",
"g.axes[-1].legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Pre-processing: Feature selection/extraction"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Lets look at the day of the week people get the loan "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df['dayofweek'] = df['effective_date'].dt.dayofweek\n",
"bins = np.linspace(df.dayofweek.min(), df.dayofweek.max(), 10)\n",
"g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
"g.map(plt.hist, 'dayofweek', bins=bins, ec=\"k\")\n",
"g.axes[-1].legend()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less then day 4 "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" <th>dayofweek</th>\n",
" <th>weekend</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>male</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>female</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-09-22</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>female</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>male</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 2016-09-08 \n",
"1 2 2 PAIDOFF 1000 30 2016-09-08 \n",
"2 3 3 PAIDOFF 1000 15 2016-09-08 \n",
"3 4 4 PAIDOFF 1000 30 2016-09-09 \n",
"4 6 6 PAIDOFF 1000 30 2016-09-09 \n",
"\n",
" due_date age education Gender dayofweek weekend \n",
"0 2016-10-07 45 High School or Below male 3 0 \n",
"1 2016-10-07 33 Bechalor female 3 0 \n",
"2 2016-09-22 27 college male 3 0 \n",
"3 2016-10-08 28 college female 4 1 \n",
"4 2016-10-08 29 college male 4 1 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['weekend'] = df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Convert Categorical features to numerical values"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets look at gender:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"Gender loan_status\n",
"female PAIDOFF 0.865385\n",
" COLLECTION 0.134615\n",
"male PAIDOFF 0.731293\n",
" COLLECTION 0.268707\n",
"Name: loan_status, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"86 % of female pay there loans while only 73 % of males pay there loan\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets convert male to 0 and female to 1:\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>loan_status</th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>effective_date</th>\n",
" <th>due_date</th>\n",
" <th>age</th>\n",
" <th>education</th>\n",
" <th>Gender</th>\n",
" <th>dayofweek</th>\n",
" <th>weekend</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>45</td>\n",
" <td>High School or Below</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-10-07</td>\n",
" <td>33</td>\n",
" <td>Bechalor</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>2016-09-08</td>\n",
" <td>2016-09-22</td>\n",
" <td>27</td>\n",
" <td>college</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>28</td>\n",
" <td>college</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>PAIDOFF</td>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>2016-09-09</td>\n",
" <td>2016-10-08</td>\n",
" <td>29</td>\n",
" <td>college</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n",
"0 0 0 PAIDOFF 1000 30 2016-09-08 \n",
"1 2 2 PAIDOFF 1000 30 2016-09-08 \n",
"2 3 3 PAIDOFF 1000 15 2016-09-08 \n",
"3 4 4 PAIDOFF 1000 30 2016-09-09 \n",
"4 6 6 PAIDOFF 1000 30 2016-09-09 \n",
"\n",
" due_date age education Gender dayofweek weekend \n",
"0 2016-10-07 45 High School or Below 0 3 0 \n",
"1 2016-10-07 33 Bechalor 1 3 0 \n",
"2 2016-09-22 27 college 0 3 0 \n",
"3 2016-10-08 28 college 1 4 1 \n",
"4 2016-10-08 29 college 0 4 1 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## One Hot Encoding \n",
"#### How about education?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"education loan_status\n",
"Bechalor PAIDOFF 0.750000\n",
" COLLECTION 0.250000\n",
"High School or Below PAIDOFF 0.741722\n",
" COLLECTION 0.258278\n",
"Master or Above COLLECTION 0.500000\n",
" PAIDOFF 0.500000\n",
"college PAIDOFF 0.765101\n",
" COLLECTION 0.234899\n",
"Name: loan_status, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['education'])['loan_status'].value_counts(normalize=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"#### Feature befor One Hot Encoding"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>education</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>45</td>\n",
" <td>0</td>\n",
" <td>High School or Below</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>Bechalor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>college</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>college</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>college</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender education\n",
"0 1000 30 45 0 High School or Below\n",
"1 1000 30 33 1 Bechalor\n",
"2 1000 15 27 0 college\n",
"3 1000 30 28 1 college\n",
"4 1000 30 29 0 college"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Principal','terms','age','Gender','education']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"#### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>weekend</th>\n",
" <th>Bechalor</th>\n",
" <th>High School or Below</th>\n",
" <th>college</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>45</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender weekend Bechalor High School or Below \\\n",
"0 1000 30 45 0 0 0 1 \n",
"1 1000 30 33 1 0 1 0 \n",
"2 1000 15 27 0 0 0 0 \n",
"3 1000 30 28 1 1 0 0 \n",
"4 1000 30 29 0 1 0 0 \n",
"\n",
" college \n",
"0 0 \n",
"1 0 \n",
"2 1 \n",
"3 1 \n",
"4 1 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Feature = df[['Principal','terms','age','Gender','weekend']]\n",
"Feature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)\n",
"Feature.drop(['Master or Above'], axis = 1,inplace=True)\n",
"Feature.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Feature selection"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Lets defind feature sets, X:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>weekend</th>\n",
" <th>Bechalor</th>\n",
" <th>High School or Below</th>\n",
" <th>college</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>45</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1000</td>\n",
" <td>15</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1000</td>\n",
" <td>30</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender weekend Bechalor High School or Below \\\n",
"0 1000 30 45 0 0 0 1 \n",
"1 1000 30 33 1 0 1 0 \n",
"2 1000 15 27 0 0 0 0 \n",
"3 1000 30 28 1 1 0 0 \n",
"4 1000 30 29 0 1 0 0 \n",
"\n",
" college \n",
"0 0 \n",
"1 0 \n",
"2 1 \n",
"3 1 \n",
"4 1 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = Feature\n",
"X[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"What are our lables?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n",
" dtype=object)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = df['loan_status'].values\n",
"y[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Try to understand the corelation of loan_status and the selected features"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Principal</th>\n",
" <th>terms</th>\n",
" <th>age</th>\n",
" <th>Gender</th>\n",
" <th>weekend</th>\n",
" <th>Bechalor</th>\n",
" <th>High School or Below</th>\n",
" <th>college</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Principal</th>\n",
" <td>1.000000</td>\n",
" <td>0.521876</td>\n",
" <td>-0.060893</td>\n",
" <td>-0.005134</td>\n",
" <td>0.089006</td>\n",
" <td>0.022212</td>\n",
" <td>0.011206</td>\n",
" <td>-0.021506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>terms</th>\n",
" <td>0.521876</td>\n",
" <td>1.000000</td>\n",
" <td>-0.064762</td>\n",
" <td>-0.032399</td>\n",
" <td>0.084842</td>\n",
" <td>-0.057337</td>\n",
" <td>0.101787</td>\n",
" <td>-0.052172</td>\n",
" </tr>\n",
" <tr>\n",
" <th>age</th>\n",
" <td>-0.060893</td>\n",
" <td>-0.064762</td>\n",
" <td>1.000000</td>\n",
" <td>-0.010519</td>\n",
" <td>0.000431</td>\n",
" <td>0.057065</td>\n",
" <td>0.066836</td>\n",
" <td>-0.131585</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Gender</th>\n",
" <td>-0.005134</td>\n",
" <td>-0.032399</td>\n",
" <td>-0.010519</td>\n",
" <td>1.000000</td>\n",
" <td>-0.079157</td>\n",
" <td>0.082229</td>\n",
" <td>-0.043927</td>\n",
" <td>-0.006420</td>\n",
" </tr>\n",
" <tr>\n",
" <th>weekend</th>\n",
" <td>0.089006</td>\n",
" <td>0.084842</td>\n",
" <td>0.000431</td>\n",
" <td>-0.079157</td>\n",
" <td>1.000000</td>\n",
" <td>0.016430</td>\n",
" <td>-0.064819</td>\n",
" <td>0.044184</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bechalor</th>\n",
" <td>0.022212</td>\n",
" <td>-0.057337</td>\n",
" <td>0.057065</td>\n",
" <td>0.082229</td>\n",
" <td>0.016430</td>\n",
" <td>1.000000</td>\n",
" <td>-0.335888</td>\n",
" <td>-0.331958</td>\n",
" </tr>\n",
" <tr>\n",
" <th>High School or Below</th>\n",
" <td>0.011206</td>\n",
" <td>0.101787</td>\n",
" <td>0.066836</td>\n",
" <td>-0.043927</td>\n",
" <td>-0.064819</td>\n",
" <td>-0.335888</td>\n",
" <td>1.000000</td>\n",
" <td>-0.765299</td>\n",
" </tr>\n",
" <tr>\n",
" <th>college</th>\n",
" <td>-0.021506</td>\n",
" <td>-0.052172</td>\n",
" <td>-0.131585</td>\n",
" <td>-0.006420</td>\n",
" <td>0.044184</td>\n",
" <td>-0.331958</td>\n",
" <td>-0.765299</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Principal terms age Gender weekend \\\n",
"Principal 1.000000 0.521876 -0.060893 -0.005134 0.089006 \n",
"terms 0.521876 1.000000 -0.064762 -0.032399 0.084842 \n",
"age -0.060893 -0.064762 1.000000 -0.010519 0.000431 \n",
"Gender -0.005134 -0.032399 -0.010519 1.000000 -0.079157 \n",
"weekend 0.089006 0.084842 0.000431 -0.079157 1.000000 \n",
"Bechalor 0.022212 -0.057337 0.057065 0.082229 0.016430 \n",
"High School or Below 0.011206 0.101787 0.066836 -0.043927 -0.064819 \n",
"college -0.021506 -0.052172 -0.131585 -0.006420 0.044184 \n",
"\n",
" Bechalor High School or Below college \n",
"Principal 0.022212 0.011206 -0.021506 \n",
"terms -0.057337 0.101787 -0.052172 \n",
"age 0.057065 0.066836 -0.131585 \n",
"Gender 0.082229 -0.043927 -0.006420 \n",
"weekend 0.016430 -0.064819 0.044184 \n",
"Bechalor 1.000000 -0.335888 -0.331958 \n",
"High School or Below -0.335888 1.000000 -0.765299 \n",
"college -0.331958 -0.765299 1.000000 "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge = pd.concat([X, df['loan_status']], axis=1, sort=False)\n",
"merge.head()\n",
"merge.corr(method='pearson')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Split dataset for test and train"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train size is (276, 8) \n",
" X_test size is (70, 8) \n",
" y_train size is (276,) \n",
" y_test size is (70,)\n",
" Principal terms age Gender weekend Bechalor High School or Below \\\n",
"188 1000 15 35 0 0 0 0 \n",
"299 1000 30 26 0 1 0 1 \n",
"239 1000 30 31 0 0 0 0 \n",
"46 1000 15 25 0 1 0 0 \n",
"259 1000 30 28 0 0 0 0 \n",
"\n",
" college \n",
"188 1 \n",
"299 0 \n",
"239 1 \n",
"46 1 \n",
"259 1 \n"
]
},
{
"data": {
"text/plain": [
"array(['PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n",
" dtype=object)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)\n",
"print(\"X_train size is \", X_train.shape, \"\\n\", \"X_test size is \", X_test.shape, \"\\n\",\n",
" \"y_train size is \", y_train.shape, \"\\n\", \"y_test size is \", y_test.shape)\n",
"print(X_train[0:5])\n",
"y_train[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Normalize Data "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Data Standardization give data zero mean and unit variance (technically should be done after train test split )"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.51578458, 0.92071769, 2.33152555, -0.42056004, -1.20577805,\n",
" -0.38170062, 1.13639374, -0.86968108],\n",
" [ 0.51578458, 0.92071769, 0.34170148, 2.37778177, -1.20577805,\n",
" 2.61985426, -0.87997669, -0.86968108],\n",
" [ 0.51578458, -0.95911111, -0.65321055, -0.42056004, -1.20577805,\n",
" -0.38170062, -0.87997669, 1.14984679],\n",
" [ 0.51578458, 0.92071769, -0.48739188, 2.37778177, 0.82934003,\n",
" -0.38170062, -0.87997669, 1.14984679],\n",
" [ 0.51578458, 0.92071769, -0.3215732 , -0.42056004, 0.82934003,\n",
" -0.38170062, -0.87997669, 1.14984679]])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = preprocessing.StandardScaler().fit(X).transform(X.astype(float))\n",
"X[0:5]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.33474248, 0.83916906, -0.19614926, -0.47756693, 0.74535599,\n",
" -0.2773501 , 1.26197963, -1.05887304],\n",
" [-1.70282047, -0.9301633 , -0.19614926, -0.47756693, 0.74535599,\n",
" -0.2773501 , -0.79240582, 0.94440028],\n",
" [ 0.33474248, -0.9301633 , -0.04012144, -0.47756693, -1.34164079,\n",
" -0.2773501 , 1.26197963, -1.05887304],\n",
" [ 0.33474248, 0.83916906, -1.13231619, -0.47756693, -1.34164079,\n",
" -0.2773501 , -0.79240582, 0.94440028],\n",
" [ 0.33474248, 0.83916906, 0.42796202, -0.47756693, -1.34164079,\n",
" -0.2773501 , -0.79240582, 0.94440028]])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# also need to normalize the test and train dataset\n",
"X_train = preprocessing.StandardScaler().fit(X_train).transform(X_train.astype(float))\n",
"X_train[0:5]\n",
"X_test = preprocessing.StandardScaler().fit(X_test).transform(X_test.astype(float))\n",
"X_test[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# Classification "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model\n",
"You should use the following algorithm:\n",
"- K Nearest Neighbor(KNN)\n",
"- Decision Tree\n",
"- Support Vector Machine\n",
"- Logistic Regression\n",
"\n",
"\n",
"\n",
"__ Notice:__ \n",
"- You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.\n",
"- You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.\n",
"- You should include the code of the algorithm in the following cells."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# K Nearest Neighbor(KNN)\n",
"Notice: You should find the best k to build the model with the best accuracy. \n",
"**warning:** You should not use the __loan_test.csv__ for finding the best k, however, you can split your train_loan.csv into train and test to find the best __k__."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### k-Nearest Neighbors test - find the best k value"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test set Accuracy at k= 1 : 0.6714285714285714\n",
"Test set Accuracy at k= 2 : 0.6428571428571429\n",
"Test set Accuracy at k= 3 : 0.7285714285714285\n",
"Test set Accuracy at k= 4 : 0.6571428571428571\n",
"Test set Accuracy at k= 5 : 0.7142857142857143\n",
"Test set Accuracy at k= 6 : 0.6571428571428571\n",
"Test set Accuracy at k= 7 : 0.7428571428571429\n",
"Test set Accuracy at k= 8 : 0.7428571428571429\n",
"Test set Accuracy at k= 9 : 0.7142857142857143\n"
]
},
{
"data": {
"text/plain": [
"Text(0,0.5,'Testing Accuracy')"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# finding a suitable k value\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"k_range = range(1, 10)\n",
"accuracy_score = []\n",
"for k in k_range:\n",
" KNN = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)\n",
" # perform the test\n",
" knn_yhat = KNN.predict(X_test)\n",
" print(\"Test set Accuracy at k=\", k, \": \", jaccard_similarity_score(y_test, knn_yhat))\n",
" accuracy_score.append(jaccard_similarity_score(y_test, knn_yhat))\n",
"\n",
"# plot the relationship between K and testing accuracy\n",
"plt.plot(k_range, accuracy_score)\n",
"plt.xlabel('Value of K for KNN')\n",
"plt.ylabel('Testing Accuracy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The result shows that the best accuracy came from k = 7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform k-Nearest Neighbors test using k = 7"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=7, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for KNN\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"# perform the test\n",
"KNN = KNeighborsClassifier(n_neighbors = 7).fit(X_train, y_train)\n",
"KNN"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Decision Trees test - find the best Depth"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Depth</th>\n",
" <th>F1-score</th>\n",
" <th>Jacard</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>d=3</th>\n",
" <td>0.620577</td>\n",
" <td>0.585714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d=4</th>\n",
" <td>0.620577</td>\n",
" <td>0.585714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d=5</th>\n",
" <td>0.648789</td>\n",
" <td>0.614286</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Depth F1-score Jacard\n",
"d=3 0.620577 0.585714\n",
"d=4 0.620577 0.585714\n",
"d=5 0.648789 0.614286"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# findinng the best depth level\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.metrics import f1_score\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"\n",
"# Compare accuracy result for depth = 3, 4 and 5\n",
"d_range = range(3, 6)\n",
"f1 = []\n",
"ja = []\n",
"for d in d_range:\n",
" DT = DecisionTreeClassifier(criterion=\"entropy\", max_depth=d)\n",
" DT.fit(X_train, y_train)\n",
" dt_yhat = DT.predict(X_test)\n",
" f1.append(f1_score(y_test, dt_yhat, average='weighted'))\n",
" ja.append(jaccard_similarity_score(y_test, dt_yhat))\n",
"\n",
"result = pd.DataFrame(f1, index=['d=3','d=4', 'd=5'])\n",
"result.columns = ['F1-score']\n",
"result.insert(loc=1, column='Jacard', value=ja)\n",
"result.columns.name = \"Depth\"\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The result shows that using Depth=5 will give a higer accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Decision Trees using Depth = 5"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=5,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n",
" splitter='best')"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for Decision Trees\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"# prepare DT setting\n",
"DT = DecisionTreeClassifier(criterion=\"entropy\", max_depth=5)\n",
"# perform the test\n",
"DT.fit(X_train, y_train)\n",
"DT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Support Vector Machine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Support Vector Machines test - find the best kernel function"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.\n",
" 'precision', 'predicted', average, warn_for)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# for SVM\n",
"from sklearn import svm\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"from sklearn.metrics import f1_score\n",
"\n",
"# import Matplotlib (scientific plotting library)\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"func_list = ['linear', 'poly', 'rbf', 'sigmoid']\n",
"accuracy_score = []\n",
"\n",
"for func in func_list:\n",
" SVM = svm.SVC(kernel=func)\n",
" SVM.fit(X_train, y_train)\n",
" svm_yhat = SVM.predict(X_test)\n",
" accuracy_score.append(f1_score(y_test, svm_yhat, average='weighted'))\n",
" \n",
"# plot the comparison among 4 kernel functions\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"y_pos = np.arange(len(func_list))\n",
"plt.bar(y_pos, accuracy_score, align='center', alpha=0.5)\n",
"plt.xticks(y_pos, func_list)\n",
"plt.ylabel('Accuracy')\n",
"plt.xlabel('Kernel Functions')\n",
"plt.title('Accuracy Comparison for 4 Kernal Functions')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The found best kernel function is rbf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Support Vector Machines using rbf kernel function"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n",
" decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',\n",
" max_iter=-1, probability=False, random_state=None, shrinking=True,\n",
" tol=0.001, verbose=False)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for SVM\n",
"from sklearn import svm\n",
"# prepare SVM setting\n",
"SVM = svm.SVC(kernel='rbf')\n",
"# perform the test\n",
"SVM.fit(X_train, y_train)\n",
"SVM"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Logistic Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logistic Regression test - find the best parameters"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test 0 : Accuracy at c = 0.1 solver= newton-cg is : 0.477460130698766\n",
"Test 1 : Accuracy at c = 0.1 solver= lbfgs is : 0.47746026240380063\n",
"Test 2 : Accuracy at c = 0.1 solver= liblinear is : 0.49096560818457907\n",
"Test 3 : Accuracy at c = 0.1 solver= sag is : 0.47746120029530975\n",
"Test 4 : Accuracy at c = 0.1 solver= saga is : 0.47746371482697025\n",
"Test 5 : Accuracy at c = 0.01 solver= newton-cg is : 0.48933564178286426\n",
"Test 6 : Accuracy at c = 0.01 solver= lbfgs is : 0.48933560490693945\n",
"Test 7 : Accuracy at c = 0.01 solver= liblinear is : 0.5699980927778155\n",
"Test 8 : Accuracy at c = 0.01 solver= sag is : 0.48934954495811284\n",
"Test 9 : Accuracy at c = 0.01 solver= saga is : 0.4893356454870894\n",
"Test 10 : Accuracy at c = 0.001 solver= newton-cg is : 0.5177257828275373\n",
"Test 11 : Accuracy at c = 0.001 solver= lbfgs is : 0.5177257382214536\n",
"Test 12 : Accuracy at c = 0.001 solver= liblinear is : 0.6691108543335518\n",
"Test 13 : Accuracy at c = 0.001 solver= sag is : 0.5176993195513844\n",
"Test 14 : Accuracy at c = 0.001 solver= saga is : 0.517725322622093\n"
]
},
{
"data": {
"text/plain": [
"Text(0,0.5,'Testing Accuracy')"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEKCAYAAAA4t9PUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl83HWd+PHXO1dztE2bSQq9kvQUSikFQoHGCxQsiIDicniB68LqyqKLy0JFUcEDXXfR34K6iAiuKJcCVYsFFDxaCg1QelKapEkbWtpkkl65k3n//vh+JwzTSWaazHfO9/PxmEdnvvP9fuc9kPadz/X+iKpijDHGjFZOsgMwxhiT3iyRGGOMGRNLJMYYY8bEEokxxpgxsURijDFmTCyRGGOMGRNLJMYYY8bEEokxxpgxsURijDFmTPKSHUAilJeXa3V1dbLDMMaYtPLSSy+1qWpFtPM8TSQisgz4IZAL3KOqt0c451Lg64ACr6rqx0TkLOCOkNOOAy5X1cdF5D7gPcAB972rVHX9SHFUV1dTV1c31q9jjDFZRUSaYznPs0QiIrnAXcA5QAuwTkRWqOqWkHPmAcuBWlXtEJEpAKr6LLDYPacMqAeeCrn9Dar6qFexG2OMiZ2XYyRLgHpVbVTVPuBB4KKwc64G7lLVDgBV3RfhPh8FnlTVLg9jNcYYM0peJpLpwK6Q1y3usVDzgfkislpE1rpdYeEuB34dduxbIrJBRO4QkXGRPlxErhGROhGpa21tHe13MMYYE4WXiUQiHAuvWZ8HzAPeC1wB3CMik4ZuIDIVOBFYFXLNcpwxk9OAMuDGSB+uqnerao2q1lRURB0rMsYYM0peJpIWYGbI6xnA7gjnPKGq/aq6A9iGk1iCLgUeU9X+4AFV3aOOXuDnOF1oxhhjksTLRLIOmCcis0SkAKeLakXYOY8DZwGISDlOV1djyPtXENat5bZSEBEBLgY2eRK9McaYmHg2a0tVB0TkWpxuqVzgXlXdLCK3AnWqusJ971wR2QIM4szG8gOISDVOi+YvYbd+QEQqcLrO1gOf9eo7GGOMiU6yYavdmpoatXUkxpixeqHRz8SifI6fOjHZoSSEiLykqjXRzrMSKcYYE6MbHt3A7U++luwwUk5WlEgxxpix6hsI0NLRRU6k+ahZzlokxhgTg5aOLgIKLR3dDAwGkh1OSrFEYowxMWhud4prDASU3ft7khxNarFEYowxMWhu6xx63uTvHOHM7GOJxBhjYtDkf2t8pNkSydtYIjHGmBjsbO9i/jETKMzPodlvNWRD2awtY4yJQZO/k3ccMwFVp3Vi3mItEmOMiWIwoOxq76LKV0KVr9i6tsJYIjHGmCj2HOimf1Cp8hVT5StmZ3sXgUDmVwWJlXVtGWNMFMExkSpfMYMBpXcgwN5DPUwtLUpyZKnBEokxxkQRnO5b7Ssh4K5FbGrrskTisq4tY4yJYqe/i4K8HI6dWEiVr9g51m7jJEGWSIwxJoomfyeVZcXk5AhTSwvJzxWbuRXCEokxxkTR7O+i2m2J5OXmMGNyMTstkQyxRGKMMSNQVZr9XVSWlQwdq/IVW5mUEJ4mEhFZJiLbRKReRG4a5pxLRWSLiGwWkV+FHB8UkfXuY0XI8Vki8oKIbBeRh9xtfI0xxhOth3rp7h+kurx46Fi1r4RmfxfZsDFgLDxLJCKSC9wFnAcsAK4QkQVh58wDlgO1qnoC8MWQt7tVdbH7uDDk+HeBO1R1HtABfMar72CMMcGqv5VlbyWSyrJiDvcO0N7Zl6ywUoqXLZIlQL2qNqpqH/AgcFHYOVcDd6lqB4Cq7hvphiIiwNnAo+6h+4GL4xq1McaEaGp7a+pvULB1YgPuDi8TyXRgV8jrFvdYqPnAfBFZLSJrRWRZyHuFIlLnHg8mCx+wX1UHRrinMcbETbO/i9wcYfrkt9aMVLlJxUqlOLxckBhpQ8rwDsU8YB7wXmAG8DcRWaiq+4FKVd0tIrOBP4vIRuBgDPd0PlzkGuAagMrKytF9A2NM1mtu72L6pCLyc9/6vXvG5CJEsCrALi9bJC3AzJDXM4DdEc55QlX7VXUHsA0nsaCqu90/G4HngJOBNmCSiOSNcE/c6+5W1RpVramoqIjPNzLGZJ1mf+fQIsSgcXm5TCstshaJy8tEsg6Y586yKgAuB1aEnfM4cBaAiJTjdHU1ishkERkXcrwW2KLOFIlngY+6118JPOHhdzDGZLmmts63jY8EVZcX2xiJy7NE4o5jXAusArYCD6vqZhG5VUSCs7BWAX4R2YKTIG5QVT9wPFAnIq+6x29X1S3uNTcC14tIPc6Yyc+8+g7GmOy2v6uPgz0DR7RIACrLStjZbokEPC7aqKorgZVhx24Jea7A9e4j9Jw1wInD3LMRZ0aYMcZ4qmmo6m+EFomvmPbOPg5091NalJ/o0FKKrWw3xphhNA9V/T2yRTJUvNG6tyyRGGPMcIKzsmaWRUok7hRgqwJsicQYY4bT5O9kamkhhfm5R7wXbJHYFGBLJMYYM6yd/q6IA+0AxQV5VEwYZ1OAsURijDHDavJ3UVV25EB7ULXPpgCDJRJjjInocO8AbYd7qSqP3CIBZ5zEWiSWSIwxJqLgbKyRWiRVZcXsPdhLd99gosJKSZZIjDEmgmBLY7gxEoCqcifJZPvCREskxhgTwVuLEYdPJMH1Jdm+W6IlEmOMiWBneye+kgImFA6/aj3Y7ZXtixItkRhjTARNbcNP/Q0qLc5nUnG+tUiSHYAxxqSiZn/kqr/hqsqKs35RoiUSY4wJ09M/yJ6DPVRGaZGAOwU4y8ukWCIxxpgwLR1dqBJTi6TaV8wbHd30DQQSEFlqskRijDFhmmOYsRVU6SshoE7yyVaWSIwxJsxI+5CEC04Bbs7itSSWSIwxJkyzv5MJhXlMLo6+YdVQOfm27B0n8TSRiMgyEdkmIvUictMw51wqIltEZLOI/Mo9tlhEnnePbRCRy0LOv09EdojIevex2MvvYIzJPs3+Lqp9JYhI1HPLxxdQXJCb1S0Sz7baFZFc4C7gHKAFWCciK0L2XkdE5gHLgVpV7RCRKe5bXcCnVHW7iEwDXhKRVaq6333/BlV91KvYjTHZrdnfyQnTS2M6V0Tc4o3Zm0i8bJEsAepVtVFV+4AHgYvCzrkauEtVOwBUdZ/75+uqut19vhvYB1R4GKsxxgAwMBigpaM74va6w3HKyVvXlhemA7tCXre4x0LNB+aLyGoRWSsiy8JvIiJLgAKgIeTwt9wurztEZFy8AzfGZK/d+3sYCOiIVX/DVfqKaWnvZjCgHkaWurxMJJE6F8P/K+cB84D3AlcA94jIpKEbiEwF/g/4tKoGJ2kvB44DTgPKgBsjfrjINSJSJyJ1ra2tY/kexpgs0hRD1d9w1b4S+gYD7DnQ7VVYKc3LRNICzAx5PQPYHeGcJ1S1X1V3ANtwEgsiMhH4A/AVVV0bvEBV96ijF/g5ThfaEVT1blWtUdWaigrrFTPGxCZYPr66PPYWSVVZdu/f7mUiWQfME5FZIlIAXA6sCDvnceAsABEpx+nqanTPfwz4hao+EnqB20pBnOkUFwObPPwOxpgs0+zvojA/hykTYu81D+5Lkq2JxLNZW6o6ICLXAquAXOBeVd0sIrcCdaq6wn3vXBHZAgzizMbyi8gngHcDPhG5yr3lVaq6HnhARCpwus7WA5/16jsYY7JPcJ/2WKb+Bk2dWEhBXk7WbrvrWSIBUNWVwMqwY7eEPFfgevcRes4vgV8Oc8+z4x+pMcY4mv2dzDqKbi2AnBxh5uSirJ25ZSvbjTHGFQgoO9uj70MSSXUWryWxRGKMMa69h3roHQjEVGMrXHBRotPRkl0skRhjjCvYooilfHy4Kl8x3f2DtB7qjXdYKc8SiTHGuJpHsYYkqCqLqwBbIjHGGFeTv4v8XGFqaeFRXxtsxTRlYRVgSyTGGOPa6e9i5uRi8nKP/p/G6ZOLyM0RdlqL5Egi8lkRia0MpjHGpLEmf2dM+7RHkp+bw/RJRUObYmWTWNJuNfCyiPxKRN7vcTzGGJMUqjq0D8loVfmKs3JRYtREoqo34dS/egD4rIhsF5FbRaTa49iMMSZh2jv7ONw7QGXZ6FokEEwk1iKJyK282+Q+AsBU4AkR+Y5nkRljTAIFu6Sqy0efSKp9JRzo7md/V1+8wkoLsYyR/IuIvAj8EHgJWKSqVwMnA5eNeLExxqSJt6b+jr5rK9iaybZxklhqbc0ALlfVxtCDqhoQkQu9CcsYYxKr2d+FCMyYXDTqe1QPVQHuZPHMSVHOzhyxdG09hrPVLQAiMkFEagBU1Uq4G2MyQrO/k2mlRYzLyx31PSqzdF+SWBLJ3UDof5VO4H+9CccYY5Kjub1rTOMjAIX5uRw7sTDrqgDHkkhyQra5DQ6853sXkjHGJF6zv4vKo9infThVvmJ2WovkCDtE5HMikisiOSLyeZzZW8YYkxEO9vTT3tlH9SgXI4aq9pVk3WB7LInkn4H3AXvdx3uAq70MyhhjEinYghjLjK2gSl8xbYd76ewdGPO90kUsCxL3qupHVbVcVStU9VJV3RvLzUVkmYhsE5F6EblpmHMuFZEtIrJZRH4VcvxKd/HjdhG5MuT4qSKy0b3n/5Oj2Q/TGGMiaBpD1d9wwZXx2TTgHnX6r4iMA64CTgCGSmKq6jVRrssF7gLOAVqAdSKyQlW3hJwzD1gO1Kpqh4hMcY+XAV8DagAFXnKv7QB+DFwDrMXZxncZ8GSsX9gYY8I1D7VIxp5IhsrJ+ztZMG3imO+XDmLp2voFTr2tC4AXgDlATwzXLQHqVbVRVfuAB4GLws65GrjLTRCoanCa8QeAp1W13X3vaWCZiEwFJqrq8+5+778ALo4hFmOMGVazv5MpE8ZRXBDL0rqRVWbhviSxJJL5qrocOKyqP8NpASyM4brpwK6Q1y3usbfdG5gvIqtFZK2ILIty7XT3+Uj3NMaYo9LkH90+7ZFMLMzHV1KQVcUbY0kk/e6f+0XkeGACUBXDdZHGLsI3M87DKQj5XuAK4B4RmTTCtbHc0/lwkWtEpE5E6lpbW2MI1xiTrZr9nXEZaA+q9BXT1GYtklA/E5HJOGMWq4DXgf+K4boWYGbI6xnA7gjnPKGq/aq6A9iGk1iGu7bFfT7SPQFQ1btVtUZVayoqKmII1xiTjbr7Btl7sJeqMVT9DVftK8mqDa5GTCTugHmbqnao6rOqWunO3vpRDPdeB8wTkVkiUgBcDqwIO+dx4Cz3s8pxuroacRLWuSIy2U1i5wKrVHUPcEhEznBna30KeCL2r2uMMW8X/Ae/qjx+LZIqXzG7D3TT0z8Yt3umshETiaoOAl8czY1VdQC4FicpbAUeVtXN7l4mwWKPqwC/iGwBngVuUFW/qrYDt+Eko3XAre4xgM8B9wD1QAM2Y8sYMwbBqb/xWIwYVOUrRhVaOrKjVRLLFIVVIvJF4CGcOlsAqOrBaBeq6kqcKbqhx24Jea7A9e4j/Np7gXsjHK8jtsF+Y4yJamgxYhzKowRVhawlmTtlQtzum6piSST/7P75pZBjClTGPxxjjEmsJn8nk4rzKS2OXwnB4KLEbCmVEjWRqOrMaOcYY0y62tneFdcZWwCTi/OZMC4va6YAx7Ky/WORjqvqryIdN8aYdNLk7+TkmZPjek8Roao8e/Zvj6Vr610hzwuBs3G23LVEYoxJa30DAd7o6ObDi+O/rrmqrITNuw/E/b6pKJaurc+Fvnan497nVUDGGJMob+zvJqDxqfobrspXzKrNbzIwGCAvN5Yle+lrNN/uEM56D2OMSWvxrPobrtpXwkBA2b0/ltKE6S2WMZLHeKsMSQ5OFWBbBGiMSXvNbcFEEv8WSbB4Y5O/c+h5popljOTOkOcDQLOqNnkTjjHGJE5zexclBbmUjy+I+72H9iXJglIpsSSS7cA+Ve0BEJEiEZmpqruiXGeMMSmt2d9Fpa8EL/bHmzJhHIX5OUOtnkwWyxjJb4FAyOsA8BtvwjHGmMRp8nfGtTRKqJwcobKsOCsWJcaSSPLcjakAUNVeYJx3IRljjPcGA0pLe7en4xdVvhJ2tluLBJyiiucHX4jIBUD7COcbY0zK23Ogm77BwNBYhheqfc6ixEAg4rZJGSOWMZLPAb8SkbtwZm+1AZ/wNCpjjPFYPPdpH06lr4TegQB7D/UwtbTIs89JtlgWJL4O1Lg7F6Kq+z2PyhhjPPZWIvG2RRL8rExOJFG7tkTkNhGZpKr7VXW/u9nUNxIRnDHGeKXZ30lBXg5TJxZ69hnB0vSZXrwxljGSC0JbIaraAXzIu5CMMcZ7zf4uKsuKycmJ/9TfoGmTCsnLkYyfuRVLIsl1t8oFQEQKgfiv3jHGmARq8nfGdZ/2SPJyc5hZVjy0eVamiiWRPAg8LSJXisincLbHjanyr4gsE5FtIlIvIjdFeP8qEWkVkfXu45/c42eFHFsvIj0icrH73n0isiPkvcWxf11jjAFV9WQfkkictSSZ3bUVy2D7t0VkA/B+QIDvqeofol0nIrnAXcA5QAuwTkRWqOqWsFMfUtVrwz7zWWCxe58ynP3Znwo55QZVfTRaDMYYE0nr4V66+gapLve+Bla1r5iXmztQVU9W0KeCmKr/qurvVfWLqvoFoE1EfhjDZUuAelVtdBc0PghcNIoYPwo8qaqZ3TY0xiRMcMZWpcddW+DMCjvUO0B7Z1/0k9NUTIlERBaKyLdEpAH4PrAjhsumA6H1uFrcY+EuEZENIvKoiETa1vdy4Ndhx77lXnOHiERcZS8i14hInYjUtba2xhCuMSZbNLn1r7xcjBhUNVQFOHN/Fx42kYjIbBH5sohsAu7BWYiYr6rvUtUfxHDvSG248OWdvwOqVXUR8Axwf1gMU4ETccZlgpYDxwGnAWXAjZE+XFXvVtUaVa2pqKiIIVxjTLbY2d5Fbo4wfbL3azuC4zCZXCplpBZJPfAB4COqeoaq3oFTRj5WLUBoC2MGsDv0BFX1u7W7AH4KnBp2j0uBx1S1P+SaPeroBX6O04VmjDExa/J3MX1SEfkJ2LlwZlkRItDUloUtEuAynFbIn0TkRyLyHiK3MoazDpgnIrPc6cOXAytCT3BbHEEXAlvD7nEFYd1awWvEGbW6GNh0FDEZYwzN/k5PS6OEGpeXy7TSooxelDhsIlHVR1T1EmAB8AJOl9KxIvI/InJ2tBur6gBwLU631FbgYVXdLCK3isiF7mnXichmEXkVuA64Kni9iFTjtGj+EnbrB0RkI7ARKAe+GcsXNSbVdPcNZvz6glTV7O9KWCIBZ5wkkze4imX67yGcsYv7RaQcp6XydeDPMVy7ElgZduyWkOfLcRJUpGubiDA4r6pRk5gx6eAHz7zOL9c2U/eVcygqyE12OFljf1cfB7r7EzLQHlTlK2bV5r0J+7xEO6oOQlVtU9W7VPXdXgVkTLb4y+utdPYN8lJzR7JDySqJKNYYrspXQntnHwd7+qOfnIa8H2kyxhyh7XAvr715CIDVDW1Jjia7BFeZJ7JrK1gFOFO7Mi2RGJMEaxr8AJSVFLCm3hJJIiVyMWJQpVsFOFNLpVgiMSYJ1tS3MaEwj4+fXsnGNw5woDszuzxSkbM3SCGF+Ykbl6oK2ZckE8WyH0mHiLSHPXaIyCPuzCpjzFFa3dDGGbN9vGteBQGFtY3+ZIeUNZr9nQltjQCUjMujYsK4jJ0CHEuL5H+ArwJzgLnAV4D7gMdxFgQaY47CrvYudrV3UzvHx+KZkyjKz7XurQRq8ncldMZWUFVZccaWSYklkZzrztTqUNV2Vf0RcJ6qPoBTosQYcxRWu0mjdm45BXk5LJlVxuoGa5EkQmfvAG2He6lKQNXfcFW+kuwebBeRj4Q9D65wD3gRlDGZbHWDnykTxjF3yngAauf6qN93mL0He5IcWeYbmvpblvgWSbWvmDcP9tDdN5jwz/ZaLInkE8DV7tiIH7ga+KSIFANf9DQ6YzKMqvJ8QxtL5/iG9qZYOqccgDU2DdhzzUmY+htUGZwCnIEr3KMmElWtV9XzVLVMVX3u89dVtUtVw8uXGGNGsG3vIdoO97F0bvnQsQVTJzKpOJ/V9da95bVgmZJkJJLguEwmDrhHLZHilkX5R6A69HxVvca7sIzJTGvcZLF0jm/oWE6OcOZsH2vq2zJ6F71U0OzvxFdSwITC/IR/diZPAY6aSIAngLXA34HM69wzJoHWNLRR5StmxuS3/0a8dG45T256k2Z/F9Xlie+/zxZNbYkt1hhqUnEBpUX5GbkoMZZEUqKqX/I8EmMy3MBggBca27ngpGlHvFfrtlBWN7RZIvHQzvYulsxK3mTTal9xdo6RAE+KyLmeR2JMhtvwxgEO9Q5QO9d3xHuzykuYWlo41PVl4q93YJDdB7qT1iIBqPSVZGSLJJZE8lngjyJy2J251SEi7V4HZkymCS46PHP2kYlERFg6p5w1DW0EAuE7Upt42NXejWpi9mkfTrWvmDc6uukbyKyVE7EkknIgHygFKtzXtgm6MUdpdb2f46dOxDd+XMT3a+f66OjqZ+ubBxMcWXYIzpaqTGKLpMpXQkDhjf3dSYvBC8MmEhGZ5z49YZhHVCKyTES2iUi9iNwU4f2rRKRVRNa7j38KeW8w5PiKkOOzROQFEdkuIg+52/gak9J6+gd5aWfH0FhIJLXulGDr3vJGsDxJMlskwW61TOveGmmw/SbgM8BdEd5TYMTNrUQk1732HKAFWCciK1R1S9ipD6nqtRFu0a2qiyMc/y5wh6o+KCI/cWP88UixGJNsdU0d9A0EhpJFJMdMLGRORQmrG9q4+t2zExhddtjp72RCYR6TixM/9TdoaApwWye8I2lhxN2wiURVP+M+PVtV31bjWkRi+T+xBKhX1Ub3mgeBi4DwRBIzcSbYnw18zD10P862v5ZITEpb3dBGXo5EnTFUO7ecR19qoW8gQEGe7fIQT03uPu3JXKdTMX4cxQW5Gbd/eyw/qS/EeCzcdGBXyOsWIuzBDlwiIhtE5FERmRlyvFBE6kRkrYhc7B7zAftVdSDKPY1JKWsa/CyeOYmScSPPuF86x0dX3yCvtuxPUGTZo9nfmdDtdSMRESrLijNuUeJIYyRTROQkoEhEThSRRe7jnUAso1WR0n74dJTfAdWqugh4BqeFEVSpqjU4rY8fiMicGO8ZjP8aNxHVtba2xhCuMd440N3Pxpb9b1vNPpwzZvsQeatCsImPgcEALR3dQ1veJlO1ryTjyqSM1CL5IHAnMANnrCP4+DLO/iTRtAChLYwZwO7QE1TVr6q97sufAqeGvLfb/bMReA44GWgDJolI8Ne6I+4Zcv3dqlqjqjUVFTbJzCTPC41+Asrb6msNZ1JxAQunldqAe5zt3t/DQECTUvU3XJWvmF3t3Qxm0DTvYROJqv5cVd8FfEZV362q73If56vqIzHcex0wz51lVQBcDqwIPUFEpoa8vBDY6h6fLCLj3OflQC2wRVUVeBb4qHvNlTglXIxJWWsa/BTm53By5aSYzl8618cruzro6huIfrKJSVMSq/6Gq/KV0DcYYM+BzJkCHMsYyRQRmQggIj8RkRdF5H3RLnLHMa4FVuEkiIdVdbOI3CoiF7qnXScim0XkVeA64Cr3+PFAnXv8WeD2kNleNwLXi0g9zpjJz2L6psYkyer6Nk6rLmNcXmx7hNfOKad/UFnX1OFxZNnjraq/yW+RBLvXMmmTq1hqbV2jqne6ZVJmAJ8D7iakG2o4qroSWBl27JaQ58uB5RGuWwOcOMw9G3FmhBmT8vYd7GH7vsNccuqMmK85rbqMgtwc1tS38Z751i0bD81tnRTm5zBlQuTFoIlUObSWpIulc5McTJzE0iIJduSdB/xcVV+K8Tpjst4adwvd2jnRx0eCigpyOblyEqtto6u4afJ3UVVWQk5O8kv0Ty0toiA3J6MG3GNJCK+KyErgQzgFHMczzEwpY8zbra5vo7QonwXTJh7VdbVzy9m8+yD7u/o8iiy77GzvTGpplFC5OcLMsqKMmgIcSyL5NM6ivyWq2gUU4qwmN8aMQFVZ0+DnzNk+co/yN+HauT5U4fkGm701VoGAOvu8pEgiAWesJpPKpMSy1e4gMBtnbASgKJbrjMl2zf4u3tjfHbFsfDSLZkyipCDXurfiYN+hXnoHAikx0B5U5e5L4kxETX9RE4KI3AmcBXzCPdQJ/MTLoIzJBMHxkVjWj4TLz81hyawyW08SB6k09Teo2ldCV98grYd7o5+cBmJpWSxV1X8GegBUtR2wirvGRLG6oY1jJo5j9ih3PKydW05jW2dGrTdIhuCgdjKr/oarzLD922NJJP0ikoM7wC4iPiCzdmUxJs4CAeX5Bj+1c8pHXSRwqTvTa7W1Ssak2d9Ffq4wtbQw2aEMCSa1jE8kIWVI7gJ+A1SIyDeAv+OUcjfGDOO1Nw/R3tk3qm6toOOOnUBZScHQzopmdJr9XcyYXExebuoM7U6fVESOkDFTgEdakPgicIqq/kJEXgLej1M08R9UdVNCojMmTa1xB8lHM9AelJMjnDnHx5oGP6qa1PLn6azJ35lS4yMABXk5TJ9cNLTZVrobKZEM/dSq6mZgs/fhGJMZVte3Mbu8hKmlRWO6T+2ccv6wYQ+NbZ3MqRgfp+iyh6qy09/FadUj7wOTDNW+EnZmQYukQkSuH+5NVf1vD+IxJu31DwZ4cUc7Hz5l7FvlBFs0a+rbLJGMQntnH4d6B6gsS60WCUBlWTG/37An2WHExUidhrnAeGDCMA9jTASv7tpPZ9/gUZVFGU5lWTHTJxXZgPsoDe3TXp56iaTaV8KB7v6MqF4wUotkj6remrBIjMkQq+v9iMCZMWxkFY2IUDvXx6rNexkM6FGvkM92O9udrqPKFNiHJFxVyBTgScXpvaJipBaJ/cQaMwprGto4YdrEuP3jUDu3nAPd/WzZfTAu98smTW1diMDMsrGNVXkhuNI+E0qljJRIou45Yox5u+6+QV7ZuT8u3VpBZ852WjZWLuXo7WzvYlppUcx7wSRScNwmE9aSjLRDYnsiAzEmE6xraqdvMBCXbq2gKRMLmTdlvO3jPgqpOPU3qKggl2MmjsvyPNd9AAAYyElEQVTsRGKMOXqrG9rIzxWWzIrvdNPaueWsa2qnd2AwrvfNdM3+rpQq1hiuyleSEYsSPU0kIrJMRLaJSL2I3BTh/atEpFVE1ruPf3KPLxaR591teDeIyGUh19wnIjtCrlns5Xcw5misqfdz8szJFBfEsvlo7JbO8dHTH+CVnfvjet9MdrCnn/bOvpQqHx+u2lecEYsSPUskIpKLU17lPGABcIWILIhw6kOquth93OMe6wI+paonAMuAH4jIpJBrbgi5Zr1X38GYo7G/q49Nuw+wdAyr2Ydz+mwfOYKVSzkKwT3RU7VrC5wWSdvhXjp7B5Idyph42SJZAtSraqOq9gEPAhfFcqGqvq6q293nu4F9gG1ebVLa2kY/qk43VLyVFuVz4oxJQ6XpTXRvlY9P5a6tzBhw9zKRTAd2hbxucY+Fu8TtvnpURGaGvykiS3DK1jeEHP6We80dIjIurlEbM0qr6/0UF+Ry0oxJ0U8ehdo5Ptbv2p/2v70mSnMatEiCVYCD613SlZeJJNI6lPDtwH4HVKvqIuAZ4P633UBkKvB/wKdVNVi6fjlwHHAaUAbcGPHDRa4RkToRqWttbR39tzAmRqsb2lgyq4yCPG/+WtXOLWcgoLy4wyZUxqLZ30nFhHFxH6+Kp+C+JOk+TuJlImkBQlsYM4DdoSeoql9Vg1uE/RQ4NfieiEwE/gB8RVXXhlyzRx29wM9xutCOoKp3q2qNqtZUVFivmPHWmwd6aGztjOv6kXCnVk2mIC/HpgHHqCnF9mmPZGJhPmUlBWk/c8vLRLIOmCcis0SkALgcWBF6gtviCLoQ2OoeLwAeA36hqo9EukacmtoXA1bS3iRdsGy8FwPtQYX5udRUTWa1jZPEZGeKT/0NqvIV2xjJcFR1ALgWWIWTIB5W1c0icquIXOiedp07xfdV4DrgKvf4pcC7gasiTPN9QEQ2AhuBcuCbXn0HY2K1ut7P5OJ8jj92oqefs3SOj617DuLPkL2+vdLdN8ibB3uoSsGqv+GqytI/kXjaeaiqK4GVYcduCXm+HGfMI/y6XwK/HOaeZ8c5TGPGRFVZ09DGmXN85HhcVHHp3HJ46nWeb/RzwaJpnn5WOtvZ7g60l6dDi6SEJ17dTe/AYEqWcomFrWw3Zox2tHWy50DP0B7rXlo0vZQJ4/KsrHwUwTGHtGiR+IpRhV3t3ckOZdRSdzqDMWkiOGbhxfqRcHm5OZw+u2xoTCbd9Q8G+NPWfXT1xXdK89+3O/99qtNijMSJ8bcvt3DcVKdrVDV8guvbhb6tYZNhwy99/4JjmFiYP/ZAR2CJxJgxWlPfxrTSwoTNEFo6p5xntu6jpaOLGZNT/zfukXzz91u4//lmT+49fVIRpcXe/gMaD3MrxlOQl8OPnmuIfvIoPHP9eyyRGJPKAgHl+UY/7z/+GJyJhN4LtnzW1Pu59LT0TSR/3LSH+59v5qql1Xy6tjru9/eNT4+1yqXF+fz9xrM42N0fcvTtP0uhP1rytuMyzPG3nk8t9X4vFkskxozBlj0H2d/VP7S3eiLMP2Y85ePHsaahjUtPO6IYRFrY1d7FDY9u4KQZpXz5/OM9W8SZLqZMKGTKhMJkhzFq2f1/z5gxCi4OTMRAe5CIsHSOj9UN/qh96amobyDAtb9+BYA7P3ZK1ieRTGD/B40Zg9UNfuZOGc8xExP722TtXB+th3qp33c4oZ8bD/+56jVe3bWf716yiJlpMKvKRGeJxJhR6hsIsG5HO7Vx3A0xVsEWULqVS/nT1r389G87+OQZVZx/4tToF5i0YInEmFFav2s/3f2DziLBBJtZVszMsqK0Kpey50A3X3rkVY6fOpGbP3h8ssMxcWSJxJhRWl3fRo7AGbMS3yIBqJ1TztpGPwODgegnJ9nAYIDrfv0KfQMB7vrYyRTmp+cKbhOZJRJjRmlNQxsLp5cmba3C0rnlHOoZYNPug0n5/KPxg2e2s66pg29/+ERmV4xPdjgmziyRGDMKnb0DvLJzf0Jna4Vb6o7NpPo4yd+2t3LXc/VcWjODi0+OtLedSXeWSIwZhReb2hkIaELXj4QrHz+O446dkNLlUvYd6uHfHlrP3IrxfP3CE5IdjvGIJRJjRmFNfRsFuTnUVJUlNY6lc8qpa+qgp38wqXFEMhhQvvjgeg73DnDXx09J6Z0KzdhYIjFmFFbX+zmlahJFBckdNK6d66N3IMDLzR1JjSOSHz1bz5oGP9+48ATmHzMh2eEYD1kiMeYotXf2sWXPQU+31Y3Vklll5OYIa1JsGvALjX7ueOZ1Ll48jUtr0rOMi4mdJRJjjtLaRucf7WSsHwk3oTCfk2aUsjqFxkn8h3u57sFXqPKV8M0Pn5iwYpYmeTxNJCKyTES2iUi9iNwU4f2rRKQ1ZDvdfwp570oR2e4+rgw5fqqIbHTv+f/EfkpNgq2ub2P8uDxOmlGa7FAAZ5xkQ8sBDvX0Rz/ZY4GA8qVHXqWjq587P3Yy48fZuEg28CyRiEgucBdwHrAAuEJEFkQ49SFVXew+7nGvLQO+BpwOLAG+JiKT3fN/DFwDzHMfy7z6DsZEsqbBz+mzysjLTY0G/dK5PgYDyguN7ckOhXv+3shz21r56geP54RpqZFojfe8/JuwBKhX1UZV7QMeBC6K8doPAE+raruqdgBPA8tEZCowUVWfV6fs6S+Ai70I3phIdu/vZkdbJ2cmob7WcE6pnMy4vJykd2+9vLOD7/1xG+ctPJZPnFGV1FhMYnmZSKYDu0Jet7jHwl0iIhtE5FERCY7KDXftdPd5tHsa44ng4r9EbKsbq8L8XE6rLmNNEvdxP9DVz7/+6hWOLS3k9ksW2bhIlvEykUT6SQrfPOF3QLWqLgKeAe6Pcm0s93RuIHKNiNSJSF1ra2uMIRszsjUNfnwlBbwjxaazLp3rY9veQ7Qe6k34Z6sqNzz6KnsP9nDnx06htCj1t7c18eVlImkBQuf9zQB2h56gqn5VDf7k/xQ4Ncq1Le7zYe8Zcu+7VbVGVWsqKipG/SWMCVJVVte3ceYcHzk5qfUbd3AqcjJWud+/pomntuzlpvOOY/HMSQn/fJN8XiaSdcA8EZklIgXA5cCK0BPcMY+gC4Gt7vNVwLkiMtkdZD8XWKWqe4BDInKGO1vrU8ATHn4HY4Y0tB5m36HelOrWClo4vZSJhXkJ797a2HKAb698jfcdN4XPvHNWQj/bpA7P5uap6oCIXIuTFHKBe1V1s4jcCtSp6grgOhG5EBgA2oGr3GvbReQ2nGQEcKuqBqekfA64DygCnnQfxnhutfuPdCosRAyXmyOcMduX0AH3Qz39XPvrl/GNL+D7/3CSjYtkMU8neavqSmBl2LFbQp4vB5YPc+29wL0RjtcBC+MbqTHRra5vY8bkIip9qbk9bO3ccp7aspdd7V2eb2Grqiz/7UZaOrp58JozmFxS4OnnmdSWGhPhjUlxgwFlbaM/JVsjQcFKxIkoK//gul38fsMerj9nPqdVJ7dwpUk+SyTGxGDz7gMc7BlgaRLLxkczp2I8UyaM83z73dfePMjXV2zmXfPK+dx75nj6WSY9WCIxJgbB8ZFkbmQVjYiwdI6P5xvacNbrxl9X3wCff+BlJhbl89+XLk652WsmOSyRZIg19W1c/Ys6ntu2L9mhZKQ1DW3MP2Y8FRPGJTuUES2dW07b4T627T3kyf1veWIzjW2d/PCyxSn/38IkjlVUS3MHuvr51sotPFzXQl6O8PSWvVy8eBpfvWABvvHZ9xe9p3+Qzt6BuN5zIKCsa2rn8tMq43pfLwSnJj+zZS9lxREGwCM0ICTiOl8In4T1zJa9PPpSC9e9b15KVD42qcMSSZpSVZ7c9Ca3PLGZjq4+PvueOXzuvXP42d938OPn6vnL663c8qEFXLx4elZMy+zqG+Cnf93BT/7SQLdHuwW+Mw3+8Zw+qYhZ5SV8/6nX+f5Tr8f9/qfPKuML75sX9/ua9CZe9aWmkpqaGq2rq0t2GHHz5oEevvrEJp7espeF0ydy+0cWsXD6W5VWt715iBt/s4H1u/bz7vkVfOvihZ5PB02WQEB57JU3+M9V23jzYA/nn3gsZ8yO/4B4cUEeHz55OrlpMCawZfdBXtl15I6Jkf6qD/u3P8LJebk5nL9wKqXFVgIlW4jIS6paE/U8SyTpIxBQfvXiTr775Gv0BwJcf858/rF2VsRy5oMB5f+eb+J7q7ahCv/+gXdw1dLqtPiHMFZrG/188w9b2PTGQU6aUcpXL1hAjU1FNSZuLJGEyIREUr/vMMt/u4F1TR3UzvXx7Q+fSJWvJOp1b+zv5ubHNvLctlZOmjmJ715yIscdOzEBEXunqa2T7zy5lVWb9zKttJAbzzuODy2aZjOIjIkzSyQh0jmR9A0E+N+/NPA/f66nqCCXr3zweD566oyjGvdQVVa8uptv/G4LB7v7+ex75nDt2XMpzM/1MPL4O9DVz//8eTv3P99Efm4O//LeOXzmnbMpKkiv72FMuog1kdhgewp7ZWcHN/1mI9v2HuKCRVP52odOGNWUSxHhosXTede8Cr75hy3c+Ww9Kzfu4TsfOZHTPRhPiLf+wQAPrG3mB3/azoHufi6rmcn1585nyoTCZIdmjMFaJCmps3eA7z+1jfvWNHHsxEJuu2gh719wTNzu/9fXW/nyY06dpI+dXslN5x3HxMLUG0BVVf782j6+tXIrja2d1M71cfP5C1gwLb275oxJF9a1FSKdEslz2/Zx82Ob2H2gm0+eUcUNH3gHEzz4R76rb4D/fup17l29g4oJ47j1ooV84IRj4/45o7Vl90G+tXILq+v9zK4o4ebzj+fs46ZkxVRmY1KFJZIQ6ZBI/Id7ue33W3h8/W7mThnP7R85MSEzkF7dtZ8bf7OB1948xHkLj+UbF57AlInJ6zLad7CH/3rqdR5+aRelRfn82/vn87HTK8mPMDPNGOMtSyQhUjmRqCqPr3+DW3+3hcO9A/zLe+fyL2fNYVxe4gaQ+wcD3P3XRn74p+2My8vh5vOP57LTZib0t/+e/kF++tdGfvyXBvoHA1x5ZjX/evY8W7NgTBJZIgmRqolkV3sXNz++ib++3srJlZP47iWLmJ/EvcAbWw+z/LcbeWFHO2fMLuM7H1nErPLoU4zHIhBwZpR974+vsftAD8tOOJabzjuOao8/1xgTnSWSEKmWSAYDyn1rmvj+qm3kCPzHsuP4xBlVKbFYMBBQHqrbxbdXbqVvIMAX3j+Pq98125Oupbqmdm77w1Ze3bWfE6eX8pUPHp8Ws8iMyRYpkUhEZBnwQ5ytdu9R1duHOe+jwCPAaapaJyIfB24IOWURcIqqrheR54CpQLf73rmqOmLJ29Emkpsf28iLO9qjn3iUDvcOsOdAD2e9o4JvfvhEpk8qivtnjNXegz187YnN/HHzmxwzcVzcB/wDAaWxrZNjJxbyH8vewcWLp9uCQmNSTNLXkYhILnAXcA7QAqwTkRWquiXsvAnAdcALwWOq+gDwgPv+icATqro+5LKPu1vuemrapCLmHTM+7vcVhGULj+WCRVNTdhbSMRML+cknT+WPm97k9xt2R6zTNFaXnDqDT9dWU1xgy5mMSWde/g1eAtSraiOAiDwIXARsCTvvNuB7wL8Pc58rgF97FeRIPn/W3GR8bEpZtvBYli1MnWnBxpjU4+WcyunArpDXLe6xISJyMjBTVX8/wn0u48hE8nMRWS8iX5VhfqUXkWtEpE5E6lpbW0cRvjHGmFh4mUgi/QM/1EEiIjnAHcCXhr2ByOlAl6puCjn8cVU9EXiX+/hkpGtV9W5VrVHVmoqKitHEb4wxJgZeJpIWYGbI6xnA7pDXE4CFwHMi0gScAawQkdCBncsJa42o6hvun4eAX+F0oRljjEkSLxPJOmCeiMwSkQKcpLAi+KaqHlDVclWtVtVqYC1wYXAQ3W2x/APwYPAaEckTkXL3eT5wARDaWjHGGJNgng22q+qAiFwLrMKZ/nuvqm4WkVuBOlVdMfIdeDfQEhysd40DVrlJJBd4BvipB+EbY4yJkS1INMYYE1Gs60isEp4xxpgxsURijDFmTLKia0tEWoHmUV5eDrTFMRyvpVO8Fqt30inedIoV0ivescZapapR109kRSIZCxGpi6WPMFWkU7wWq3fSKd50ihXSK95ExWpdW8YYY8bEEokxxpgxsUQS3d3JDuAopVO8Fqt30inedIoV0ivehMRqYyTGGGPGxFokxhhjxsQSyQhEZJmIbBORehG5KdnxDEdEZorIsyKyVUQ2i8gXkh1TNCKSKyKviMhIWwikBBGZJCKPishr7n/jM5Md03BE5N/cn4FNIvJrESlMdkyhROReEdknIptCjpWJyNMist39c3IyYww1TLz/6f4sbBCRx0RkUjJjDIoUa8h7/y4iGqxVGG+WSIYRssPjecAC4AoRWZDcqIY1AHxJVY/HqaL8+RSONegLwNZkBxGjHwJ/VNXjgJNI0bhFZDrObqM1qroQpx7d5cmN6gj3AcvCjt0E/ElV5wF/cl+nivs4Mt6ngYWqugh4HVie6KCGcR9HxoqIzMTZqXanVx9siWR4Qzs8qmofThXii5IcU0SqukdVX3afH8L5h276yFclj4jMAD4I3JPsWKIRkYk4BUR/BqCqfaq6P7lRjSgPKBKRPKCYt2/dkHSq+legPezwRcD97vP7gYsTGtQIIsWrqk+p6oD7ci3OFhlJN8x/W3D2ffoPQvaDijdLJMOLusNjKhKRauBk4IXkRjKiH+D8YAeSHUgMZgOtOLtyviIi94hISbKDisTdq+f7OL957gEOqOpTyY0qJseo6h5wfikCpiQ5nqPxj8CTyQ5iOCJyIfCGqr7q5edYIhneiDs8piIRGQ/8Bviiqh5MdjyRiMgFwD5VfSnZscQoDzgF+LGqngx0klpdL0PcsYWLgFnANKBERD6R3Kgyl4jcjNOt/ECyY4lERIqBm4FbvP4sSyTDi7bDY0px92j5DfCAqv422fGMoBa40N0V80HgbBH5ZXJDGlELzr44wRbeoziJJRW9H9ihqq2q2g/8Flia5JhisVdEpgK4f+5LcjxRiciVOBvrfVxTdw3FHJxfKl51/77NAF4WkWPj/UGWSIY34g6PqUREBKcPf6uq/ney4xmJqi5X1RnurpiXA39W1ZT9rVlV3wR2icg73EPvA7YkMaSR7ATOEJFi92fifaToxIAwK4Ar3edXAk8kMZaoRGQZcCPOjq5dyY5nOKq6UVWnhOxC2wKc4v5Mx5UlkmG4g2nBHR63Ag+r6ubkRjWsWuCTOL/dr3cf5yc7qAzyr8ADIrIBWAx8O8nxROS2mh4FXgY24vz9TqlV2CLya+B54B0i0iIinwFuB84Rke04s4tuT2aMoYaJ905gAvC0+3ftJ0kN0jVMrIn57NRtlRljjEkH1iIxxhgzJpZIjDHGjIklEmOMMWNiicQYY8yYWCIxxhgzJpZITEYTkUF3iuYmEXnEXe2bdCLy5SR85n0i8tFEf67JfJZITKbrVtXFbjXcPuCzsV7oVoD2ylEnEo/jMWbULJGYbPI3YC6AiDwuIi+5e3dcEzxBRA6LyK0i8gJwpojcIiLr3BbN3e6KcUTkORG5Q0T+6u5RcpqI/NbdU+ObIff7hIi86LaK/tfdh+V2nAq960XkgeHOixRPyH2PF5EXQ15XuwsmGS7mUCLSFNybQkRqROQ593mJu6/FOrdIZUpWvDapxRKJyQpuWfXzcFZ8A/yjqp4K1ADXiYjPPV4CbFLV01X178Cdqnqa26IpwqmvFNSnqu8GfoJT1uPzwELgKhHxicjxwGVAraouBgZxajPdxFstpY8Pd94w8QCgqluBAhGZ7R66DHjYfT5SzNHcjFO25jTgLOA/U7XasUkdeckOwBiPFYnIevf533D3FcFJHh92n88E5gF+nH/EfxNy/Vki8h84e3uUAZuB37nvBWuvbQQ2B0uhi0ije893AqcC69xGQRGRCxK+b4TzwuMJ9TBwKU5JkcvcR7SYozkXp6jmv7uvC4FK0qNml0kSSyQm03W7v+UPEZH34lTKPVNVu9xuneCWtD2qOuieVwj8CGfHwV0i8vWQ8wB63T8DIc+Dr/NwtiK4X1Wj7aA30nlD8UTwEPCIiPwWUFXdHkPMQQO81SMR+r4Al6jqtigxGzPEurZMNioFOtwkchzO9sSRBP+BbRNnr5ejnfH0J+CjIjIFhvYmr3Lf6xen9H+084alqg04LZav4iSVo4m5CacVBHBJyPFVwL+GjAWdHC0OYyyRmGz0RyDPHZy+DWe71CO4W+r+FKfr6nGcrQVipqpbgK8AT7mf9TQw1X37bmCDiDwQ5bxoHgI+gTs+chQxfwP4oYj8DScZBd0G5LuxbXJfGzMiq/5rjDFmTKxFYowxZkwskRhjjBkTSyTGGGPGxBKJMcaYMbFEYowxZkwskRhjjBkTSyTGGGPGxBKJMcaYMfn/+QOjBjzdb3YAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# for Logistic Regression\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import log_loss\n",
"\n",
"# import Matplotlib (scientific plotting library)\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"c_list = [0.1, 0.01, 0.001]\n",
"solver_list = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']\n",
"idx = []\n",
"\n",
"accuracy_score = []\n",
"for idx1, c in enumerate(c_list):\n",
" for idx2, sol in enumerate(solver_list):\n",
" idx.append(idx2 + idx1 * 5)\n",
" # perform the test\n",
" LR = LogisticRegression(C=c, solver=sol).fit(X_train, y_train)\n",
" # it can predict the outcome\n",
" lr_yhat = LR.predict(X_test)\n",
" lr_prob = LR.predict_proba(X_test)\n",
" print(\"Test \", (idx2 + idx1 * 5), \": Accuracy at c =\", c,\"solver=\", sol,\n",
" \"is : \", log_loss(y_test, lr_prob))\n",
" accuracy_score.append(log_loss(y_test, lr_prob))\n",
"lr_prob = LR.predict_proba(X_test)\n",
"log_loss(y_test, lr_prob)\n",
"# plot the relationship between K and testing accuracy\n",
"plt.plot(idx, accuracy_score)\n",
"plt.xlabel('Parameter value')\n",
"plt.ylabel('Testing Accuracy')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The result shows that using c=0.001 and solver=liblinear gives the highest accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Logistic Regression test using c=0.001 and solver=liblinear"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for Logistic Regression\n",
"from sklearn.linear_model import LogisticRegression\n",
"# prepare LR setting\n",
"LR = LogisticRegression(C=0.001, solver='liblinear').fit(X_train, y_train)\n",
"LR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Evaluation using Test set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### First, download and load the test set:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2018-11-21 02:28:14-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 3642 (3.6K) [text/csv]\n",
"Saving to: ‘loan_test.csv’\n",
"\n",
"loan_test.csv 100%[=====================>] 3.56K --.-KB/s in 0s \n",
"\n",
"2018-11-21 02:28:14 (88.2 MB/s) - ‘loan_test.csv’ saved [3642/3642]\n",
"\n"
]
}
],
"source": [
"!wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Load Test set for evaluation "
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0.49362588 0.92844966 3.05981865 1.97714211 -1.30384048 2.39791576\n",
" -0.79772404 -0.86135677]\n",
" [-3.56269116 -1.70427745 0.53336288 -0.50578054 0.76696499 -0.41702883\n",
" -0.79772404 -0.86135677]\n",
" [ 0.49362588 0.92844966 1.88080596 1.97714211 0.76696499 -0.41702883\n",
" 1.25356634 -0.86135677]\n",
" [ 0.49362588 0.92844966 -0.98251057 -0.50578054 0.76696499 -0.41702883\n",
" -0.79772404 1.16095912]\n",
" [-0.66532184 -0.78854628 -0.47721942 -0.50578054 0.76696499 2.39791576\n",
" -0.79772404 -0.86135677]]\n",
"(54, 8)\n",
"['PAIDOFF' 'PAIDOFF' 'PAIDOFF' 'PAIDOFF' 'PAIDOFF']\n",
"(54,)\n"
]
}
],
"source": [
"test_df = pd.read_csv('loan_test.csv')\n",
"# convert date time\n",
"test_df['due_date'] = pd.to_datetime(test_df['due_date'])\n",
"test_df['effective_date'] = pd.to_datetime(test_df['effective_date'])\n",
"test_df['dayofweek'] = test_df['effective_date'].dt.dayofweek\n",
"# evaulate weekend field\n",
"test_df['weekend'] = test_df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\n",
"# convert male to 0 and female to 1\n",
"test_df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\n",
"# work out education level\n",
"test_feature = test_df[['Principal','terms','age','Gender','weekend']]\n",
"test_feature = pd.concat([test_feature,pd.get_dummies(test_df['education'])], axis=1)\n",
"test_feature.drop(['Master or Above'], axis = 1,inplace=True)\n",
"# Testing feature\n",
"X_loan_test = test_feature\n",
"# normalize the test data\n",
"X_loan_test = preprocessing.StandardScaler().fit(X_loan_test).transform(X_loan_test)\n",
"# and target result\n",
"y_loan_test = test_df['loan_status'].values\n",
"y_loan_test[0:5]\n",
"print (X_loan_test[0:5])\n",
"print (X_loan_test.shape)\n",
"print (y_loan_test[0:5])\n",
"print (y_loan_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluate Result\n",
"\n",
"Evaulate the result by using 3 diferent algorithms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Jaccard"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.67, 0.74, 0.8, 0.78]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Jaccard setup\n",
"from sklearn.metrics import jaccard_similarity_score\n",
"\n",
"# evaluate KNN\n",
"knn_yhat = KNN.predict(X_loan_test)\n",
"jc1 = round(jaccard_similarity_score(y_loan_test, knn_yhat), 2)\n",
"# evaluate Decision Trees\n",
"dt_yhat = DT.predict(X_loan_test)\n",
"jc2 = round(jaccard_similarity_score(y_loan_test, dt_yhat), 2)\n",
"#evaluate SVM\n",
"svm_yhat = SVM.predict(X_loan_test)\n",
"jc3 = round(jaccard_similarity_score(y_loan_test, svm_yhat), 2)\n",
"# evaluate Logistic Regression\n",
"lr_yhat = LR.predict(X_loan_test)\n",
"jc4 = round(jaccard_similarity_score(y_loan_test, lr_yhat), 2)\n",
"\n",
"list_jc = [jc1, jc2, jc3, jc4]\n",
"list_jc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### F1-score"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.63, 0.76, 0.76, 0.73]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# F1-score setup\n",
"from sklearn.metrics import f1_score\n",
"\n",
"# evaluate KNN\n",
"fs1 = round(f1_score(y_loan_test, knn_yhat, average='weighted'), 2)\n",
"# evaluate Desision Trees \n",
"fs2 = round(f1_score(y_loan_test, dt_yhat, average='weighted'), 2)\n",
"# evaluate SVM\n",
"fs3 = round(f1_score(y_loan_test, svm_yhat, average='weighted'), 2)\n",
"# evaluate Logistic Regression\n",
"fs4 = round(f1_score(y_loan_test, lr_yhat, average='weighted'),2 )\n",
"\n",
"list_fs = [fs1, fs2, fs3, fs4]\n",
"list_fs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### LogLoss"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['NA', 'NA', 'NA', 0.67]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# LogLoss\n",
"from sklearn.metrics import log_loss\n",
"lr_prob = LR.predict_proba(X_loan_test)\n",
"list_ll = ['NA', 'NA', 'NA', round(log_loss(y_loan_test, lr_prob), 2)]\n",
"list_ll"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Report\n",
"You should be able to report the accuracy of the built model using different evaluation metrics:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Algorithm</th>\n",
" <th>Jaccard</th>\n",
" <th>F1-score</th>\n",
" <th>LogLoss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>KNN</th>\n",
" <td>0.67</td>\n",
" <td>0.63</td>\n",
" <td>NA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Decision Tree</th>\n",
" <td>0.74</td>\n",
" <td>0.76</td>\n",
" <td>NA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SVM</th>\n",
" <td>0.80</td>\n",
" <td>0.76</td>\n",
" <td>NA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Logistic Regression</th>\n",
" <td>0.78</td>\n",
" <td>0.73</td>\n",
" <td>0.67</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Algorithm Jaccard F1-score LogLoss\n",
"KNN 0.67 0.63 NA\n",
"Decision Tree 0.74 0.76 NA\n",
"SVM 0.80 0.76 NA\n",
"Logistic Regression 0.78 0.73 0.67"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"# fomulate the report format\n",
"df = pd.DataFrame(list_jc, index=['KNN','Decision Tree','SVM','Logistic Regression'])\n",
"df.columns = ['Jaccard']\n",
"df.insert(loc=1, column='F1-score', value=list_fs)\n",
"df.insert(loc=2, column='LogLoss', value=list_ll)\n",
"df.columns.name = 'Algorithm'\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Algorithm | Jaccard | F1-score | LogLoss |\n",
"|--------------------|---------|----------|---------|\n",
"| KNN | ? | ? | NA |\n",
"| Decision Tree | ? | ? | NA |\n",
"| SVM | ? | ? | NA |\n",
"| LogisticRegression | ? | ? | ? |"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Want to learn more?\n",
"\n",
"IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).\n",
"\n",
"Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)\n",
"\n",
"\n",
"<hr>\n",
"Copyright &copy; 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Thanks for completing this lesson!\n",
"\n",
"Notebook created by: <a href = \"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment