Skip to content

Instantly share code, notes, and snippets.

@kezeh32
Created February 17, 2022 04:16
Show Gist options
  • Save kezeh32/abe7db292d5c281413f13cda99dc22be to your computer and use it in GitHub Desktop.
Save kezeh32/abe7db292d5c281413f13cda99dc22be to your computer and use it in GitHub Desktop.
Machine prediction
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": "<a href=\"https://www.github.com.com\"><img src = \"https://github.com/kezeh32/testRepo/raw/main/lantern_analyst_logo.png\" width = 150, align = \"center\"></a>\n\n<h1 align=center><font size = 5> Classification with Python</font></h1>"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This notebook uses classification algorithms to predict outcomes.\n\nDataset is loaded using Pandas library, algorithms are applied to find the best one for this specific dataset by accuracy evaluation methods.\n\nLets first load required libraries:"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": "import itertools\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.ticker import NullFormatter\nimport pandas as pd\nimport numpy as np\nimport matplotlib.ticker as ticker\nfrom sklearn import preprocessing\n%matplotlib inline"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:\n\n| Field | Description |\n|----------------|---------------------------------------------------------------------------------------|\n| Loan_status | Whether a loan is paid off on in collection |\n| Principal | Basic principal loan amount at the |\n| Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |\n| Effective_date | When the loan got originated and took effects |\n| Due_date | Since it\u2019s one-time payoff schedule, each loan has one single due date |\n| Age | Age of applicant |\n| Education | Education of applicant |\n| Gender | The gender of applicant |"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "I download the dataset:"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "--2022-02-17 03:21:04-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv\nResolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196\nConnecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 23101 (23K) [text/csv]\nSaving to: \u2018loan_train.csv\u2019\n\nloan_train.csv 100%[===================>] 22.56K --.-KB/s in 0.001s \n\n2022-02-17 03:21:05 (15.4 MB/s) - \u2018loan_train.csv\u2019 saved [23101/23101]\n\n"
}
],
"source": "!wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### Load Data From CSV File "
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/8/2016</td>\n <td>10/7/2016</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>male</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/8/2016</td>\n <td>10/7/2016</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>female</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>9/8/2016</td>\n <td>9/22/2016</td>\n <td>27</td>\n <td>college</td>\n <td>male</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/9/2016</td>\n <td>10/8/2016</td>\n <td>28</td>\n <td>college</td>\n <td>female</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/9/2016</td>\n <td>10/8/2016</td>\n <td>29</td>\n <td>college</td>\n <td>male</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 9/8/2016 \n1 2 2 PAIDOFF 1000 30 9/8/2016 \n2 3 3 PAIDOFF 1000 15 9/8/2016 \n3 4 4 PAIDOFF 1000 30 9/9/2016 \n4 6 6 PAIDOFF 1000 30 9/9/2016 \n\n due_date age education Gender \n0 10/7/2016 45 High School or Below male \n1 10/7/2016 33 Bechalor female \n2 9/22/2016 27 college male \n3 10/8/2016 28 college female \n4 10/8/2016 29 college male "
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df = pd.read_csv('loan_train.csv')\ndf.head()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": " Get summary of rows and columns:"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "(346, 10)"
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df.shape"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Convert date columns to date objects:"
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>male</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>female</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>2016-09-08</td>\n <td>2016-09-22</td>\n <td>27</td>\n <td>college</td>\n <td>male</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>28</td>\n <td>college</td>\n <td>female</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>29</td>\n <td>college</td>\n <td>male</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 2016-09-08 \n1 2 2 PAIDOFF 1000 30 2016-09-08 \n2 3 3 PAIDOFF 1000 15 2016-09-08 \n3 4 4 PAIDOFF 1000 30 2016-09-09 \n4 6 6 PAIDOFF 1000 30 2016-09-09 \n\n due_date age education Gender \n0 2016-10-07 45 High School or Below male \n1 2016-10-07 33 Bechalor female \n2 2016-09-22 27 college male \n3 2016-10-08 28 college female \n4 2016-10-08 29 college male "
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df['due_date'] = pd.to_datetime(df['due_date'])\ndf['effective_date'] = pd.to_datetime(df['effective_date'])\ndf.head()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Data visualization and pre-processing\nLet's see how many of each class is in our dataset:"
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "PAIDOFF 260\nCOLLECTION 86\nName: loan_status, dtype: int64"
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df['loan_status'].value_counts()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Plotting columns in Seaborn:"
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAbBklEQVR4nO3de5xVdb3/8ddbnBwRzQuTIoQzKpIg/HY6aWZ2EI3wBnEsxcyk4zmkqcUps9CyTj4yE0rzeAtPhI+OoFSKhnmL4BiWF8BRwAveJpwEBOyRkkCAn98fe824Gfcwl71nZs3e7+fjsR57re9el89i9pfP/n7X2uuriMDMzCxtduruAMzMzPJxgjIzs1RygjIzs1RygjIzs1RygjIzs1RygjIzs1RyguokkvaVNFPSy5IWS/qzpHFF2vcISXOLsa+uIGmBpNrujsO6RynVBUlVkh6T9KSkYzvxOBs6a989iRNUJ5AkYA7wcEQcGBFHAOOBAd0Uz87dcVyzEqwLxwPPRcSHI+KPxYjJWuYE1TlGAv+MiJsbCyLiLxHx3wCSekmaIukJSU9L+lJSPiJpbfxa0nOSbksqOJJGJ2ULgX9t3K+k3SRNT/b1pKSxSfkESb+S9FvgwUJORtIMSTdJmp98C/6X5JjPSpqRs95NkhZJWi7pv1rY16jkG/SSJL4+hcRmqVcydUFSBrgaOElSnaRdW/o8S6qXdGXy3iJJh0t6QNJLks5L1ukjaV6y7dLGePMc9xs5/z5561XJighPRZ6ArwDX7OD9icC3k/ldgEVADTAC+DvZb5c7AX8GPg5UAq8CgwABs4G5yfZXAp9P5vcEVgC7AROABmDvFmL4I1CXZzohz7ozgNuTY48F3gSGJTEuBjLJensnr72ABcDwZHkBUAv0BR4GdkvKvwlc3t1/L0+dN5VgXZgAXJ/Mt/h5BuqB85P5a4Cngd2BKuD1pHxnYI+cfb0IKFnekLyOAqYl57oTMBf4RHf/XbtqctdPF5B0A9nK9c+I+AjZD91wSZ9JVnk/2Qr3T+DxiGhItqsDqoENwCsR8UJS/r9kKzbJvsZIujhZrgQGJvMPRcQb+WKKiPb2n/82IkLSUmBNRCxNYlmexFgHnC5pItmK1w8YQrZiNvpoUvZI8mX4fWT/47EyUSJ1oVFrn+d7ktelQJ+IeAt4S9ImSXsC/wCulPQJ4B2gP7AvsDpnH6OS6clkuQ/Zf5+HOxhzj+IE1TmWA6c1LkTEBZL6kv12CNlvQxdFxAO5G0kaAWzOKdrGu3+jlh6aKOC0iHi+2b6OIlsB8m8k/ZHsN7rmLo6I3+cpb4zrnWYxvgPsLKkGuBj4SET8Len6q8wT60MRcWZLcVnJKcW6kHu8HX2ed1hngLPItqiOiIgtkurJX2d+GBE/20EcJcvXoDrHH4BKSefnlPXOmX8AOF9SBYCkQyTttoP9PQfUSDooWc6tEA8AF+X0z3+4LQFGxLERkckz7ahC7sgeZP8T+LukfYET86zzKHCMpIOTWHtLOqSDx7OeoZTrQqGf5/eT7e7bIuk44IA86zwA/FvOta3+kj7QjmP0aE5QnSCyncefBv5F0iuSHgduJdtHDfA/wDPAEknLgJ+xg9ZsRGwi241xb3Jh+C85b18BVABPJ/u6osin0yYR8RTZbojlwHTgkTzrrCXbhz9L0tNkK/iHujBM62KlXBeK8Hm+DaiVtIhsa+q5PMd4EJgJ/DnpXv81+Vt7JanxgpyZmVmquAVlZmap5ARlZmap5ARlZmap5ARlZmaplIoENXr06CD72wZPnkphKirXD08lNrVZKhLUunXrujsEs9Ry/bBylYoEZWZm1pwTlJmZpZITlJmZpZIfFmtmJWXLli00NDSwadOm7g6lrFVWVjJgwAAqKio6vA8nKDMrKQ0NDey+++5UV1eTPDfWulhEsH79ehoaGqipqenwftzFZ2YlZdOmTeyzzz5OTt1IEvvss0/BrVgnKCsbB/Trh6SCpwP69evuU7FWODl1v2L8DdzFZ2Vj5erVNOw/oOD9DHitoQjRmFlr3IIys5JWrJZze1rQvXr1IpPJcNhhh/HZz36Wt99+G4CtW7fSt29fJk+evN36I0aMYNGi7CDD1dXVDBs2jGHDhjFkyBC+/e1vs3nzuwPyLl++nJEjR3LIIYcwaNAgrrjiChqHTZoxYwZVVVVkMhkymQxf+MIXAJgwYQI1NTVN5dddd11R/m07m1tQZlbSitVybtSWFvSuu+5KXV0dAGeddRY333wzX/va13jwwQcZPHgws2fP5sorr2yxG2z+/Pn07duXDRs2MHHiRCZOnMitt97Kxo0bGTNmDDfddBOjRo3i7bff5rTTTuPGG2/kggsuAOCMM87g+uuvf88+p0yZwmc+85mOn3g3aLUFJWm6pNeTESoby74n6a+S6pLppJz3Jkt6UdLzkj7VWYGbmfUExx57LC+++CIAs2bN4qtf/SoDBw7k0UcfbXXbPn36cPPNNzNnzhzeeOMNZs6cyTHHHMOoUaMA6N27N9dffz1XXXVVp55Dd2lLF98MYHSe8msiIpNMvwOQNAQYDwxNtrlRUq9iBWtm1pNs3bqV++67j2HDhrFx40bmzZvHKaecwplnnsmsWbPatI899tiDmpoaXnjhBZYvX84RRxyx3fsHHXQQGzZs4M033wTgjjvuaOrK+8UvftG03je+8Y2m8qVLlxbvJDtRqwkqIh4G3mjj/sYCt0fE5oh4BXgROLKA+MzMepyNGzeSyWSora1l4MCBnHvuucydO5fjjjuO3r17c9ppp3HXXXexbdu2Nu2v8RpTRLTYLdhYfsYZZ1BXV0ddXR1f/OIXm96fMmVKU/mwYcMKPMOuUcg1qAslfQFYBHw9Iv4G9Ady260NSdl7SJoITAQYOHBgAWGYlR7Xj54t9xpUo1mzZvHII49QXV0NwPr165k/fz4nnHDCDvf11ltvUV9fzyGHHMLQoUN5+OGHt3v/5Zdfpk+fPuy+++7FPIVU6OhdfDcBBwEZYBXw46Q8X2rPO/5HREyLiNqIqK2qqupgGGalyfWjtLz55pssXLiQlStXUl9fT319PTfccEOr3XwbNmzgy1/+Mp/+9KfZa6+9OOuss1i4cCG///3vgWxL7Stf+QqXXHJJV5xGl+tQCyoi1jTOS7oFmJssNgAfzFl1APBah6MzMyvQwP32K+pv1wbut1+7t7nzzjsZOXIku+yyS1PZ2LFjueSSS7a7hbzRcccdR0TwzjvvMG7cOL7zne8A2ZbZ3XffzUUXXcQFF1zAtm3bOPvss7nwwgs7fkIppsa+zR2uJFUDcyPisGS5X0SsSub/EzgqIsZLGgrMJHvdaX9gHjAoInbY0VpbWxuNvwEw6yySivZD3VbqTVEfY+D60T7PPvsshx56aHeHYbT4t2hz/Wi1BSVpFjAC6CupAfguMEJShmz3XT3wJYCIWC5pNvAMsBW4oLXkZGZmlk+rCSoizsxT/PMdrP8D4AeFBGVmZuZHHZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZlZSdt/wMCiDrex/4DWn+yxevVqxo8fz0EHHcSQIUM46aSTWLFiRatDZeT7PVN1dTXr1q3brqz5sBqZTIZnnnkGgBUrVnDSSSdx8MEHc+ihh3L66adv93y+Pn36MHjw4KbhOBYsWMApp5zStO85c+YwfPhwPvShDzFs2DDmzJnT9N6ECRPo379/02+31q1b1/RkjM7g4TbMrKSt+uurHHX5/UXb32Pfz/fs7HdFBOPGjeOcc87h9ttvB6Curo41a9YwYcKEHQ6V0R75htXYtGkTJ598Mj/5yU849dRTgezQHVVVVU2PXhoxYgRTp06ltrYWgAULFjRt/9RTT3HxxRfz0EMPUVNTwyuvvMInP/lJDjzwQIYPHw5kx7qaPn06559/frtjbi+3oMzMimj+/PlUVFRw3nnnNZVlMhlWrFjR6UNlzJw5k6OPPropOUH2qRSHHXZYm7afOnUql156KTU1NQDU1NQwefJkpkyZ0rTOpEmTuOaaa9i6dWvR4m6JE5SZWREtW7bsPUNiAG0aKqM9crvtMpkMGzdubPHYbZUvxtraWpYvX960PHDgQD7+8Y/zy1/+ssPHaSt38ZmZdYG2DJXRHi2NnFuIfDHmK7v00ksZM2YMJ598clGP35xbUGZmRTR06FAWL16ct7z5MxWLPVRGS8duz/bNY1yyZAlDhgzZruzggw8mk8kwe/bsDh+rLZygzMyKaOTIkWzevJlbbrmlqeyJJ55g0KBBnT5Uxuc+9zn+9Kc/ce+99zaV3X///W0eQffiiy/mhz/8IfX19QDU19dz5ZVX8vWvf/0961522WVMnTq1KHG3xF18ZlbS+vX/YKt33rV3fzsiibvuuotJkyZx1VVXUVlZSXV1Nddee22rQ2XMmDFju9u6H300O/7r8OHD2WmnbHvi9NNPZ/jw4dxxxx0sXLiwad0bb7yRj33sY8ydO5dJkyYxadIkKioqGD58OD/96U/bdG6ZTIYf/ehHnHrqqWzZsoWKigquvvpqMpnMe9YdOnQohx9+OEuWLGnTvjuiTcNtdDYPJ2BdwcNtlAcPt5EehQ630WoXn6Tpkl6XtCynbIqk5yQ9LekuSXsm5dWSNkqqS6ab2xqImZlZrrZcg5oBNG8fPwQcFhHDgRXA5Jz3XoqITDKdh5mZWQe0mqAi4mHgjWZlD0ZE46+0HiU7tLuZWSqk4dJFuSvG36AYd/H9G3BfznKNpCcl/Z+kY1vaSNJESYskLVq7dm0RwjArHa4fHVdZWcn69eudpLpRRLB+/XoqKysL2k9Bd/FJuozs0O63JUWrgIERsV7SEcAcSUMj4j0/k46IacA0yF4ELiQOs1Lj+tFxAwYMoKGhASf27lVZWcmAAYV1rnU4QUk6BzgFOD6SryoRsRnYnMwvlvQScAjgW5DMrEtUVFQ0PUvOerYOdfFJGg18ExgTEW/nlFdJ6pXMHwgMAl4uRqBmZlZeWm1BSZoFjAD6SmoAvkv2rr1dgIeSZzQ9mtyx9wng+5K2AtuA8yLijbw7NjMz24FWE1REnJmn+OctrPsb4DeFBmVmZuZn8ZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSo5QZmZWSq1mqAkTZf0uqRlOWV7S3pI0gvJ6145702W9KKk5yV9qrMCNzOz0taWFtQMYHSzsm8B8yJiEDAvWUbSEGA8MDTZ5sbGEXbNzMzao9UEFREPA81HxR0L3JrM3wp8Oqf89ojYHBGvAC8CRxYnVDMzKycdvQa1b0SsAkheP5CU9wdezVmvISl7D0kTJS2StGjt2rUdDMOsNLl+mBX/JgnlKYt8K0bEtIiojYjaqqqqIodh1rO5fph1PEGtkdQPIHl9PSlvAD6Ys94A4LWOh2dmZuWqownqHuCcZP4c4O6c8vGSdpFUAwwCHi8sRDMzK0c7t7aCpFnACKCvpAbgu8BVwGxJ5wIrgc8CRMRySbOBZ4CtwAURsa2TYjczsxLWaoKKiDNbeOv4Ftb/AfCDQoIyMzPzkyTMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzCyVWn2aeUskDQbuyCk6ELgc2BP4D6BxnOpLI+J3HT2OmZmVpw4nqIh4HsgASOoF/BW4C/gicE1ETC1GgGZmVp6K1cV3PPBSRPylSPszM7MyV6wENR6YlbN8oaSnJU2XtFe+DSRNlLRI0qK1a9fmW8WsbLl+mBUhQUl6HzAG+FVSdBNwENnuv1XAj/NtFxHTIqI2ImqrqqoKDcOspLh+mBWnBXUisCQi1gBExJqI2BYR7wC3AEcW4RhmZlZmipGgziSne09Sv5z3xgHLinAMMzMrMx2+iw9AUm/gk8CXcoqvlpQBAqhv9p6ZmVmbFJSgIuJtYJ9mZWcXFJGZmRl+koSZmaWUE5SZmaWSE5SZmaWSE5SZmaWSE5SZmaWSE5SZmaVSQbeZm/Uk6lXBgNcairIfM+t8TlBWNmLbFo66/P6C9/PY90cXIRoza427+MzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJUKHQ+qHngL2AZsjYhaSXsDdwDVZMeDOj0i/lZYmGZmVm6K0YI6LiIyEVGbLH8LmBcRg4B5ybKVoQP69UNSwdMB/fq1fjAzKzmd8UPdscCIZP5WYAHwzU44jqXcytWradh/QMH7KcbTH8ys5ym0BRXAg5IWS5qYlO0bEasAktcP5NtQ0kRJiyQtWrt2bYFhmJUW1w+zwhPUMRFxOHAicIGkT7R1w4iYFhG1EVFbVVVVYBhmpcX1w6zABBURryWvrwN3AUcCayT1A0heXy80SDMzKz8dTlCSdpO0e+M8MApYBtwDnJOsdg5wd6FBmplZ+SnkJol9gbskNe5nZkTcL+kJYLakc4GVwGcLD9PMzMpNhxNURLwM/L885euB4wsJyszMzE+SMDOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjMrc2kd/bozRtQ1M7MeJK2jX7sFZWZmqVTIeFAflDRf0rOSlkv6alL+PUl/lVSXTCcVL1wzMysXhXTxbQW+HhFLkoELF0t6KHnvmoiYWnh4ZmZWrgoZD2oVsCqZf0vSs0D/YgVmZmblrSjXoCRVAx8GHkuKLpT0tKTpkvZqYZuJkhZJWrR27dpihGFWMlw/zIqQoCT1AX4DTIqIN4GbgIOADNkW1o/zbRcR0yKiNiJqq6qqCg3DrKS4fpgVmKAkVZBNTrdFxJ0AEbEmIrZFxDvALcCRhYdpZmblppC7+AT8HHg2In6SU577S61xwLKOh2dmZuWqkLv4jgHOBpZKqkvKLgXOlJQBAqgHvlTAMczMrEwVchffQkB53vpdx8MxMzPL8pMkzMwslfwsPus06lVRlGdzqVdFEaIxs57GCco6TWzbwlGX31/wfh77/ugiRGNmPY27+MzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoMzMulhah1hPG9/FZ2bWxdI6xHrauAVlZmap5ARlZmap5C4+M7Myl9anvjhBmZmVubQ+9cVdfGZmlkqdlqAkjZb0vKQXJX2r0P35tkwzs/LSKV18knoBNwCfBBqAJyTdExHPdHSfvi3TzKy8dNY1qCOBFyPiZQBJtwNjgQ4nqLQ5oF8/Vq5eXfB+Bu63H39ZtaoIEZU2Kd/YmJZGrhutK9ZNCTv1qijpuqGIKP5Opc8AoyPi35Pls4GjIuLCnHUmAhOTxcHA80UPpO36Auu68fiF6Kmx99S4ofXY10VEQVeLU1Q/SvnvlGY9Nfa2xN3m+tFZLah8KX27TBgR04BpnXT8dpG0KCJquzuOjuipsffUuKFrYk9L/fDfqXv01NiLHXdn3STRAHwwZ3kA8FonHcvMzEpQZyWoJ4BBkmokvQ8YD9zTSccyM7MS1CldfBGxVdKFwANAL2B6RCzvjGMVSbd3pRSgp8beU+OGnh17e/Xkc3XsXa+ocXfKTRJmZmaF8pMkzMwslZygzMwslcomQUnqJelJSXOT5b0lPSTpheR1r5x1JyePaHpe0qe6L2qQtKekX0t6TtKzko7uCbFL+k9JyyUtkzRLUmVa45Y0XdLrkpbllLU7VklHSFqavHedesgvKF03uiV214+21I+IKIsJ+BowE5ibLF8NfCuZ/xbwo2R+CPAUsAtQA7wE9OrGuG8F/j2Zfx+wZ9pjB/oDrwC7JsuzgQlpjRv4BHA4sCynrN2xAo8DR5P9HeB9wInd9blp5/m7bnRt3K4fbawf3V45uugfeAAwDxiZUwmfB/ol8/2A55P5ycDknG0fAI7uprj3SD7Ialae6tiTCvgqsDfZO0XnAqPSHDdQ3awCtivWZJ3ncsrPBH7WHZ+bdp6360bXx+760cb6US5dfNcClwDv5JTtGxGrAJLXDyTljR+eRg1JWXc4EFgL/CLpgvkfSbuR8tgj4q/AVGAlsAr4e0Q8SMrjbqa9sfZP5puXp921uG50KdeP7cp3qOQTlKRTgNcjYnFbN8lT1l334u9Mtml9U0R8GPgH2eZ0S1IRe9IfPZZsE39/YDdJn9/RJnnK0vr7h5Zi7UnnALhu0E2xu35sV75DJZ+ggGOAMZLqgduBkZL+F1gjqR9A8vp6sn6aHtPUADRExGPJ8q/JVsq0x34C8EpErI2ILcCdwMdIf9y52htrQzLfvDzNXDe6h+tHG8+h5BNUREyOiAERUU32kUt/iIjPk3300jnJaucAdyfz9wDjJe0iqQYYRPbiXpeLiNXAq5IGJ0XHkx2yJO2xrwQ+Kql3cqfO8cCzpD/uXO2KNenmeEvSR5Nz/kLONqnkutFtnzHXj7bWj+64SNhdEzCCdy8E70P24vALyeveOetdRvbuk+fp5juxgAywCHgamAPs1RNiB/4LeA5YBvyS7F09qYwbmEX2WsAWst/0zu1IrEBtcr4vAdfT7AJ+mifXjS6P3fWjDfXDjzoyM7NUKvkuPjMz65mcoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoFJM0jZJdckTj38lqXcL6/2pg/uvlXRdAfFt6Oi2ZoVw3SgPvs08xSRtiIg+yfxtwOKI+EnO+70iYlsa4jPrSq4b5cEtqJ7jj8DBkkZImi9pJrAU3v22lry3QO+OkXNb45grkj4i6U+SnpL0uKTdk/UbxwD6nqRfSvpDMsbLfyTlfSTNk7QkGctlbPecvlmLXDdK1M7dHYC1TtLOwInA/UnRkcBhEfFKntU/DAwl+5yrR4BjJD0O3AGcERFPSNoD2Jhn2+HAR4HdgCcl3Uv2GVvjIuJNSX2BRyXdE256Wwq4bpQ2t6DSbVdJdWQf57IS+HlS/ngLFbDxvYaIeAeoIzuOy2BgVUQ8ARARb0bE1jzb3h0RGyNiHTCfbGUXcKWkp4Hfk31E/r7FODmzArhulAG3oNJtY0RkcguSXol/7GCbzTnz28j+jUXbHs/ffJ0AzgKqgCMiYouyT76ubMO+zDqT60YZcAuqPDwH7C/pIwBJH3u+LydjJVVK2ofsw0OfAN5PdsygLZKOAw7oqqDNuoDrRoq5BVUGIuKfks4A/lvSrmT72E/Is+rjwL3AQOCKiHgtuUPqt5IWke0Wea6LwjbrdK4b6ebbzA3I3qkEbIiIqd0di1mauG50H3fxmZlZKrkFZWZmqeQWlJmZpZITlJmZpZITlJmZpZITlJmZpZITlJmZpdL/B7A+/1urYJiLAAAAAElFTkSuQmCC\n",
"text/plain": "<Figure size 432x216 with 2 Axes>"
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": "import seaborn as sns\n\nbins = np.linspace(df.Principal.min(), df.Principal.max(), 10)\ng = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'Principal', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()"
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAZB0lEQVR4nO3de5QU5bnv8e9PmDgiGEFGGR1hRsULChl1djTBJIjKYXtDj5dojIF1POFo8MKKxqi5rJPtWoREl5psbyHRwEoCyt5RcJMVFQkcg1EjIl4QIx4d2bPlrolyBALynD+6ZjLAwPQM1dPVPb/PWrW66+3qt56X6Zen663qehURmJmZZc1exQ7AzMysLU5QZmaWSU5QZmaWSU5QZmaWSU5QZmaWSU5QZmaWSU5QKZN0kKTpkt6W9KKkZyWdn1LdIyTNSaOuriBpgaSGYsdhxVdO/UJSlaTnJb0k6QsF3M+GQtVdKpygUiRJwCzg6Yg4LCJOBC4BaooUT89i7NestTLsF6cBb0TE8RHxxzRisrY5QaVrJPD3iLi/uSAi3o2IfwWQ1EPSbZJekPSKpP+VlI9Ijjb+XdIbkn6TdGokjU7KFgL/vbleSftKejCp6yVJY5LycZL+TdJ/AE/uSWMkTZV0n6T5yTffLyX7XCZpaqvt7pO0SNJSST/YRV2jkm/Ni5P4eu9JbFZSyqZfSKoHfgycKWmJpH129dmW1ChpUvLaIkknSHpC0v+VdGWyTW9J85L3vtocbxv7/Varf582+1hZiggvKS3AtcCdu3l9PPDd5PnewCKgDhgB/I3cN8q9gGeBU4BK4D+BwYCAmcCc5P2TgK8mz/cH3gT2BcYBTUC/XcTwR2BJG8vpbWw7FXgo2fcY4ENgaBLji0B9sl2/5LEHsAAYlqwvABqA/sDTwL5J+beB7xf77+Wla5Yy7BfjgLuT57v8bAONwFXJ8zuBV4A+QBWwJinvCezXqq63ACXrG5LHUcCUpK17AXOALxb779oVi4eACkjSPeQ61N8j4p/IfdCGSbow2eTT5DrZ34E/R0RT8r4lQC2wAXgnIpYn5b8m15lJ6jpX0g3JeiUwMHk+NyLebyumiOjomPl/RERIehVYHRGvJrEsTWJcAlwsaTy5zlYNDCHXGZudnJQ9k3wB/hS5/2ysGyqTftGsvc/2Y8njq0DviPgI+EjSJkn7A/8PmCTpi8A24BDgIGBVqzpGJctLyXpvcv8+T3cy5pLhBJWupcAFzSsRMUFSf3LfCCH3DeiaiHii9ZskjQA2tyr6hH/8bXZ1s0QBF0TEX3ao6yRyH/q23yT9kdy3uB3dEBFPtVHeHNe2HWLcBvSUVAfcAPxTRHyQDP1VthHr3Ii4dFdxWVkrx37Ren+7+2zvtv8Al5E7ojoxIrZIaqTt/vPDiPjZbuIoSz4Hla4/AJWSrmpV1qvV8yeAqyRVAEg6UtK+u6nvDaBO0uHJeutO8ARwTasx+ePzCTAivhAR9W0su+uEu7MfuY7/N0kHAf/cxjbPAcMlHZHE2kvSkZ3cn5Wecu4Xe/rZ/jS54b4tkk4FBrWxzRPA/2h1busQSQd2YB8lywkqRZEbMD4P+JKkdyT9GZhGblwa4BfA68BiSa8BP2M3R7ERsYnc0MXvkpPB77Z6+VagAnglqevWlJuTl4h4mdzQw1LgQeCZNrZZS27cfoakV8h16qO7MEwronLuFyl8tn8DNEhaRO5o6o029vEkMB14Nhlq/3faPtorO80n48zMzDLFR1BmZpZJTlBmZpZJTlBmZpZJTlBmZpZJXZqgRo8eHeR+v+DFS3dYOsX9xEs3XNrUpQlq3bp1Xbk7s5LkfmKW4yE+MzPLJCcoMzPLJCcoMzPLJN8s1szK3pYtW2hqamLTpk3FDqVbq6yspKamhoqKiry2d4Iys7LX1NREnz59qK2tJbmPrHWxiGD9+vU0NTVRV1eX13s8xGdmZW/Tpk0ccMABTk5FJIkDDjigQ0exTlBFMKi6GkmpLIOqq4vdHLOS4ORUfB39G3iIrwhWrFpF08E1qdRV815TKvWYmWWNj6DMrNtJcxQj35GMHj16UF9fz3HHHcdFF13Exx9/DMDWrVvp378/N99883bbjxgxgkWLcpMO19bWMnToUIYOHcqQIUP47ne/y+bN/5igd+nSpYwcOZIjjzySwYMHc+utt9I8ldLUqVOpqqqivr6e+vp6vva1rwEwbtw46urqWsp/+tOfpvJvmyYfQZlZt5PmKAbkN5Kxzz77sGTJEgAuu+wy7r//fr75zW/y5JNPctRRRzFz5kwmTZq0y2Gw+fPn079/fzZs2MD48eMZP34806ZNY+PGjZx77rncd999jBo1io8//pgLLriAe++9lwkTJgDw5S9/mbvvvnunOm+77TYuvPDCzje8wHwEZWbWxb7whS/w1ltvATBjxgyuu+46Bg4cyHPPPdfue3v37s3999/PrFmzeP/995k+fTrDhw9n1KhRAPTq1Yu7776byZMnF7QNXcEJysysC23dupXf//73DB06lI0bNzJv3jzOPvtsLr30UmbMmJFXHfvttx91dXUsX76cpUuXcuKJJ273+uGHH86GDRv48MMPAXj44YdbhvJ++ctftmz3rW99q6X81VdfTa+RKfEQn5lZF9i4cSP19fVA7gjqiiuuYPbs2Zx66qn06tWLCy64gFtvvZU777yTHj16tFtf8zmmiNjlsGBzeakO8eWVoCQ1Ah8BnwBbI6JBUj/gYaAWaAQujogPChOmmVlpa30OqtmMGTN45plnqK2tBWD9+vXMnz+f008/fbd1ffTRRzQ2NnLkkUdy7LHH8vTTT2/3+ttvv03v3r3p06dPmk3och0Z4js1IuojoiFZvwmYFxGDgXnJupmZ5eHDDz9k4cKFrFixgsbGRhobG7nnnnvaHebbsGED3/jGNzjvvPPo27cvl112GQsXLuSpp54Cckdq1157LTfeeGNXNKOg9mSIbwwwInk+DVgAfHsP4zEzK7iBAwak+hvCgQMGdPg9jzzyCCNHjmTvvfduKRszZgw33njjdpeQNzv11FOJCLZt28b555/P9773PSB3ZDZ79myuueYaJkyYwCeffMLll1/O1Vdf3fkGZYSaxzF3u5H0DvABuZkPfxYRUyT9NSL2b7XNBxHRt433jgfGAwwcOPDEd999N63YS5akVH+om8/f0Ioi75/Nu58U1rJlyzjmmGOKHYaxy79Fm30l3yG+4RFxAvDPwARJX8w3mIiYEhENEdFQVVWV79vMuhX3E7Od5ZWgIuK95HEN8CjwWWC1pGqA5HFNoYI0M7Pup90EJWlfSX2anwOjgNeAx4CxyWZjgdmFCtLMzLqffC6SOAh4NLmevicwPSIel/QCMFPSFcAK4KLChWlmZt1NuwkqIt4GPtNG+XrgtEIEZWZm5lsdmZlZJjlBmVm3c3DNwFSn2zi4ZmC7+1y1ahWXXHIJhx9+OEOGDOHMM8/kzTffbHeqjLZ+z1RbW8u6deu2K9txWo36+npef/11AN58803OPPNMjjjiCI455hguvvji7e7P17t3b4466qiW6TgWLFjA2Wef3VL3rFmzGDZsGEcffTRDhw5l1qxZLa+NGzeOQw45pOW3W+vWrWu5M8ae8r348jSoupoVq1YVOwwzS8HK//pPTvr+46nV9/y/jN7t6xHB+eefz9ixY3nooYcAWLJkCatXr2bcuHG7nSqjI9q6596mTZs466yzuOOOOzjnnHOA3NQdVVVVLbdeGjFiBLfffjsNDbkbBS1YsKDl/S+//DI33HADc+fOpa6ujnfeeYczzjiDww47jGHDhgG5ua4efPBBrrrqqg7HvDtOUHnyLLhm1lnz58+noqKCK6+8sqWsvr6eBx54oM2pMkaMGNGpBNWW6dOn87nPfa4lOUHurhT5uv3227nllluoq6sDoK6ujptvvpnbbruNX/3qVwBMnDiRO++8k69//eupxNzMQ3xmZgX22muv7TQlBpDXVBkd0XrYrr6+no0bN+5y3/lqK8aGhgaWLl3asj5w4EBOOeWUloSVFh9BmZkVST5TZXTErqbV2BNtxdhW2S233MK5557LWWedldq+fQRlZlZgxx57LC+++GKb5YsWLdquLO2pMna17468f8cYFy9ezJAhQ7YrO+KII6ivr2fmzJmd3teOnKDMzAps5MiRbN68mZ///OctZS+88AKDBw8u+FQZX/nKV/jTn/7E7373u5ayxx9/PO8ZdG+44QZ++MMf0tjYCEBjYyOTJk3i+uuv32nb73znO9x+++2pxA0e4jOzbqj6kEPbvfKuo/XtjiQeffRRJk6cyOTJk6msrKS2tpa77rqr3akypk6dut1l3c899xwAw4YNY6+9cscYF198McOGDePhhx9m4cKFLdvee++9fP7zn2fOnDlMnDiRiRMnUlFRwbBhw/jJT36SV9vq6+v50Y9+xDnnnMOWLVuoqKjgxz/+ccvswK0de+yxnHDCCSxevDivutuT13QbaWloaIgdDxVLRdpTZHi6jW6h4ycRKO1+klWebiM7CjHdhpmZWZdygjIzs0xygjKzbsFD4cXX0b+BE5SZlb3KykrWr1/vJFVEEcH69euprKzM+z2+is/Myl5NTQ1NTU2sXbu22KF0a5WVldTU5H+BmBNUidubzv3ivC0DBwzg3ZUrU6nLLEsqKipa7iVnpcMJqsRtBt/E1szKUt7noCT1kPSSpDnJej9JcyUtTx77Fi5MMzPrbjpykcR1wLJW6zcB8yJiMDAvWTczM0tFXglKUg1wFvCLVsVjgGnJ82nAealGZmZm3Vq+R1B3ATcC21qVHRQRKwGSxwPbeqOk8ZIWSVrkK2jM2uZ+YrazdhOUpLOBNRHRqfu1R8SUiGiIiIaqqqrOVGFW9txPzHaWz1V8w4FzJZ0JVAL7Sfo1sFpSdUSslFQNrClkoGZm1r20ewQVETdHRE1E1AKXAH+IiK8CjwFjk83GArMLFqWZmXU7e3Kro8nAGZKWA2ck62ZmZqno0A91I2IBsCB5vh44Lf2QzMzMfLNYMzPLKCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLpHYTlKRKSX+W9LKkpZJ+kJT3kzRX0vLksW/hwzUzs+4inyOozcDIiPgMUA+MlnQycBMwLyIGA/OSdTMzs1S0m6AiZ0OyWpEsAYwBpiXl04DzChGgmZl1T3mdg5LUQ9ISYA0wNyKeBw6KiJUAyeOBu3jveEmLJC1au3ZtSmGblRf3E7Od5ZWgIuKTiKgHaoDPSjou3x1ExJSIaIiIhqqqqk6GaVbe3E/Mdtahq/gi4q/AAmA0sFpSNUDyuCbt4MzMrPvK5yq+Kkn7J8/3AU4H3gAeA8Ymm40FZhcoRjMz64Z65rFNNTBNUg9yCW1mRMyR9CwwU9IVwArgogLGaWZm3Uy7CSoiXgGOb6N8PXBaIYIyMzPznSTMzCyTnKDMzCyTnKDMzCyTnKDMzCyTyjpBDaquRlIqi5mZda18LjMvWStWraLp4JpU6qp5rymVeszMLD9lfQRlZmalywnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyyQnKzMwyqd0EJelQSfMlLZO0VNJ1SXk/SXMlLU8e+xY+XDMz6y7yOYLaClwfEccAJwMTJA0BbgLmRcRgYF6ybmZmlop2E1RErIyIxcnzj4BlwCHAGGBastk04LwCxWhmZt1Qh85BSaoFjgeeBw6KiJWQS2LAgbt4z3hJiyQtWrt27R6Ga1ae3E/MdpZ3gpLUG/gtMDEiPsz3fRExJSIaIqKhqqqqMzGalT33E7Od5ZWgJFWQS06/iYhHkuLVkqqT16uBNYUJ0czMuqN8ruIT8ACwLCLuaPXSY8DY5PlYYHb64VlX2ht2O+19R5ZB1dXFbo6Zlbh8pnwfDlwOvCppSVJ2CzAZmCnpCmAFcFFBIrQusxloOrgmlbpq3mtKpR4z677aTVARsRDQLl4+Ld1wsks9KlL7T1c9P5VeXT0qUqnHzCxr8jmCMiA+2cJJ3388lbqe/5fRqdZlZlaOfKsjMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLpLK+k0SatycyM7OuVdYJKu3bE5mZWdfxEJ+ZmWWSE5SZmWWSE5SZmWVSWZ+D6g5SnafKc0tZhgyqrmbFqlWp1LXPXj3YuO2TVOoaOGAA765cmUpdtntOUCXOF4JYuVqxalWqMzx7tujS0+4Qn6QHJa2R9Fqrsn6S5kpanjz2LWyYZmbW3eRzDmoqsONX65uAeRExGJiXrJu12BuQlMoyqLq62M0xsyJod4gvIp6WVLtD8RhgRPJ8GrAA+HaagVlp2wweUjGzPdLZq/gOioiVAMnjgbvaUNJ4SYskLVq7dm0nd2dW3sqlnwyqrk7tyNms4BdJRMQUYApAQ0NDFHp/ZqWoXPpJ2hc2WPfW2SOo1ZKqAZLHNemFZGZm1vkE9RgwNnk+FpidTjhmZmY5+VxmPgN4FjhKUpOkK4DJwBmSlgNnJOtmZmapyecqvkt38dJpKcdiZmbWInP34vNVQGZmBhm81ZGvAjIzM8hggrLi8Y1nzSxLnKCshW88a2ZZkrlzUGZmZuAEZWZmGeUEZWZmmeQEZWZmmeQEZZnnuaUKy789tKzyVXyWeZ5bqrD820PLKicoKwj/psrM9pQTlBWEf1NlZnvK56DMzCyTfARlmZfmcOFePSpSO5k/cMAA3l25MpW6ykWqQ7s9P+Vh4g4YVF3NilWrUqkrK59tJyjLvLSHC31BQOGk/bfyMHH+yvFiFw/xmZlZJmXuCCrNIQIzMytdmUtQvvrLzMxgDxOUpNHAT4AewC8iYnIqUZkVSLn8PivNE+LWMWleaLNXzwq2bd2SSl3lqNMJSlIP4B7gDKAJeEHSYxHxelrBmaWtXI7Qy/GEeKnY5ot2usyeXCTxWeCtiHg7Iv4OPASMSScsMzPr7hQRnXujdCEwOiL+Z7J+OXBSRFy9w3bjgfHJ6lHAXzof7nb6A+tSqisL3J7s6mxb1kVEXodZ7id5c3uyLdW+sifnoNoahN0p20XEFGDKHuyn7Z1LiyKiIe16i8Xtya6uaIv7SX7cnmxLuz17MsTXBBzaar0GeG/PwjEzM8vZkwT1AjBYUp2kTwGXAI+lE5aZmXV3nR7ii4itkq4GniB3mfmDEbE0tcjal/pwSJG5PdlVym0p5djb4vZkW6rt6fRFEmZmZoXke/GZmVkmOUGZmVkmZT5BSTpU0nxJyyQtlXRdUt5P0lxJy5PHvsWONR+SKiX9WdLLSXt+kJSXZHuaSeoh6SVJc5L1km2PpEZJr0paImlRUpb59rivZJ/7ScdkPkEBW4HrI+IY4GRggqQhwE3AvIgYDMxL1kvBZmBkRHwGqAdGSzqZ0m1Ps+uAZa3WS709p0ZEfavfdJRCe9xXss/9pCMioqQWYDa5+//9BahOyqqBvxQ7tk60pRewGDiplNtD7jdw84CRwJykrJTb0wj036Gs5NrjvpKtxf2k40spHEG1kFQLHA88DxwUESsBkscDixhahySH+UuANcDciCjp9gB3ATcC21qVlXJ7AnhS0ovJLYigxNrjvpJJd+F+0iGZmw9qVyT1Bn4LTIyID9O63X0xRMQnQL2k/YFHJR1X5JA6TdLZwJqIeFHSiCKHk5bhEfGepAOBuZLeKHZAHeG+kj3uJ51TEkdQkirIdbjfRMQjSfFqSdXJ69XkvmGVlIj4K7AAGE3ptmc4cK6kRnJ3tB8p6deUbnuIiPeSxzXAo+Tu3F8S7XFfySz3k07IfIJS7uvfA8CyiLij1UuPAWOT52PJjbdnnqSq5NsgkvYBTgfeoETbExE3R0RNRNSSu93VHyLiq5RoeyTtK6lP83NgFPAaJdAe95Xscj/ppGKfaMvjRNwp5MY6XwGWJMuZwAHkTjguTx77FTvWPNszDHgpac9rwPeT8pJszw5tG8E/Tv6WZHuAw4CXk2Up8J1SaY/7Smks7if5L77VkZmZZVLmh/jMzKx7coIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIyM7NMcoIqA5JmJTdsXNp800ZJV0h6U9ICST+XdHdSXiXpt5JeSJbhxY3erOu4r5QW/1C3DEjqFxHvJ7eDeQH4b8AzwAnAR8AfgJcj4mpJ04F7I2KhpIHAE5GbP8is7LmvlJaSuZu57da1ks5Pnh8KXA78n4h4H0DSvwFHJq+fDgxpdYfr/ST1iYiPujJgsyJxXykhTlAlLrl1/+nA5yLiY0kLyE0atqtvensl227skgDNMsJ9pfT4HFTp+zTwQdLhjiY31Xcv4EuS+krqCVzQavsngaubVyTVd2WwZkXkvlJinKBK3+NAT0mvALcCzwH/BUwiN5vqU8DrwN+S7a8FGiS9Iul14MquD9msKNxXSowvkihTknpHxIbkW+GjwIMR8Wix4zLLGveV7PIRVPn635KWkJtH5x1gVlGjMcsu95WM8hGUmZllko+gzMwsk5ygzMwsk5ygzMwsk5ygzMwsk5ygzMwsk/4/w0FgvqziN4oAAAAASUVORK5CYII=\n",
"text/plain": "<Figure size 432x216 with 2 Axes>"
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": "bins=np.linspace(df.age.min(), df.age.max(), 10)\ng = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'age', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Pre-processing....\nDay of the week people get loan:"
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAZtklEQVR4nO3de3hU9b3v8fdHSI0I1htqJIVExQsIO2p6rFVbxMtDvYHbe9GCx25OrTeOpW61tj27nsdS8fHS7a3WqrQVlFpvpacqUtiKFStiFBGLbk0xFRSwrVJBQb/nj1lJAwQySdZkFjOf1/PMMzNr1vqt7wr58p3fbya/nyICMzOzrNmq2AGYmZm1xQXKzMwyyQXKzMwyyQXKzMwyyQXKzMwyyQXKzMwyyQUqZZJ2lTRF0huSnpf0jKSTUmp7mKTpabTVHSTNllRf7Dis+EopLyT1lfSspBckHV7A86wqVNtbCheoFEkS8BDwZETsEREHAWcA1UWKp2cxzmvWWgnmxZHAqxFxQEQ8lUZM1jYXqHQNBz6OiNuaN0TEnyPiPwEk9ZA0SdJzkl6S9L+S7cOS3sb9kl6VdE+S1EgakWybA/xrc7uStpV0Z9LWC5JGJtvHSvqVpN8Aj3flYiTdLelWSbOSd75fTs65SNLdrfa7VdI8SQsl/ccm2jomedc8P4mvd1disy1KyeSFpDrgGuBYSQ2SttnU77akRklXJ6/Nk3SgpMck/bekbyT79JY0Mzl2QXO8bZz3261+Pm3mWEmKCN9SugEXAddv5vVxwJXJ462BeUAtMAz4O7l3lFsBzwCHAZXAW8BAQMA0YHpy/NXAWcnj7YHFwLbAWKAJ2HETMTwFNLRxO6qNfe8G7k3OPRJ4HxiSxPg8UJfst2Ny3wOYDQxNns8G6oGdgSeBbZPt/w58r9j/Xr51z60E82IscFPyeJO/20AjcF7y+HrgJaAP0Bd4N9neE9iuVVuvA0qer0rujwFuT651K2A68KVi/7t2x81DQAUk6WZyCfVxRHye3C/aUEmnJLt8llySfQz8MSKakuMagBpgFfBmRLyWbP8luWQmaetESROS55VA/+TxjIh4r62YIqKjY+a/iYiQtAB4JyIWJLEsTGJsAE6TNI5cslUBg8glY7MvJNueTt4Af4bcfzZWhkokL5q197v9SHK/AOgdER8AH0haI2l74B/A1ZK+BHwK9AN2BZa1auOY5PZC8rw3uZ/Pk52MeYvhApWuhcDJzU8i4nxJO5N7Rwi5d0AXRsRjrQ+SNAz4qNWmT/jnv82mJksUcHJE/GmDtg4m90vf9kHSU+TexW1oQkQ80cb25rg+3SDGT4GekmqBCcDnI+KvydBfZRuxzoiIMzcVl5W0UsyL1ufb3O/2ZvMHGE2uR3VQRKyV1Ejb+fPDiPjJZuIoSf4MKl2/ByolnddqW69Wjx8DzpNUASBpb0nbbqa9V4FaSXsmz1snwWPAha3G5A/IJ8CIODwi6tq4bS4JN2c7con/d0m7Al9pY5+5wKGS9kpi7SVp706ez7Y8pZwXXf3d/iy54b61ko4ABrSxz2PA/2z12VY/Sbt04BxbLBeoFEVuwHgU8GVJb0r6IzCZ3Lg0wB3AK8B8SS8DP2EzvdiIWENu6OK3yYfBf2718lVABfBS0tZVKV9OXiLiRXJDDwuBO4Gn29hnOblx+6mSXiKX1Pt2Y5hWRKWcFyn8bt8D1EuaR6439Wob53gcmAI8kwy130/bvb2S0/xhnJmZWaa4B2VmZpnkAmVmZpnkAmVmZpnkAmVmZpnUrQVqxIgRQe7vF3zzrRxuneI88a0Mb23q1gK1YsWK7jyd2RbJeWKW4yE+MzPLJBcoMzPLJBcoMzPLJE8Wa2Ylb+3atTQ1NbFmzZpih1LWKisrqa6upqKiIq/9XaDMrOQ1NTXRp08fampqSOaRtW4WEaxcuZKmpiZqa2vzOsZDfGZW8tasWcNOO+3k4lREkthpp5061It1gbKyMqCqCkmp3AZUVRX7cqwDXJyKr6P/Bh7is7KyZNkymnavTqWt6rebUmnHzNrmHpSZlZ00e9L59qZ79OhBXV0d+++/P6eeeioffvghAOvWrWPnnXfm8ssvX2//YcOGMW9ebtHhmpoahgwZwpAhQxg0aBBXXnklH330zwV6Fy5cyPDhw9l7770ZOHAgV111Fc1LKd1999307duXuro66urq+NrXvgbA2LFjqa2tbdn+4x//OJWfbZry6kFJ+t/A18lNSbEAOIfcipj3ATVAI3BaRPy1IFGamaUozZ405Neb3mabbWhoaABg9OjR3HbbbVxyySU8/vjj7LPPPkybNo2rr756k8Ngs2bNYuedd2bVqlWMGzeOcePGMXnyZFavXs2JJ57IrbfeyjHHHMOHH37IySefzC233ML5558PwOmnn85NN920UZuTJk3ilFNO6fyFF1i7PShJ/YCLgPqI2B/oAZwBXAbMjIiBwMzkuZmZtePwww/n9ddfB2Dq1KlcfPHF9O/fn7lz57Z7bO/evbntttt46KGHeO+995gyZQqHHnooxxxzDAC9evXipptuYuLEiQW9hu6Q7xBfT2AbST3J9ZzeBkaSW7aZ5H5U6tGZmZWYdevW8bvf/Y4hQ4awevVqZs6cyfHHH8+ZZ57J1KlT82pju+22o7a2ltdee42FCxdy0EEHrff6nnvuyapVq3j//fcBuO+++1qG8u66666W/b797W+3bF+wYEF6F5mSdgtURPwFuBZYAiwF/h4RjwO7RsTSZJ+lwC5tHS9pnKR5kuYtX748vcjNSojzpPStXr2auro66uvr6d+/P+eeey7Tp0/niCOOoFevXpx88sk8+OCDfPLJJ3m11/wZU0Rscliwefvpp59OQ0MDDQ0NnHPOOS2vT5o0qWX7kCFDuniF6Wv3MyhJO5DrLdUCfwN+JemsfE8QEbcDtwPU19dvclp1s3LmPCl9rT+DajZ16lSefvppampqAFi5ciWzZs3iqKOO2mxbH3zwAY2Njey9994MHjyYJ598cr3X33jjDXr37k2fPn3SvIRul88Q31HAmxGxPCLWAg8AXwTekVQFkNy/W7gwzcxKy/vvv8+cOXNYsmQJjY2NNDY2cvPNN7c7zLdq1Sq++c1vMmrUKHbYYQdGjx7NnDlzeOKJJ4BcT+2iiy7i0ksv7Y7LKKh8vsW3BPiCpF7AauBIYB7wD2AMMDG5f7hQQZqZpan/brul+nds/XfbrcPHPPDAAwwfPpytt966ZdvIkSO59NJL1/sKebMjjjiCiODTTz/lpJNO4rvf/S6Q65k9/PDDXHjhhZx//vl88sknnH322VxwwQWdv6CMUPM45mZ3kv4DOB1YB7xA7ivnvYFpQH9yRezUiHhvc+3U19dH8/f6zYpBUqp/qNtO/nRq6gLnSfoWLVrEfvvtV+wwjE3+W7SZK3n9HVREfB/4/gabPyLXmzIzM0udZ5IwM7NMcoEyM7NMcoEyM7NMcoEyM7NMcoEyM7NMcoEys7Kze3X/VJfb2L26f7vnXLZsGWeccQZ77rkngwYN4thjj2Xx4sXtLpXR1t8z1dTUsGLFivW2bbisRl1dHa+88goAixcv5thjj2WvvfZiv/3247TTTltvfr7evXuzzz77tCzHMXv2bI4//viWth966CGGDh3Kvvvuy5AhQ3jooYdaXhs7diz9+vVr+dutFStWtMyM0VVesNDMys7Sv7zFwd97NLX2nv3BiM2+HhGcdNJJjBkzhnvvvReAhoYG3nnnHcaOHbvZpTI6oq1lNdasWcNxxx3HddddxwknnADklu7o27dvy9RLw4YN49prr6W+vh6A2bNntxz/4osvMmHCBGbMmEFtbS1vvvkmRx99NHvssQdDhw4Fcmtd3XnnnZx33nkdjnlz3IMyMyuwWbNmUVFRwTe+8Y2WbXV1dSxevLjgS2VMmTKFQw45pKU4QW5Wiv333z+v46+99lquuOIKamtrAaitreXyyy9n0qRJLfuMHz+e66+/nnXr1qUWN7hAmZkV3Msvv7zRkhhAXktldETrYbu6ujpWr169yXPnq60Y6+vrWbhwYcvz/v37c9hhh/GLX/yi0+dpi4f4zMyKJJ+lMjpiUyvndkVbMba17YorruDEE0/kuOOOS+3c7kGZmRXY4MGDef7559vcvuG8i2kvlbGpc3fk+A1jnD9/PoMGDVpv21577UVdXR3Tpk3r9Lk25AJlZlZgw4cP56OPPuKnP/1py7bnnnuOgQMHFnypjK9+9av84Q9/4Le//W3LtkcffTTvFXQnTJjAD3/4QxobGwFobGzk6quv5lvf+tZG+37nO9/h2muvTSVu8BCfmZWhqn6fa/ebdx1tb3Mk8eCDDzJ+/HgmTpxIZWUlNTU13HDDDe0ulXH33Xev97XuuXPnAjB06FC22irXxzjttNMYOnQo9913H3PmzGnZ95ZbbuGLX/wi06dPZ/z48YwfP56KigqGDh3KjTfemNe11dXV8aMf/YgTTjiBtWvXUlFRwTXXXENdXd1G+w4ePJgDDzyQ+fPn59V2e/JabiMtXkbAis3LbZQnL7eRHR1ZbsNDfGZmlkmZK1ADqqpS++vuAVVVxb4cMzPrpMx9BrVk2bJUh2DMzGDzX+m27tHRj5Qy14MyM0tbZWUlK1eu7PB/kJaeiGDlypVUVlbmfUzmelBmZmmrrq6mqamJ5cuXFzuUslZZWUl1df4jZC5QZlbyKioqWuaSsy2Hh/jMzCyTXKDMzCyTXKDMzCyTXKDMzCyTXKDMzCyT8ipQkraXdL+kVyUtknSIpB0lzZD0WnK/Q6GDNTOz8pFvD+pG4NGI2Bf4F2ARcBkwMyIGAjOT52ZmZqlot0BJ2g74EvAzgIj4OCL+BowEJie7TQZGFSZEMzMrR/n0oPYAlgN3SXpB0h2StgV2jYilAMn9Lm0dLGmcpHmS5vmvuM3a5jwx21g+BaoncCBwa0QcAPyDDgznRcTtEVEfEfV9+/btZJhmpc15YraxfApUE9AUEc8mz+8nV7DekVQFkNy/W5gQzcysHLVboCJiGfCWpH2STUcCrwCPAGOSbWOAhwsSoZmZlaV8J4u9ELhH0meAN4BzyBW3aZLOBZYApxYmRLP0qEdFauuEqUdFKu2YWdvyKlAR0QDUt/HSkalGY1Zg8claDv7eo6m09ewPRqTSjpm1zTNJmJlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJuVdoCT1kPSCpOnJ8x0lzZD0WnK/Q+HCNDOzctORHtTFwKJWzy8DZkbEQGBm8tzMzCwVeRUoSdXAccAdrTaPBCYnjycDo1KNzMzMylq+PagbgEuBT1tt2zUilgIk97u0daCkcZLmSZq3fPnyrsRqVrKcJ2Yba7dASToeeDcinu/MCSLi9oioj4j6vn37dqYJs5LnPDHbWM889jkUOFHSsUAlsJ2kXwLvSKqKiKWSqoB3CxmomZmVl3Z7UBFxeURUR0QNcAbw+4g4C3gEGJPsNgZ4uGBRmplZ2enK30FNBI6W9BpwdPLczMwsFfkM8bWIiNnA7OTxSuDI9EMyMzPzTBJmZpZRLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBFMKCqCkmp3AZUVRX7cszMCqJD60FZOpYsW0bT7tWptFX9dlMq7ZiZZY17UGZmlkkuUGZmlkkuUGZmlkkuUGZmlkkuUGZmlkkuUGZmlkkuUGZmlkkuUGZmlkkuUGZmlkntFihJn5M0S9IiSQslXZxs31HSDEmvJfc7FD5cMzMrF/n0oNYB34qI/YAvAOdLGgRcBsyMiIHAzOS5mZlZKtotUBGxNCLmJ48/ABYB/YCRwORkt8nAqALFaGZmZahDn0FJqgEOAJ4Fdo2IpZArYsAumzhmnKR5kuYtX768i+GalSbnidnG8i5QknoDvwbGR8T7+R4XEbdHRH1E1Pft27czMZqVPOeJ2cbyKlCSKsgVp3si4oFk8zuSqpLXq4B3CxOimZmVo3y+xSfgZ8CiiLiu1UuPAGOSx2OAh9MPz8zMylU+CxYeCpwNLJDUkGy7ApgITJN0LrAEOLUgEZqZWVlqt0BFxBxAm3j5yHTDMTOzYhtQVcWSZctSaav/brvx56VLO3Wsl3w3M7P1LFm2jKbdq1Npq/rtpk4f66mOLPMGVFUhKZVbqUjzZzKgqqrYl2PWJvegLPOy8m4uS/wzsXLgHpSZmWVSSfegtobUhnW68kGfdY16VPhdvlkZKukC9RF4GKQExCdrOfh7j6bS1rM/GJFKO2ZWeB7iMzOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTCrpqY7MzKzj0pz/Uj0qOn2sC5SZma0nK/NfeojPrMw1z/rvxQ8ta9yDMitznvXfsso9KDMzyyQXKCuI3av7pzZsZGblyUN8VhBL//JWJj5kNbMtV+YKVFa+3mhmxTWgqooly5al0lb/3Xbjz0uXptKWdZ/MFaisfL1xS9H8Daw0OIktS5YsW+Yvb5S5LhUoSSOAG4EewB0RMTGVqCxv/gaWmZWqTn9JQlIP4GbgK8Ag4ExJg9IKzMwsLVn9W68BVVWpxdWrR8+S+2JSV3pQ/wN4PSLeAJB0LzASeCWNwMzM0pLVkYa0hzGzeI1doYjo3IHSKcCIiPh68vxs4OCIuGCD/cYB45Kn+wB/aqfpnYEVnQpqy+FrLA3tXeOKiMjrg1DnSZt8jaUhn2tsM1e60oNqqx+4UbWLiNuB2/NuVJoXEfVdiCvzfI2lIc1rdJ5szNdYGrpyjV35Q90m4HOtnlcDb3ehPTMzsxZdKVDPAQMl1Ur6DHAG8Eg6YZmZWbnr9BBfRKyTdAHwGLmvmd8ZEQtTiCnvYY4tmK+xNBTzGv3zLQ2+xs3o9JckzMzMCsmTxZqZWSa5QJmZWSZlpkBJGiHpT5Jel3RZseNJm6TPSZolaZGkhZIuLnZMhSKph6QXJE0vdiyFIGl7SfdLejX59zykG89d0nkC5ZMrpZ4n0PVcycRnUMm0SYuBo8l9ff054MyIKJlZKSRVAVURMV9SH+B5YFQpXWMzSZcA9cB2EXF8seNJm6TJwFMRcUfyDdZeEfG3bjhvyecJlE+ulHqeQNdzJSs9qJZpkyLiY6B52qSSERFLI2J+8vgDYBHQr7hRpU9SNXAccEexYykESdsBXwJ+BhARH3dHcUqUfJ5AeeRKqecJpJMrWSlQ/YC3Wj1vosR+IVuTVAMcADxb5FAK4QbgUuDTIsdRKHsAy4G7kuGZOyRt203nLqs8gZLOlRso7TyBFHIlKwUqr2mTSoGk3sCvgfER8X6x40mTpOOBdyPi+WLHUkA9gQOBWyPiAOAfQHd9FlQ2eQKlmytlkieQQq5kpUCVxbRJkirIJdw9EfFAseMpgEOBEyU1kht+Gi7pl8UNKXVNQFNENL+jv59cEnbXuUs+T6Dkc6Uc8gRSyJWsFKiSnzZJuUVWfgYsiojrih1PIUTE5RFRHRE15P4Nfx8RZxU5rFRFxDLgLUn7JJuOpPuWmCn5PIHSz5VyyBNIJ1cyseR7AadNypJDgbOBBZIakm1XRMT/K15I1kkXAvckReIN4JzuOGmZ5Ak4V0pJl3IlE18zNzMz21BWhvjMzMzW4wJlZmaZ5AJlZmaZ5AJlZmaZ5AJlZmaZ5AKVIZL+j6QJKba3r6SGZJqRPdNqt1X7jZJ2Trtds81xnpQPF6jSNgp4OCIOiIj/LnYwZhk1CudJJrlAFZmk7yTr+zwB7JNs+zdJz0l6UdKvJfWS1EfSm8kUMEjaLnlnViGpTtJcSS9JelDSDpKOBcYDX0/W1rlF0onJsQ9KujN5fK6k/5s8PkvSH5N3kz9JlndA0jGSnpE0X9KvkjnSWl/DNpIelfRv3fVzs/LiPClPLlBFJOkgclOdHAD8K/D55KUHIuLzEfEv5JYaODdZdmA2uSn6SY77dUSsBX4O/HtEDAUWAN9P/ur+NuD6iDgCeBI4PDm2HzAoeXwY8JSk/YDTgUMjog74BBidDE1cCRwVEQcC84BLWl1Gb+A3wJSI+Gk6Pxmzf3KelC8XqOI6HHgwIj5MZmtunldtf0lPSVoAjAYGJ9vv4J9ThZxDbhr7zwLbR8R/Jdsnk1uDZUNPAYdLGkRuPqx3lFsY7hDgD+TmyToIeC6ZXuZIctPlf4Fckj6dbB8DDGjV7sPAXRHx887/GMw2y3lSpjIxF1+Za2uuqbvJrSD6oqSxwDCAiHhaUo2kLwM9IuLlJPHaP0nEXyTtAIwg9y5xR+A0YFVEfCBJwOSIuLz1cZJOAGZExJmbaPpp4CuSpoTnzbLCcZ6UIfegiutJ4KRkbLoPcEKyvQ+wNBlHH73BMT8HpgJ3AUTE34G/Smoeljgb+C/a9gy58fYnyb1TnJDcA8wETpG0C4CkHSUNAOYCh0raK9neS9Lerdr8HrASuKWD126WL+dJmXKBKqJkWev7gAZya980J8F3ya0gOgN4dYPD7gF2IJd8zcYAkyS9BNQBP9jEKZ8CekbE68B8cu8On0pieYXcGPrjSTszgKqIWA6MBaYm2+cC+27Q7nigUtI1+V25Wf6cJ+XLs5lvYSSdAoyMiLOLHYtZVjlPSoM/g9qCSPpP4CvAscWOxSyrnCelwz0oMzPLJH8GZWZmmeQCZWZmmeQCZWZmmeQCZWZmmeQCZWZmmfT/AcKH/fljK6RSAAAAAElFTkSuQmCC\n",
"text/plain": "<Figure size 432x216 with 2 Axes>"
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": "df['dayofweek'] = df['effective_date'].dt.dayofweek\nbins=np.linspace(df.dayofweek.min(), df.dayofweek.max(), 10)\ng = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'dayofweek', bins=bins, ec=\"k\")\ng.axes[-1].legend()\nplt.show()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We see that people who get the loan at the end of the week dont pay it off, \nso lets use Feature binarization to set a threshold values less then day 4 "
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n <th>dayofweek</th>\n <th>weekend</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>male</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>female</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>2016-09-08</td>\n <td>2016-09-22</td>\n <td>27</td>\n <td>college</td>\n <td>male</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>28</td>\n <td>college</td>\n <td>female</td>\n <td>4</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>29</td>\n <td>college</td>\n <td>male</td>\n <td>4</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 2016-09-08 \n1 2 2 PAIDOFF 1000 30 2016-09-08 \n2 3 3 PAIDOFF 1000 15 2016-09-08 \n3 4 4 PAIDOFF 1000 30 2016-09-09 \n4 6 6 PAIDOFF 1000 30 2016-09-09 \n\n due_date age education Gender dayofweek weekend \n0 2016-10-07 45 High School or Below male 3 0 \n1 2016-10-07 33 Bechalor female 3 0 \n2 2016-09-22 27 college male 3 0 \n3 2016-10-08 28 college female 4 1 \n4 2016-10-08 29 college male 4 1 "
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df['weekend']= df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\ndf.head()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Convert Categorical features to numerical values\nLooking at gender:"
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "Gender loan_status\nfemale PAIDOFF 0.865385\n COLLECTION 0.134615\nmale PAIDOFF 0.731293\n COLLECTION 0.268707\nName: loan_status, dtype: float64"
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## 86 % of female pay their loans while only 73 % of males pay theirs\nLet's convert male = 0 and female = 1"
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n <th>dayofweek</th>\n <th>weekend</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>45</td>\n <td>High School or Below</td>\n <td>0</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>2</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-08</td>\n <td>2016-10-07</td>\n <td>33</td>\n <td>Bechalor</td>\n <td>1</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>15</td>\n <td>2016-09-08</td>\n <td>2016-09-22</td>\n <td>27</td>\n <td>college</td>\n <td>0</td>\n <td>3</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>4</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>28</td>\n <td>college</td>\n <td>1</td>\n <td>4</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>6</td>\n <td>6</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>2016-09-09</td>\n <td>2016-10-08</td>\n <td>29</td>\n <td>college</td>\n <td>0</td>\n <td>4</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 0 0 PAIDOFF 1000 30 2016-09-08 \n1 2 2 PAIDOFF 1000 30 2016-09-08 \n2 3 3 PAIDOFF 1000 15 2016-09-08 \n3 4 4 PAIDOFF 1000 30 2016-09-09 \n4 6 6 PAIDOFF 1000 30 2016-09-09 \n\n due_date age education Gender dayofweek weekend \n0 2016-10-07 45 High School or Below 0 3 0 \n1 2016-10-07 33 Bechalor 1 3 0 \n2 2016-09-22 27 college 0 3 0 \n3 2016-10-08 28 college 1 4 1 \n4 2016-10-08 29 college 0 4 1 "
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\ndf.head()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### For Education:"
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "education loan_status\nBechalor PAIDOFF 0.750000\n COLLECTION 0.250000\nHigh School or Below PAIDOFF 0.741722\n COLLECTION 0.258278\nMaster or Above COLLECTION 0.500000\n PAIDOFF 0.500000\ncollege PAIDOFF 0.765101\n COLLECTION 0.234899\nName: loan_status, dtype: float64"
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df.groupby(['education'])['loan_status'].value_counts(normalize=True)"
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>education</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>High School or Below</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>Bechalor</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>college</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>college</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>college</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Principal terms age Gender education\n0 1000 30 45 0 High School or Below\n1 1000 30 33 1 Bechalor\n2 1000 15 27 0 college\n3 1000 30 28 1 college\n4 1000 30 29 0 college"
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "df[['Principal','terms','age','Gender','education']].head()"
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>weekend</th>\n <th>Bechalor</th>\n <th>High School or Below</th>\n <th>college</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Principal terms age Gender weekend Bechalor High School or Below \\\n0 1000 30 45 0 0 0 1 \n1 1000 30 33 1 0 1 0 \n2 1000 15 27 0 0 0 0 \n3 1000 30 28 1 1 0 0 \n4 1000 30 29 0 1 0 0 \n\n college \n0 0 \n1 0 \n2 1 \n3 1 \n4 1 "
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "Feature = df[['Principal','terms','age','Gender','weekend']]\nFeature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)\nFeature.drop(['Master or Above'], axis = 1,inplace=True)\nFeature.head()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### Feature selection:\nLet's define X"
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Principal</th>\n <th>terms</th>\n <th>age</th>\n <th>Gender</th>\n <th>weekend</th>\n <th>Bechalor</th>\n <th>High School or Below</th>\n <th>college</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1000</td>\n <td>30</td>\n <td>45</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1000</td>\n <td>30</td>\n <td>33</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1000</td>\n <td>15</td>\n <td>27</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1000</td>\n <td>30</td>\n <td>28</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1000</td>\n <td>30</td>\n <td>29</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Principal terms age Gender weekend Bechalor High School or Below \\\n0 1000 30 45 0 0 0 1 \n1 1000 30 33 1 0 1 0 \n2 1000 15 27 0 0 0 0 \n3 1000 30 28 1 1 0 0 \n4 1000 30 29 0 1 0 0 \n\n college \n0 0 \n1 0 \n2 1 \n3 1 \n4 1 "
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "X = Feature\nX[0:5]"
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)"
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "y = df['loan_status'].values\ny[0:5]"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "### Normalization:"
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array([[ 0.51578458, 0.92071769, 2.33152555, -0.42056004, -1.20577805,\n -0.38170062, 1.13639374, -0.86968108],\n [ 0.51578458, 0.92071769, 0.34170148, 2.37778177, -1.20577805,\n 2.61985426, -0.87997669, -0.86968108],\n [ 0.51578458, -0.95911111, -0.65321055, -0.42056004, -1.20577805,\n -0.38170062, -0.87997669, 1.14984679],\n [ 0.51578458, 0.92071769, -0.48739188, 2.37778177, 0.82934003,\n -0.38170062, -0.87997669, 1.14984679],\n [ 0.51578458, 0.92071769, -0.3215732 , -0.42056004, 0.82934003,\n -0.38170062, -0.87997669, 1.14984679]])"
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "X = preprocessing.StandardScaler().fit(X).transform(X)\nX[0:5]"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Now the Classification Begins"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## K Nearest Neighbor Algorithm:"
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "Train set: (276, 8) (276,)\nTest set: (70, 8) (70,)\n"
}
],
"source": "# We split the X into train and test to find the best k\nfrom sklearn.model_selection import train_test_split\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)\nprint ('Train set:', X_train.shape, y_train.shape)\nprint ('Test set:', X_test.shape, y_test.shape)"
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "KNeighborsClassifier(n_neighbors=3)"
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "# Modeling\nfrom sklearn.neighbors import KNeighborsClassifier\nk = 3\n#Train Model and Predict \nkNN_model = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)\nkNN_model"
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)"
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "# for sanity check...\nyhat = kNN_model.predict(X_test)\nyhat[0:5]"
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array([0.67142857, 0.65714286, 0.71428571, 0.68571429, 0.75714286,\n 0.71428571, 0.78571429, 0.75714286, 0.75714286, 0.67142857,\n 0.7 , 0.72857143, 0.7 , 0.7 ])"
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "# Best k\nKs=15\nmean_acc=np.zeros((Ks-1))\nstd_acc=np.zeros((Ks-1))\nConfustionMx=[];\nfor n in range(1,Ks):\n \n #Train Model and Predict \n kNN_model = KNeighborsClassifier(n_neighbors=n).fit(X_train,y_train)\n yhat = kNN_model.predict(X_test)\n \n \n mean_acc[n-1]=np.mean(yhat==y_test);\n \n std_acc[n-1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])\nmean_acc"
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "KNeighborsClassifier(n_neighbors=7)"
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "# Building the model again, using k=7\nfrom sklearn.neighbors import KNeighborsClassifier\nk = 7\n#Train Model and Predict \nkNN_model = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)\nkNN_model"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Decision Tree Algorithm:"
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "DecisionTreeClassifier(criterion='entropy', max_depth=4)"
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "from sklearn.tree import DecisionTreeClassifier\nDT_model = DecisionTreeClassifier(criterion=\"entropy\", max_depth = 4)\nDT_model.fit(X_train,y_train)\nDT_model"
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array(['COLLECTION', 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'COLLECTION',\n 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'COLLECTION', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'COLLECTION', 'COLLECTION', 'PAIDOFF',\n 'COLLECTION', 'COLLECTION', 'PAIDOFF', 'COLLECTION', 'PAIDOFF',\n 'COLLECTION', 'COLLECTION', 'COLLECTION', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'COLLECTION', 'PAIDOFF',\n 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'COLLECTION', 'PAIDOFF',\n 'COLLECTION', 'COLLECTION', 'COLLECTION', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'COLLECTION',\n 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF'], dtype=object)"
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "yhat = DT_model.predict(X_test)\nyhat"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Support Vector Machine Algorithm:"
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "SVC()"
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "from sklearn import svm\nSVM_model = svm.SVC()\nSVM_model.fit(X_train, y_train)"
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array(['COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'COLLECTION', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'COLLECTION', 'PAIDOFF', 'COLLECTION',\n 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'COLLECTION', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'COLLECTION',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)"
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "yhat = SVM_model.predict(X_test)\nyhat"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Logistic Regression Algorithm:"
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "LogisticRegression(C=0.01)"
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "from sklearn.linear_model import LogisticRegression\nLR_model = LogisticRegression(C=0.01).fit(X_train,y_train)\nLR_model"
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',\n 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'], dtype=object)"
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "yhat = LR_model.predict(X_test)\nyhat"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Model Evaluation using test set"
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": "from sklearn.metrics import jaccard_score\nfrom sklearn.metrics import f1_score\nfrom sklearn.metrics import log_loss"
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "--2022-02-17 03:59:28-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv\nResolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196\nConnecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 3642 (3.6K) [text/csv]\nSaving to: \u2018loan_test.csv\u2019\n\nloan_test.csv 100%[===================>] 3.56K --.-KB/s in 0s \n\n2022-02-17 03:59:28 (74.7 MB/s) - \u2018loan_test.csv\u2019 saved [3642/3642]\n\n"
}
],
"source": "!wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv"
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>Unnamed: 0.1</th>\n <th>loan_status</th>\n <th>Principal</th>\n <th>terms</th>\n <th>effective_date</th>\n <th>due_date</th>\n <th>age</th>\n <th>education</th>\n <th>Gender</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>1</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/8/2016</td>\n <td>10/7/2016</td>\n <td>50</td>\n <td>Bechalor</td>\n <td>female</td>\n </tr>\n <tr>\n <th>1</th>\n <td>5</td>\n <td>5</td>\n <td>PAIDOFF</td>\n <td>300</td>\n <td>7</td>\n <td>9/9/2016</td>\n <td>9/15/2016</td>\n <td>35</td>\n <td>Master or Above</td>\n <td>male</td>\n </tr>\n <tr>\n <th>2</th>\n <td>21</td>\n <td>21</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/10/2016</td>\n <td>10/9/2016</td>\n <td>43</td>\n <td>High School or Below</td>\n <td>female</td>\n </tr>\n <tr>\n <th>3</th>\n <td>24</td>\n <td>24</td>\n <td>PAIDOFF</td>\n <td>1000</td>\n <td>30</td>\n <td>9/10/2016</td>\n <td>10/9/2016</td>\n <td>26</td>\n <td>college</td>\n <td>male</td>\n </tr>\n <tr>\n <th>4</th>\n <td>35</td>\n <td>35</td>\n <td>PAIDOFF</td>\n <td>800</td>\n <td>15</td>\n <td>9/11/2016</td>\n <td>9/25/2016</td>\n <td>29</td>\n <td>Bechalor</td>\n <td>male</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Unnamed: 0 Unnamed: 0.1 loan_status Principal terms effective_date \\\n0 1 1 PAIDOFF 1000 30 9/8/2016 \n1 5 5 PAIDOFF 300 7 9/9/2016 \n2 21 21 PAIDOFF 1000 30 9/10/2016 \n3 24 24 PAIDOFF 1000 30 9/10/2016 \n4 35 35 PAIDOFF 800 15 9/11/2016 \n\n due_date age education Gender \n0 10/7/2016 50 Bechalor female \n1 9/15/2016 35 Master or Above male \n2 10/9/2016 43 High School or Below female \n3 10/9/2016 26 college male \n4 9/25/2016 29 Bechalor male "
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "test_df = pd.read_csv('loan_test.csv')\ntest_df.head()"
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array([[ 0.49362588, 0.92844966, 3.05981865, 1.97714211, -1.30384048,\n 2.39791576, -0.79772404, -0.86135677],\n [-3.56269116, -1.70427745, 0.53336288, -0.50578054, 0.76696499,\n -0.41702883, -0.79772404, -0.86135677],\n [ 0.49362588, 0.92844966, 1.88080596, 1.97714211, 0.76696499,\n -0.41702883, 1.25356634, -0.86135677],\n [ 0.49362588, 0.92844966, -0.98251057, -0.50578054, 0.76696499,\n -0.41702883, -0.79772404, 1.16095912],\n [-0.66532184, -0.78854628, -0.47721942, -0.50578054, 0.76696499,\n 2.39791576, -0.79772404, -0.86135677]])"
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "## Preprocessing\ntest_df['due_date'] = pd.to_datetime(test_df['due_date'])\ntest_df['effective_date'] = pd.to_datetime(test_df['effective_date'])\ntest_df['dayofweek'] = test_df['effective_date'].dt.dayofweek\ntest_df['weekend'] = test_df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)\ntest_df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\ntest_Feature = test_df[['Principal','terms','age','Gender','weekend']]\ntest_Feature = pd.concat([test_Feature,pd.get_dummies(test_df['education'])], axis=1)\ntest_Feature.drop(['Master or Above'], axis = 1,inplace=True)\ntest_X = preprocessing.StandardScaler().fit(test_Feature).transform(test_Feature)\ntest_X[0:5]"
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'],\n dtype=object)"
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": "test_y = test_df['loan_status'].values\ntest_y[0:5]"
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "KNN Jaccard index: 0.65\nKNN F1-score: 0.63\n"
}
],
"source": "knn_yhat = kNN_model.predict(test_X)\nprint(\"KNN Jaccard index: %.2f\" % jaccard_score(test_y, knn_yhat, pos_label = \"PAIDOFF\"))\nprint(\"KNN F1-score: %.2f\" % f1_score(test_y, knn_yhat, average='weighted') )"
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "DT Jaccard index: 0.66\nDT F1-score: 0.74\n"
}
],
"source": "DT_yhat = DT_model.predict(test_X)\nprint(\"DT Jaccard index: %.2f\" % jaccard_score(test_y, DT_yhat, pos_label = \"PAIDOFF\"))\nprint(\"DT F1-score: %.2f\" % f1_score(test_y, DT_yhat, average='weighted') )"
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "SVM Jaccard index: 0.78\nSVM F1-score: 0.76\n"
}
],
"source": "SVM_yhat = SVM_model.predict(test_X)\nprint(\"SVM Jaccard index: %.2f\" % jaccard_score(test_y, SVM_yhat, pos_label = \"PAIDOFF\"))\nprint(\"SVM F1-score: %.2f\" % f1_score(test_y, SVM_yhat, average='weighted') )"
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "LR Jaccard index: 0.74\nLR F1-score: 0.63\nLR LogLoss: 0.52\n"
}
],
"source": "LR_yhat = LR_model.predict(test_X)\nLR_yhat_prob = LR_model.predict_proba(test_X)\nprint(\"LR Jaccard index: %.2f\" % jaccard_score(test_y, LR_yhat, pos_label = \"PAIDOFF\"))\nprint(\"LR F1-score: %.2f\" % f1_score(test_y, LR_yhat, average='weighted') )\nprint(\"LR LogLoss: %.2f\" % log_loss(test_y, LR_yhat_prob))"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Report Summary:"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "| Algorithm | Jaccard | F1-score | LogLoss |\n|--------------------|---------|----------|---------|\n| KNN | 0.65 | 0.63 | NA |\n| Decision Tree | 0.66 | 0.74 | NA |\n| SVM | 0.78 | 0.76 | NA |\n| LogisticRegression | 0.74 | 0.63 | 0.52 |"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": ""
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment