Created
November 6, 2016 01:22
-
-
Save DeepakRavi/fe09a84c0d4df313c443984512461f5a to your computer and use it in GitHub Desktop.
Identifying Fraudulent Activities
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": false | |
}, | |
"source": [ | |
"# Identifying Fraudulent Activities\n", | |
"\n", | |
"Company XYZ is an e-commerce site that sells hand-made clothes.\n", | |
"The task is to build a model that predicts whether a user has a high probability of using the site to perform some illegal activity or not. The only information that is provided is about the user's first transaction on the site" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"from pandas import DataFrame, Series\n", | |
"import matplotlib\n", | |
"import matplotlib.pyplot as plt \n", | |
"from sklearn.tree import DecisionTreeClassifier \n", | |
"from sklearn.cross_validation import train_test_split \n", | |
"from sklearn.pipeline import Pipeline\n", | |
"from sklearn.grid_search import GridSearchCV\n", | |
"from sklearn.metrics import classification_report, roc_curve, auc\n", | |
"from sklearn.ensemble import RandomForestClassifier\n", | |
"from sklearn.preprocessing import LabelEncoder\n", | |
"from sklearn.ensemble.partial_dependence import plot_partial_dependence\n", | |
"from sklearn.ensemble.partial_dependence import partial_dependence" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"%matplotlib inline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##### This part is data preparation. This takes a couple of minutes to run. This part can be conveniently skipped without losing the flow of the problem" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>user_id</th>\n", | |
" <th>signup_time</th>\n", | |
" <th>purchase_time</th>\n", | |
" <th>purchase_value</th>\n", | |
" <th>device_id</th>\n", | |
" <th>source</th>\n", | |
" <th>browser</th>\n", | |
" <th>sex</th>\n", | |
" <th>age</th>\n", | |
" <th>ip_address</th>\n", | |
" <th>class</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>22058</td>\n", | |
" <td>2015-02-24 22:55:49</td>\n", | |
" <td>2015-04-18 02:47:11</td>\n", | |
" <td>34</td>\n", | |
" <td>QVPSPJUOCKZAR</td>\n", | |
" <td>SEO</td>\n", | |
" <td>Chrome</td>\n", | |
" <td>M</td>\n", | |
" <td>39</td>\n", | |
" <td>7.327584e+08</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>333320</td>\n", | |
" <td>2015-06-07 20:39:50</td>\n", | |
" <td>2015-06-08 01:38:54</td>\n", | |
" <td>16</td>\n", | |
" <td>EOGFQPIZPYXFZ</td>\n", | |
" <td>Ads</td>\n", | |
" <td>Chrome</td>\n", | |
" <td>F</td>\n", | |
" <td>53</td>\n", | |
" <td>3.503114e+08</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1359</td>\n", | |
" <td>2015-01-01 18:52:44</td>\n", | |
" <td>2015-01-01 18:52:45</td>\n", | |
" <td>15</td>\n", | |
" <td>YSSKYOSJHPPLJ</td>\n", | |
" <td>SEO</td>\n", | |
" <td>Opera</td>\n", | |
" <td>M</td>\n", | |
" <td>53</td>\n", | |
" <td>2.621474e+09</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>150084</td>\n", | |
" <td>2015-04-28 21:13:25</td>\n", | |
" <td>2015-05-04 13:54:50</td>\n", | |
" <td>44</td>\n", | |
" <td>ATGTXKYKUDUQN</td>\n", | |
" <td>SEO</td>\n", | |
" <td>Safari</td>\n", | |
" <td>M</td>\n", | |
" <td>41</td>\n", | |
" <td>3.840542e+09</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>221365</td>\n", | |
" <td>2015-07-21 07:09:52</td>\n", | |
" <td>2015-09-09 18:40:53</td>\n", | |
" <td>39</td>\n", | |
" <td>NAUITBZFJKHWW</td>\n", | |
" <td>Ads</td>\n", | |
" <td>Safari</td>\n", | |
" <td>M</td>\n", | |
" <td>45</td>\n", | |
" <td>4.155831e+08</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" user_id signup_time purchase_time purchase_value \\\n", | |
"0 22058 2015-02-24 22:55:49 2015-04-18 02:47:11 34 \n", | |
"1 333320 2015-06-07 20:39:50 2015-06-08 01:38:54 16 \n", | |
"2 1359 2015-01-01 18:52:44 2015-01-01 18:52:45 15 \n", | |
"3 150084 2015-04-28 21:13:25 2015-05-04 13:54:50 44 \n", | |
"4 221365 2015-07-21 07:09:52 2015-09-09 18:40:53 39 \n", | |
"\n", | |
" device_id source browser sex age ip_address class \n", | |
"0 QVPSPJUOCKZAR SEO Chrome M 39 7.327584e+08 0 \n", | |
"1 EOGFQPIZPYXFZ Ads Chrome F 53 3.503114e+08 0 \n", | |
"2 YSSKYOSJHPPLJ SEO Opera M 53 2.621474e+09 1 \n", | |
"3 ATGTXKYKUDUQN SEO Safari M 41 3.840542e+09 0 \n", | |
"4 NAUITBZFJKHWW Ads Safari M 45 4.155831e+08 0 " | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#Reading in the data\n", | |
"\n", | |
"fraud_data = pd.read_csv('fraud_data.csv')\n", | |
"ip_address = pd.read_csv('IpAddress_to_Country.csv')\n", | |
"fraud_data.head(5)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lower_bound_ip_address</th>\n", | |
" <th>upper_bound_ip_address</th>\n", | |
" <th>country</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>16777216.0</td>\n", | |
" <td>16777471</td>\n", | |
" <td>Australia</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>16777472.0</td>\n", | |
" <td>16777727</td>\n", | |
" <td>China</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>16777728.0</td>\n", | |
" <td>16778239</td>\n", | |
" <td>China</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>16778240.0</td>\n", | |
" <td>16779263</td>\n", | |
" <td>Australia</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>16779264.0</td>\n", | |
" <td>16781311</td>\n", | |
" <td>China</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" lower_bound_ip_address upper_bound_ip_address country\n", | |
"0 16777216.0 16777471 Australia\n", | |
"1 16777472.0 16777727 China\n", | |
"2 16777728.0 16778239 China\n", | |
"3 16778240.0 16779263 Australia\n", | |
"4 16779264.0 16781311 China" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ip_address.head(5)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"False" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#Comparing both the tables\n", | |
"\n", | |
"len(fraud_data) == len(ip_address)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(151112, 11)" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"fraud_data.shape" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(138846, 3)" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ip_address.shape" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"country = len(fraud_data) * [0]\n", | |
"\n", | |
"for ind, row in fraud_data.iterrows():\n", | |
" temp = ip_address[(ip_address['lower_bound_ip_address'] < row['ip_address']) & \n", | |
" (ip_address['upper_bound_ip_address'] > row['ip_address'])]['country']\n", | |
" \n", | |
" if len(temp) == 1:\n", | |
" country[ind] = temp.values[0]\n", | |
"\n", | |
"fraud_data['country'] = country" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"fraud_data.to_csv('full_data.csv')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##### Beginning of the problem" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"data = pd.read_csv('full_data.csv')\n", | |
"data = data.drop('Unnamed: 0', axis = 1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"user_id int64\n", | |
"signup_time object\n", | |
"purchase_time object\n", | |
"purchase_value int64\n", | |
"device_id object\n", | |
"source object\n", | |
"browser object\n", | |
"sex object\n", | |
"age int64\n", | |
"ip_address float64\n", | |
"class int64\n", | |
"country object\n", | |
"dtype: object" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data.dtypes" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>user_id</th>\n", | |
" <th>purchase_value</th>\n", | |
" <th>age</th>\n", | |
" <th>ip_address</th>\n", | |
" <th>class</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>count</th>\n", | |
" <td>151112.000000</td>\n", | |
" <td>151112.000000</td>\n", | |
" <td>151112.000000</td>\n", | |
" <td>1.511120e+05</td>\n", | |
" <td>151112.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>mean</th>\n", | |
" <td>200171.040970</td>\n", | |
" <td>36.935372</td>\n", | |
" <td>33.140704</td>\n", | |
" <td>2.152145e+09</td>\n", | |
" <td>0.093646</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>std</th>\n", | |
" <td>115369.285024</td>\n", | |
" <td>18.322762</td>\n", | |
" <td>8.617733</td>\n", | |
" <td>1.248497e+09</td>\n", | |
" <td>0.291336</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>min</th>\n", | |
" <td>2.000000</td>\n", | |
" <td>9.000000</td>\n", | |
" <td>18.000000</td>\n", | |
" <td>5.209350e+04</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25%</th>\n", | |
" <td>100642.500000</td>\n", | |
" <td>22.000000</td>\n", | |
" <td>27.000000</td>\n", | |
" <td>1.085934e+09</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50%</th>\n", | |
" <td>199958.000000</td>\n", | |
" <td>35.000000</td>\n", | |
" <td>33.000000</td>\n", | |
" <td>2.154770e+09</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>75%</th>\n", | |
" <td>300054.000000</td>\n", | |
" <td>49.000000</td>\n", | |
" <td>39.000000</td>\n", | |
" <td>3.243258e+09</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>max</th>\n", | |
" <td>400000.000000</td>\n", | |
" <td>154.000000</td>\n", | |
" <td>76.000000</td>\n", | |
" <td>4.294850e+09</td>\n", | |
" <td>1.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" user_id purchase_value age ip_address \\\n", | |
"count 151112.000000 151112.000000 151112.000000 1.511120e+05 \n", | |
"mean 200171.040970 36.935372 33.140704 2.152145e+09 \n", | |
"std 115369.285024 18.322762 8.617733 1.248497e+09 \n", | |
"min 2.000000 9.000000 18.000000 5.209350e+04 \n", | |
"25% 100642.500000 22.000000 27.000000 1.085934e+09 \n", | |
"50% 199958.000000 35.000000 33.000000 2.154770e+09 \n", | |
"75% 300054.000000 49.000000 39.000000 3.243258e+09 \n", | |
"max 400000.000000 154.000000 76.000000 4.294850e+09 \n", | |
"\n", | |
" class \n", | |
"count 151112.000000 \n", | |
"mean 0.093646 \n", | |
"std 0.291336 \n", | |
"min 0.000000 \n", | |
"25% 0.000000 \n", | |
"50% 0.000000 \n", | |
"75% 0.000000 \n", | |
"max 1.000000 " | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data.describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Quick Insights\n", | |
"\n", | |
"From the above table, it can be seen that the averge purchase value is around 36 with the median around 35. This indicates that the purchase value is pretty evenly distributed. \n", | |
"\n", | |
"Minimum age as entered by the user is 18 with a max of 76 and an average 33 and median of 33. This indicates that the site consists of a lot of young users\n", | |
"\n", | |
"The percentage of fraudulent activity is around 9%. This is slightly on the high end and needs to be looked into. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Converting signup time and purchase time to datetime objects\n", | |
"\n", | |
"data['signup_time'] = pd.to_datetime(data['signup_time'])\n", | |
"data['purchase_time'] = pd.to_datetime(data['purchase_time'])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"count 151112\n", | |
"unique 3\n", | |
"top SEO\n", | |
"freq 60615\n", | |
"Name: source, dtype: object" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data['source'].describe()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"count 151112\n", | |
"unique 182\n", | |
"top United States\n", | |
"freq 58049\n", | |
"Name: country, dtype: object" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data['country'].describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"#### Let's perform feature engineering by creating more powerful variables\n", | |
"\n", | |
"1.Difference between signup time and purchase time\n", | |
"\n", | |
"2.Different user id's using the same device could be an indication of a fake transaction\n", | |
"\n", | |
"3.Different user id's from the same IP address could be a fake transaction" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Difference between signup time and purchase time\n", | |
"data['diff_time'] = (data['purchase_time'] - data['signup_time'])/np.timedelta64(1, 's')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Different user id's using the same device\n", | |
"device_user_count = len(data) * [0]\n", | |
"device_count = data.groupby('device_id')['user_id'].count()\n", | |
"device_user_count = device_count[data['device_id']]\n", | |
"device_user_count = device_user_count.reset_index().drop('device_id', axis = 1)\n", | |
"device_user_count.columns = ['device_user_count']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"data = pd.concat([data, device_user_count], axis = 1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Number of users' using a given ip address\n", | |
"\n", | |
"ip_count = data.groupby('ip_address')['user_id'].count()\n", | |
"ip_count = ip_count[data['ip_address']].reset_index().drop('ip_address', axis = 1)\n", | |
"ip_count.columns = ['ip_count']\n", | |
"data = pd.concat([data, ip_count], axis = 1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Keeping only the top 50 countries\n", | |
"#Replacing everything else with 'Other'\n", | |
"\n", | |
"temp = data.groupby('country')[['user_id']].count().sort_values('user_id', ascending = False)\n", | |
"temp = temp.iloc[:50,:].loc[data['country']].reset_index()\n", | |
"temp.loc[temp.isnull().any(axis = 1), 'country'] = 'other'\n", | |
"temp.loc[temp['country'] == '0','country'] = 'other'\n", | |
"temp = temp.drop('user_id', axis = 1)\n", | |
"temp.columns = ['country_revised']\n", | |
"data = pd.concat([data, temp], axis = 1)\n", | |
"data = data.drop('country', axis = 1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>user_id</th>\n", | |
" <th>signup_time</th>\n", | |
" <th>purchase_time</th>\n", | |
" <th>purchase_value</th>\n", | |
" <th>device_id</th>\n", | |
" <th>source</th>\n", | |
" <th>browser</th>\n", | |
" <th>sex</th>\n", | |
" <th>age</th>\n", | |
" <th>ip_address</th>\n", | |
" <th>class</th>\n", | |
" <th>diff_time</th>\n", | |
" <th>device_user_count</th>\n", | |
" <th>ip_count</th>\n", | |
" <th>country_revised</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>22058</td>\n", | |
" <td>2015-02-24 22:55:49</td>\n", | |
" <td>2015-04-18 02:47:11</td>\n", | |
" <td>34</td>\n", | |
" <td>QVPSPJUOCKZAR</td>\n", | |
" <td>SEO</td>\n", | |
" <td>Chrome</td>\n", | |
" <td>M</td>\n", | |
" <td>39</td>\n", | |
" <td>7.327584e+08</td>\n", | |
" <td>0</td>\n", | |
" <td>4506682.0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>Japan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>333320</td>\n", | |
" <td>2015-06-07 20:39:50</td>\n", | |
" <td>2015-06-08 01:38:54</td>\n", | |
" <td>16</td>\n", | |
" <td>EOGFQPIZPYXFZ</td>\n", | |
" <td>Ads</td>\n", | |
" <td>Chrome</td>\n", | |
" <td>F</td>\n", | |
" <td>53</td>\n", | |
" <td>3.503114e+08</td>\n", | |
" <td>0</td>\n", | |
" <td>17944.0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>United States</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1359</td>\n", | |
" <td>2015-01-01 18:52:44</td>\n", | |
" <td>2015-01-01 18:52:45</td>\n", | |
" <td>15</td>\n", | |
" <td>YSSKYOSJHPPLJ</td>\n", | |
" <td>SEO</td>\n", | |
" <td>Opera</td>\n", | |
" <td>M</td>\n", | |
" <td>53</td>\n", | |
" <td>2.621474e+09</td>\n", | |
" <td>1</td>\n", | |
" <td>1.0</td>\n", | |
" <td>12</td>\n", | |
" <td>12</td>\n", | |
" <td>United States</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>150084</td>\n", | |
" <td>2015-04-28 21:13:25</td>\n", | |
" <td>2015-05-04 13:54:50</td>\n", | |
" <td>44</td>\n", | |
" <td>ATGTXKYKUDUQN</td>\n", | |
" <td>SEO</td>\n", | |
" <td>Safari</td>\n", | |
" <td>M</td>\n", | |
" <td>41</td>\n", | |
" <td>3.840542e+09</td>\n", | |
" <td>0</td>\n", | |
" <td>492085.0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>other</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>221365</td>\n", | |
" <td>2015-07-21 07:09:52</td>\n", | |
" <td>2015-09-09 18:40:53</td>\n", | |
" <td>39</td>\n", | |
" <td>NAUITBZFJKHWW</td>\n", | |
" <td>Ads</td>\n", | |
" <td>Safari</td>\n", | |
" <td>M</td>\n", | |
" <td>45</td>\n", | |
" <td>4.155831e+08</td>\n", | |
" <td>0</td>\n", | |
" <td>4361461.0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>United States</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" user_id signup_time purchase_time purchase_value \\\n", | |
"0 22058 2015-02-24 22:55:49 2015-04-18 02:47:11 34 \n", | |
"1 333320 2015-06-07 20:39:50 2015-06-08 01:38:54 16 \n", | |
"2 1359 2015-01-01 18:52:44 2015-01-01 18:52:45 15 \n", | |
"3 150084 2015-04-28 21:13:25 2015-05-04 13:54:50 44 \n", | |
"4 221365 2015-07-21 07:09:52 2015-09-09 18:40:53 39 \n", | |
"\n", | |
" device_id source browser sex age ip_address class diff_time \\\n", | |
"0 QVPSPJUOCKZAR SEO Chrome M 39 7.327584e+08 0 4506682.0 \n", | |
"1 EOGFQPIZPYXFZ Ads Chrome F 53 3.503114e+08 0 17944.0 \n", | |
"2 YSSKYOSJHPPLJ SEO Opera M 53 2.621474e+09 1 1.0 \n", | |
"3 ATGTXKYKUDUQN SEO Safari M 41 3.840542e+09 0 492085.0 \n", | |
"4 NAUITBZFJKHWW Ads Safari M 45 4.155831e+08 0 4361461.0 \n", | |
"\n", | |
" device_user_count ip_count country_revised \n", | |
"0 1 1 Japan \n", | |
"1 1 1 United States \n", | |
"2 12 12 United States \n", | |
"3 1 1 other \n", | |
"4 1 1 United States " | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data.head(5)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Building a Machine Learning Model" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Response Variable\n", | |
"y = data['class']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"#Predictors\n", | |
"data = data.drop(['user_id', 'signup_time','purchase_time','class'], axis = 1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"X = data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"purchase_value 0\n", | |
"device_id 0\n", | |
"source 0\n", | |
"browser 0\n", | |
"sex 0\n", | |
"age 0\n", | |
"ip_address 0\n", | |
"diff_time 0\n", | |
"device_user_count 0\n", | |
"ip_count 0\n", | |
"country_revised 0\n", | |
"dtype: int64" | |
] | |
}, | |
"execution_count": 25, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X.isnull().sum()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Label Encoding string variables\n", | |
"lb = LabelEncoder()\n", | |
"X['device_id'] = lb.fit_transform(X['device_id'])\n", | |
"X['source'] = lb.fit_transform(X['source'])\n", | |
"X['browser'] = lb.fit_transform(X['browser'])\n", | |
"X['sex'] = lb.fit_transform(X['sex'])\n", | |
"X['country_revised'] = lb.fit_transform(X['country_revised'])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"#Splitting data into train and test dataset\n", | |
"X_train, X_test, y_train, y_test = train_test_split(X,y)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"#Creating a pipeline\n", | |
"pipeline = Pipeline(steps = [('clf', RandomForestClassifier(criterion = 'entropy'))])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"clf_forest = RandomForestClassifier(n_estimators= 20, criterion = 'entropy', max_depth= 50, min_samples_leaf= 3,\n", | |
" min_samples_split= 3, oob_score= True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\Deepak\\Anaconda2\\lib\\site-packages\\sklearn\\ensemble\\forest.py:403: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.\n", | |
" warn(\"Some inputs do not have OOB scores. \"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',\n", | |
" max_depth=50, max_features='auto', max_leaf_nodes=None,\n", | |
" min_samples_leaf=3, min_samples_split=3,\n", | |
" min_weight_fraction_leaf=0.0, n_estimators=20, n_jobs=1,\n", | |
" oob_score=True, random_state=None, verbose=0, warm_start=False)" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"clf_forest.fit(X_train, y_train)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" | |
] | |
}, | |
"execution_count": 31, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"preds = clf_forest.predict(X_test)\n", | |
"preds" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" precision recall f1-score support\n", | |
"\n", | |
" 0 0.95 1.00 0.98 34184\n", | |
" 1 0.99 0.55 0.70 3594\n", | |
"\n", | |
"avg / total 0.96 0.96 0.95 37778\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print classification_report(y_test, preds)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0.06105041, 0.0794475 , 0.0114642 , 0.01611104, 0.00809526,\n", | |
" 0.05038098, 0.07786234, 0.34906319, 0.14008505, 0.17718896,\n", | |
" 0.02925106])" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#Variable importance\n", | |
"clf_forest.feature_importances_" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array(['purchase_value', 'device_id', 'source', 'browser', 'sex', 'age',\n", | |
" 'ip_address', 'diff_time', 'device_user_count', 'ip_count'], dtype=object)" | |
] | |
}, | |
"execution_count": 34, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#Features used are \n", | |
"data.columns.values[:-1]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.95492967688425368" | |
] | |
}, | |
"execution_count": 35, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#out of box score\n", | |
"clf_forest.oob_score_" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"#### Some quick insights\n", | |
"\n", | |
"From the above, it is very clear that we are able to predict fraud with a precision of 98% and a recall of 54%.\n", | |
"This implies of all the times we predicted fraud, we were right 98% of the time.\n", | |
"Similarly, of all the fraud that has taken place, we were able to correctly identify only 54% of it. \n", | |
"It is clear that we need to improve our recall rate even if it reduces the precision.\n", | |
"This is act of balancing false positives and false negatives.\n", | |
"\n", | |
"A false positive would imply more checks on a potentially non -fraudulent customer.\n", | |
"A false negative would imply an act of fraud going undetected. \n", | |
"\n", | |
"Thus we need to decrease false negatives, even if it is at the cost of false positives.\n", | |
"This would automatically improve our recall/sensitivity score." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### ROC analysis" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"prob_score = clf_forest.predict_proba(X_test)\n", | |
"prob_score = DataFrame(prob_score).iloc[:,0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"fpr,tpr,thresholds = roc_curve(y_test,1-prob_score)\n", | |
"#auc = auc(fpr,tpr)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEZCAYAAACervI0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXFWd9/HPNxtr9kASkkAWVlkSthAQpSEoQXgGhXFk\nkUFnEWdEHWdej+LMM0MexxkGX6PDKKM+qIOKIjqCgorIIq0sCQQhrAlJWEI2sgdCSEKW3/PHuU0q\nTXV3dXfdqu663/frVa/UrTp17++GcH73nnPuOYoIzMysePrUOwAzM6sPJwAzs4JyAjAzKygnADOz\ngnICMDMrKCcAM7OCcgIwMysoJwBrGJJekvSGpNckLZd0g6S9W5U5RdK9WZn1km6TdESrMgMlXStp\ncVZuoaSvSBpW2zMyy5cTgDWSAM6JiEHAFOBY4PMtX0o6GfgN8DNgNDABeBJ4UNL4rEx/4LfAEcB7\ns32dDKwBpuYVuKS+ee3brC1OANZoBBARq0iV/ZSS764BvhsR10XEpojYEBH/CMwGZmZlLgPGAu+P\niOeyfa2JiH+NiDvLHlA6UtJdktZKWiHpyuzzGyR9oaTcaZKWlGy/KOmzkp4AXs/e/0+rff+npGuz\n94MkfTu7u1ki6Z8lqRt/V1ZwTgDWkCSNBc4GFmbbewGnAD8tU/wnwHuy99OBOyNic4XH2Re4G7iD\ndFdxMHBvOz9pPffKhVmcQ4CbgbMl7ZPtuw/wQeCHWdnvAW8CE0l3N+8B/qKSOM3KcQKwRvNzSa8B\nLwMr2XVlP4z0731Fmd+sAEZk74e3UaYt5wIrIuLaiHgzu7OY04nf/2dELI+IrRHxMvAY8IHsu+nA\npoiYI2kkKVF8JiK2RMQa4Frgok4cy2w3TgDWaM7L2u1PAw5nV8W+HthJukpvbTSpjR9gbRtl2jIO\neL5roQKwtNX2j9hVqV8E3JS9PxDoD6yQtE7SeuCb7Do/s05zArBG09IHcD+pyeTL2fYbwCxSk0pr\nfwLck72/BzgrazKqxBJgUhvfbQJKRyGVSyytm4T+B2iSNIZ0J9CSAJYAW4DhETEsIoZGxJCIOKbC\nOM3exgnAGtm1wHskHZ1tXwlcJukKSftKGirpi8A0oKWz9kZSZXuLpMOUDJf0eUkzyhzjl8AoSZ+S\nNCDbb8toobnA+7LjjAI+3VHAWdPO74AbgBdKOqJfAe4C/iMbpipJEyW9uyt/MWbgBGCNZber6awy\n/R7wT9n2g8BZwAWkdv4XgcnAOyPi+azMm8CZwHxS5+6rpFFCw4GH33bAiNdJnbF/BLwCLACasq9v\nJA0zfQm4k9TJ22a8JW4itf//sNXnfwoMAJ4F1pHuFka1sQ+zDskLwpiZFZPvAMzMCsoJwMysoJwA\nzMwKygnAzKyg+tU7gEpJcm+1mVkXRETZOaN61R1ARHTpddVVV3X5t7315XMuxsvnXIxXd865Pb0q\nAZiZWfU4AZiZFVQhEkBTU1O9Q6g5n3Mx+JyLIa9zzvVJYEnfIU2XuzLamLRK0ldJ09xuAj4SEXPb\nKBd5xmpm1ogkEXXqBL6BNPdKWZLOBiZFxCHA5aTpbc3MrAZyTQAR8QBpHva2nAd8Pyv7MDA4W/jC\nzMxyVu8+gDGkqXdbLMs+MzOznPWaB8HMzAohdsL6RbB6LqyaC6seh1NmwuiTqn6oeieAZaQl9VqM\nzT4ra+bMmW+9b2pqKuRoADNrINu3wJqnU0W/OqvsVz8Jew2H/Y+F/abA5I/D0EMr3mVzczPNzc0V\nlc19PQBJ44FfRMTRZb57H/CJiDhH0jTg2oiY1sZ+PArIzHqvzetg9RNZJZ9V9hsWwZBDUmW//5RU\n4e8/BfYcWrXDtjcKKO9hoDeRVkcaDqwEriKtaBQRcX1W5jpgBmkY6Ecj4rE29uUEYGY9XwRsfHlX\n803Ln1vWwX6TUwXfUuEPPxL67ZlrOHVLANXkBGBmPc6ObbBu/q4r+pamnL57llzRZ5X9kEmg2o+7\ncQIwM+uuN1/PmnDm7mrGWfssDBy3q71+ZPbnPj1nNLsTgJlZZ2x6Zffmm9VzYePS1GTTckW//7Ew\n4mgYsG+9o22XE4CZWTktQy7f6pjNKvyd27ImnJLKfthh0KfeAyc7zwnAzOytIZclbfWrn4S9Ruyq\n5Fva7AeOBZWtM3sdJwAzK5bN697eMbthURpPX9oxu9/kqg657ImcAMysMUXAa4t3f5Bq1VzYuj5V\n7qWV/fB35D7ksidyAjCz3m/HNlg37+2Vfb+9du+Y3W8KDJlYlyGXPZETgJn1Ljt3pCGWa56CRbel\n5pt182Dgga0q+8k9ashlT+QEYGY9VwRsXg0bnoe18+Dle+Clu9J8OMOPhAPPgJEnwH5HQ/996h1t\nr+MEYGb1t3NHGnWz9hlYvxDWL4ANC9P7Pv1gyMEw9BAYexqMnwGDxnW8T+uQE4CZ1V5EasZ5+bew\n5Lew9Hew98jURj/00FTZDz0kTYa217B6R9uwnADMLH+xE9Y9B8vuzyr9+6D/vjDu9NSMM+502Hd0\nvaMsHCcAM6u+bW/AK3Ng+YOw/CFYPgv2GAwHvHNXhT94fL2jLDwnADPrvteXw7IHd1X4a55JHbMH\nnJIq/QNO8RV+D+QEYGads2MbrHkSls/Oru4fTLNhHnBKeo15ZxqZ03+vekdqHXACMLO2vbkxXc2v\neSq9Vs2FVY/BoPEwehoccHK6wh92WMPMj1MkTgBmlq7q1y/YVdGvzv58YxUMPwJGHJWmNx5xDIye\nmtrzrddzAjArkog0d31LRd/yWr8gLV4y4ujdX0MmQZ++9Y7acuIEYNaotm+Bpfendvot6+DVF9P7\nvv1bVfRHpcnQ+u9d74itxpwAzBpFRLqSf+lOeOk3sOyBVMGPPS3NibPvmGw0zgH1jtR6CCcAs95s\ny4Y0P87iu1Olv3MHTJgB48+CA6c3/Hz21j3tJYDet76ZWaPavgU2rYDXXoZXHoGFP4Mta9N890Mm\nwaipcP4dMOwIj8axqvAdgFk97NyRJkVbPgtWzEp/vvYS7D0qLUe4/7FpvpyDzoRBEzze3rrMTUBm\n9bZ5HayYvauyf2UO7DMqjbEffXL6c/iRHo1jVecEYFZrKx+HlXNSZb98FmxaDqNO3FXZj56W5rs3\ny5kTgFletr0BS5rTPPevvpBeGxal9vzxZ+26wh9xlK/urS6cAMyq4bUlqQln9ZOw5mlY+zS8vgxG\nnpiu6IdMhMETYPDENI1C3/71jtjMCcCs07ZvSZ20yx7MZsB8CHZsTWPs95+SPVh1FAw9OK1mZdZD\nOQGYtWX7ljTkcvWTaTGT9QvSa9OKNPRyzLt2zYA5ZJKHX1qv4wRg1uLNjelqfunv0xQKqx5L4+pH\nHp9muxx6KAw9LC1k4it7awBOAFZMLbNfLp+VNec8AOvmpcp+7Luzq/uTYcDAekdqlhsnAGtcEbB+\nYarcX30+PTXb8npjJQwYnFatOui9aU77USdCvz3qHbVZzdQ1AUiaAVwL9AG+ExHXtPp+EPAD4ECg\nL/DliPhumf04AdiuCn9pcxp+uaQ5NdWMfXdqvhl0UPYanyZG80gcK7i6JQBJfYAFwHRgOTAHuDAi\n5peU+TwwKCI+L2kE8BwwMiK2t9qXE0ARRcDGl+GRL6U57dc+m6Y0Hnc6jGtKr0Hj3Tlr1oZ6TgY3\nFVgYEYuzQG4GzgPml5QJoKURdiCwtnXlbwWwfWt6Wnbj0lThr3g4td+vehxiJxxxCZx8VRp+uff+\nrvDNqiDvBDAGWFKyvZSUFEpdB9wuaTmwL/ChnGOyeouAlY/Cy/fBkt+meXHe3Aj7jE4Toe07Nk1x\nPPmv06RoA8e6wjfLQU8Y53YW8HhEnCFpEnC3pGMi4vXWBWfOnPnW+6amJpqammoWpHXTts2w8JZU\n2S/9HWx7HSacA++4FN5zfWqv91QJZt3W3NxMc3NzRWXz7gOYBsyMiBnZ9pVAlHYES/olcHVEPJht\n3wt8LiIebbUv9wH0Nq+9DC/8Cl78VRp3P3gCHPYhGHNqeqlPvSM0a3j17AOYAxws6SBgBXAhcFGr\nMouBM4EHJY0EDgVeyDkuq7ad22HlH2DNM6mj9qU74Y1VaeWqIy6Fs2/0ylVmPUythoH+J7uGgf6b\npMtJdwLXSxoNfBcYnf3k6oj4UZn9+A6gp9i+NT1QteapNIXCqrmw7P50hT9qapoMbcIMGHmCm3XM\n6swPglnX7dyRHqp6ZU56mnbhremBq8ETYcQxsN8x2aLk74Y9BtU7WjNrxQnAKrdjG7x4R3qy9oVf\nwrr50HcAjD87zZUzrimNwe+3Z70jNbMKOAFYeds2wesrUiW/YVFqu194a5rieOK5ac6cA06BPQbX\nO1Iz6yInAIOtr6V2+1dfSrNhLn8ojcUfND5d2Q85BIYekir+IRPrHa2ZVYkTQJFEpKdpV86BVx5N\nbfdrnoTNa3c9VLXXfnD4RWkmTDflmDU0J4BGFZGGWq78Q6roV2YVfuxMs16OOjE14+y1X+qs7b93\nvSM2sxpzAmgUm9elqRPWPgtP3wBvvJJWtBpz6q6pjkedCAPHeeoEMwOcAHq/bZvg4avhiW+kTtmh\nh8B+U+Dg89xBa2btqueTwNYVO3fAy/fAotvS1f7qJ9IDVh9+ND1sZWZWBb4D6Cm2rE8jc575Przw\nCxh2eJo3Z+Txac3agWPqHaGZ9UJuAuppdmxL0yi8sRLWPA2L74EVs2HUCekhqwnvSyN23I5vZt3k\nBFBvEbDkvlTRL/09rJgF+4yCwZPSAidj3w2T/hf036fekZpZg3ECqIetr8JzP4YXf51eAwbC5I+n\nK/zBE2Hw+HpHaGYF0O0EIGkAcGBELKp2cJXqFQkgIk2D/NxP4LmbYeSJae6cg8/LmnQ8/72Z1Va3\nRgFJOgf4CjAAmCBpCnBVRHygumH2Ytu3wrM3wpxr0nq2h/wx/Nkid9yaWY9WyTDQLwAnAfcBRMRc\nSQfnGlVvsXY+zJqZmnoOPAPe+y0Ye5o7b82sV6gkAWyLiA3avVLr4W0xOVs7H+77FCy+O61p+5cv\nwaCD6h2VmVmnVJIA5kn6E6CPpAnAp4DZ+YbVQ616Amb/c6r4J38czv0J7Dmk3lGZmXVJJb2SVwDH\nAzuBW4GtwKfzDKrHiYBZX4Abp6R1bf98Ebz7Glf+ZtardTgKSNL5EXFrR5/lrW6jgHbugJtPTQ9s\nnX8HjH1X7WMwM+uibg0DlfRYRBzX6rM/RMTxVYyxQ3VJAG++Dj89M03T8Md3w6ADa3t8M7Nu6tIw\nUElnATOAMZK+UvLVIFJzUON75OqUBC57Kq2La2bWQNrrBF4FPA1sAZ4p+XwjcGWeQfUIi26Dh/81\ntfe78jezBlRJE9CeEbGlRvG0F0ftmoBWPAz/Mx3OuRkmnVubY5qZ5aC76wGMkfQvwDuAtxaQjYhD\nqxRfz7JjG9z7CTjxs678zayhVTIM9LvADYCAs4GfAD/OMab6+sNX0sRtUxu/lcvMiq2SJqA/RMTx\nkp6KiKOzzx6NiBNqEuGuOPJvAlr5GNx0ElwyB/afku+xzMxqoLtNQFsl9QGel/RxYBkwsJoB9gg7\nt8OvLk5NP678zawAKkkAnwH2IU0B8S/AYODP8gyqLp75PvTbC079l3pHYmZWE11aEEbSmIhYlkM8\n7R0zvyag7Vvhvw+BM76W5u43M2sQ7TUBtdsJLOlESe+XNCLbPlLS94GHc4izfp7+bxgwyJW/mRVK\nmwlA0tXAD4FLgDslzSStCfAE0DhDQDevhYf+CU7793pHYmZWU202AUl6Fjg+IjZLGgYsAY6OiBdq\nGWBJPNVvAtq5HX54Euy9P1zw6+ru28ysB+hqE9CWiNgMEBHrgAVdqfwlzZA0X9ICSZ9ro0yTpMcl\nPS3pvs4eo8ue/BZsXALnNu5jDWZmbWnvDmAD8NuWTeD0km0i4vwOd56Gjy4ApgPLgTnAhRExv6TM\nYOAh4L0RsUzSiIhYU2Zf1b0D2L4Fvj0BzvlRWrjdzKwBdfU5gAtabV/XhWNPBRZGxOIskJuB84D5\nJWUuBm5pGVVUrvLPxSPXwP7HuvI3s8JqMwFExL1V2P8YUt9Bi6WkpFDqUKB/1vSzL/DViLixCsdu\n27oFaTH3Dz+a62HMzHqySh4Ey1s/4DjgDNIDZ7MkzYqIRa0Lzpw58633TU1NNDU1de2I918JR/05\njKzpmjZmZrlrbm6mubm5orJdehCsUpKmATMjYka2fSUQEXFNSZnPAXtGxP/Ntr8N/Doibmm1r+r0\nAax4GG6aBn+1GvYe0f39mZn1YF1+EKzVTvbowrHnAAdLOkjSAOBC4PZWZW4DTpXUV9LewEnAvC4c\nqzJz/wumft6Vv5kVXocJQNJUSU8BC7PtyZK+VsnOI2IHcAVwF2lVsZsjYp6kyyV9LCszH/gN8CQw\nG7g+Ip7t0tl0ZP1CeO4ncMzHctm9mVlvUsl00LOBDwE/j4hjs8+ejoijahBfaRzdbwK69RwYMgnO\n+Gp1gjIz6+G6Ox10n4hYLO32+x1ViayW1i+EF++Ajy3puKyZWQFUkgCWSJoKhKS+wCdJD3f1Lgtu\ngSM+DAPH1jsSM7MeoZJO4L8C/hY4EFgJTMs+611e+AWMPqneUZiZ9RiV3AFsj4gLc48kTysegVWP\nwQV31jsSM7Meo5I7gDmS7pB0maTeuRTk/JvguL9Ji72bmRlQQQKIiEnAF4Hjgack/VxS77ojWD4L\nxryr3lGYmfUonXoSOFsX4Frgkojom1tU5Y/d9WGgX+kLn9oE/fasblBmZj1ct54ElrSvpEsk/QJ4\nBFgNnFLlGPMVAX361zsKM7MepZJO4KeBXwBfioj7c44nJwEqmwDNzAqrkgQwMSJ25h5J7pwAzMxK\ntZkAJH05Iv4OuEXS2xrfK1kRrEdo6TfwHYCZ2W7auwNoWSi3KyuB9SD5TXdtZtabtbci2CPZ2yMi\nYrckIOkKoBorhtWIr/7NzFqr5EGwPyvz2Z9XO5Dc5LjgjZlZb9ZeH8CHSAu4TJB0a8lXA4ENeQdW\nPR4BZGZWTnt9AI8Aa4GxwH+VfL4ReDzPoKoqAjcBmZm9XXt9AC8CLwL31C6cPPgOwMysnPaagH4X\nEadJWs/uQ2lEWth9WO7RVYPvAMzMymqvCej07M9evnq67wDMzMppcxRQydO/44C+2QLvJwOXA/vU\nILbq8B2AmVlZlQwD/TlpOchJwA3AIcBNuUZVbb4DMDN7m0oSwM6I2AacD3wtIj4DjMk3rGrycwBm\nZuVUkgC2S/ogcCnwy+yzXjS3spuAzMzKqfRJ4NNJ00G/IGkC8KN8w6oi9wGYmZVV0YpgkvoBB2eb\niyJie65RlY+hayuCvbkRvjkaPvV69YMyM+vh2lsRrMP1ACS9C7gRWEa6lB4l6dKIeLC6YebEdwBm\nZmVVsiDMfwDvi4hnASQdQUoIJ+QZWPX4OQAzs3Iq6QMY0FL5A0TEPGBAfiFVme8AzMzKquQO4DFJ\n3wR+kG1fQm+aDA58B2BmVkYlCeDjwKeAz2bb9wNfyy2iqvNzAGZm5bSbACQdDUwCfhYRX6pNSFXm\nJiAzs7La7AOQ9PekaSAuAe6WVG5lsA5JmiFpvqQFkj7XTrkTJW2TVOXF5t0JbGZWTnt3AJcAx0TE\nJkn7AXcA/92ZnUvqQ1pUfjqwHJgj6baImF+m3L8Bv+nM/iviOwAzs7LaGwW0NSI2AUTE6g7KtmUq\nsDAiFmfzCd0MnFem3CeBnwKrunCMDjgBmJmV094dwMSStYAFTCpdGzgiKmmqGQMsKdleSkoKb5F0\nAPD+iDhd0m7fVYX6wL6jq75bM7Perr0EcEGr7etyiuFaoLRvoM3L9ZkzZ771vqmpiaampo73vtdw\nuOypLgdnZtabNDc309zcXFHZiuYC6ipJ04CZETEj276StJzkNSVlXmh5S1p9bBPwsYi4vdW+ujYX\nkJlZgbU3F1DeCaAv8BypE3gF8AhwUfY0cbnyNwC/iIhby3znBGBm1kndmgyuOyJih6QrgLtIncjf\niYh5ki5PX8f1rX+SZzxmZrZLxXcAkvaIiK05x9Pe8X0HYGbWSe3dAXQ4tFPSVElPAQuz7cmSetFU\nEGZmVk4lY/u/CpwLrAWIiCdIK4SZmVkvVkkC6BMRi1t9tiOPYMzMrHYq6QRekj2gFdmonk8CC/IN\ny8zM8tZhJ7Ck/UnNQGdmH90DXBERa3KOrXUc7gQ2M+ukuj0HUE1OAGZmndfdReG/RZnx+RHxsSrE\nZmZmdVJJH8A9Je/3BD7A7hO8mZlZL9TpJqBs7v4HIuKUfEJq87huAjIz66RuPQhWxgRgZPdCMjOz\nequkD2A9u/oA+gDrgCvzDMrMzPLXbhOQJAHjgGXZRzvr1Q7jJiAzs87rchNQVuPeERE7spdrYDOz\nBlFJH8BcScfmHomZmdVUm01AkvpFxHZJzwCHAc+TVusS6ebguNqF6SYgM7Ou6OqDYI8AxwF/lEtU\nZmZWV+0lAAFExPM1isXMzGqovQSwn6S/bevLiPhKDvGYmVmNtJcA+gL7kt0JmJlZY2mvE/ixWnf0\ntsedwGZmndfV5wB85W9m1sDauwMYFhHrahxPm3wHYGbWeV4QxsysoKo9G6iZmTUAJwAzs4JyAjAz\nKygnADOzgnICMDMrKCcAM7OCcgIwMysoJwAzs4JyAjAzK6jcE4CkGZLmS1og6XNlvr9Y0hPZ6wFJ\nR+cdk5mZ5TwVhKQ+wAJgOrAcmANcGBHzS8pMA+ZFxKuSZgAzI2JamX15Kggzs06q51QQU4GFEbE4\nIrYBNwPnlRaIiNkR8Wq2ORsYk3NMZmZG/glgDLCkZHsp7VfwfwH8OteIzMwMaH9FsJqSdDrwUeDU\ntsrMnDnzrfdNTU00NTXlHpeZWW/S3NxMc3NzRWXz7gOYRmrTn5FtXwlERFzTqtwxwC3AjLYWoXcf\ngJlZ59WzD2AOcLCkgyQNAC4Ebm8V3IGkyv/Stip/MzOrvlybgCJih6QrgLtIyeY7ETFP0uXp67ge\n+EdgGPB1SQK2RcTUPOMyMzOvCGZm1tC8IpiZmb2NE4CZWUE5AZiZFZQTgJlZQTkBmJkVlBOAmVlB\nOQGYmRWUE4CZWUE5AZiZFZQTgJlZQTkBmJkVlBOAmVlBOQGYmRWUE4CZWUE5AZiZFZQTgJlZQTkB\nmJkVlBOAmVlBOQGYmRWUE4CZWUE5AZiZFZQTgJlZQTkBmJkVlBOAmVlBOQGYmRWUE4CZWUE5AZiZ\nFZQTgJlZQTkBmJkVlBOAmVlBOQGYmRVU7glA0gxJ8yUtkPS5Nsp8VdJCSXMlTck7JjMzyzkBSOoD\nXAecBRwJXCTp8FZlzgYmRcQhwOXAN/OMyczMkrzvAKYCCyNicURsA24GzmtV5jzg+wAR8TAwWNLI\nnOMyMyu8vBPAGGBJyfbS7LP2yiwrU8bMzKrMncBmZgXVL+f9LwMOLNkem33Wusy4DsoAMHPmzLfe\nNzU10dTUVI0YzcwaRnNzM83NzRWVVUTkFoikvsBzwHRgBfAIcFFEzCsp8z7gExFxjqRpwLURMa3M\nviLPWM3MGpEkIkLlvsv1DiAidki6AriL1Nz0nYiYJ+ny9HVcHxF3SHqfpEXAJuCjecZkZmZJrncA\n1eQ7ADOzzmvvDsCdwGZmBeUEYGZWUIVIAJX2iDcSn3Mx+JyLIa9zdgJoUD7nYvA5F4MTgJmZVZUT\ngJlZQfWqYaD1jsHMrDdqaxhor0kAZmZWXW4CMjMrKCcAM7OCaqgEUMTlJzs6Z0kXS3oiez0g6eh6\nxFlNlfx3zsqdKGmbpPNrGV8eKvy33STpcUlPS7qv1jFWWwX/tgdJuj37f/kpSR+pQ5hVI+k7klZK\nerKdMtWtvyKiIV6kZLYIOAjoD8wFDm9V5mzgV9n7k4DZ9Y67Buc8DRicvZ9RhHMuKXcv8Evg/HrH\nXYP/zoOBZ4Ax2faIesddg3P+PHB1y/kCa4F+9Y69G+d8KjAFeLKN76tefzXSHUARl5/s8JwjYnZE\nvJptzqb3r7ZWyX9ngE8CPwVW1TK4nFRyzhcDt0TEMoCIWFPjGKutknMOYGD2fiCwNiK21zDGqoqI\nB4D17RSpev3VSAmgiMtPVnLOpf4C+HWuEeWvw3OWdADw/oj4BlB2+FsvU8l/50OBYZLukzRH0qU1\niy4flZzzdcA7JC0HngA+XaPY6qXq9VfeK4JZDyHpdNJaC6fWO5YauBYobTNuhCTQkX7AccAZwD7A\nLEmzImJRfcPK1VnA4xFxhqRJwN2SjomI1+sdWG/RSAmgqstP9hKVnDOSjgGuB2ZERHu3mL1BJed8\nAnCzJJHahs+WtC0ibq9RjNVWyTkvBdZExBZgi6TfA5NJ7ei9USXn/FHgaoCIeF7Si8DhwKM1ibD2\nql5/NVIT0BzgYEkHSRoAXAi0/h/+duBPAbLlJzdExMrahllVHZ6zpAOBW4BLI+L5OsRYbR2ec0RM\nzF4TSP0Af92LK3+o7N/2bcCpkvpK2pvUSTiP3quSc14MnAmQtYUfCrxQ0yirT7R9x1r1+qth7gCi\ngMtPVnLOwD8Cw4CvZ1fE2yJiav2i7p4Kz3m3n9Q8yCqr8N/2fEm/AZ4EdgDXR8SzdQy7Wyr87/xF\n4LslwyY/GxHr6hRyt0m6CWgChkt6GbgKGECO9ZengjAzK6hGagIyM7NOcAIwMysoJwAzs4JyAjAz\nKygnADOzgnICMDMrKCcA6zEk7ZD0WDal8WPZQ2xtlT1I0lNVOOZ92ZTDcyXdL+mQLuzjckkfzt5f\nJmlUyXfXSzq8ynE+nD3d3dFvPi1pz+4e2xqXE4D1JJsi4riIODb78+UOylfrIZaLImIKaabFf+/s\njyPi/0XED7LNj1AyQVdEfCwi5lclyl1xfoPK4vwbYO8qHdsakBOA9SRvewQ+u9L/vaRHs9e0MmXe\nkV0VP5ZdIU/KPr+k5PNvZE9Ct3fc3wMtv52e/e4JSd+W1D/7/N+yBVfmSvpS9tlVkv5O0gWkeYh+\nkP12z+zK/bjsLuFLJTFfJumrXYxzFnBAyb6+LukRpUVRrso++2RW5j5J92afvVfSQ9nf44+zKSOs\nwJwArCddpZtfAAACuUlEQVTZq6QJ6Jbss5XAmRFxAmk+mK+V+d3HgWsj4jhSBbw0a3b5EHBK9vlO\n4JIOjv9HwFOS9gBuAD4YEZNJC5L8laRhpGmmj8quxL9Y8tuIiFtIE5FdnN3BbCn5/hbgAyXbHyJN\nWNeVOGcAPy/Z/vtseo/JQJOkoyLia6SJwpoiYrqk4cA/ANOzv8s/AH/XwXGswTXMXEDWEN7IKsFS\nA4DrlJa/2wGUa6OfBfyDpHHArRGxSNJ00vTIc7Ir6j1JyaScH0raDLxEWkjmMOCFksnzvgf8NfBf\nwGZJ3wZ+RVptrJy3XcFHxBpJz0uaSpqh87CIeEjSJzoZ5x6k6Z5LlwO8UNJfkv5/HgW8A3ia3ScW\nm5Z9/mB2nP6kvzcrMCcA6+k+A7wSEcdI6gtsbl0gIn4kaTZwLvCrbMIwAd+LiH+o4BgXR8TjLRvZ\n1XK5SnxHVoFPBz4IXJG9r9SPSVf784GftRyus3FmTUnXARdIGk+6kj8+Il6TdAMpibQm4K6I6Oju\nwgrETUDWk5Rr+x4MrMje/ynQ920/kiZExItZs8ftwDGk9YD/WNJ+WZmh7Ywqan3c54CDJE3Mti8F\nfpe1mQ+JiDuBv82O09pGYFAbx/kZaVm/C0lLHNLFOP8JOEnSodmxXgc2Kk2JfHZJ+ddKYpkNvLOk\nf2Tvrox4ssbiBGA9SblRPV8HPiLpcdJ875vKlPmTrGP2ceBI4PsRMQ/4P8Bdkp4gTSs8qsxv33bM\niNhKmmr3p9lvdwDfJFWmv8w++z3p7qS17wLfbOkELt1/RGwgzdF/YEQ8mn3W6TizvoUvA/87Ip4k\nLZg+D/gB8EDJb74F3Cnp3myN4I8CP8qO8xCpqcsKzNNBm5kVlO8AzMwKygnAzKygnADMzArKCcDM\nrKCcAMzMCsoJwMysoJwAzMwKygnAzKyg/j/eesD8BtlQBAAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x99cd048>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"#Plotting the ROC curve\n", | |
"plt.plot(fpr,tpr, color = 'darkorange')\n", | |
"plt.xlim([-.05, 1.05])\n", | |
"plt.ylim([-.05, 1.05])\n", | |
"plt.xlabel('False Positive Rate')\n", | |
"plt.ylabel('True Positive Rate')\n", | |
"plt.title('ROC curve')\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0.40539789, 0.40623261, 0.40651085, ..., 0.94073456,\n", | |
" 0.94073456, 1. ])" | |
] | |
}, | |
"execution_count": 39, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"tpr" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0. , 0. , 0. , ..., 0.79072081,\n", | |
" 0.7911011 , 1. ])" | |
] | |
}, | |
"execution_count": 40, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"fpr" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 1.00000000e+00, 9.97727273e-01, 9.97222222e-01, ...,\n", | |
" 9.25925926e-04, 4.34782609e-04, 0.00000000e+00])" | |
] | |
}, | |
"execution_count": 41, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"thresholds" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>1-fpr</th>\n", | |
" <th>fpr</th>\n", | |
" <th>tf</th>\n", | |
" <th>thresholds</th>\n", | |
" <th>tpr</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>2163</th>\n", | |
" <td>0.765621</td>\n", | |
" <td>0.234379</td>\n", | |
" <td>0.000099</td>\n", | |
" <td>0.057143</td>\n", | |
" <td>0.765721</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 1-fpr fpr tf thresholds tpr\n", | |
"2163 0.765621 0.234379 0.000099 0.057143 0.765721" | |
] | |
}, | |
"execution_count": 42, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#ROC Analysis\n", | |
"i = np.arange(len(fpr))\n", | |
"roc = DataFrame({'fpr' : Series(fpr, index=i),'tpr' : Series(tpr, index = i), '1-fpr' : Series(1-fpr, index = i), \n", | |
" 'tf' : Series(tpr - (1-fpr), index = i), 'thresholds' : Series(thresholds, index = i)})\n", | |
"roc.ix[(roc['tf']-0).abs().argsort()[[0]]]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[]" | |
] | |
}, | |
"execution_count": 43, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEPCAYAAABLIROyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XecVPXVx/HPAQGl2gsIiICCBREBxcZixRIxKhH1McYW\njVGfqEnEGGNNotGoUWMNjyZWjCWKGjurggVExQYK0hERBQWlyp7nj3NXhnW2wc7e2Znv+/W6r51y\nZ+6Z3dl77q+buyMiIsWnUdoBiIhIOpQARESKlBKAiEiRUgIQESlSSgAiIkVKCUBEpEgpAUi1zOxY\nM3s67TjyiZktMrOtUjhuRzMrM7OC+N81s/fNbO81eJ2+k3WgIL5ExcTMppnZYjNbaGafmtmdZtY8\nl8d09/vcfWAuj5HJzHY3sxeSz7jAzB4zs+71dfws8Yw0s5MyH3P3Vu4+LUfH28bMHjSzecnnf8fM\nzjEzKz98Lo5bW0ki2npt3sPdd3D3l6s5zg+SXn1/JwuVEkDD48Ah7t4a6AnsDFyQbkhrxswaZ3ms\nH/AM8CiwBdAJeBcYnYsr7mwxpMnMOgOvA9OBHdx9A2Aw0AtoVcfHWtvPvsaJqJbHtuRYVt2OUkvu\nrq0BbcBUYJ+M+1cBIzLuNwWuIU4gc4CbgWYZzw8C3ga+BiYBBySPtwb+AXwKzAQuByx57gTgleT2\nzcDVFWL6D/Cr5PYWwEPA58AnwFkZ+10M/Bu4G/gKOCnL53sZuDHL408BdyW3+ycxXgDMA6YAx9bk\nd5Dx2t8mz/0TWB8YkcT8ZXK7bbL/FcB3wGJgIXBD8ngZsHVy+07gJuCJZJ/XgE4Z8RwATAQWAH8H\nSrN99mTfuzP/nlme75gc+6fJ5/sc+F3G832AV5NjzQZuBNbJeL4MOAP4GPgkeex6YEbynRgL7Jmx\nfyPgd8Dk5LONBbYEXkre65vk8cHJ/ocS368FwChgxwrf3d8C44ElQGMyvs9J7GOTOOYA1ySPTwdW\nAouSY+1Kxncy2Wd74Nnk7zcHGJr2/2pD2FIPQFst/2Cr/8NsSVwdX5vx/HXECbkN0AJ4DPhj8lxf\n4sRb/votgG2S248SJ8p1gY2Jq9BTk+dOAF5Obu8FTM843vrEyXEz4grtTeDC5J97q+TEsX+y78XA\nMuBHyf1mFT7besTJtn+Wz/0zYHZyuz+wArgaaALsnZyIutbgd1D+2j8lr20GbAj8OLndAhgOPJpx\n7JFUOGEnJ6TMBDAP2IU4Yd4D3Jc8t1FyQhuUPHd28juoLAHMAU6o4u9fngBuIxJdD2ApsG3yfK/k\n72xAB+AD4OyM15cRJaw2rEqKxyZ/x0bAOUkMTZPnfkOcsLsk93cENsh4r8xEtzMwF+idHP944vva\nJOO7+xbQNuPYmd/nV4HjktvNgb4Zn3klyQVJlu9kS+LC5VfJ76QF0Cft/9WGsKUegLZa/sHiH2Zh\nspUBzwGtM57/psI/ZT9gSnL7VuCvWd5z0+QkkllSGAK8mNz+/p8tuT+N5CoROAV4Prm9KzCtwnsP\nBYYlty8GSqv4bO2Sz7RNlucOBJYlt/sDy4F1M54fDlxYg99B/+SzNqkijp7Alxn3syWAiiWA2zOe\nOwj4MLl9PDC6wmtnVHy/jOeWk5TKKnm+/GS4RcZjbwA/qWT//wUerhB3/2q+Y/NJrtyJksuhlez3\n/e8guX8zcGmFfSYCe2V8d0/I8n0uTwClyXdko0o+c6OMxzITwBBgXK7+5wp5WwdpiAa5+0gz2wu4\nj7hiX2hmmxBXTuNWtRfSiFV1p+2BJ7O8X0fianhO8jpLthmVHH84cAxRxD+WqLaAuOJsZ2bzk/uW\nHD+zkW9mFZ9rAXFS2YKoosi0BfBF5r7uvjTj/nSgbQ1+BwDz3H1F+R0zW4+oBjmQuBI2oKWZmSdn\nmBr4LOP2YuKqFOJqt+JnnlXF+3xJfNbqzM12PDPrClxLXIWvB6wDjKvq+Gb2a+CkjOO2Ir5TEN+Z\nKTWIB+J79FMzO6v8rYnvVdvKjl3ByUTV40QzmwJc5u7Zvq8VtSeqG6WW1AjcMBmAu79C1GH/NXn8\nC+JksL27b5hs67t7m+T5mUDnLO83k7gq3ih5zQbJ63pUcvz7gaPMrANx1f9wxvtMyTj2Bu7ext1/\nlPHaSk+o7r6YqD8fnOXpnwDPZ9zfIDlxl+tAVANU9zvIFsN5QFei2mB9okoJViWNmiaBbOYQJ6hM\nW1ax//PAkWtxvFuACUDn5LNcyA8bT7//PGa2J1HNc1Ty99qAKF2Wv6ay70w2M4mqtsy/f0t3H57t\n2BW5+yfufqy7bwL8BXgo+RtX9/uvTYySQQmg4bse2N/MdkyuVu8Ark+uhDGzdmZ2QLLvMOBEMxtg\noa2ZbevunxENaNeZWavkua0r65/t7u8QV6r/AJ5294XJU2OARWb2WzNb18wam9n2Zta7Fp9nKHCC\nmZ1pZi3NbAMzuwLYDbg0Yz8DLjWzJklJ6BDgwRr8DrJpRTRKLjSzDYFLKjw/F1jT7o5PAjuY2WHJ\n7+NMor2kMhcDu5vZVWa2WRJ/FzO728xaJ/tU1RumFbDQ3RebWTfgF9XE14poE/nSzJqa2R9YvbfR\nP4DLzaxLEsuOZrZB8txnrP57uQM43cz6Jvu2MLODzaxFNTGQ7H+cmZWXPL4mTvxlRPtKGZWf5J8A\nNjezs5PP0LI8BqmaEkDDs9rVkLt/QZQC/pA8NJRoeH3dzL4iTuzbJPuOBU4kksbXRJ1rh+R1PyUa\n0D4k6oD/DWxeRRz3AfsC92bEUkb0AulJ1O1+TpwUWmd7g6wfzn00URVzJHH1PBXYCdjD3TOrIuYQ\nVUafElVQp7n7pOS58yv7HVTieqLa6AuiIfKpCs//DRhsZl+a2fXlodbw83xJlGiuTt6/G9FQvqyS\n/acQbRadgA/MbAHxtxhL9ILJduzM+78GjjOzhURD8QNV7AvRIPwMUeU2lSg9ZVZZXQs8CDxrZl8T\nCaG85HUp8C8zm29mR7n7OOBU4KakGvBjoq6+smNXfGxg8pkXEg35R7v7MndfAvyR6Ao8v+LJ3d2/\nAfYHDiOS0sdASZZjSQXl3fxy8+Zmw4gTwtzKqhPM7Aai0exb4GfJ1aVIpcysP3C3u3eoduc8kwzm\nmkV0W30p7XikuOW6BHAncTWXlZkdRNRVdgVOI3qpiBQUMzvAzNqYWTOiTh6im61IqnKaANx9FFFM\nr8wg4F/Jvm8AbcrrPUUKSD+il8rnRFvFIHfPWgUkUp/S7gbajtXrG2cnj83NvrsIJFUnDab6x90v\nZfUGbJG8oEZgEZEilXYJYDar95HeMnnsB8wsd63VIiIFzN2zdh2ujxJA+ajSbB4nuh9iZrsBX7l7\npdU/azTcedEifPx4/Kmn8GHD8Msvx085Be/fH+/YEV9vPXyXXfCjjsKHDsVvvBF/5BF83Dh86dI6\nG3J98cUXpz7sW/GnH0cxxt+QY8+Mf/BgZ/jw9OOp7VaVnJYAzOw+oj/uRmY2gxjk0jTO5X67uz+V\nDBSZTHQDPbHOg2jZEnr0iC2bJUvgnXdg2jSYMgU+/BCefx4++QQmT4auXaFdO2jfHrbcMn62bw+d\nOsHWW4NVNSZHRApFWRk0KrBK85wmAHc/tgb7nJnLGKq13nrQr19sFS1ZAhMnwuzZMGsWzJwJpaXx\nc8IEWLkSevaMxNCuXfzs3DkSRLt2kXxEpCAoARSb9daDnXeOrSL3SArvvx8JYvZsGDsW7r9/VcJo\n1iySwY47UtK8OTz2GGy1FXTsCOuvX+8fZ22UlJSkHcJaUfzpacixw6r4CzEB5HQkcF2q3cSMecAd\nFiyIaqUJE6Ka6aOPoqpp2jRYZ51VyWCrrVZtnTqtShCqXhLJG4cdBqecEj8bEjPDK2kEVgkgV8xg\nww1j690bjj9+1XPuMH8+TJ++KiFMmwYjR8LUqfE4QIcOkQzKf26zDWy/fVQzraM/nUh9KsQSgM4i\naTCDjTaKrVev7Pt89RXMmBHJYMaMSBB33hmN1HPmRFLo0AG6dYvG6PKG6W23hRY1mnxRRGqhrKzw\nCuVKAPlq/fVjy9Z76ZtvViWHCROimqm0NH5+8kk0QLdtGwmie3fYbjvYccdopG7atN4/ikghcFcJ\nQPJBy5ZxUt9uOzjooNWfW7w4eil9+mlUJ02YADffHL2Z5s2DPfeMZLDNNqu2tm0L79JGpI6pCkjy\nX/PmUQ207bYwYMDqzy1cGCWFiROjx9K998LHH0eJomvXSAY9esAuu0TJoX37wvvGi6whJQBp2Fq3\nji4MFbsxfP01TJoUyeDNN+Gvf42Sw4IF0ejcrVs0Qu+2G/TtC5tskk78IikqxASgbqBSuUWLovvq\n5MnRvvDqqzBuXIyP6NIlSgnbb79q23xzVSVJQZk6NfpdfPgh/Pa30VGvoQ1rqKobqBKA1E5ZWQx0\nmzw5SgkffBDb++9D48bR5XWPPeK/ZNdd4zGRBujii+HWW1f1o+jXDwYPhnXXTTuy2lECkNwrHxk9\nZgyMHg3PPRelhm7dYIcdIjH07Blbq1bVv59IioYPhyFD4tpmu+3SjmbtKAFIOhYtirLze+9Fo/P4\n8XG7ffsoHfTtG+0KPXpAkyZpRysCwFFHwcMPx8wtDW3UbzZKAJI/Vq6MRDBuHLzxBrz+egxy69kz\nksFuu0X10cYbpx2pFKkmTWI4TYcGs+Zc1ZQAJL8tXBglhDfeiOqjV16JUdL77BNdUnv3jq3QumBI\n3njgAZg7F0aNgmeeiY5xhdKfQQlAGpayspg478UXoxfSq6/G3EkDB0bV0e67x2C2QvkPlVR9/TVs\nuimcfnrU9x94YMzLWCiUAKThmzw5Fup5/XV44YWY0mL33aOE0KsX7LQTtGmTdpTSAE2dGmMmp01L\nO5LcUAKQwvLdd9EFdezYGLj29tvRrtCjB/zP/8Dee6uEINWaORMuvTS+QsuXR3+FQqQEIIXvu+/g\n7rvhtdfg2Wfjfo8e0Zfv4IPVqCxAfC3+8IcY9D5mTPTtHzIk+vp365Z2dLmhBCDFxT2mthgzBh56\nKIZvbr89HHpojOTp2jXtCCUln3wSJ/q7747psnbYofB7ICsBSHFbtgxeegkefTQSQvPmsN9+MedR\nA1uaU9bOmDFwxhlR7VMsqkoA6lcnha9ZMzjgALjlFvj886gimj8/Onofemj0/Vu5Mu0oJUdmzYJH\nHoGf/hR+8pNYpE+CSgBSvBYuhH/8A+66KzqBDxgAv/99VBepAbkgfP551PjtsQf06RP9A3r3Lq4O\nY6oCEqnO1Klw5ZUwYkTM9lVSEnMC7L9/4VcSF7DXX4ezz46qn2KlKiCR6nTqBLfdFkttjhgRrYNX\nXBGrpR17bIxOlgbj3Xdj+uZDDokhIpKdSgAiVZk6Fe6/PzqMd+8ecxUdemgsxamprvPGpEnw1FPw\n2Wdxu7Q0/kTnnhvTTBVzjZ6qgETW1rJlMdhs1KhoN1ixAgYNgsMPj3WWJVXnnRc9e/bbLwptBx0U\nP0UJQKRuucfEdf/9L/zpTzHq+E9/ivYClQpScdJJ0dB78slpR5J/1AYgUpfMoiro0ktj3eRTT426\nhk6d4MgjY7zBd9+lHWVRWLEi5u4fP15DOtaESgAidcE95icaPToak2fNgl/9Klohu3VTT6I69Oqr\nMbj7rbdiwthOnSIfX3ghtGuXdnT5R1VAIvXttdfgX/+KQWdbbRWTzK+zTtpRNWiTJ8dJ/sEH4cwz\nYyLYPfeEzp2Lu5G3OkoAImlZvDhaJqdMgeOOg7POgo4ddcaqpaVL40p/113h+usLa77+XFMCEEnb\nxIlwzTXw5JPRUHzaaXDEETHqWH5g0aIoRE2dGss/PPkkbLFFTNnctGna0TUsSgAi+eSNN2JswT//\nGaOU7r1XldcZpk+PK/yePWPbccfo5aNG3jWjBCCSjxYvht/8JuYiuuiimJi+yOs2li2Dxx+Hv/89\nBnPJ2lM3UJF81Lx5nOleeCG6teyyS3RmLy2NXkUFbtmymJ//ySdj2oZddomZOs89F/r3Tzu64qAS\ngEi+WLkShg2LSem22w5+/GM4+mho2TLtyOrUO+/EtMyzZ8dCbdtuGwWfwYNjha4C+7ipS7UKyMwG\nAtcTpY1h7n5VhedbA/cAHYDGwF/d/a4s76MEIMVhwYKYbuLaa+Gcc+KSuAC6kK5cGUsx3ngjfPll\n/Nx007SjKnypJQAzawR8DOwLfAqMBYa4+8SMfS4AWrv7BWa2MfARsJm7f1fhvZQApLg89RQMHQrz\n5sEpp8Qlco8eaUdVa998E7NmXHhhVPvsuGP0hj3wwLQjKw5VJYBcX1b0BSa5+/QkkAeAQcDEjH0c\naJXcbgV8WfHkL1KUDj44trFjYfjwmGto2LCYjTTPffMN3HBDzK5dWho9Xy+6CI4/Pu3IJFOuE0A7\nYGbG/VlEUsh0E/C4mX0KtASOznFMIg1Lnz6x7bUX/Oxn0T5w990xoCyPTJoEd94ZBZdJk2JNnUMO\nibx16KGxMqfkl3yoWDwQeNvd9zGzzsBzZtbD3b+puOMll1zy/e2SkhJKSkrqLUiR1A0aFPUml10W\n9SgvvQQ775xqSJ98Ai+/HCtv3X57tFlffXXkqLZtNeA5DaWlpZTWsA9trtsAdgMucfeByf2hgGc2\nBJvZE8Cf3X10cv8F4Hx3f7PCe6kNQKTcr38Njz0GN98cl9j17KOP4kQ/bFj06OnVK+bgb4BNFAUv\nzTaAsUAXM+sIzAGGAMdU2Gc6sB8w2sw2A7YBpuQ4LpGG7fLL4zJ70KBIAKefDgMH5vSSe+ZMeO+9\n6Ld/881wxhlR1dOlS84OKTlWX91A/8aqbqBXmtlpREngdjPbArgL2CJ5yZ/d/f4s76MSgEhF8+dH\nA/E110SbwIsv1snbfvddTGD69tsxRm36dPjqq7jSb98+eqZqGqOGQVNBiBS6lStjzeLmzWOZysMP\nj3mGalEiKCuLNXU/+ijyyccfw4AB0YDbpQtss01BDEcoOkoAIsVg5cpYLeWxx+C+++Cqqyrtd7lw\nYVTnjB0bE5WWlsZVfuvWsNlmMRXDSSel3sYsdSDNNgARqS+NG0dX0b32itlF//a3KAEcdxyY4R71\n+H//O9xxR5zoBwyIgsPJJ0eVTvPmaX8IqU8qAYgUoKXzF/PB7+9n23t+z1ubDOS8pjfy3tSWLFsG\nRx0F558PvXunHaXUB1UBiRQgd1i+HKZNi+qcl1+OOXaWLIExY2L+/MF7zuGk0p+yQeOFNP73A6y3\nXae0w5Z6pgQgUiCmTo215++6K0bcrlgRA646doyan223hXXXjdWz+vZN1qJfujTaA266CU49NVYj\ny7NRxJI7SgAiDURZWUyY9s03sSTiI4/At9/GBKHjxsVyiN27r5oVonPnWrz5u+/GjGyffhorkm2z\nTa4+huQRJQCRPPTpp/DmmzFh2pw5sVLkmDFxVb/eetGOe8IJ0KEDtGkTJ/vOnddyrNeKFbEK2f/9\nH3zxhRbYLQJKACJ5oKwMPvggBlg98UQMsurdG7p1g002iS6Xu+0WvXNybu+9Y5rpq66Cww6rhwNK\nWpQAROrZF1/EgKrJk+NEP2lS1MA0axbn3qOOihP+FltU/145UVYGDzwQbQKHHRbdgnr2TCkYySUl\nAJEcWbgQPv88TvJvvhkX1W+9FbNkdu8e1ezdu8fknVtuGVMp5JUZM2JGt9tug4svhp//PMYTSMFQ\nAhBZC2VlMVp28eKYKmHcuNimT4/pEjbccFXDbPv2sPnmsO++DWz++5deimW6Nt88Gh7694+MJQ2e\nEoBIFl99FVfqn30WV/FffRX96d99NyZDK99mzIiT/OabwwYbRE1Jz54xcrZ9e9hoo7Q/SR1ZuhTu\nuSf6l44cCVdcAb/8ZdpRyVpSApCitmIFjB4dJ/kFC+D992ON2pkzoVMn2GqrOLm3bBkzLPfqFX3p\n11knto02go03LrLFTa68MjLilVemHYmsJc0FJAVv3jx49tnoM79sWQyWeuKJGBm7fHnUxW+3XXSn\n7NIlqrwHDIBGjdKOPE+1bh11XFLQlACkQViyJKpi5s6NKpoPP4RZs+Drr+PK/rPPYNddo0qmadPo\nO//UU3GF36yZpjGutfXXj5lF33wTdtmlyIo/xUP/FpI3ysqi++SSJVFNM25c1MlPmRL957fcMqpq\nunePqe733z+u6DfbLAZNrb9+2p+ggPz4x1ECOOSQWOfx0kuhXz8lggKjNgCpV+5xXvn005ju4P33\nY0766dPjhN+yZUxJ3KVLXNH36AFbbw077BDPST1bvBhuuQX++MdIAP/+t+aMbmDUCCz1bsmSuJp/\n7z0YPz5O8vPmxZV8o0Zxgm/ePK7m+/aNKps+fXSSz1vLl8eAsa+/jmQwYIBKAw3EWicAM2sKdHD3\nyXUdXE0pAeSnWbPipD59ekxLPGXKqmmJN9ooGl932SVO8pttFtMebLaZzh0N0sKFcO210YL+r39F\nHZzkvbVKAGZ2CHAt0NTdO5lZT+Bid/9x3YdaZRxKAClavjyu5t97L0a6vvlmnPBXrIj6+I4doztl\n+/YxhqhjR/WwKVh33QUnnhiZfq+90o5GqrG2CWAcsC8w0t13Th57z913rPNIq45DCaCerFwZc9iM\nHw+PPw6vvx6zVbZtC7vvHnPO77FH9LDZckud6IvSBRfAK6/E+pI77ZR2NFKFtR0HsMLdv7LVy+w6\nExcQ9xgR+/DDcVH31lsxHXHbtrGc7AUXxNV9ixZpRyp54/zzo9vWQQfB7Nmq02ugalICuBP4L3Ah\ncDhwNtDC3X+e+/BWi0MlgDo2bRqMGAE33BCNtvvtB0ccEVMc1GqhESlO7tFyP3dujBg++mgNuMhD\na1sF1AL4A3BA8tAzwKXuvqROo6yGEsDaW748Fhx58skovU+YEO14Z54ZVTq6iJM18uSTMHRoLG5w\nxx1KAnlmbRPAEe7+SHWP5ZoSQO25R919aWlMk/Dww9C1KwwcGA21JSUNbMZKyV+zZsHBB8ciB8OG\nKQnkkbVNAG+5e68Kj41z913qMMZqKQHU3MKFcVF25ZVRPbvbbnGlf+yxsfKUSE7MmwcHHhiDOy69\ntA7Wr5S6sEYJwMwOBAYCxwL3ZjzVGtjJ3fvUdaBVUQKo3MqVUaXzyiux3ODbb0eVzkknRbWs/gel\n3syZA+ecE8XOc86JxmJJ1Zr2AvoceB9YCnyQ8fgiYGjdhSdrYsmSmDrhlVdiTM7KlTF9ywUXRGOu\nqnYkFVtsEUtNDhsWc3BLXqs0Abj728DbZnavuy+tx5ikEmVlMGoUvPgi3HRTzJHTpw/ceCPss4/6\n40se2XDDmKZV8lpNWmramdkfge2AdcsfdPdtchaVfG/WrJjX/oEHYmBWhw7RgDtqVEyrIJKXdtop\nSgBnnhnF0nbt0o5IsqjJNeNdwJ2AAQcBDwLDcxhT0Vu2LKp1Bg2KWTBffBHOOw8++iiSwA036OQv\neW7rrVetrdm1a8wGKHmnRlNBuPsumdM/mNmb7t67XiJcFUdBNwJ/8w08/zw8/TT85z+xetUxx8Dg\nwZrnXhq4006LEsEZZ6QdSVFa26kglplZI+ATMzsdmA20qssAi9XixfDII/Dcc7F6VY8e0ZV65Mjo\nSSdSELp3j6qg9dePvsiSN2pSAtgV+BDYAPgj0Aa4yt3rtYm/UEoA330XPXeeey7m0erXD370Izj0\n0JhBU6QgvfxyjEC87TY4/vi0oykqdb4gjJm1c/fZax1Z7Y7ZYBPAZ59Fl82RI+Huu6M9bP/9Y6K1\nHj3Sjk6knowdG0tMnnpqTB3RShUJ9WGNq4DMrA/QDhjl7l+Y2fbA+cA+wJY1PPhA4HqiwXmYu1+V\nZZ8S4DqgCTDP3QfU5L3zlXvMlz9iRPTWeecd6NUrFkUpr97R4CwpOn36xACxE0+Mf4Arrkg7oqJX\n1UjgPwNHAuOBTsATwBnAVcAt7r642jePtoOPifUEPgXGAkPcfWLGPm2AV4ED3H22mW3s7l9kea+8\nLwHMmgX//OeqgVmHHx5X+n36RLdoESHWGB4/Hm69Ne1IisKalgAGEVM+LDGzDYGZwI7uPqUWx+4L\nTHL36UkgDyTvOzFjn2OBh8urlLKd/POVeyxqPnx4zL0zbRr85CexYNKuu2pglkhWrVrBokVpRyFU\nPQ5gafmUz+4+H/i4lid/iOqjmRn3ZyWPZdoG2NDMRprZWDPL+xaisjJ44w3Yc89owF2yBG6+OaZB\nue22aNjVyV+kEq1axT/QjBlpR1L0qioBbG1m5VM+G9Ap4z7ufkQdxtCLaFdoAbxmZq+luQB9Nl98\nEStlPf10jMw1g5//HM49V/X5IrWy225RRO7YET7+OAaKSSqqSgBHVrh/0xq8/2ygQ8b9LZPHMs0C\nvkjmG1pqZi8DOwE/SACXXHLJ97dLSkooKSlZg5Bqrqws5tC/775ou+rZM67u77kn6vV14hdZA5tt\nBvcmEwyffDJcf330kpA6UVpaSmlpaY32XaNuoDVlZo2Bj4hG4DnAGOAYd5+QsU834EZi6ulmwBvA\n0e7+YYX3qrdG4OXLowfPZZdFVc6vfhVVPWrIFalD33wDd94JZ58N770X855InVvbkcBrzN1XmtmZ\nwLOs6gY6wcxOi6f9dnefaGbPAO8CK4HbK57868vixfDXv8I118R38Xe/i0ZdXemL5EDLlnDWWbGC\nUa9eMHFizCEk9SanJYC6lKsSwPLlsVzif/8L998fc+lffjlsu22dH0pEKjNoELRtGwPENCS+TlVV\nAqhxXxUzK6glRpYtg7/8Jdav+NOf4udbb8GDD+rkL1LvLroo1g/YbrsYKTylth0OZU3UZC6gvsAw\noI27dzCznYBT3P2s+ggwI446KQHMmAF33BFjUPr1iySgqZVF8sTMmVEt1LRpXI3JWlvbEsANwKHA\nlwDuPh5ocFM1TJkSs9LusEN06XzlFXj8cZ38RfJK+/Zwyinw7bdpR1IUapIAGpWP5M2wMhfB5MKK\nFXHF37s3bL45TJgQI9F14hfJU82bx+hKybma9AKamVQDedKt8yxifp+89vXXMS/PdddF/f5TT8X4\nExHJc+tMepw0AAARsElEQVStF13yJOdqUgL4BXAuMaBrLrBb8lhemjAhVtLaait46aUYbzJ6tE7+\nIg2GEkC9qUkC+M7dh7j7xsk2JB8nbCsrg/PPjymXu3ePLsUPPwy7765+/CINStu2cSU3ZEjakRS8\nmlQBjTWzj4iF4B9x97yaxm/mzFgk/f77YeONYdKkqOsXkQZq001hwYL4h375Zdh777QjKljVlgDc\nvTNwBbAL8J6Z/cfM8iI1f/55DCBcsiQGc73zjk7+IgWhZcvoq33IIfDLX8bc61LnajUSOFkX4Hrg\nOHdvnLOosh97tXEAS5fCPvvExcGVV9ZnJCJSbz7/HDp0iKX1evdOO5oGaa3GAZhZSzM7zsxGEJO5\nzQN2r+MYa+3mm6F16xjFKyIFatNNY5TwEUfET5UE6lRNRgJPA0YAD7r7K/URVCVxrFYC2GWXmLRt\nQIMbkiYiteIOY8ZASUmUCLSYfK1UVQKoSQJo5O5lOYmsFjITwLRpMR//nDmwTk7nMxWRvNG2LYwd\nC+0qLiooVVmj6aDN7K/ufh7wsJn9IEvU4YpgtfboozF5oE7+IkWkTRuYN08JoA5VdQodnvxck5XA\ncurRR6PPv4gUkYEDoyF4+nQlgTpSaSOwu49JbnZ39xcyN6B7/YT3Q3Pnwrvvwr77phWBiKTiuutg\nzz3h/ffTjqRg1GQk8ElZHju5rgOpqccfjwuBdddNKwIRSc2uu8Lvf592FAWj0kZgMzsaGAKUACMz\nnmoFrOPu9dr/prwReOBAOOmkWKpRRIrM0qUxV9CTT8KBB0Ljeh2O1CCtUS8gM+sEdAb+DAzNeGoR\n8La7r6jrQKtiZv7NN84mm0RPsJYt6/PoIpI3bropFu/u3TvmgFFvkCqtVTfQfGFm/sorzumnqwpQ\npOjNmROTxX35Jbz+uq4Iq7BGI4HN7KXk5wIzm5+xLTCz+bkKtipTp0KPHmkcWUTyyhZbwMiR0Rg4\ndmza0TRYVTUCl9fxbwxskrGV3693n30Gm22WxpFFJO80agT77w+DB8N558UYAamVqrqBlo/+bQ80\ndveVQD/gNKBFPcT2A5MnQ+fOaRxZRPLSn/8cDcLPPRdVQVIrNekG+h9iOcjOwJ1AV+C+nEZViYUL\nYYMN0jiyiOStXXeNK8MV9dovpSDUJAGUJT1+jgBudPdzgFSG4S1apHmgRCSLpk1h+fK0o2hwarQk\npJkNBo4Hnkgea5K7kCo3f74SgIhk0bSpSgBroKYjgQcAf3H3Kcn4gPtzG1Z2kydD165pHFlE8lqT\nJioBrIFqR1C4+/tmdjbQxcy6AZPd/Y+5D+2H5s/Xko8ikoVKAGuk2gRgZnsBdwOzAQM2N7Pj3X10\nroOraNNNNehPRLJQCWCN1OR0eh1wsLt/CGBm3YmEUO8LdHboUN9HFJEGQY3Aa6QmbQBNy0/+AO4+\nAWiau5Aqp0FgIpJV8+YxHkDzxNRKTRLAW2Z2q5ntmWy3AG/nOrBslABEJKtzzol1Yg84AJ5/Pu1o\nGoyaJIDTgSnAb5NtCjEauN5tvHEaRxWRvLfhhnDVVXDZZfCjH8G//512RA1ClW0AZrYjMSX0o+7+\nl/oJqXLrr592BCKSt8zglFOiPeDqq2HAAF01VqOq2UB/R0wDcRzwnJllWxmsXikBiEi1jjwy+ov3\n65d2JHmvqiqg44Ae7j4Y6AP8on5CqpySuYhUq0ULePRRmDULhg9PO5q8VlUCWObu3wK4+7xq9q2U\nmQ00s4lm9rGZnV/Ffn3MbIWZHVHZPu1SmYFIRBqcxo0jCZx8Mtx7b9rR5K2qloT8Cnix/C4xHUT5\nfdy90hN1xns0Aj4G9gU+BcYCQ9x9Ypb9ngOWAP/n7o9keS+fMcNp374Gn0pEBOCWW+Cxx+Dpp9OO\nJDVVrQhWVSPwkRXu37QGx+4LTHL36UkgDwCDgIkV9jsLeIioaqqUuoGKSK1svz3cl8rs9Q1CpQnA\n3V+og/dvB8zMuD+LSArfM7O2wOHuPsDMVnuuoqapDD8TkQarRQv49tu0o8hba1SvX8euBzLbBrIW\nVUREaq1FC1i8OO0o8laup1abDWTO4LNl8lim3sADZmbEesMHmdkKd3+84ptdcskl398uKSmhpKSk\nruMVkUKy0Ubw0Ufwt7/BGWfEpHEFrrS0lNLS0hrtW2kj8A92NGvm7stqE4iZNQY+IhqB5wBjgGOS\n+YSy7X8nMKKyRuCaxioi8r1XXoHTT4cLLoD/+Z+0o6l3VTUCV1sFZGZ9zew9YFJyfyczu7EmB04W\nkj8TeBb4AHjA3SeY2Wlm9vNsL6nJ+4qI1Nhee8EvfgE//zl88EHa0eSVaksAZvY6cDTwH3ffOXns\nfXffoR7iy4xDJQARWTPuMVfQlVfC+edHaaBIrFUJAGhU3o0zw8q1D0tEpJ6YwdChcOON8EJddHAs\nDDVpBJ6ZdM/0pE7/LGJwl4hIw9KuHazU9Wu5mpQAfgGcS/TmmQvsRh7MCyQiUmtNmmjt4Aw1WRT+\nc2BIPcQiIpJb66wD332XdhR5oyaLwt9Blt457p6tF4+ISP5SCWA1NWkDyFxfbV3gx6w+vYOISMOg\nBLCamlQBrTahtpndDYzKWUQiIrmiBLCaNZkLqBOgeTlFpOFRG8BqatIGsIBVbQCNgPnA0FwGJSKS\nEyoBrKa6ReEN2IlVE7iVaTiuiDRYTZrA11/HFNEtWqQdTeqqrAJKTvZPufvKZNPJX0Qark02gZ12\ngvbt4bzzYO7ctCNKVU3aAN4xs51zHomISK41awalpfDGGzBpUkwNUcQqTQBmVl49tDMw1sw+MrO3\nzOxtM3urfsITEcmBrl3hsMNg9GhYsCDtaFJTVQlgTPLzMGBb4GBgMHBU8lNEpOHab7/42akTXHxx\nurGkpNLpoM3s7fLpn/OBpoMWkZx49NFIAO++m3YkOVHVdNBV9QLaxMzOrexJd792rSMTEUlbly6x\nXkARqioBNAZaokXaRaSQNW0Ky5enHUUqqkoAc9z9snqLREQkDU2bwrJaLXdeMKpqBNaVv4gUvmbN\nirYEUFUC2LfeohARSUsRVwFVmgDcfX59BiIikgolABGRIrXuurB0KRx6KNx1V9rR1CslABEpbk2b\nwkcfwfHHw9lnx1YkKh0Ilm80EExEcu6dd6BfP1iyJO1I6kxVA8GUAEREypWVrVo0plFhVJBUlQAK\n4xOKiNSFRo2iW+jSpWlHUi+UAEREMq23XkFVAVVFCUBEJFNZGTz/fFHMD6QEICKS6fe/hyFD4L33\n0o4k55QAREQy/frXsNdeRbFQjBKAiEhFrVvDwoVpR5FzSgAiIhW1agWPP552FDmnBCAiUtEJJ8BD\nD6UdRc5pIJiISEUrVsQUEd9+C82bpx3NWtFAMBGR2mjSBPr0gWeeSTuSnFICEBHJpn9/uOWWtKPI\nqZwnADMbaGYTzexjMzs/y/PHmtn4ZBtlZjvmOiYRkWr97nfw/vvw05/Chx+mHU1O5DQBmFkj4Cbg\nQGB74Bgz61ZhtynA3u6+E3AFcEcuYxIRqZENNoDx46FbNxgwAI44ItoGCkiuSwB9gUnuPt3dVwAP\nAIMyd3D319396+Tu60C7HMckIlIzm2wSJYEZM2Kq6FdfTTuiOpXrBNAOmJlxfxZVn+BPAf6b04hE\nRGqrWTPo2DHmCSog66QdQDkzGwCcCOxZ2T6XXHLJ97dLSkooKSnJeVwiIt9rAF3RS0tLKS0trdG+\nOR0HYGa7AZe4+8Dk/lDA3f2qCvv1AB4GBrr7J5W8l8YBiEh69tkHLrwQ9t037UhqJc1xAGOBLmbW\n0cyaAkOA1cZXm1kH4uR/fGUnfxGR1FnWc2iDltMqIHdfaWZnAs8SyWaYu08ws9Piab8duAjYELjZ\nzAxY4e59cxmXiMgaKbBaiJy3Abj708C2FR67LeP2qcCpuY5DRGStFGAJQCOBRURqqsBKAEoAIiI1\noRKAiEgRUwlARKQIqQQgIlLEVAIQESlCKgGIiBQxlQBERIqQSgAiIkVMJQARkSKkEoCISBFTCUBE\npAiZKQGIiBQlVQGJiBQxlQBERIqQSgAiIkVMJQARkSKkEoCISBFTCUBEpAipBCAiUsRUAhARKUIq\nAYiIFDGVAEREipBKACIiRUwlABGRIqQSgIhIEVMJQESkCGk6aBGRIqUqIBGRIqYSgIhIEVIJQESk\niKkEICJShFQCEBEpYioBiIgUIZUARESKmEoAtWNmA81sopl9bGbnV7LPDWY2yczeMbOeuY5JRKTW\nVAKoHTNrBNwEHAhsDxxjZt0q7HMQ0NnduwKnAbfmMqa0lJaWph3CWlH86WrI8Tfk2KFC/CoB1Epf\nYJK7T3f3FcADwKAK+wwC/gXg7m8AbcxssxzHVe8K6p+gAVL86WnIsUNG/CoB1Fo7YGbG/VnJY1Xt\nMzvLPiIi6VMJQESkCBXgZHDmOfxAZrYbcIm7D0zuDwXc3a/K2OdWYKS7D0/uTwT6u/vcCu9VWL95\nEZF64u5Z66/WyfFxxwJdzKwjMAcYAhxTYZ/HgV8Cw5OE8VXFkz9U/gFERGTN5DQBuPtKMzsTeJao\nbhrm7hPM7LR42m9396fM7GAzmwx8C5yYy5hERCTktApIRETylxqBRUSKlBKAiEiRUgIQESlSSgAi\nIkVKCUBEpEgpAUjqzGyYmc01s3er2e9sM/vQzO6uYp/+Zjai7qOsPTP7kZn9Nrk9KHMiRDO71Mz2\nqcdY+ptZv/o6njQMSgCSD+4kZoytzi+A/dz9+Gr2y4u+ze4+wt3/ktw9nJgRt/y5i939xbo8npk1\nruLpEmD3ujyeNHxKAJI6dx8FLKhqHzO7Bdga+K+Z/a+Z9TGzV81snJmNMrOuWV7T38zeNrO3kv1a\nJI//2szGJOtPXFzJ8RaZ2bVm9r6ZPWdmGyWP9zSz15LXPmxmbZLHzzazD5LH70seO8HMbkyuvA8D\n/pLE0snM7jSzI8zsQDN7sELMI5LbBySf8U0zG25mzbPEOdLMrjOzMcDZZnaomb2efN5nzWyTZCT+\n6cCvkuPvYWYbm9lDZvZGsik5FCN316Yt9Q3oCLxbzT5TgA2S2y2BRsntfYGHktv9gceT248D/ZLb\nzYHGwP7AbcljBowA9sxyrDJgSHL7IuCG5Pb48v2BS4Frk9uzgSbJ7dbJzxMyXncncETG+98JHJHE\nNA1YL3n8ZmK6lI2AlzIe/y1wUZY4RwI3Zdxvk3H7ZODq5PbFwLkZz90L7J7cbg98mPZ3QFv9b7me\nC0ikLlmyAawP/Cu58neyT2syGrjOzO4FHnH32WZ2ALC/mb2VvFcLoCswqsJrVwLlV+b3AA+bWWvi\nBFu+7z8z9hkP3Gdm/wH+U9MP5DFdytPAj8zsYeAQ4DdElc12wGgzM6AJ8FolbzM843b7pESxRfKa\nqZW8Zj+ge/LeAC3NrLm7L65p7NLwKQFIXjKzLYmrcwdudffbK+xyOfCiux+RVHGMrPge7n6VmT1B\nnFRHmdlA4qT/Z3e/o5YhlbcrVDYp4SHA3kRVz4VmtkMt3ns4cCZRDTbW3b9NTszPuvtxNXj9txm3\nbwSucfcnzaw/ceWfjQG7eizUJEVKbQCSLzKv7nH3We6+s7v3ynLyB2hNVLtAJRMImtnW7v6BR0Ps\nm8C2wDPASRntAW3NbJMsL28MHJXcPg4Y5e4Lgflmtkfy+PFENQ1AB3d/CRiaxNaywvstSh7P5iWg\nF3AqsWoewOvAHmbWOYmzebZ2jixaA58mt0+o4vjPAv9bfsfMdqrBe0uBUQKQ1CWNpq8C25jZDDOr\nbEbYzN49VwNXmtk4Kv8e/8rM3jOzd4DlwH/d/TngPuC1pNvpv/nhyRriqrqvmb1HVMdcljx+AnBN\n8p47AZeZ2TrAPWY2HhgH/C1JFpkeAH6TNM52yvws7l4GPAEMTH7i7l8APwPuT973VSKBVfU7gWiX\neMjMxgLzMh4fAfy4vBEYOBvobWbjzex9Yj1uKTKaDVQkCzNb5O6t0o5DJJdUAhDJTldGUvBUAhAR\nKVIqAYiIFCklABGRIqUEICJSpJQARESKlBKAiEiRUgIQESlS/w/fhk21wz6W/AAAAABJRU5ErkJg\ngg==\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x137af5c0>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"fig, ax = plt.subplots(1)\n", | |
"plt.plot(roc['tpr'])\n", | |
"plt.plot(roc['1-fpr'], color = 'red')\n", | |
"plt.xlabel('1-false positive rate')\n", | |
"plt.ylabel('True Positive Rate')\n", | |
"plt.title('Receiver Operating Characteristic')\n", | |
"ax.set_xticklabels([])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Quick Insights\n", | |
"1.The optimal cut off point from the above graph can be deduced to be 0.06\n", | |
"\n", | |
"2.Anything above this value can be labelled as 1 \n", | |
"\n", | |
"3.Anything below can be labelled as 0\n", | |
"\n", | |
"4.The TPR at the threshold is 76%\n", | |
"\n", | |
"5.The FPR at threshold is 23%" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Rebuilding the random forest model with this additional information" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"prob = clf_forest.predict_proba(X_test)[:,1]\n", | |
"prob[prob > 0.06] = 1\n", | |
"prob[prob <= 0.06] = 0" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 45, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0., 0., 1., ..., 0., 0., 1.])" | |
] | |
}, | |
"execution_count": 45, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"prob" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" precision recall f1-score support\n", | |
"\n", | |
" 0 0.97 0.78 0.87 34184\n", | |
" 1 0.27 0.76 0.40 3594\n", | |
"\n", | |
"avg / total 0.90 0.78 0.82 37778\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print classification_report(y_test, prob)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Conclusion\n", | |
"\n", | |
"As it can be seen from the above table, precision has come down to 26% whereas recall/sensitivity has gone up to 77% from a mere 56% in the previous model\n", | |
"\n", | |
"In case of fraudulent activities the cost of a False Negative is much more expensive than the cost of a False Positive. \n", | |
"\n", | |
"Hence, it is alright to predict more customers as falsely positive of fraud rather than let a fraudulent customer get away with the act\n", | |
"\n", | |
"With more customers predicted as 1, it will decrease precision but increase sensitivity\n", | |
"\n", | |
"The wrongly suspected customers can be made to go through an additional security check either in the form of answering a\n", | |
"personal question or request for SSN or temporarily freezing the account etc. \n", | |
"\n", | |
"At the same time with this new model, customers wouldn't be able to get away with fraud" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment