Skip to content

Instantly share code, notes, and snippets.

@DeepakRavi
Created November 6, 2016 01:22
Show Gist options
  • Save DeepakRavi/fe09a84c0d4df313c443984512461f5a to your computer and use it in GitHub Desktop.
Save DeepakRavi/fe09a84c0d4df313c443984512461f5a to your computer and use it in GitHub Desktop.
Identifying Fraudulent Activities
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Identifying Fraudulent Activities\n",
"\n",
"Company XYZ is an e-commerce site that sells hand-made clothes.\n",
"The task is to build a model that predicts whether a user has a high probability of using the site to perform some illegal activity or not. The only information that is provided is about the user's first transaction on the site"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from pandas import DataFrame, Series\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt \n",
"from sklearn.tree import DecisionTreeClassifier \n",
"from sklearn.cross_validation import train_test_split \n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.grid_search import GridSearchCV\n",
"from sklearn.metrics import classification_report, roc_curve, auc\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.preprocessing import LabelEncoder\n",
"from sklearn.ensemble.partial_dependence import plot_partial_dependence\n",
"from sklearn.ensemble.partial_dependence import partial_dependence"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### This part is data preparation. This takes a couple of minutes to run. This part can be conveniently skipped without losing the flow of the problem"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>signup_time</th>\n",
" <th>purchase_time</th>\n",
" <th>purchase_value</th>\n",
" <th>device_id</th>\n",
" <th>source</th>\n",
" <th>browser</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>ip_address</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>22058</td>\n",
" <td>2015-02-24 22:55:49</td>\n",
" <td>2015-04-18 02:47:11</td>\n",
" <td>34</td>\n",
" <td>QVPSPJUOCKZAR</td>\n",
" <td>SEO</td>\n",
" <td>Chrome</td>\n",
" <td>M</td>\n",
" <td>39</td>\n",
" <td>7.327584e+08</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>333320</td>\n",
" <td>2015-06-07 20:39:50</td>\n",
" <td>2015-06-08 01:38:54</td>\n",
" <td>16</td>\n",
" <td>EOGFQPIZPYXFZ</td>\n",
" <td>Ads</td>\n",
" <td>Chrome</td>\n",
" <td>F</td>\n",
" <td>53</td>\n",
" <td>3.503114e+08</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1359</td>\n",
" <td>2015-01-01 18:52:44</td>\n",
" <td>2015-01-01 18:52:45</td>\n",
" <td>15</td>\n",
" <td>YSSKYOSJHPPLJ</td>\n",
" <td>SEO</td>\n",
" <td>Opera</td>\n",
" <td>M</td>\n",
" <td>53</td>\n",
" <td>2.621474e+09</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>150084</td>\n",
" <td>2015-04-28 21:13:25</td>\n",
" <td>2015-05-04 13:54:50</td>\n",
" <td>44</td>\n",
" <td>ATGTXKYKUDUQN</td>\n",
" <td>SEO</td>\n",
" <td>Safari</td>\n",
" <td>M</td>\n",
" <td>41</td>\n",
" <td>3.840542e+09</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>221365</td>\n",
" <td>2015-07-21 07:09:52</td>\n",
" <td>2015-09-09 18:40:53</td>\n",
" <td>39</td>\n",
" <td>NAUITBZFJKHWW</td>\n",
" <td>Ads</td>\n",
" <td>Safari</td>\n",
" <td>M</td>\n",
" <td>45</td>\n",
" <td>4.155831e+08</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id signup_time purchase_time purchase_value \\\n",
"0 22058 2015-02-24 22:55:49 2015-04-18 02:47:11 34 \n",
"1 333320 2015-06-07 20:39:50 2015-06-08 01:38:54 16 \n",
"2 1359 2015-01-01 18:52:44 2015-01-01 18:52:45 15 \n",
"3 150084 2015-04-28 21:13:25 2015-05-04 13:54:50 44 \n",
"4 221365 2015-07-21 07:09:52 2015-09-09 18:40:53 39 \n",
"\n",
" device_id source browser sex age ip_address class \n",
"0 QVPSPJUOCKZAR SEO Chrome M 39 7.327584e+08 0 \n",
"1 EOGFQPIZPYXFZ Ads Chrome F 53 3.503114e+08 0 \n",
"2 YSSKYOSJHPPLJ SEO Opera M 53 2.621474e+09 1 \n",
"3 ATGTXKYKUDUQN SEO Safari M 41 3.840542e+09 0 \n",
"4 NAUITBZFJKHWW Ads Safari M 45 4.155831e+08 0 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Reading in the data\n",
"\n",
"fraud_data = pd.read_csv('fraud_data.csv')\n",
"ip_address = pd.read_csv('IpAddress_to_Country.csv')\n",
"fraud_data.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lower_bound_ip_address</th>\n",
" <th>upper_bound_ip_address</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16777216.0</td>\n",
" <td>16777471</td>\n",
" <td>Australia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>16777472.0</td>\n",
" <td>16777727</td>\n",
" <td>China</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>16777728.0</td>\n",
" <td>16778239</td>\n",
" <td>China</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>16778240.0</td>\n",
" <td>16779263</td>\n",
" <td>Australia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>16779264.0</td>\n",
" <td>16781311</td>\n",
" <td>China</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" lower_bound_ip_address upper_bound_ip_address country\n",
"0 16777216.0 16777471 Australia\n",
"1 16777472.0 16777727 China\n",
"2 16777728.0 16778239 China\n",
"3 16778240.0 16779263 Australia\n",
"4 16779264.0 16781311 China"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ip_address.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Comparing both the tables\n",
"\n",
"len(fraud_data) == len(ip_address)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(151112, 11)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fraud_data.shape"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(138846, 3)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ip_address.shape"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"country = len(fraud_data) * [0]\n",
"\n",
"for ind, row in fraud_data.iterrows():\n",
" temp = ip_address[(ip_address['lower_bound_ip_address'] < row['ip_address']) & \n",
" (ip_address['upper_bound_ip_address'] > row['ip_address'])]['country']\n",
" \n",
" if len(temp) == 1:\n",
" country[ind] = temp.values[0]\n",
"\n",
"fraud_data['country'] = country"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"fraud_data.to_csv('full_data.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Beginning of the problem"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"data = pd.read_csv('full_data.csv')\n",
"data = data.drop('Unnamed: 0', axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"user_id int64\n",
"signup_time object\n",
"purchase_time object\n",
"purchase_value int64\n",
"device_id object\n",
"source object\n",
"browser object\n",
"sex object\n",
"age int64\n",
"ip_address float64\n",
"class int64\n",
"country object\n",
"dtype: object"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>purchase_value</th>\n",
" <th>age</th>\n",
" <th>ip_address</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>151112.000000</td>\n",
" <td>151112.000000</td>\n",
" <td>151112.000000</td>\n",
" <td>1.511120e+05</td>\n",
" <td>151112.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>200171.040970</td>\n",
" <td>36.935372</td>\n",
" <td>33.140704</td>\n",
" <td>2.152145e+09</td>\n",
" <td>0.093646</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>115369.285024</td>\n",
" <td>18.322762</td>\n",
" <td>8.617733</td>\n",
" <td>1.248497e+09</td>\n",
" <td>0.291336</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>2.000000</td>\n",
" <td>9.000000</td>\n",
" <td>18.000000</td>\n",
" <td>5.209350e+04</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>100642.500000</td>\n",
" <td>22.000000</td>\n",
" <td>27.000000</td>\n",
" <td>1.085934e+09</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>199958.000000</td>\n",
" <td>35.000000</td>\n",
" <td>33.000000</td>\n",
" <td>2.154770e+09</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>300054.000000</td>\n",
" <td>49.000000</td>\n",
" <td>39.000000</td>\n",
" <td>3.243258e+09</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>400000.000000</td>\n",
" <td>154.000000</td>\n",
" <td>76.000000</td>\n",
" <td>4.294850e+09</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id purchase_value age ip_address \\\n",
"count 151112.000000 151112.000000 151112.000000 1.511120e+05 \n",
"mean 200171.040970 36.935372 33.140704 2.152145e+09 \n",
"std 115369.285024 18.322762 8.617733 1.248497e+09 \n",
"min 2.000000 9.000000 18.000000 5.209350e+04 \n",
"25% 100642.500000 22.000000 27.000000 1.085934e+09 \n",
"50% 199958.000000 35.000000 33.000000 2.154770e+09 \n",
"75% 300054.000000 49.000000 39.000000 3.243258e+09 \n",
"max 400000.000000 154.000000 76.000000 4.294850e+09 \n",
"\n",
" class \n",
"count 151112.000000 \n",
"mean 0.093646 \n",
"std 0.291336 \n",
"min 0.000000 \n",
"25% 0.000000 \n",
"50% 0.000000 \n",
"75% 0.000000 \n",
"max 1.000000 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Quick Insights\n",
"\n",
"From the above table, it can be seen that the averge purchase value is around 36 with the median around 35. This indicates that the purchase value is pretty evenly distributed. \n",
"\n",
"Minimum age as entered by the user is 18 with a max of 76 and an average 33 and median of 33. This indicates that the site consists of a lot of young users\n",
"\n",
"The percentage of fraudulent activity is around 9%. This is slightly on the high end and needs to be looked into. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Converting signup time and purchase time to datetime objects\n",
"\n",
"data['signup_time'] = pd.to_datetime(data['signup_time'])\n",
"data['purchase_time'] = pd.to_datetime(data['purchase_time'])"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"count 151112\n",
"unique 3\n",
"top SEO\n",
"freq 60615\n",
"Name: source, dtype: object"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data['source'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"count 151112\n",
"unique 182\n",
"top United States\n",
"freq 58049\n",
"Name: country, dtype: object"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data['country'].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Let's perform feature engineering by creating more powerful variables\n",
"\n",
"1.Difference between signup time and purchase time\n",
"\n",
"2.Different user id's using the same device could be an indication of a fake transaction\n",
"\n",
"3.Different user id's from the same IP address could be a fake transaction"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Difference between signup time and purchase time\n",
"data['diff_time'] = (data['purchase_time'] - data['signup_time'])/np.timedelta64(1, 's')"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Different user id's using the same device\n",
"device_user_count = len(data) * [0]\n",
"device_count = data.groupby('device_id')['user_id'].count()\n",
"device_user_count = device_count[data['device_id']]\n",
"device_user_count = device_user_count.reset_index().drop('device_id', axis = 1)\n",
"device_user_count.columns = ['device_user_count']"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"data = pd.concat([data, device_user_count], axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Number of users' using a given ip address\n",
"\n",
"ip_count = data.groupby('ip_address')['user_id'].count()\n",
"ip_count = ip_count[data['ip_address']].reset_index().drop('ip_address', axis = 1)\n",
"ip_count.columns = ['ip_count']\n",
"data = pd.concat([data, ip_count], axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Keeping only the top 50 countries\n",
"#Replacing everything else with 'Other'\n",
"\n",
"temp = data.groupby('country')[['user_id']].count().sort_values('user_id', ascending = False)\n",
"temp = temp.iloc[:50,:].loc[data['country']].reset_index()\n",
"temp.loc[temp.isnull().any(axis = 1), 'country'] = 'other'\n",
"temp.loc[temp['country'] == '0','country'] = 'other'\n",
"temp = temp.drop('user_id', axis = 1)\n",
"temp.columns = ['country_revised']\n",
"data = pd.concat([data, temp], axis = 1)\n",
"data = data.drop('country', axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>signup_time</th>\n",
" <th>purchase_time</th>\n",
" <th>purchase_value</th>\n",
" <th>device_id</th>\n",
" <th>source</th>\n",
" <th>browser</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>ip_address</th>\n",
" <th>class</th>\n",
" <th>diff_time</th>\n",
" <th>device_user_count</th>\n",
" <th>ip_count</th>\n",
" <th>country_revised</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>22058</td>\n",
" <td>2015-02-24 22:55:49</td>\n",
" <td>2015-04-18 02:47:11</td>\n",
" <td>34</td>\n",
" <td>QVPSPJUOCKZAR</td>\n",
" <td>SEO</td>\n",
" <td>Chrome</td>\n",
" <td>M</td>\n",
" <td>39</td>\n",
" <td>7.327584e+08</td>\n",
" <td>0</td>\n",
" <td>4506682.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Japan</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>333320</td>\n",
" <td>2015-06-07 20:39:50</td>\n",
" <td>2015-06-08 01:38:54</td>\n",
" <td>16</td>\n",
" <td>EOGFQPIZPYXFZ</td>\n",
" <td>Ads</td>\n",
" <td>Chrome</td>\n",
" <td>F</td>\n",
" <td>53</td>\n",
" <td>3.503114e+08</td>\n",
" <td>0</td>\n",
" <td>17944.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>United States</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1359</td>\n",
" <td>2015-01-01 18:52:44</td>\n",
" <td>2015-01-01 18:52:45</td>\n",
" <td>15</td>\n",
" <td>YSSKYOSJHPPLJ</td>\n",
" <td>SEO</td>\n",
" <td>Opera</td>\n",
" <td>M</td>\n",
" <td>53</td>\n",
" <td>2.621474e+09</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>United States</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>150084</td>\n",
" <td>2015-04-28 21:13:25</td>\n",
" <td>2015-05-04 13:54:50</td>\n",
" <td>44</td>\n",
" <td>ATGTXKYKUDUQN</td>\n",
" <td>SEO</td>\n",
" <td>Safari</td>\n",
" <td>M</td>\n",
" <td>41</td>\n",
" <td>3.840542e+09</td>\n",
" <td>0</td>\n",
" <td>492085.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>221365</td>\n",
" <td>2015-07-21 07:09:52</td>\n",
" <td>2015-09-09 18:40:53</td>\n",
" <td>39</td>\n",
" <td>NAUITBZFJKHWW</td>\n",
" <td>Ads</td>\n",
" <td>Safari</td>\n",
" <td>M</td>\n",
" <td>45</td>\n",
" <td>4.155831e+08</td>\n",
" <td>0</td>\n",
" <td>4361461.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>United States</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id signup_time purchase_time purchase_value \\\n",
"0 22058 2015-02-24 22:55:49 2015-04-18 02:47:11 34 \n",
"1 333320 2015-06-07 20:39:50 2015-06-08 01:38:54 16 \n",
"2 1359 2015-01-01 18:52:44 2015-01-01 18:52:45 15 \n",
"3 150084 2015-04-28 21:13:25 2015-05-04 13:54:50 44 \n",
"4 221365 2015-07-21 07:09:52 2015-09-09 18:40:53 39 \n",
"\n",
" device_id source browser sex age ip_address class diff_time \\\n",
"0 QVPSPJUOCKZAR SEO Chrome M 39 7.327584e+08 0 4506682.0 \n",
"1 EOGFQPIZPYXFZ Ads Chrome F 53 3.503114e+08 0 17944.0 \n",
"2 YSSKYOSJHPPLJ SEO Opera M 53 2.621474e+09 1 1.0 \n",
"3 ATGTXKYKUDUQN SEO Safari M 41 3.840542e+09 0 492085.0 \n",
"4 NAUITBZFJKHWW Ads Safari M 45 4.155831e+08 0 4361461.0 \n",
"\n",
" device_user_count ip_count country_revised \n",
"0 1 1 Japan \n",
"1 1 1 United States \n",
"2 12 12 United States \n",
"3 1 1 other \n",
"4 1 1 United States "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Building a Machine Learning Model"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Response Variable\n",
"y = data['class']"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Predictors\n",
"data = data.drop(['user_id', 'signup_time','purchase_time','class'], axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X = data"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"purchase_value 0\n",
"device_id 0\n",
"source 0\n",
"browser 0\n",
"sex 0\n",
"age 0\n",
"ip_address 0\n",
"diff_time 0\n",
"device_user_count 0\n",
"ip_count 0\n",
"country_revised 0\n",
"dtype: int64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Label Encoding string variables\n",
"lb = LabelEncoder()\n",
"X['device_id'] = lb.fit_transform(X['device_id'])\n",
"X['source'] = lb.fit_transform(X['source'])\n",
"X['browser'] = lb.fit_transform(X['browser'])\n",
"X['sex'] = lb.fit_transform(X['sex'])\n",
"X['country_revised'] = lb.fit_transform(X['country_revised'])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Splitting data into train and test dataset\n",
"X_train, X_test, y_train, y_test = train_test_split(X,y)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Creating a pipeline\n",
"pipeline = Pipeline(steps = [('clf', RandomForestClassifier(criterion = 'entropy'))])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"clf_forest = RandomForestClassifier(n_estimators= 20, criterion = 'entropy', max_depth= 50, min_samples_leaf= 3,\n",
" min_samples_split= 3, oob_score= True)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\Deepak\\Anaconda2\\lib\\site-packages\\sklearn\\ensemble\\forest.py:403: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.\n",
" warn(\"Some inputs do not have OOB scores. \"\n"
]
},
{
"data": {
"text/plain": [
"RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',\n",
" max_depth=50, max_features='auto', max_leaf_nodes=None,\n",
" min_samples_leaf=3, min_samples_split=3,\n",
" min_weight_fraction_leaf=0.0, n_estimators=20, n_jobs=1,\n",
" oob_score=True, random_state=None, verbose=0, warm_start=False)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf_forest.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"preds = clf_forest.predict(X_test)\n",
"preds"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.95 1.00 0.98 34184\n",
" 1 0.99 0.55 0.70 3594\n",
"\n",
"avg / total 0.96 0.96 0.95 37778\n",
"\n"
]
}
],
"source": [
"print classification_report(y_test, preds)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.06105041, 0.0794475 , 0.0114642 , 0.01611104, 0.00809526,\n",
" 0.05038098, 0.07786234, 0.34906319, 0.14008505, 0.17718896,\n",
" 0.02925106])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Variable importance\n",
"clf_forest.feature_importances_"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array(['purchase_value', 'device_id', 'source', 'browser', 'sex', 'age',\n",
" 'ip_address', 'diff_time', 'device_user_count', 'ip_count'], dtype=object)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Features used are \n",
"data.columns.values[:-1]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.95492967688425368"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#out of box score\n",
"clf_forest.oob_score_"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Some quick insights\n",
"\n",
"From the above, it is very clear that we are able to predict fraud with a precision of 98% and a recall of 54%.\n",
"This implies of all the times we predicted fraud, we were right 98% of the time.\n",
"Similarly, of all the fraud that has taken place, we were able to correctly identify only 54% of it. \n",
"It is clear that we need to improve our recall rate even if it reduces the precision.\n",
"This is act of balancing false positives and false negatives.\n",
"\n",
"A false positive would imply more checks on a potentially non -fraudulent customer.\n",
"A false negative would imply an act of fraud going undetected. \n",
"\n",
"Thus we need to decrease false negatives, even if it is at the cost of false positives.\n",
"This would automatically improve our recall/sensitivity score."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### ROC analysis"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"prob_score = clf_forest.predict_proba(X_test)\n",
"prob_score = DataFrame(prob_score).iloc[:,0]"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fpr,tpr,thresholds = roc_curve(y_test,1-prob_score)\n",
"#auc = auc(fpr,tpr)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEZCAYAAACervI0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXFWd9/HPNxtr9kASkkAWVlkSthAQpSEoQXgGhXFk\nkUFnEWdEHWdej+LMM0MexxkGX6PDKKM+qIOKIjqCgorIIq0sCQQhrAlJWEI2sgdCSEKW3/PHuU0q\nTXV3dXfdqu663/frVa/UrTp17++GcH73nnPuOYoIzMysePrUOwAzM6sPJwAzs4JyAjAzKygnADOz\ngnICMDMrKCcAM7OCcgIwMysoJwBrGJJekvSGpNckLZd0g6S9W5U5RdK9WZn1km6TdESrMgMlXStp\ncVZuoaSvSBpW2zMyy5cTgDWSAM6JiEHAFOBY4PMtX0o6GfgN8DNgNDABeBJ4UNL4rEx/4LfAEcB7\ns32dDKwBpuYVuKS+ee3brC1OANZoBBARq0iV/ZSS764BvhsR10XEpojYEBH/CMwGZmZlLgPGAu+P\niOeyfa2JiH+NiDvLHlA6UtJdktZKWiHpyuzzGyR9oaTcaZKWlGy/KOmzkp4AXs/e/0+rff+npGuz\n94MkfTu7u1ki6Z8lqRt/V1ZwTgDWkCSNBc4GFmbbewGnAD8tU/wnwHuy99OBOyNic4XH2Re4G7iD\ndFdxMHBvOz9pPffKhVmcQ4CbgbMl7ZPtuw/wQeCHWdnvAW8CE0l3N+8B/qKSOM3KcQKwRvNzSa8B\nLwMr2XVlP4z0731Fmd+sAEZk74e3UaYt5wIrIuLaiHgzu7OY04nf/2dELI+IrRHxMvAY8IHsu+nA\npoiYI2kkKVF8JiK2RMQa4Frgok4cy2w3TgDWaM7L2u1PAw5nV8W+HthJukpvbTSpjR9gbRtl2jIO\neL5roQKwtNX2j9hVqV8E3JS9PxDoD6yQtE7SeuCb7Do/s05zArBG09IHcD+pyeTL2fYbwCxSk0pr\nfwLck72/BzgrazKqxBJgUhvfbQJKRyGVSyytm4T+B2iSNIZ0J9CSAJYAW4DhETEsIoZGxJCIOKbC\nOM3exgnAGtm1wHskHZ1tXwlcJukKSftKGirpi8A0oKWz9kZSZXuLpMOUDJf0eUkzyhzjl8AoSZ+S\nNCDbb8toobnA+7LjjAI+3VHAWdPO74AbgBdKOqJfAe4C/iMbpipJEyW9uyt/MWbgBGCNZber6awy\n/R7wT9n2g8BZwAWkdv4XgcnAOyPi+azMm8CZwHxS5+6rpFFCw4GH33bAiNdJnbF/BLwCLACasq9v\nJA0zfQm4k9TJ22a8JW4itf//sNXnfwoMAJ4F1pHuFka1sQ+zDskLwpiZFZPvAMzMCsoJwMysoJwA\nzMwKygnAzKyg+tU7gEpJcm+1mVkXRETZOaN61R1ARHTpddVVV3X5t7315XMuxsvnXIxXd865Pb0q\nAZiZWfU4AZiZFVQhEkBTU1O9Q6g5n3Mx+JyLIa9zzvVJYEnfIU2XuzLamLRK0ldJ09xuAj4SEXPb\nKBd5xmpm1ogkEXXqBL6BNPdKWZLOBiZFxCHA5aTpbc3MrAZyTQAR8QBpHva2nAd8Pyv7MDA4W/jC\nzMxyVu8+gDGkqXdbLMs+MzOznPWaB8HMzAohdsL6RbB6LqyaC6seh1NmwuiTqn6oeieAZaQl9VqM\nzT4ra+bMmW+9b2pqKuRoADNrINu3wJqnU0W/OqvsVz8Jew2H/Y+F/abA5I/D0EMr3mVzczPNzc0V\nlc19PQBJ44FfRMTRZb57H/CJiDhH0jTg2oiY1sZ+PArIzHqvzetg9RNZJZ9V9hsWwZBDUmW//5RU\n4e8/BfYcWrXDtjcKKO9hoDeRVkcaDqwEriKtaBQRcX1W5jpgBmkY6Ecj4rE29uUEYGY9XwRsfHlX\n803Ln1vWwX6TUwXfUuEPPxL67ZlrOHVLANXkBGBmPc6ObbBu/q4r+pamnL57llzRZ5X9kEmg2o+7\ncQIwM+uuN1/PmnDm7mrGWfssDBy3q71+ZPbnPj1nNLsTgJlZZ2x6Zffmm9VzYePS1GTTckW//7Ew\n4mgYsG+9o22XE4CZWTktQy7f6pjNKvyd27ImnJLKfthh0KfeAyc7zwnAzOytIZclbfWrn4S9Ruyq\n5Fva7AeOBZWtM3sdJwAzK5bN697eMbthURpPX9oxu9/kqg657ImcAMysMUXAa4t3f5Bq1VzYuj5V\n7qWV/fB35D7ksidyAjCz3m/HNlg37+2Vfb+9du+Y3W8KDJlYlyGXPZETgJn1Ljt3pCGWa56CRbel\n5pt182Dgga0q+8k9ashlT+QEYGY9VwRsXg0bnoe18+Dle+Clu9J8OMOPhAPPgJEnwH5HQ/996h1t\nr+MEYGb1t3NHGnWz9hlYvxDWL4ANC9P7Pv1gyMEw9BAYexqMnwGDxnW8T+uQE4CZ1V5EasZ5+bew\n5Lew9Hew98jURj/00FTZDz0kTYa217B6R9uwnADMLH+xE9Y9B8vuzyr9+6D/vjDu9NSMM+502Hd0\nvaMsHCcAM6u+bW/AK3Ng+YOw/CFYPgv2GAwHvHNXhT94fL2jLDwnADPrvteXw7IHd1X4a55JHbMH\nnJIq/QNO8RV+D+QEYGads2MbrHkSls/Oru4fTLNhHnBKeo15ZxqZ03+vekdqHXACMLO2vbkxXc2v\neSq9Vs2FVY/BoPEwehoccHK6wh92WMPMj1MkTgBmlq7q1y/YVdGvzv58YxUMPwJGHJWmNx5xDIye\nmtrzrddzAjArkog0d31LRd/yWr8gLV4y4ujdX0MmQZ++9Y7acuIEYNaotm+Bpfendvot6+DVF9P7\nvv1bVfRHpcnQ+u9d74itxpwAzBpFRLqSf+lOeOk3sOyBVMGPPS3NibPvmGw0zgH1jtR6CCcAs95s\ny4Y0P87iu1Olv3MHTJgB48+CA6c3/Hz21j3tJYDet76ZWaPavgU2rYDXXoZXHoGFP4Mta9N890Mm\nwaipcP4dMOwIj8axqvAdgFk97NyRJkVbPgtWzEp/vvYS7D0qLUe4/7FpvpyDzoRBEzze3rrMTUBm\n9bZ5HayYvauyf2UO7DMqjbEffXL6c/iRHo1jVecEYFZrKx+HlXNSZb98FmxaDqNO3FXZj56W5rs3\ny5kTgFletr0BS5rTPPevvpBeGxal9vzxZ+26wh9xlK/urS6cAMyq4bUlqQln9ZOw5mlY+zS8vgxG\nnpiu6IdMhMETYPDENI1C3/71jtjMCcCs07ZvSZ20yx7MZsB8CHZsTWPs95+SPVh1FAw9OK1mZdZD\nOQGYtWX7ljTkcvWTaTGT9QvSa9OKNPRyzLt2zYA5ZJKHX1qv4wRg1uLNjelqfunv0xQKqx5L4+pH\nHp9muxx6KAw9LC1k4it7awBOAFZMLbNfLp+VNec8AOvmpcp+7Luzq/uTYcDAekdqlhsnAGtcEbB+\nYarcX30+PTXb8npjJQwYnFatOui9aU77USdCvz3qHbVZzdQ1AUiaAVwL9AG+ExHXtPp+EPAD4ECg\nL/DliPhumf04AdiuCn9pcxp+uaQ5NdWMfXdqvhl0UPYanyZG80gcK7i6JQBJfYAFwHRgOTAHuDAi\n5peU+TwwKCI+L2kE8BwwMiK2t9qXE0ARRcDGl+GRL6U57dc+m6Y0Hnc6jGtKr0Hj3Tlr1oZ6TgY3\nFVgYEYuzQG4GzgPml5QJoKURdiCwtnXlbwWwfWt6Wnbj0lThr3g4td+vehxiJxxxCZx8VRp+uff+\nrvDNqiDvBDAGWFKyvZSUFEpdB9wuaTmwL/ChnGOyeouAlY/Cy/fBkt+meXHe3Aj7jE4Toe07Nk1x\nPPmv06RoA8e6wjfLQU8Y53YW8HhEnCFpEnC3pGMi4vXWBWfOnPnW+6amJpqammoWpHXTts2w8JZU\n2S/9HWx7HSacA++4FN5zfWqv91QJZt3W3NxMc3NzRWXz7gOYBsyMiBnZ9pVAlHYES/olcHVEPJht\n3wt8LiIebbUv9wH0Nq+9DC/8Cl78VRp3P3gCHPYhGHNqeqlPvSM0a3j17AOYAxws6SBgBXAhcFGr\nMouBM4EHJY0EDgVeyDkuq7ad22HlH2DNM6mj9qU74Y1VaeWqIy6Fs2/0ylVmPUythoH+J7uGgf6b\npMtJdwLXSxoNfBcYnf3k6oj4UZn9+A6gp9i+NT1QteapNIXCqrmw7P50hT9qapoMbcIMGHmCm3XM\n6swPglnX7dyRHqp6ZU56mnbhremBq8ETYcQxsN8x2aLk74Y9BtU7WjNrxQnAKrdjG7x4R3qy9oVf\nwrr50HcAjD87zZUzrimNwe+3Z70jNbMKOAFYeds2wesrUiW/YVFqu194a5rieOK5ac6cA06BPQbX\nO1Iz6yInAIOtr6V2+1dfSrNhLn8ojcUfND5d2Q85BIYekir+IRPrHa2ZVYkTQJFEpKdpV86BVx5N\nbfdrnoTNa3c9VLXXfnD4RWkmTDflmDU0J4BGFZGGWq78Q6roV2YVfuxMs16OOjE14+y1X+qs7b93\nvSM2sxpzAmgUm9elqRPWPgtP3wBvvJJWtBpz6q6pjkedCAPHeeoEMwOcAHq/bZvg4avhiW+kTtmh\nh8B+U+Dg89xBa2btqueTwNYVO3fAy/fAotvS1f7qJ9IDVh9+ND1sZWZWBb4D6Cm2rE8jc575Przw\nCxh2eJo3Z+Txac3agWPqHaGZ9UJuAuppdmxL0yi8sRLWPA2L74EVs2HUCekhqwnvSyN23I5vZt3k\nBFBvEbDkvlTRL/09rJgF+4yCwZPSAidj3w2T/hf036fekZpZg3ECqIetr8JzP4YXf51eAwbC5I+n\nK/zBE2Hw+HpHaGYF0O0EIGkAcGBELKp2cJXqFQkgIk2D/NxP4LmbYeSJae6cg8/LmnQ8/72Z1Va3\nRgFJOgf4CjAAmCBpCnBVRHygumH2Ytu3wrM3wpxr0nq2h/wx/Nkid9yaWY9WyTDQLwAnAfcBRMRc\nSQfnGlVvsXY+zJqZmnoOPAPe+y0Ye5o7b82sV6gkAWyLiA3avVLr4W0xOVs7H+77FCy+O61p+5cv\nwaCD6h2VmVmnVJIA5kn6E6CPpAnAp4DZ+YbVQ616Amb/c6r4J38czv0J7Dmk3lGZmXVJJb2SVwDH\nAzuBW4GtwKfzDKrHiYBZX4Abp6R1bf98Ebz7Glf+ZtardTgKSNL5EXFrR5/lrW6jgHbugJtPTQ9s\nnX8HjH1X7WMwM+uibg0DlfRYRBzX6rM/RMTxVYyxQ3VJAG++Dj89M03T8Md3w6ADa3t8M7Nu6tIw\nUElnATOAMZK+UvLVIFJzUON75OqUBC57Kq2La2bWQNrrBF4FPA1sAZ4p+XwjcGWeQfUIi26Dh/81\ntfe78jezBlRJE9CeEbGlRvG0F0ftmoBWPAz/Mx3OuRkmnVubY5qZ5aC76wGMkfQvwDuAtxaQjYhD\nqxRfz7JjG9z7CTjxs678zayhVTIM9LvADYCAs4GfAD/OMab6+sNX0sRtUxu/lcvMiq2SJqA/RMTx\nkp6KiKOzzx6NiBNqEuGuOPJvAlr5GNx0ElwyB/afku+xzMxqoLtNQFsl9QGel/RxYBkwsJoB9gg7\nt8OvLk5NP678zawAKkkAnwH2IU0B8S/AYODP8gyqLp75PvTbC079l3pHYmZWE11aEEbSmIhYlkM8\n7R0zvyag7Vvhvw+BM76W5u43M2sQ7TUBtdsJLOlESe+XNCLbPlLS94GHc4izfp7+bxgwyJW/mRVK\nmwlA0tXAD4FLgDslzSStCfAE0DhDQDevhYf+CU7793pHYmZWU202AUl6Fjg+IjZLGgYsAY6OiBdq\nGWBJPNVvAtq5HX54Euy9P1zw6+ru28ysB+hqE9CWiNgMEBHrgAVdqfwlzZA0X9ICSZ9ro0yTpMcl\nPS3pvs4eo8ue/BZsXALnNu5jDWZmbWnvDmAD8NuWTeD0km0i4vwOd56Gjy4ApgPLgTnAhRExv6TM\nYOAh4L0RsUzSiIhYU2Zf1b0D2L4Fvj0BzvlRWrjdzKwBdfU5gAtabV/XhWNPBRZGxOIskJuB84D5\nJWUuBm5pGVVUrvLPxSPXwP7HuvI3s8JqMwFExL1V2P8YUt9Bi6WkpFDqUKB/1vSzL/DViLixCsdu\n27oFaTH3Dz+a62HMzHqySh4Ey1s/4DjgDNIDZ7MkzYqIRa0Lzpw58633TU1NNDU1de2I918JR/05\njKzpmjZmZrlrbm6mubm5orJdehCsUpKmATMjYka2fSUQEXFNSZnPAXtGxP/Ntr8N/Doibmm1r+r0\nAax4GG6aBn+1GvYe0f39mZn1YF1+EKzVTvbowrHnAAdLOkjSAOBC4PZWZW4DTpXUV9LewEnAvC4c\nqzJz/wumft6Vv5kVXocJQNJUSU8BC7PtyZK+VsnOI2IHcAVwF2lVsZsjYp6kyyV9LCszH/gN8CQw\nG7g+Ip7t0tl0ZP1CeO4ncMzHctm9mVlvUsl00LOBDwE/j4hjs8+ejoijahBfaRzdbwK69RwYMgnO\n+Gp1gjIz6+G6Ox10n4hYLO32+x1ViayW1i+EF++Ajy3puKyZWQFUkgCWSJoKhKS+wCdJD3f1Lgtu\ngSM+DAPH1jsSM7MeoZJO4L8C/hY4EFgJTMs+611e+AWMPqneUZiZ9RiV3AFsj4gLc48kTysegVWP\nwQV31jsSM7Meo5I7gDmS7pB0maTeuRTk/JvguL9Ji72bmRlQQQKIiEnAF4Hjgack/VxS77ojWD4L\nxryr3lGYmfUonXoSOFsX4Frgkojom1tU5Y/d9WGgX+kLn9oE/fasblBmZj1ct54ElrSvpEsk/QJ4\nBFgNnFLlGPMVAX361zsKM7MepZJO4KeBXwBfioj7c44nJwEqmwDNzAqrkgQwMSJ25h5J7pwAzMxK\ntZkAJH05Iv4OuEXS2xrfK1kRrEdo6TfwHYCZ2W7auwNoWSi3KyuB9SD5TXdtZtabtbci2CPZ2yMi\nYrckIOkKoBorhtWIr/7NzFqr5EGwPyvz2Z9XO5Dc5LjgjZlZb9ZeH8CHSAu4TJB0a8lXA4ENeQdW\nPR4BZGZWTnt9AI8Aa4GxwH+VfL4ReDzPoKoqAjcBmZm9XXt9AC8CLwL31C6cPPgOwMysnPaagH4X\nEadJWs/uQ2lEWth9WO7RVYPvAMzMymqvCej07M9evnq67wDMzMppcxRQydO/44C+2QLvJwOXA/vU\nILbq8B2AmVlZlQwD/TlpOchJwA3AIcBNuUZVbb4DMDN7m0oSwM6I2AacD3wtIj4DjMk3rGrycwBm\nZuVUkgC2S/ogcCnwy+yzXjS3spuAzMzKqfRJ4NNJ00G/IGkC8KN8w6oi9wGYmZVV0YpgkvoBB2eb\niyJie65RlY+hayuCvbkRvjkaPvV69YMyM+vh2lsRrMP1ACS9C7gRWEa6lB4l6dKIeLC6YebEdwBm\nZmVVsiDMfwDvi4hnASQdQUoIJ+QZWPX4OQAzs3Iq6QMY0FL5A0TEPGBAfiFVme8AzMzKquQO4DFJ\n3wR+kG1fQm+aDA58B2BmVkYlCeDjwKeAz2bb9wNfyy2iqvNzAGZm5bSbACQdDUwCfhYRX6pNSFXm\nJiAzs7La7AOQ9PekaSAuAe6WVG5lsA5JmiFpvqQFkj7XTrkTJW2TVOXF5t0JbGZWTnt3AJcAx0TE\nJkn7AXcA/92ZnUvqQ1pUfjqwHJgj6baImF+m3L8Bv+nM/iviOwAzs7LaGwW0NSI2AUTE6g7KtmUq\nsDAiFmfzCd0MnFem3CeBnwKrunCMDjgBmJmV094dwMSStYAFTCpdGzgiKmmqGQMsKdleSkoKb5F0\nAPD+iDhd0m7fVYX6wL6jq75bM7Perr0EcEGr7etyiuFaoLRvoM3L9ZkzZ771vqmpiaampo73vtdw\nuOypLgdnZtabNDc309zcXFHZiuYC6ipJ04CZETEj276StJzkNSVlXmh5S1p9bBPwsYi4vdW+ujYX\nkJlZgbU3F1DeCaAv8BypE3gF8AhwUfY0cbnyNwC/iIhby3znBGBm1kndmgyuOyJih6QrgLtIncjf\niYh5ki5PX8f1rX+SZzxmZrZLxXcAkvaIiK05x9Pe8X0HYGbWSe3dAXQ4tFPSVElPAQuz7cmSetFU\nEGZmVk4lY/u/CpwLrAWIiCdIK4SZmVkvVkkC6BMRi1t9tiOPYMzMrHYq6QRekj2gFdmonk8CC/IN\ny8zM8tZhJ7Ck/UnNQGdmH90DXBERa3KOrXUc7gQ2M+ukuj0HUE1OAGZmndfdReG/RZnx+RHxsSrE\nZmZmdVJJH8A9Je/3BD7A7hO8mZlZL9TpJqBs7v4HIuKUfEJq87huAjIz66RuPQhWxgRgZPdCMjOz\nequkD2A9u/oA+gDrgCvzDMrMzPLXbhOQJAHjgGXZRzvr1Q7jJiAzs87rchNQVuPeERE7spdrYDOz\nBlFJH8BcScfmHomZmdVUm01AkvpFxHZJzwCHAc+TVusS6ebguNqF6SYgM7Ou6OqDYI8AxwF/lEtU\nZmZWV+0lAAFExPM1isXMzGqovQSwn6S/bevLiPhKDvGYmVmNtJcA+gL7kt0JmJlZY2mvE/ixWnf0\ntsedwGZmndfV5wB85W9m1sDauwMYFhHrahxPm3wHYGbWeV4QxsysoKo9G6iZmTUAJwAzs4JyAjAz\nKygnADOzgnICMDMrKCcAM7OCcgIwMysoJwAzs4JyAjAzK6jcE4CkGZLmS1og6XNlvr9Y0hPZ6wFJ\nR+cdk5mZ5TwVhKQ+wAJgOrAcmANcGBHzS8pMA+ZFxKuSZgAzI2JamX15Kggzs06q51QQU4GFEbE4\nIrYBNwPnlRaIiNkR8Wq2ORsYk3NMZmZG/glgDLCkZHsp7VfwfwH8OteIzMwMaH9FsJqSdDrwUeDU\ntsrMnDnzrfdNTU00NTXlHpeZWW/S3NxMc3NzRWXz7gOYRmrTn5FtXwlERFzTqtwxwC3AjLYWoXcf\ngJlZ59WzD2AOcLCkgyQNAC4Ebm8V3IGkyv/Stip/MzOrvlybgCJih6QrgLtIyeY7ETFP0uXp67ge\n+EdgGPB1SQK2RcTUPOMyMzOvCGZm1tC8IpiZmb2NE4CZWUE5AZiZFZQTgJlZQTkBmJkVlBOAmVlB\nOQGYmRWUE4CZWUE5AZiZFZQTgJlZQTkBmJkVlBOAmVlBOQGYmRWUE4CZWUE5AZiZFZQTgJlZQTkB\nmJkVlBOAmVlBOQGYmRWUE4CZWUE5AZiZFZQTgJlZQTkBmJkVlBOAmVlBOQGYmRWUE4CZWUE5AZiZ\nFZQTgJlZQTkBmJkVlBOAmVlBOQGYmRVU7glA0gxJ8yUtkPS5Nsp8VdJCSXMlTck7JjMzyzkBSOoD\nXAecBRwJXCTp8FZlzgYmRcQhwOXAN/OMyczMkrzvAKYCCyNicURsA24GzmtV5jzg+wAR8TAwWNLI\nnOMyMyu8vBPAGGBJyfbS7LP2yiwrU8bMzKrMncBmZgXVL+f9LwMOLNkem33Wusy4DsoAMHPmzLfe\nNzU10dTUVI0YzcwaRnNzM83NzRWVVUTkFoikvsBzwHRgBfAIcFFEzCsp8z7gExFxjqRpwLURMa3M\nviLPWM3MGpEkIkLlvsv1DiAidki6AriL1Nz0nYiYJ+ny9HVcHxF3SHqfpEXAJuCjecZkZmZJrncA\n1eQ7ADOzzmvvDsCdwGZmBeUEYGZWUIVIAJX2iDcSn3Mx+JyLIa9zdgJoUD7nYvA5F4MTgJmZVZUT\ngJlZQfWqYaD1jsHMrDdqaxhor0kAZmZWXW4CMjMrKCcAM7OCaqgEUMTlJzs6Z0kXS3oiez0g6eh6\nxFlNlfx3zsqdKGmbpPNrGV8eKvy33STpcUlPS7qv1jFWWwX/tgdJuj37f/kpSR+pQ5hVI+k7klZK\nerKdMtWtvyKiIV6kZLYIOAjoD8wFDm9V5mzgV9n7k4DZ9Y67Buc8DRicvZ9RhHMuKXcv8Evg/HrH\nXYP/zoOBZ4Ax2faIesddg3P+PHB1y/kCa4F+9Y69G+d8KjAFeLKN76tefzXSHUARl5/s8JwjYnZE\nvJptzqb3r7ZWyX9ngE8CPwVW1TK4nFRyzhcDt0TEMoCIWFPjGKutknMOYGD2fiCwNiK21zDGqoqI\nB4D17RSpev3VSAmgiMtPVnLOpf4C+HWuEeWvw3OWdADw/oj4BlB2+FsvU8l/50OBYZLukzRH0qU1\niy4flZzzdcA7JC0HngA+XaPY6qXq9VfeK4JZDyHpdNJaC6fWO5YauBYobTNuhCTQkX7AccAZwD7A\nLEmzImJRfcPK1VnA4xFxhqRJwN2SjomI1+sdWG/RSAmgqstP9hKVnDOSjgGuB2ZERHu3mL1BJed8\nAnCzJJHahs+WtC0ibq9RjNVWyTkvBdZExBZgi6TfA5NJ7ei9USXn/FHgaoCIeF7Si8DhwKM1ibD2\nql5/NVIT0BzgYEkHSRoAXAi0/h/+duBPAbLlJzdExMrahllVHZ6zpAOBW4BLI+L5OsRYbR2ec0RM\nzF4TSP0Af92LK3+o7N/2bcCpkvpK2pvUSTiP3quSc14MnAmQtYUfCrxQ0yirT7R9x1r1+qth7gCi\ngMtPVnLOwD8Cw4CvZ1fE2yJiav2i7p4Kz3m3n9Q8yCqr8N/2fEm/AZ4EdgDXR8SzdQy7Wyr87/xF\n4LslwyY/GxHr6hRyt0m6CWgChkt6GbgKGECO9ZengjAzK6hGagIyM7NOcAIwMysoJwAzs4JyAjAz\nKygnADOzgnICMDMrKCcA6zEk7ZD0WDal8WPZQ2xtlT1I0lNVOOZ92ZTDcyXdL+mQLuzjckkfzt5f\nJmlUyXfXSzq8ynE+nD3d3dFvPi1pz+4e2xqXE4D1JJsi4riIODb78+UOylfrIZaLImIKaabFf+/s\njyPi/0XED7LNj1AyQVdEfCwi5lclyl1xfoPK4vwbYO8qHdsakBOA9SRvewQ+u9L/vaRHs9e0MmXe\nkV0VP5ZdIU/KPr+k5PNvZE9Ct3fc3wMtv52e/e4JSd+W1D/7/N+yBVfmSvpS9tlVkv5O0gWkeYh+\nkP12z+zK/bjsLuFLJTFfJumrXYxzFnBAyb6+LukRpUVRrso++2RW5j5J92afvVfSQ9nf44+zKSOs\nwJwArCddpZtfAAACuUlEQVTZq6QJ6Jbss5XAmRFxAmk+mK+V+d3HgWsj4jhSBbw0a3b5EHBK9vlO\n4JIOjv9HwFOS9gBuAD4YEZNJC5L8laRhpGmmj8quxL9Y8tuIiFtIE5FdnN3BbCn5/hbgAyXbHyJN\nWNeVOGcAPy/Z/vtseo/JQJOkoyLia6SJwpoiYrqk4cA/ANOzv8s/AH/XwXGswTXMXEDWEN7IKsFS\nA4DrlJa/2wGUa6OfBfyDpHHArRGxSNJ00vTIc7Ir6j1JyaScH0raDLxEWkjmMOCFksnzvgf8NfBf\nwGZJ3wZ+RVptrJy3XcFHxBpJz0uaSpqh87CIeEjSJzoZ5x6k6Z5LlwO8UNJfkv5/HgW8A3ia3ScW\nm5Z9/mB2nP6kvzcrMCcA6+k+A7wSEcdI6gtsbl0gIn4kaTZwLvCrbMIwAd+LiH+o4BgXR8TjLRvZ\n1XK5SnxHVoFPBz4IXJG9r9SPSVf784GftRyus3FmTUnXARdIGk+6kj8+Il6TdAMpibQm4K6I6Oju\nwgrETUDWk5Rr+x4MrMje/ynQ920/kiZExItZs8ftwDGk9YD/WNJ+WZmh7Ywqan3c54CDJE3Mti8F\nfpe1mQ+JiDuBv82O09pGYFAbx/kZaVm/C0lLHNLFOP8JOEnSodmxXgc2Kk2JfHZJ+ddKYpkNvLOk\nf2Tvrox4ssbiBGA9SblRPV8HPiLpcdJ875vKlPmTrGP2ceBI4PsRMQ/4P8Bdkp4gTSs8qsxv33bM\niNhKmmr3p9lvdwDfJFWmv8w++z3p7qS17wLfbOkELt1/RGwgzdF/YEQ8mn3W6TizvoUvA/87Ip4k\nLZg+D/gB8EDJb74F3Cnp3myN4I8CP8qO8xCpqcsKzNNBm5kVlO8AzMwKygnAzKygnADMzArKCcDM\nrKCcAMzMCsoJwMysoJwAzMwKygnAzKyg/j/eesD8BtlQBAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x99cd048>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Plotting the ROC curve\n",
"plt.plot(fpr,tpr, color = 'darkorange')\n",
"plt.xlim([-.05, 1.05])\n",
"plt.ylim([-.05, 1.05])\n",
"plt.xlabel('False Positive Rate')\n",
"plt.ylabel('True Positive Rate')\n",
"plt.title('ROC curve')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.40539789, 0.40623261, 0.40651085, ..., 0.94073456,\n",
" 0.94073456, 1. ])"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tpr"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0. , 0. , ..., 0.79072081,\n",
" 0.7911011 , 1. ])"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fpr"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1.00000000e+00, 9.97727273e-01, 9.97222222e-01, ...,\n",
" 9.25925926e-04, 4.34782609e-04, 0.00000000e+00])"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"thresholds"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1-fpr</th>\n",
" <th>fpr</th>\n",
" <th>tf</th>\n",
" <th>thresholds</th>\n",
" <th>tpr</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2163</th>\n",
" <td>0.765621</td>\n",
" <td>0.234379</td>\n",
" <td>0.000099</td>\n",
" <td>0.057143</td>\n",
" <td>0.765721</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 1-fpr fpr tf thresholds tpr\n",
"2163 0.765621 0.234379 0.000099 0.057143 0.765721"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#ROC Analysis\n",
"i = np.arange(len(fpr))\n",
"roc = DataFrame({'fpr' : Series(fpr, index=i),'tpr' : Series(tpr, index = i), '1-fpr' : Series(1-fpr, index = i), \n",
" 'tf' : Series(tpr - (1-fpr), index = i), 'thresholds' : Series(thresholds, index = i)})\n",
"roc.ix[(roc['tf']-0).abs().argsort()[[0]]]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEPCAYAAABLIROyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XecVPXVx/HPAQGl2gsIiICCBREBxcZixRIxKhH1McYW\njVGfqEnEGGNNotGoUWMNjyZWjCWKGjurggVExQYK0hERBQWlyp7nj3NXhnW2wc7e2Znv+/W6r51y\nZ+6Z3dl77q+buyMiIsWnUdoBiIhIOpQARESKlBKAiEiRUgIQESlSSgAiIkVKCUBEpEgpAUi1zOxY\nM3s67TjyiZktMrOtUjhuRzMrM7OC+N81s/fNbO81eJ2+k3WgIL5ExcTMppnZYjNbaGafmtmdZtY8\nl8d09/vcfWAuj5HJzHY3sxeSz7jAzB4zs+71dfws8Yw0s5MyH3P3Vu4+LUfH28bMHjSzecnnf8fM\nzjEzKz98Lo5bW0ki2npt3sPdd3D3l6s5zg+SXn1/JwuVEkDD48Ah7t4a6AnsDFyQbkhrxswaZ3ms\nH/AM8CiwBdAJeBcYnYsr7mwxpMnMOgOvA9OBHdx9A2Aw0AtoVcfHWtvPvsaJqJbHtuRYVt2OUkvu\nrq0BbcBUYJ+M+1cBIzLuNwWuIU4gc4CbgWYZzw8C3ga+BiYBBySPtwb+AXwKzAQuByx57gTgleT2\nzcDVFWL6D/Cr5PYWwEPA58AnwFkZ+10M/Bu4G/gKOCnL53sZuDHL408BdyW3+ycxXgDMA6YAx9bk\nd5Dx2t8mz/0TWB8YkcT8ZXK7bbL/FcB3wGJgIXBD8ngZsHVy+07gJuCJZJ/XgE4Z8RwATAQWAH8H\nSrN99mTfuzP/nlme75gc+6fJ5/sc+F3G832AV5NjzQZuBNbJeL4MOAP4GPgkeex6YEbynRgL7Jmx\nfyPgd8Dk5LONBbYEXkre65vk8cHJ/ocS368FwChgxwrf3d8C44ElQGMyvs9J7GOTOOYA1ySPTwdW\nAouSY+1Kxncy2Wd74Nnk7zcHGJr2/2pD2FIPQFst/2Cr/8NsSVwdX5vx/HXECbkN0AJ4DPhj8lxf\n4sRb/votgG2S248SJ8p1gY2Jq9BTk+dOAF5Obu8FTM843vrEyXEz4grtTeDC5J97q+TEsX+y78XA\nMuBHyf1mFT7besTJtn+Wz/0zYHZyuz+wArgaaALsnZyIutbgd1D+2j8lr20GbAj8OLndAhgOPJpx\n7JFUOGEnJ6TMBDAP2IU4Yd4D3Jc8t1FyQhuUPHd28juoLAHMAU6o4u9fngBuIxJdD2ApsG3yfK/k\n72xAB+AD4OyM15cRJaw2rEqKxyZ/x0bAOUkMTZPnfkOcsLsk93cENsh4r8xEtzMwF+idHP944vva\nJOO7+xbQNuPYmd/nV4HjktvNgb4Zn3klyQVJlu9kS+LC5VfJ76QF0Cft/9WGsKUegLZa/sHiH2Zh\nspUBzwGtM57/psI/ZT9gSnL7VuCvWd5z0+QkkllSGAK8mNz+/p8tuT+N5CoROAV4Prm9KzCtwnsP\nBYYlty8GSqv4bO2Sz7RNlucOBJYlt/sDy4F1M54fDlxYg99B/+SzNqkijp7Alxn3syWAiiWA2zOe\nOwj4MLl9PDC6wmtnVHy/jOeWk5TKKnm+/GS4RcZjbwA/qWT//wUerhB3/2q+Y/NJrtyJksuhlez3\n/e8guX8zcGmFfSYCe2V8d0/I8n0uTwClyXdko0o+c6OMxzITwBBgXK7+5wp5WwdpiAa5+0gz2wu4\nj7hiX2hmmxBXTuNWtRfSiFV1p+2BJ7O8X0fianhO8jpLthmVHH84cAxRxD+WqLaAuOJsZ2bzk/uW\nHD+zkW9mFZ9rAXFS2YKoosi0BfBF5r7uvjTj/nSgbQ1+BwDz3H1F+R0zW4+oBjmQuBI2oKWZmSdn\nmBr4LOP2YuKqFOJqt+JnnlXF+3xJfNbqzM12PDPrClxLXIWvB6wDjKvq+Gb2a+CkjOO2Ir5TEN+Z\nKTWIB+J79FMzO6v8rYnvVdvKjl3ByUTV40QzmwJc5u7Zvq8VtSeqG6WW1AjcMBmAu79C1GH/NXn8\nC+JksL27b5hs67t7m+T5mUDnLO83k7gq3ih5zQbJ63pUcvz7gaPMrANx1f9wxvtMyTj2Bu7ext1/\nlPHaSk+o7r6YqD8fnOXpnwDPZ9zfIDlxl+tAVANU9zvIFsN5QFei2mB9okoJViWNmiaBbOYQJ6hM\nW1ax//PAkWtxvFuACUDn5LNcyA8bT7//PGa2J1HNc1Ty99qAKF2Wv6ay70w2M4mqtsy/f0t3H57t\n2BW5+yfufqy7bwL8BXgo+RtX9/uvTYySQQmg4bse2N/MdkyuVu8Ark+uhDGzdmZ2QLLvMOBEMxtg\noa2ZbevunxENaNeZWavkua0r65/t7u8QV6r/AJ5294XJU2OARWb2WzNb18wam9n2Zta7Fp9nKHCC\nmZ1pZi3NbAMzuwLYDbg0Yz8DLjWzJklJ6BDgwRr8DrJpRTRKLjSzDYFLKjw/F1jT7o5PAjuY2WHJ\n7+NMor2kMhcDu5vZVWa2WRJ/FzO728xaJ/tU1RumFbDQ3RebWTfgF9XE14poE/nSzJqa2R9YvbfR\nP4DLzaxLEsuOZrZB8txnrP57uQM43cz6Jvu2MLODzaxFNTGQ7H+cmZWXPL4mTvxlRPtKGZWf5J8A\nNjezs5PP0LI8BqmaEkDDs9rVkLt/QZQC/pA8NJRoeH3dzL4iTuzbJPuOBU4kksbXRJ1rh+R1PyUa\n0D4k6oD/DWxeRRz3AfsC92bEUkb0AulJ1O1+TpwUWmd7g6wfzn00URVzJHH1PBXYCdjD3TOrIuYQ\nVUafElVQp7n7pOS58yv7HVTieqLa6AuiIfKpCs//DRhsZl+a2fXlodbw83xJlGiuTt6/G9FQvqyS\n/acQbRadgA/MbAHxtxhL9ILJduzM+78GjjOzhURD8QNV7AvRIPwMUeU2lSg9ZVZZXQs8CDxrZl8T\nCaG85HUp8C8zm29mR7n7OOBU4KakGvBjoq6+smNXfGxg8pkXEg35R7v7MndfAvyR6Ao8v+LJ3d2/\nAfYHDiOS0sdASZZjSQXl3fxy8+Zmw4gTwtzKqhPM7Aai0exb4GfJ1aVIpcysP3C3u3eoduc8kwzm\nmkV0W30p7XikuOW6BHAncTWXlZkdRNRVdgVOI3qpiBQUMzvAzNqYWTOiTh6im61IqnKaANx9FFFM\nr8wg4F/Jvm8AbcrrPUUKSD+il8rnRFvFIHfPWgUkUp/S7gbajtXrG2cnj83NvrsIJFUnDab6x90v\nZfUGbJG8oEZgEZEilXYJYDar95HeMnnsB8wsd63VIiIFzN2zdh2ujxJA+ajSbB4nuh9iZrsBX7l7\npdU/azTcedEifPx4/Kmn8GHD8Msvx085Be/fH+/YEV9vPXyXXfCjjsKHDsVvvBF/5BF83Dh86dI6\nG3J98cUXpz7sW/GnH0cxxt+QY8+Mf/BgZ/jw9OOp7VaVnJYAzOw+oj/uRmY2gxjk0jTO5X67uz+V\nDBSZTHQDPbHOg2jZEnr0iC2bJUvgnXdg2jSYMgU+/BCefx4++QQmT4auXaFdO2jfHrbcMn62bw+d\nOsHWW4NVNSZHRApFWRk0KrBK85wmAHc/tgb7nJnLGKq13nrQr19sFS1ZAhMnwuzZMGsWzJwJpaXx\nc8IEWLkSevaMxNCuXfzs3DkSRLt2kXxEpCAoARSb9daDnXeOrSL3SArvvx8JYvZsGDsW7r9/VcJo\n1iySwY47UtK8OTz2GGy1FXTsCOuvX+8fZ22UlJSkHcJaUfzpacixw6r4CzEB5HQkcF2q3cSMecAd\nFiyIaqUJE6Ka6aOPoqpp2jRYZ51VyWCrrVZtnTqtShCqXhLJG4cdBqecEj8bEjPDK2kEVgkgV8xg\nww1j690bjj9+1XPuMH8+TJ++KiFMmwYjR8LUqfE4QIcOkQzKf26zDWy/fVQzraM/nUh9KsQSgM4i\naTCDjTaKrVev7Pt89RXMmBHJYMaMSBB33hmN1HPmRFLo0AG6dYvG6PKG6W23hRY1mnxRRGqhrKzw\nCuVKAPlq/fVjy9Z76ZtvViWHCROimqm0NH5+8kk0QLdtGwmie3fYbjvYccdopG7atN4/ikghcFcJ\nQPJBy5ZxUt9uOzjooNWfW7w4eil9+mlUJ02YADffHL2Z5s2DPfeMZLDNNqu2tm0L79JGpI6pCkjy\nX/PmUQ207bYwYMDqzy1cGCWFiROjx9K998LHH0eJomvXSAY9esAuu0TJoX37wvvGi6whJQBp2Fq3\nji4MFbsxfP01TJoUyeDNN+Gvf42Sw4IF0ejcrVs0Qu+2G/TtC5tskk78IikqxASgbqBSuUWLovvq\n5MnRvvDqqzBuXIyP6NIlSgnbb79q23xzVSVJQZk6NfpdfPgh/Pa30VGvoQ1rqKobqBKA1E5ZWQx0\nmzw5SgkffBDb++9D48bR5XWPPeK/ZNdd4zGRBujii+HWW1f1o+jXDwYPhnXXTTuy2lECkNwrHxk9\nZgyMHg3PPRelhm7dYIcdIjH07Blbq1bVv59IioYPhyFD4tpmu+3SjmbtKAFIOhYtirLze+9Fo/P4\n8XG7ffsoHfTtG+0KPXpAkyZpRysCwFFHwcMPx8wtDW3UbzZKAJI/Vq6MRDBuHLzxBrz+egxy69kz\nksFuu0X10cYbpx2pFKkmTWI4TYcGs+Zc1ZQAJL8tXBglhDfeiOqjV16JUdL77BNdUnv3jq3QumBI\n3njgAZg7F0aNgmeeiY5xhdKfQQlAGpayspg478UXoxfSq6/G3EkDB0bV0e67x2C2QvkPlVR9/TVs\nuimcfnrU9x94YMzLWCiUAKThmzw5Fup5/XV44YWY0mL33aOE0KsX7LQTtGmTdpTSAE2dGmMmp01L\nO5LcUAKQwvLdd9EFdezYGLj29tvRrtCjB/zP/8Dee6uEINWaORMuvTS+QsuXR3+FQqQEIIXvu+/g\n7rvhtdfg2Wfjfo8e0Zfv4IPVqCxAfC3+8IcY9D5mTPTtHzIk+vp365Z2dLmhBCDFxT2mthgzBh56\nKIZvbr89HHpojOTp2jXtCCUln3wSJ/q7747psnbYofB7ICsBSHFbtgxeegkefTQSQvPmsN9+MedR\nA1uaU9bOmDFwxhlR7VMsqkoA6lcnha9ZMzjgALjlFvj886gimj8/Onofemj0/Vu5Mu0oJUdmzYJH\nHoGf/hR+8pNYpE+CSgBSvBYuhH/8A+66KzqBDxgAv/99VBepAbkgfP551PjtsQf06RP9A3r3Lq4O\nY6oCEqnO1Klw5ZUwYkTM9lVSEnMC7L9/4VcSF7DXX4ezz46qn2KlKiCR6nTqBLfdFkttjhgRrYNX\nXBGrpR17bIxOlgbj3Xdj+uZDDokhIpKdSgAiVZk6Fe6/PzqMd+8ecxUdemgsxamprvPGpEnw1FPw\n2Wdxu7Q0/kTnnhvTTBVzjZ6qgETW1rJlMdhs1KhoN1ixAgYNgsMPj3WWJVXnnRc9e/bbLwptBx0U\nP0UJQKRuucfEdf/9L/zpTzHq+E9/ivYClQpScdJJ0dB78slpR5J/1AYgUpfMoiro0ktj3eRTT426\nhk6d4MgjY7zBd9+lHWVRWLEi5u4fP15DOtaESgAidcE95icaPToak2fNgl/9Klohu3VTT6I69Oqr\nMbj7rbdiwthOnSIfX3ghtGuXdnT5R1VAIvXttdfgX/+KQWdbbRWTzK+zTtpRNWiTJ8dJ/sEH4cwz\nYyLYPfeEzp2Lu5G3OkoAImlZvDhaJqdMgeOOg7POgo4ddcaqpaVL40p/113h+usLa77+XFMCEEnb\nxIlwzTXw5JPRUHzaaXDEETHqWH5g0aIoRE2dGss/PPkkbLFFTNnctGna0TUsSgAi+eSNN2JswT//\nGaOU7r1XldcZpk+PK/yePWPbccfo5aNG3jWjBCCSjxYvht/8JuYiuuiimJi+yOs2li2Dxx+Hv/89\nBnPJ2lM3UJF81Lx5nOleeCG6teyyS3RmLy2NXkUFbtmymJ//ySdj2oZddomZOs89F/r3Tzu64qAS\ngEi+WLkShg2LSem22w5+/GM4+mho2TLtyOrUO+/EtMyzZ8dCbdtuGwWfwYNjha4C+7ipS7UKyMwG\nAtcTpY1h7n5VhedbA/cAHYDGwF/d/a4s76MEIMVhwYKYbuLaa+Gcc+KSuAC6kK5cGUsx3ngjfPll\n/Nx007SjKnypJQAzawR8DOwLfAqMBYa4+8SMfS4AWrv7BWa2MfARsJm7f1fhvZQApLg89RQMHQrz\n5sEpp8Qlco8eaUdVa998E7NmXHhhVPvsuGP0hj3wwLQjKw5VJYBcX1b0BSa5+/QkkAeAQcDEjH0c\naJXcbgV8WfHkL1KUDj44trFjYfjwmGto2LCYjTTPffMN3HBDzK5dWho9Xy+6CI4/Pu3IJFOuE0A7\nYGbG/VlEUsh0E/C4mX0KtASOznFMIg1Lnz6x7bUX/Oxn0T5w990xoCyPTJoEd94ZBZdJk2JNnUMO\nibx16KGxMqfkl3yoWDwQeNvd9zGzzsBzZtbD3b+puOMll1zy/e2SkhJKSkrqLUiR1A0aFPUml10W\n9SgvvQQ775xqSJ98Ai+/HCtv3X57tFlffXXkqLZtNeA5DaWlpZTWsA9trtsAdgMucfeByf2hgGc2\nBJvZE8Cf3X10cv8F4Hx3f7PCe6kNQKTcr38Njz0GN98cl9j17KOP4kQ/bFj06OnVK+bgb4BNFAUv\nzTaAsUAXM+sIzAGGAMdU2Gc6sB8w2sw2A7YBpuQ4LpGG7fLL4zJ70KBIAKefDgMH5vSSe+ZMeO+9\n6Ld/881wxhlR1dOlS84OKTlWX91A/8aqbqBXmtlpREngdjPbArgL2CJ5yZ/d/f4s76MSgEhF8+dH\nA/E110SbwIsv1snbfvddTGD69tsxRm36dPjqq7jSb98+eqZqGqOGQVNBiBS6lStjzeLmzWOZysMP\nj3mGalEiKCuLNXU/+ijyyccfw4AB0YDbpQtss01BDEcoOkoAIsVg5cpYLeWxx+C+++Cqqyrtd7lw\nYVTnjB0bE5WWlsZVfuvWsNlmMRXDSSel3sYsdSDNNgARqS+NG0dX0b32itlF//a3KAEcdxyY4R71\n+H//O9xxR5zoBwyIgsPJJ0eVTvPmaX8IqU8qAYgUoKXzF/PB7+9n23t+z1ubDOS8pjfy3tSWLFsG\nRx0F558PvXunHaXUB1UBiRQgd1i+HKZNi+qcl1+OOXaWLIExY2L+/MF7zuGk0p+yQeOFNP73A6y3\nXae0w5Z6pgQgUiCmTo215++6K0bcrlgRA646doyan223hXXXjdWz+vZN1qJfujTaA266CU49NVYj\ny7NRxJI7SgAiDURZWUyY9s03sSTiI4/At9/GBKHjxsVyiN27r5oVonPnWrz5u+/GjGyffhorkm2z\nTa4+huQRJQCRPPTpp/DmmzFh2pw5sVLkmDFxVb/eetGOe8IJ0KEDtGkTJ/vOnddyrNeKFbEK2f/9\nH3zxhRbYLQJKACJ5oKwMPvggBlg98UQMsurdG7p1g002iS6Xu+0WvXNybu+9Y5rpq66Cww6rhwNK\nWpQAROrZF1/EgKrJk+NEP2lS1MA0axbn3qOOihP+FltU/145UVYGDzwQbQKHHRbdgnr2TCkYySUl\nAJEcWbgQPv88TvJvvhkX1W+9FbNkdu8e1ezdu8fknVtuGVMp5JUZM2JGt9tug4svhp//PMYTSMFQ\nAhBZC2VlMVp28eKYKmHcuNimT4/pEjbccFXDbPv2sPnmsO++DWz++5deimW6Nt88Gh7694+MJQ2e\nEoBIFl99FVfqn30WV/FffRX96d99NyZDK99mzIiT/OabwwYbRE1Jz54xcrZ9e9hoo7Q/SR1ZuhTu\nuSf6l44cCVdcAb/8ZdpRyVpSApCitmIFjB4dJ/kFC+D992ON2pkzoVMn2GqrOLm3bBkzLPfqFX3p\n11knto02go03LrLFTa68MjLilVemHYmsJc0FJAVv3jx49tnoM79sWQyWeuKJGBm7fHnUxW+3XXSn\n7NIlqrwHDIBGjdKOPE+1bh11XFLQlACkQViyJKpi5s6NKpoPP4RZs+Drr+PK/rPPYNddo0qmadPo\nO//UU3GF36yZpjGutfXXj5lF33wTdtmlyIo/xUP/FpI3ysqi++SSJVFNM25c1MlPmRL957fcMqpq\nunePqe733z+u6DfbLAZNrb9+2p+ggPz4x1ECOOSQWOfx0kuhXz8lggKjNgCpV+5xXvn005ju4P33\nY0766dPjhN+yZUxJ3KVLXNH36AFbbw077BDPST1bvBhuuQX++MdIAP/+t+aMbmDUCCz1bsmSuJp/\n7z0YPz5O8vPmxZV8o0Zxgm/ePK7m+/aNKps+fXSSz1vLl8eAsa+/jmQwYIBKAw3EWicAM2sKdHD3\nyXUdXE0pAeSnWbPipD59ekxLPGXKqmmJN9ooGl932SVO8pttFtMebLaZzh0N0sKFcO210YL+r39F\nHZzkvbVKAGZ2CHAt0NTdO5lZT+Bid/9x3YdaZRxKAClavjyu5t97L0a6vvlmnPBXrIj6+I4doztl\n+/YxhqhjR/WwKVh33QUnnhiZfq+90o5GqrG2CWAcsC8w0t13Th57z913rPNIq45DCaCerFwZc9iM\nHw+PPw6vvx6zVbZtC7vvHnPO77FH9LDZckud6IvSBRfAK6/E+pI77ZR2NFKFtR0HsMLdv7LVy+w6\nExcQ9xgR+/DDcVH31lsxHXHbtrGc7AUXxNV9ixZpRyp54/zzo9vWQQfB7Nmq02ugalICuBP4L3Ah\ncDhwNtDC3X+e+/BWi0MlgDo2bRqMGAE33BCNtvvtB0ccEVMc1GqhESlO7tFyP3dujBg++mgNuMhD\na1sF1AL4A3BA8tAzwKXuvqROo6yGEsDaW748Fhx58skovU+YEO14Z54ZVTq6iJM18uSTMHRoLG5w\nxx1KAnlmbRPAEe7+SHWP5ZoSQO25R919aWlMk/Dww9C1KwwcGA21JSUNbMZKyV+zZsHBB8ciB8OG\nKQnkkbVNAG+5e68Kj41z913qMMZqKQHU3MKFcVF25ZVRPbvbbnGlf+yxsfKUSE7MmwcHHhiDOy69\ntA7Wr5S6sEYJwMwOBAYCxwL3ZjzVGtjJ3fvUdaBVUQKo3MqVUaXzyiux3ODbb0eVzkknRbWs/gel\n3syZA+ecE8XOc86JxmJJ1Zr2AvoceB9YCnyQ8fgiYGjdhSdrYsmSmDrhlVdiTM7KlTF9ywUXRGOu\nqnYkFVtsEUtNDhsWc3BLXqs0Abj728DbZnavuy+tx5ikEmVlMGoUvPgi3HRTzJHTpw/ceCPss4/6\n40se2XDDmKZV8lpNWmramdkfge2AdcsfdPdtchaVfG/WrJjX/oEHYmBWhw7RgDtqVEyrIJKXdtop\nSgBnnhnF0nbt0o5IsqjJNeNdwJ2AAQcBDwLDcxhT0Vu2LKp1Bg2KWTBffBHOOw8++iiSwA036OQv\neW7rrVetrdm1a8wGKHmnRlNBuPsumdM/mNmb7t67XiJcFUdBNwJ/8w08/zw8/TT85z+xetUxx8Dg\nwZrnXhq4006LEsEZZ6QdSVFa26kglplZI+ATMzsdmA20qssAi9XixfDII/Dcc7F6VY8e0ZV65Mjo\nSSdSELp3j6qg9dePvsiSN2pSAtgV+BDYAPgj0Aa4yt3rtYm/UEoA330XPXeeey7m0erXD370Izj0\n0JhBU6QgvfxyjEC87TY4/vi0oykqdb4gjJm1c/fZax1Z7Y7ZYBPAZ59Fl82RI+Huu6M9bP/9Y6K1\nHj3Sjk6knowdG0tMnnpqTB3RShUJ9WGNq4DMrA/QDhjl7l+Y2fbA+cA+wJY1PPhA4HqiwXmYu1+V\nZZ8S4DqgCTDP3QfU5L3zlXvMlz9iRPTWeecd6NUrFkUpr97R4CwpOn36xACxE0+Mf4Arrkg7oqJX\n1UjgPwNHAuOBTsATwBnAVcAt7r642jePtoOPifUEPgXGAkPcfWLGPm2AV4ED3H22mW3s7l9kea+8\nLwHMmgX//OeqgVmHHx5X+n36RLdoESHWGB4/Hm69Ne1IisKalgAGEVM+LDGzDYGZwI7uPqUWx+4L\nTHL36UkgDyTvOzFjn2OBh8urlLKd/POVeyxqPnx4zL0zbRr85CexYNKuu2pglkhWrVrBokVpRyFU\nPQ5gafmUz+4+H/i4lid/iOqjmRn3ZyWPZdoG2NDMRprZWDPL+xaisjJ44w3Yc89owF2yBG6+OaZB\nue22aNjVyV+kEq1axT/QjBlpR1L0qioBbG1m5VM+G9Ap4z7ufkQdxtCLaFdoAbxmZq+luQB9Nl98\nEStlPf10jMw1g5//HM49V/X5IrWy225RRO7YET7+OAaKSSqqSgBHVrh/0xq8/2ygQ8b9LZPHMs0C\nvkjmG1pqZi8DOwE/SACXXHLJ97dLSkooKSlZg5Bqrqws5tC/775ou+rZM67u77kn6vV14hdZA5tt\nBvcmEwyffDJcf330kpA6UVpaSmlpaY32XaNuoDVlZo2Bj4hG4DnAGOAYd5+QsU834EZi6ulmwBvA\n0e7+YYX3qrdG4OXLowfPZZdFVc6vfhVVPWrIFalD33wDd94JZ58N770X855InVvbkcBrzN1XmtmZ\nwLOs6gY6wcxOi6f9dnefaGbPAO8CK4HbK57868vixfDXv8I118R38Xe/i0ZdXemL5EDLlnDWWbGC\nUa9eMHFizCEk9SanJYC6lKsSwPLlsVzif/8L998fc+lffjlsu22dH0pEKjNoELRtGwPENCS+TlVV\nAqhxXxUzK6glRpYtg7/8Jdav+NOf4udbb8GDD+rkL1LvLroo1g/YbrsYKTylth0OZU3UZC6gvsAw\noI27dzCznYBT3P2s+ggwI446KQHMmAF33BFjUPr1iySgqZVF8sTMmVEt1LRpXI3JWlvbEsANwKHA\nlwDuPh5ocFM1TJkSs9LusEN06XzlFXj8cZ38RfJK+/Zwyinw7bdpR1IUapIAGpWP5M2wMhfB5MKK\nFXHF37s3bL45TJgQI9F14hfJU82bx+hKybma9AKamVQDedKt8yxifp+89vXXMS/PdddF/f5TT8X4\nExHJc+tMepw0AAARsElEQVStF13yJOdqUgL4BXAuMaBrLrBb8lhemjAhVtLaait46aUYbzJ6tE7+\nIg2GEkC9qUkC+M7dh7j7xsk2JB8nbCsrg/PPjymXu3ePLsUPPwy7765+/CINStu2cSU3ZEjakRS8\nmlQBjTWzj4iF4B9x97yaxm/mzFgk/f77YeONYdKkqOsXkQZq001hwYL4h375Zdh777QjKljVlgDc\nvTNwBbAL8J6Z/cfM8iI1f/55DCBcsiQGc73zjk7+IgWhZcvoq33IIfDLX8bc61LnajUSOFkX4Hrg\nOHdvnLOosh97tXEAS5fCPvvExcGVV9ZnJCJSbz7/HDp0iKX1evdOO5oGaa3GAZhZSzM7zsxGEJO5\nzQN2r+MYa+3mm6F16xjFKyIFatNNY5TwEUfET5UE6lRNRgJPA0YAD7r7K/URVCVxrFYC2GWXmLRt\nQIMbkiYiteIOY8ZASUmUCLSYfK1UVQKoSQJo5O5lOYmsFjITwLRpMR//nDmwTk7nMxWRvNG2LYwd\nC+0qLiooVVmj6aDN7K/ufh7wsJn9IEvU4YpgtfboozF5oE7+IkWkTRuYN08JoA5VdQodnvxck5XA\ncurRR6PPv4gUkYEDoyF4+nQlgTpSaSOwu49JbnZ39xcyN6B7/YT3Q3Pnwrvvwr77phWBiKTiuutg\nzz3h/ffTjqRg1GQk8ElZHju5rgOpqccfjwuBdddNKwIRSc2uu8Lvf592FAWj0kZgMzsaGAKUACMz\nnmoFrOPu9dr/prwReOBAOOmkWKpRRIrM0qUxV9CTT8KBB0Ljeh2O1CCtUS8gM+sEdAb+DAzNeGoR\n8La7r6jrQKtiZv7NN84mm0RPsJYt6/PoIpI3bropFu/u3TvmgFFvkCqtVTfQfGFm/sorzumnqwpQ\npOjNmROTxX35Jbz+uq4Iq7BGI4HN7KXk5wIzm5+xLTCz+bkKtipTp0KPHmkcWUTyyhZbwMiR0Rg4\ndmza0TRYVTUCl9fxbwxskrGV3693n30Gm22WxpFFJO80agT77w+DB8N558UYAamVqrqBlo/+bQ80\ndveVQD/gNKBFPcT2A5MnQ+fOaRxZRPLSn/8cDcLPPRdVQVIrNekG+h9iOcjOwJ1AV+C+nEZViYUL\nYYMN0jiyiOStXXeNK8MV9dovpSDUJAGUJT1+jgBudPdzgFSG4S1apHmgRCSLpk1h+fK0o2hwarQk\npJkNBo4Hnkgea5K7kCo3f74SgIhk0bSpSgBroKYjgQcAf3H3Kcn4gPtzG1Z2kydD165pHFlE8lqT\nJioBrIFqR1C4+/tmdjbQxcy6AZPd/Y+5D+2H5s/Xko8ikoVKAGuk2gRgZnsBdwOzAQM2N7Pj3X10\nroOraNNNNehPRLJQCWCN1OR0eh1wsLt/CGBm3YmEUO8LdHboUN9HFJEGQY3Aa6QmbQBNy0/+AO4+\nAWiau5Aqp0FgIpJV8+YxHkDzxNRKTRLAW2Z2q5ntmWy3AG/nOrBslABEJKtzzol1Yg84AJ5/Pu1o\nGoyaJIDTgSnAb5NtCjEauN5tvHEaRxWRvLfhhnDVVXDZZfCjH8G//512RA1ClW0AZrYjMSX0o+7+\nl/oJqXLrr592BCKSt8zglFOiPeDqq2HAAF01VqOq2UB/R0wDcRzwnJllWxmsXikBiEi1jjwy+ov3\n65d2JHmvqiqg44Ae7j4Y6AP8on5CqpySuYhUq0ULePRRmDULhg9PO5q8VlUCWObu3wK4+7xq9q2U\nmQ00s4lm9rGZnV/Ffn3MbIWZHVHZPu1SmYFIRBqcxo0jCZx8Mtx7b9rR5K2qloT8Cnix/C4xHUT5\nfdy90hN1xns0Aj4G9gU+BcYCQ9x9Ypb9ngOWAP/n7o9keS+fMcNp374Gn0pEBOCWW+Cxx+Dpp9OO\nJDVVrQhWVSPwkRXu37QGx+4LTHL36UkgDwCDgIkV9jsLeIioaqqUuoGKSK1svz3cl8rs9Q1CpQnA\n3V+og/dvB8zMuD+LSArfM7O2wOHuPsDMVnuuoqapDD8TkQarRQv49tu0o8hba1SvX8euBzLbBrIW\nVUREaq1FC1i8OO0o8laup1abDWTO4LNl8lim3sADZmbEesMHmdkKd3+84ptdcskl398uKSmhpKSk\nruMVkUKy0Ubw0Ufwt7/BGWfEpHEFrrS0lNLS0hrtW2kj8A92NGvm7stqE4iZNQY+IhqB5wBjgGOS\n+YSy7X8nMKKyRuCaxioi8r1XXoHTT4cLLoD/+Z+0o6l3VTUCV1sFZGZ9zew9YFJyfyczu7EmB04W\nkj8TeBb4AHjA3SeY2Wlm9vNsL6nJ+4qI1Nhee8EvfgE//zl88EHa0eSVaksAZvY6cDTwH3ffOXns\nfXffoR7iy4xDJQARWTPuMVfQlVfC+edHaaBIrFUJAGhU3o0zw8q1D0tEpJ6YwdChcOON8EJddHAs\nDDVpBJ6ZdM/0pE7/LGJwl4hIw9KuHazU9Wu5mpQAfgGcS/TmmQvsRh7MCyQiUmtNmmjt4Aw1WRT+\nc2BIPcQiIpJb66wD332XdhR5oyaLwt9Blt457p6tF4+ISP5SCWA1NWkDyFxfbV3gx6w+vYOISMOg\nBLCamlQBrTahtpndDYzKWUQiIrmiBLCaNZkLqBOgeTlFpOFRG8BqatIGsIBVbQCNgPnA0FwGJSKS\nEyoBrKa6ReEN2IlVE7iVaTiuiDRYTZrA11/HFNEtWqQdTeqqrAJKTvZPufvKZNPJX0Qark02gZ12\ngvbt4bzzYO7ctCNKVU3aAN4xs51zHomISK41awalpfDGGzBpUkwNUcQqTQBmVl49tDMw1sw+MrO3\nzOxtM3urfsITEcmBrl3hsMNg9GhYsCDtaFJTVQlgTPLzMGBb4GBgMHBU8lNEpOHab7/42akTXHxx\nurGkpNLpoM3s7fLpn/OBpoMWkZx49NFIAO++m3YkOVHVdNBV9QLaxMzOrexJd792rSMTEUlbly6x\nXkARqioBNAZaokXaRaSQNW0Ky5enHUUqqkoAc9z9snqLREQkDU2bwrJaLXdeMKpqBNaVv4gUvmbN\nirYEUFUC2LfeohARSUsRVwFVmgDcfX59BiIikgolABGRIrXuurB0KRx6KNx1V9rR1CslABEpbk2b\nwkcfwfHHw9lnx1YkKh0Ilm80EExEcu6dd6BfP1iyJO1I6kxVA8GUAEREypWVrVo0plFhVJBUlQAK\n4xOKiNSFRo2iW+jSpWlHUi+UAEREMq23XkFVAVVFCUBEJFNZGTz/fFHMD6QEICKS6fe/hyFD4L33\n0o4k55QAREQy/frXsNdeRbFQjBKAiEhFrVvDwoVpR5FzSgAiIhW1agWPP552FDmnBCAiUtEJJ8BD\nD6UdRc5pIJiISEUrVsQUEd9+C82bpx3NWtFAMBGR2mjSBPr0gWeeSTuSnFICEBHJpn9/uOWWtKPI\nqZwnADMbaGYTzexjMzs/y/PHmtn4ZBtlZjvmOiYRkWr97nfw/vvw05/Chx+mHU1O5DQBmFkj4Cbg\nQGB74Bgz61ZhtynA3u6+E3AFcEcuYxIRqZENNoDx46FbNxgwAI44ItoGCkiuSwB9gUnuPt3dVwAP\nAIMyd3D319396+Tu60C7HMckIlIzm2wSJYEZM2Kq6FdfTTuiOpXrBNAOmJlxfxZVn+BPAf6b04hE\nRGqrWTPo2DHmCSog66QdQDkzGwCcCOxZ2T6XXHLJ97dLSkooKSnJeVwiIt9rAF3RS0tLKS0trdG+\nOR0HYGa7AZe4+8Dk/lDA3f2qCvv1AB4GBrr7J5W8l8YBiEh69tkHLrwQ9t037UhqJc1xAGOBLmbW\n0cyaAkOA1cZXm1kH4uR/fGUnfxGR1FnWc2iDltMqIHdfaWZnAs8SyWaYu08ws9Piab8duAjYELjZ\nzAxY4e59cxmXiMgaKbBaiJy3Abj708C2FR67LeP2qcCpuY5DRGStFGAJQCOBRURqqsBKAEoAIiI1\noRKAiEgRUwlARKQIqQQgIlLEVAIQESlCKgGIiBQxlQBERIqQSgAiIkVMJQARkSKkEoCISBFTCUBE\npAiZKQGIiBQlVQGJiBQxlQBERIqQSgAiIkVMJQARkSKkEoCISBFTCUBEpAipBCAiUsRUAhARKUIq\nAYiIFDGVAEREipBKACIiRUwlABGRIqQSgIhIEVMJQESkCGk6aBGRIqUqIBGRIqYSgIhIEVIJQESk\niKkEICJShFQCEBEpYioBiIgUIZUARESKmEoAtWNmA81sopl9bGbnV7LPDWY2yczeMbOeuY5JRKTW\nVAKoHTNrBNwEHAhsDxxjZt0q7HMQ0NnduwKnAbfmMqa0lJaWph3CWlH86WrI8Tfk2KFC/CoB1Epf\nYJK7T3f3FcADwKAK+wwC/gXg7m8AbcxssxzHVe8K6p+gAVL86WnIsUNG/CoB1Fo7YGbG/VnJY1Xt\nMzvLPiIi6VMJQESkCBXgZHDmOfxAZrYbcIm7D0zuDwXc3a/K2OdWYKS7D0/uTwT6u/vcCu9VWL95\nEZF64u5Z66/WyfFxxwJdzKwjMAcYAhxTYZ/HgV8Cw5OE8VXFkz9U/gFERGTN5DQBuPtKMzsTeJao\nbhrm7hPM7LR42m9396fM7GAzmwx8C5yYy5hERCTktApIRETylxqBRUSKlBKAiEiRUgIQESlSSgAi\nIkVKCUBEpEgpAUjqzGyYmc01s3er2e9sM/vQzO6uYp/+Zjai7qOsPTP7kZn9Nrk9KHMiRDO71Mz2\nqcdY+ptZv/o6njQMSgCSD+4kZoytzi+A/dz9+Gr2y4u+ze4+wt3/ktw9nJgRt/y5i939xbo8npk1\nruLpEmD3ujyeNHxKAJI6dx8FLKhqHzO7Bdga+K+Z/a+Z9TGzV81snJmNMrOuWV7T38zeNrO3kv1a\nJI//2szGJOtPXFzJ8RaZ2bVm9r6ZPWdmGyWP9zSz15LXPmxmbZLHzzazD5LH70seO8HMbkyuvA8D\n/pLE0snM7jSzI8zsQDN7sELMI5LbBySf8U0zG25mzbPEOdLMrjOzMcDZZnaomb2efN5nzWyTZCT+\n6cCvkuPvYWYbm9lDZvZGsik5FCN316Yt9Q3oCLxbzT5TgA2S2y2BRsntfYGHktv9gceT248D/ZLb\nzYHGwP7AbcljBowA9sxyrDJgSHL7IuCG5Pb48v2BS4Frk9uzgSbJ7dbJzxMyXncncETG+98JHJHE\nNA1YL3n8ZmK6lI2AlzIe/y1wUZY4RwI3Zdxvk3H7ZODq5PbFwLkZz90L7J7cbg98mPZ3QFv9b7me\nC0ikLlmyAawP/Cu58neyT2syGrjOzO4FHnH32WZ2ALC/mb2VvFcLoCswqsJrVwLlV+b3AA+bWWvi\nBFu+7z8z9hkP3Gdm/wH+U9MP5DFdytPAj8zsYeAQ4DdElc12wGgzM6AJ8FolbzM843b7pESxRfKa\nqZW8Zj+ge/LeAC3NrLm7L65p7NLwKQFIXjKzLYmrcwdudffbK+xyOfCiux+RVHGMrPge7n6VmT1B\nnFRHmdlA4qT/Z3e/o5YhlbcrVDYp4SHA3kRVz4VmtkMt3ns4cCZRDTbW3b9NTszPuvtxNXj9txm3\nbwSucfcnzaw/ceWfjQG7eizUJEVKbQCSLzKv7nH3We6+s7v3ynLyB2hNVLtAJRMImtnW7v6BR0Ps\nm8C2wDPASRntAW3NbJMsL28MHJXcPg4Y5e4Lgflmtkfy+PFENQ1AB3d/CRiaxNaywvstSh7P5iWg\nF3AqsWoewOvAHmbWOYmzebZ2jixaA58mt0+o4vjPAv9bfsfMdqrBe0uBUQKQ1CWNpq8C25jZDDOr\nbEbYzN49VwNXmtk4Kv8e/8rM3jOzd4DlwH/d/TngPuC1pNvpv/nhyRriqrqvmb1HVMdcljx+AnBN\n8p47AZeZ2TrAPWY2HhgH/C1JFpkeAH6TNM52yvws7l4GPAEMTH7i7l8APwPuT973VSKBVfU7gWiX\neMjMxgLzMh4fAfy4vBEYOBvobWbjzex9Yj1uKTKaDVQkCzNb5O6t0o5DJJdUAhDJTldGUvBUAhAR\nKVIqAYiIFCklABGRIqUEICJSpJQARESKlBKAiEiRUgIQESlS/w/fhk21wz6W/AAAAABJRU5ErkJg\ngg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x137af5c0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(1)\n",
"plt.plot(roc['tpr'])\n",
"plt.plot(roc['1-fpr'], color = 'red')\n",
"plt.xlabel('1-false positive rate')\n",
"plt.ylabel('True Positive Rate')\n",
"plt.title('Receiver Operating Characteristic')\n",
"ax.set_xticklabels([])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Quick Insights\n",
"1.The optimal cut off point from the above graph can be deduced to be 0.06\n",
"\n",
"2.Anything above this value can be labelled as 1 \n",
"\n",
"3.Anything below can be labelled as 0\n",
"\n",
"4.The TPR at the threshold is 76%\n",
"\n",
"5.The FPR at threshold is 23%"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Rebuilding the random forest model with this additional information"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"prob = clf_forest.predict_proba(X_test)[:,1]\n",
"prob[prob > 0.06] = 1\n",
"prob[prob <= 0.06] = 0"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0., 0., 1., ..., 0., 0., 1.])"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prob"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.97 0.78 0.87 34184\n",
" 1 0.27 0.76 0.40 3594\n",
"\n",
"avg / total 0.90 0.78 0.82 37778\n",
"\n"
]
}
],
"source": [
"print classification_report(y_test, prob)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Conclusion\n",
"\n",
"As it can be seen from the above table, precision has come down to 26% whereas recall/sensitivity has gone up to 77% from a mere 56% in the previous model\n",
"\n",
"In case of fraudulent activities the cost of a False Negative is much more expensive than the cost of a False Positive. \n",
"\n",
"Hence, it is alright to predict more customers as falsely positive of fraud rather than let a fraudulent customer get away with the act\n",
"\n",
"With more customers predicted as 1, it will decrease precision but increase sensitivity\n",
"\n",
"The wrongly suspected customers can be made to go through an additional security check either in the form of answering a\n",
"personal question or request for SSN or temporarily freezing the account etc. \n",
"\n",
"At the same time with this new model, customers wouldn't be able to get away with fraud"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment