Skip to content

Instantly share code, notes, and snippets.

@codistwa
Created December 21, 2022 16:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save codistwa/1002809425602457e2d5108ce12842ad to your computer and use it in GitHub Desktop.
Save codistwa/1002809425602457e2d5108ce12842ad to your computer and use it in GitHub Desktop.
Machine Learning Project : Credit Card Fraud Detection
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "turkish-zealand",
"metadata": {},
"source": [
"# Fraud detection\n",
"\n",
"## Problem statement\n",
"\n",
"Whether this credit-card is fraudulent or not.\n",
"\n",
"## Dataset\n",
"[https://www.kaggle.com/datasets/yashpaloswal/fraud-detection-credit-card](https://www.kaggle.com/datasets/yashpaloswal/fraud-detection-credit-card)"
]
},
{
"cell_type": "markdown",
"id": "alpine-recording",
"metadata": {},
"source": [
"## 1. Importing Libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "external-listing",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np \n",
"import pandas as pd \n",
"\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "hawaiian-edgar",
"metadata": {},
"source": [
"## 2. Loading the dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "efficient-garlic",
"metadata": {},
"outputs": [],
"source": [
"train = './creditcard.csv'\n",
"df_train = pd.read_csv(train)"
]
},
{
"cell_type": "markdown",
"id": "immediate-dollar",
"metadata": {},
"source": [
"## 3. Exploratory data analysis"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "passing-attendance",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>...</th>\n",
" <th>V21</th>\n",
" <th>V22</th>\n",
" <th>V23</th>\n",
" <th>V24</th>\n",
" <th>V25</th>\n",
" <th>V26</th>\n",
" <th>V27</th>\n",
" <th>V28</th>\n",
" <th>Amount</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.0</td>\n",
" <td>-1.359807</td>\n",
" <td>-0.072781</td>\n",
" <td>2.536347</td>\n",
" <td>1.378155</td>\n",
" <td>-0.338321</td>\n",
" <td>0.462388</td>\n",
" <td>0.239599</td>\n",
" <td>0.098698</td>\n",
" <td>0.363787</td>\n",
" <td>...</td>\n",
" <td>-0.018307</td>\n",
" <td>0.277838</td>\n",
" <td>-0.110474</td>\n",
" <td>0.066928</td>\n",
" <td>0.128539</td>\n",
" <td>-0.189115</td>\n",
" <td>0.133558</td>\n",
" <td>-0.021053</td>\n",
" <td>149.62</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0</td>\n",
" <td>1.191857</td>\n",
" <td>0.266151</td>\n",
" <td>0.166480</td>\n",
" <td>0.448154</td>\n",
" <td>0.060018</td>\n",
" <td>-0.082361</td>\n",
" <td>-0.078803</td>\n",
" <td>0.085102</td>\n",
" <td>-0.255425</td>\n",
" <td>...</td>\n",
" <td>-0.225775</td>\n",
" <td>-0.638672</td>\n",
" <td>0.101288</td>\n",
" <td>-0.339846</td>\n",
" <td>0.167170</td>\n",
" <td>0.125895</td>\n",
" <td>-0.008983</td>\n",
" <td>0.014724</td>\n",
" <td>2.69</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>-1.358354</td>\n",
" <td>-1.340163</td>\n",
" <td>1.773209</td>\n",
" <td>0.379780</td>\n",
" <td>-0.503198</td>\n",
" <td>1.800499</td>\n",
" <td>0.791461</td>\n",
" <td>0.247676</td>\n",
" <td>-1.514654</td>\n",
" <td>...</td>\n",
" <td>0.247998</td>\n",
" <td>0.771679</td>\n",
" <td>0.909412</td>\n",
" <td>-0.689281</td>\n",
" <td>-0.327642</td>\n",
" <td>-0.139097</td>\n",
" <td>-0.055353</td>\n",
" <td>-0.059752</td>\n",
" <td>378.66</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>-0.966272</td>\n",
" <td>-0.185226</td>\n",
" <td>1.792993</td>\n",
" <td>-0.863291</td>\n",
" <td>-0.010309</td>\n",
" <td>1.247203</td>\n",
" <td>0.237609</td>\n",
" <td>0.377436</td>\n",
" <td>-1.387024</td>\n",
" <td>...</td>\n",
" <td>-0.108300</td>\n",
" <td>0.005274</td>\n",
" <td>-0.190321</td>\n",
" <td>-1.175575</td>\n",
" <td>0.647376</td>\n",
" <td>-0.221929</td>\n",
" <td>0.062723</td>\n",
" <td>0.061458</td>\n",
" <td>123.50</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2.0</td>\n",
" <td>-1.158233</td>\n",
" <td>0.877737</td>\n",
" <td>1.548718</td>\n",
" <td>0.403034</td>\n",
" <td>-0.407193</td>\n",
" <td>0.095921</td>\n",
" <td>0.592941</td>\n",
" <td>-0.270533</td>\n",
" <td>0.817739</td>\n",
" <td>...</td>\n",
" <td>-0.009431</td>\n",
" <td>0.798278</td>\n",
" <td>-0.137458</td>\n",
" <td>0.141267</td>\n",
" <td>-0.206010</td>\n",
" <td>0.502292</td>\n",
" <td>0.219422</td>\n",
" <td>0.215153</td>\n",
" <td>69.99</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 31 columns</p>\n",
"</div>"
],
"text/plain": [
" Time V1 V2 V3 V4 V5 V6 V7 \\\n",
"0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 \n",
"1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 \n",
"2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 \n",
"3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 \n",
"4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 \n",
"\n",
" V8 V9 ... V21 V22 V23 V24 V25 \\\n",
"0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 \n",
"1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 \n",
"2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 \n",
"3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 \n",
"4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 \n",
"\n",
" V26 V27 V28 Amount class \n",
"0 -0.189115 0.133558 -0.021053 149.62 0 \n",
"1 0.125895 -0.008983 0.014724 2.69 0 \n",
"2 -0.139097 -0.055353 -0.059752 378.66 0 \n",
"3 -0.221929 0.062723 0.061458 123.50 0 \n",
"4 0.502292 0.219422 0.215153 69.99 0 \n",
"\n",
"[5 rows x 31 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# View the first five lines\n",
"df_train.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "durable-morocco",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(284807, 31)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train.shape"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "center-oakland",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 284807 entries, 0 to 284806\n",
"Data columns (total 31 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Time 284807 non-null float64\n",
" 1 V1 284807 non-null float64\n",
" 2 V2 284807 non-null float64\n",
" 3 V3 284807 non-null float64\n",
" 4 V4 284807 non-null float64\n",
" 5 V5 284807 non-null float64\n",
" 6 V6 284807 non-null float64\n",
" 7 V7 284807 non-null float64\n",
" 8 V8 284807 non-null float64\n",
" 9 V9 284807 non-null float64\n",
" 10 V10 284807 non-null float64\n",
" 11 V11 284807 non-null float64\n",
" 12 V12 284807 non-null float64\n",
" 13 V13 284807 non-null float64\n",
" 14 V14 284807 non-null float64\n",
" 15 V15 284807 non-null float64\n",
" 16 V16 284807 non-null float64\n",
" 17 V17 284807 non-null float64\n",
" 18 V18 284807 non-null float64\n",
" 19 V19 284807 non-null float64\n",
" 20 V20 284807 non-null float64\n",
" 21 V21 284807 non-null float64\n",
" 22 V22 284807 non-null float64\n",
" 23 V23 284807 non-null float64\n",
" 24 V24 284807 non-null float64\n",
" 25 V25 284807 non-null float64\n",
" 26 V26 284807 non-null float64\n",
" 27 V27 284807 non-null float64\n",
" 28 V28 284807 non-null float64\n",
" 29 Amount 284807 non-null float64\n",
" 30 class 284807 non-null int64 \n",
"dtypes: float64(30), int64(1)\n",
"memory usage: 67.4 MB\n"
]
}
],
"source": [
"# checking data information\n",
"df_train.info()"
]
},
{
"cell_type": "markdown",
"id": "informed-elements",
"metadata": {},
"source": [
"**Interpreting Data Information**\n",
"\n",
"- We have 284807 rows, any column that contains lesser number of rows has missing values.\n",
"- We have 31 columns.\n",
"- There are numerical features that have data type float64.\n",
"- There are numerical features that have data type int64.\n",
"\n",
"No missing columns."
]
},
{
"cell_type": "markdown",
"id": "acoustic-prefix",
"metadata": {},
"source": [
"**putting all column names in lowercase**"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "african-timothy",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>time</th>\n",
" <th>v1</th>\n",
" <th>v2</th>\n",
" <th>v3</th>\n",
" <th>v4</th>\n",
" <th>v5</th>\n",
" <th>v6</th>\n",
" <th>v7</th>\n",
" <th>v8</th>\n",
" <th>v9</th>\n",
" <th>...</th>\n",
" <th>v21</th>\n",
" <th>v22</th>\n",
" <th>v23</th>\n",
" <th>v24</th>\n",
" <th>v25</th>\n",
" <th>v26</th>\n",
" <th>v27</th>\n",
" <th>v28</th>\n",
" <th>amount</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.0</td>\n",
" <td>-1.359807</td>\n",
" <td>-0.072781</td>\n",
" <td>2.536347</td>\n",
" <td>1.378155</td>\n",
" <td>-0.338321</td>\n",
" <td>0.462388</td>\n",
" <td>0.239599</td>\n",
" <td>0.098698</td>\n",
" <td>0.363787</td>\n",
" <td>...</td>\n",
" <td>-0.018307</td>\n",
" <td>0.277838</td>\n",
" <td>-0.110474</td>\n",
" <td>0.066928</td>\n",
" <td>0.128539</td>\n",
" <td>-0.189115</td>\n",
" <td>0.133558</td>\n",
" <td>-0.021053</td>\n",
" <td>149.62</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0</td>\n",
" <td>1.191857</td>\n",
" <td>0.266151</td>\n",
" <td>0.166480</td>\n",
" <td>0.448154</td>\n",
" <td>0.060018</td>\n",
" <td>-0.082361</td>\n",
" <td>-0.078803</td>\n",
" <td>0.085102</td>\n",
" <td>-0.255425</td>\n",
" <td>...</td>\n",
" <td>-0.225775</td>\n",
" <td>-0.638672</td>\n",
" <td>0.101288</td>\n",
" <td>-0.339846</td>\n",
" <td>0.167170</td>\n",
" <td>0.125895</td>\n",
" <td>-0.008983</td>\n",
" <td>0.014724</td>\n",
" <td>2.69</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>-1.358354</td>\n",
" <td>-1.340163</td>\n",
" <td>1.773209</td>\n",
" <td>0.379780</td>\n",
" <td>-0.503198</td>\n",
" <td>1.800499</td>\n",
" <td>0.791461</td>\n",
" <td>0.247676</td>\n",
" <td>-1.514654</td>\n",
" <td>...</td>\n",
" <td>0.247998</td>\n",
" <td>0.771679</td>\n",
" <td>0.909412</td>\n",
" <td>-0.689281</td>\n",
" <td>-0.327642</td>\n",
" <td>-0.139097</td>\n",
" <td>-0.055353</td>\n",
" <td>-0.059752</td>\n",
" <td>378.66</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>-0.966272</td>\n",
" <td>-0.185226</td>\n",
" <td>1.792993</td>\n",
" <td>-0.863291</td>\n",
" <td>-0.010309</td>\n",
" <td>1.247203</td>\n",
" <td>0.237609</td>\n",
" <td>0.377436</td>\n",
" <td>-1.387024</td>\n",
" <td>...</td>\n",
" <td>-0.108300</td>\n",
" <td>0.005274</td>\n",
" <td>-0.190321</td>\n",
" <td>-1.175575</td>\n",
" <td>0.647376</td>\n",
" <td>-0.221929</td>\n",
" <td>0.062723</td>\n",
" <td>0.061458</td>\n",
" <td>123.50</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2.0</td>\n",
" <td>-1.158233</td>\n",
" <td>0.877737</td>\n",
" <td>1.548718</td>\n",
" <td>0.403034</td>\n",
" <td>-0.407193</td>\n",
" <td>0.095921</td>\n",
" <td>0.592941</td>\n",
" <td>-0.270533</td>\n",
" <td>0.817739</td>\n",
" <td>...</td>\n",
" <td>-0.009431</td>\n",
" <td>0.798278</td>\n",
" <td>-0.137458</td>\n",
" <td>0.141267</td>\n",
" <td>-0.206010</td>\n",
" <td>0.502292</td>\n",
" <td>0.219422</td>\n",
" <td>0.215153</td>\n",
" <td>69.99</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284802</th>\n",
" <td>172786.0</td>\n",
" <td>-11.881118</td>\n",
" <td>10.071785</td>\n",
" <td>-9.834783</td>\n",
" <td>-2.066656</td>\n",
" <td>-5.364473</td>\n",
" <td>-2.606837</td>\n",
" <td>-4.918215</td>\n",
" <td>7.305334</td>\n",
" <td>1.914428</td>\n",
" <td>...</td>\n",
" <td>0.213454</td>\n",
" <td>0.111864</td>\n",
" <td>1.014480</td>\n",
" <td>-0.509348</td>\n",
" <td>1.436807</td>\n",
" <td>0.250034</td>\n",
" <td>0.943651</td>\n",
" <td>0.823731</td>\n",
" <td>0.77</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284803</th>\n",
" <td>172787.0</td>\n",
" <td>-0.732789</td>\n",
" <td>-0.055080</td>\n",
" <td>2.035030</td>\n",
" <td>-0.738589</td>\n",
" <td>0.868229</td>\n",
" <td>1.058415</td>\n",
" <td>0.024330</td>\n",
" <td>0.294869</td>\n",
" <td>0.584800</td>\n",
" <td>...</td>\n",
" <td>0.214205</td>\n",
" <td>0.924384</td>\n",
" <td>0.012463</td>\n",
" <td>-1.016226</td>\n",
" <td>-0.606624</td>\n",
" <td>-0.395255</td>\n",
" <td>0.068472</td>\n",
" <td>-0.053527</td>\n",
" <td>24.79</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284804</th>\n",
" <td>172788.0</td>\n",
" <td>1.919565</td>\n",
" <td>-0.301254</td>\n",
" <td>-3.249640</td>\n",
" <td>-0.557828</td>\n",
" <td>2.630515</td>\n",
" <td>3.031260</td>\n",
" <td>-0.296827</td>\n",
" <td>0.708417</td>\n",
" <td>0.432454</td>\n",
" <td>...</td>\n",
" <td>0.232045</td>\n",
" <td>0.578229</td>\n",
" <td>-0.037501</td>\n",
" <td>0.640134</td>\n",
" <td>0.265745</td>\n",
" <td>-0.087371</td>\n",
" <td>0.004455</td>\n",
" <td>-0.026561</td>\n",
" <td>67.88</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284805</th>\n",
" <td>172788.0</td>\n",
" <td>-0.240440</td>\n",
" <td>0.530483</td>\n",
" <td>0.702510</td>\n",
" <td>0.689799</td>\n",
" <td>-0.377961</td>\n",
" <td>0.623708</td>\n",
" <td>-0.686180</td>\n",
" <td>0.679145</td>\n",
" <td>0.392087</td>\n",
" <td>...</td>\n",
" <td>0.265245</td>\n",
" <td>0.800049</td>\n",
" <td>-0.163298</td>\n",
" <td>0.123205</td>\n",
" <td>-0.569159</td>\n",
" <td>0.546668</td>\n",
" <td>0.108821</td>\n",
" <td>0.104533</td>\n",
" <td>10.00</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284806</th>\n",
" <td>172792.0</td>\n",
" <td>-0.533413</td>\n",
" <td>-0.189733</td>\n",
" <td>0.703337</td>\n",
" <td>-0.506271</td>\n",
" <td>-0.012546</td>\n",
" <td>-0.649617</td>\n",
" <td>1.577006</td>\n",
" <td>-0.414650</td>\n",
" <td>0.486180</td>\n",
" <td>...</td>\n",
" <td>0.261057</td>\n",
" <td>0.643078</td>\n",
" <td>0.376777</td>\n",
" <td>0.008797</td>\n",
" <td>-0.473649</td>\n",
" <td>-0.818267</td>\n",
" <td>-0.002415</td>\n",
" <td>0.013649</td>\n",
" <td>217.00</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>284807 rows × 31 columns</p>\n",
"</div>"
],
"text/plain": [
" time v1 v2 v3 v4 v5 \\\n",
"0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 \n",
"1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 \n",
"2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 \n",
"3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 \n",
"4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 \n",
"... ... ... ... ... ... ... \n",
"284802 172786.0 -11.881118 10.071785 -9.834783 -2.066656 -5.364473 \n",
"284803 172787.0 -0.732789 -0.055080 2.035030 -0.738589 0.868229 \n",
"284804 172788.0 1.919565 -0.301254 -3.249640 -0.557828 2.630515 \n",
"284805 172788.0 -0.240440 0.530483 0.702510 0.689799 -0.377961 \n",
"284806 172792.0 -0.533413 -0.189733 0.703337 -0.506271 -0.012546 \n",
"\n",
" v6 v7 v8 v9 ... v21 v22 \\\n",
"0 0.462388 0.239599 0.098698 0.363787 ... -0.018307 0.277838 \n",
"1 -0.082361 -0.078803 0.085102 -0.255425 ... -0.225775 -0.638672 \n",
"2 1.800499 0.791461 0.247676 -1.514654 ... 0.247998 0.771679 \n",
"3 1.247203 0.237609 0.377436 -1.387024 ... -0.108300 0.005274 \n",
"4 0.095921 0.592941 -0.270533 0.817739 ... -0.009431 0.798278 \n",
"... ... ... ... ... ... ... ... \n",
"284802 -2.606837 -4.918215 7.305334 1.914428 ... 0.213454 0.111864 \n",
"284803 1.058415 0.024330 0.294869 0.584800 ... 0.214205 0.924384 \n",
"284804 3.031260 -0.296827 0.708417 0.432454 ... 0.232045 0.578229 \n",
"284805 0.623708 -0.686180 0.679145 0.392087 ... 0.265245 0.800049 \n",
"284806 -0.649617 1.577006 -0.414650 0.486180 ... 0.261057 0.643078 \n",
"\n",
" v23 v24 v25 v26 v27 v28 amount \\\n",
"0 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 149.62 \n",
"1 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 2.69 \n",
"2 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 378.66 \n",
"3 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 123.50 \n",
"4 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 69.99 \n",
"... ... ... ... ... ... ... ... \n",
"284802 1.014480 -0.509348 1.436807 0.250034 0.943651 0.823731 0.77 \n",
"284803 0.012463 -1.016226 -0.606624 -0.395255 0.068472 -0.053527 24.79 \n",
"284804 -0.037501 0.640134 0.265745 -0.087371 0.004455 -0.026561 67.88 \n",
"284805 -0.163298 0.123205 -0.569159 0.546668 0.108821 0.104533 10.00 \n",
"284806 0.376777 0.008797 -0.473649 -0.818267 -0.002415 0.013649 217.00 \n",
"\n",
" class \n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"... ... \n",
"284802 0 \n",
"284803 0 \n",
"284804 0 \n",
"284805 0 \n",
"284806 0 \n",
"\n",
"[284807 rows x 31 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train.columns = df_train.columns.str.lower()\n",
"df_train"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "pediatric-manchester",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>time</th>\n",
" <th>v1</th>\n",
" <th>v2</th>\n",
" <th>v3</th>\n",
" <th>v4</th>\n",
" <th>v5</th>\n",
" <th>v6</th>\n",
" <th>v7</th>\n",
" <th>v8</th>\n",
" <th>v9</th>\n",
" <th>...</th>\n",
" <th>v21</th>\n",
" <th>v22</th>\n",
" <th>v23</th>\n",
" <th>v24</th>\n",
" <th>v25</th>\n",
" <th>v26</th>\n",
" <th>v27</th>\n",
" <th>v28</th>\n",
" <th>amount</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>284807.000000</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>...</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>284807.000000</td>\n",
" <td>284807.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>94813.859575</td>\n",
" <td>1.759061e-12</td>\n",
" <td>-8.251130e-13</td>\n",
" <td>-9.654937e-13</td>\n",
" <td>8.321385e-13</td>\n",
" <td>1.649999e-13</td>\n",
" <td>4.248366e-13</td>\n",
" <td>-3.054600e-13</td>\n",
" <td>8.777971e-14</td>\n",
" <td>-1.179749e-12</td>\n",
" <td>...</td>\n",
" <td>-3.405756e-13</td>\n",
" <td>-5.723197e-13</td>\n",
" <td>-9.725856e-13</td>\n",
" <td>1.464150e-12</td>\n",
" <td>-6.987102e-13</td>\n",
" <td>-5.617874e-13</td>\n",
" <td>3.332082e-12</td>\n",
" <td>-3.518874e-12</td>\n",
" <td>88.349619</td>\n",
" <td>0.001727</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>47488.145955</td>\n",
" <td>1.958696e+00</td>\n",
" <td>1.651309e+00</td>\n",
" <td>1.516255e+00</td>\n",
" <td>1.415869e+00</td>\n",
" <td>1.380247e+00</td>\n",
" <td>1.332271e+00</td>\n",
" <td>1.237094e+00</td>\n",
" <td>1.194353e+00</td>\n",
" <td>1.098632e+00</td>\n",
" <td>...</td>\n",
" <td>7.345240e-01</td>\n",
" <td>7.257016e-01</td>\n",
" <td>6.244603e-01</td>\n",
" <td>6.056471e-01</td>\n",
" <td>5.212781e-01</td>\n",
" <td>4.822270e-01</td>\n",
" <td>4.036325e-01</td>\n",
" <td>3.300833e-01</td>\n",
" <td>250.120109</td>\n",
" <td>0.041527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>-5.640751e+01</td>\n",
" <td>-7.271573e+01</td>\n",
" <td>-4.832559e+01</td>\n",
" <td>-5.683171e+00</td>\n",
" <td>-1.137433e+02</td>\n",
" <td>-2.616051e+01</td>\n",
" <td>-4.355724e+01</td>\n",
" <td>-7.321672e+01</td>\n",
" <td>-1.343407e+01</td>\n",
" <td>...</td>\n",
" <td>-3.483038e+01</td>\n",
" <td>-1.093314e+01</td>\n",
" <td>-4.480774e+01</td>\n",
" <td>-2.836627e+00</td>\n",
" <td>-1.029540e+01</td>\n",
" <td>-2.604551e+00</td>\n",
" <td>-2.256568e+01</td>\n",
" <td>-1.543008e+01</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>54201.500000</td>\n",
" <td>-9.203734e-01</td>\n",
" <td>-5.985499e-01</td>\n",
" <td>-8.903648e-01</td>\n",
" <td>-8.486401e-01</td>\n",
" <td>-6.915971e-01</td>\n",
" <td>-7.682956e-01</td>\n",
" <td>-5.540759e-01</td>\n",
" <td>-2.086297e-01</td>\n",
" <td>-6.430976e-01</td>\n",
" <td>...</td>\n",
" <td>-2.283949e-01</td>\n",
" <td>-5.423504e-01</td>\n",
" <td>-1.618463e-01</td>\n",
" <td>-3.545861e-01</td>\n",
" <td>-3.171451e-01</td>\n",
" <td>-3.269839e-01</td>\n",
" <td>-7.083953e-02</td>\n",
" <td>-5.295979e-02</td>\n",
" <td>5.600000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>84692.000000</td>\n",
" <td>1.810880e-02</td>\n",
" <td>6.548556e-02</td>\n",
" <td>1.798463e-01</td>\n",
" <td>-1.984653e-02</td>\n",
" <td>-5.433583e-02</td>\n",
" <td>-2.741871e-01</td>\n",
" <td>4.010308e-02</td>\n",
" <td>2.235804e-02</td>\n",
" <td>-5.142873e-02</td>\n",
" <td>...</td>\n",
" <td>-2.945017e-02</td>\n",
" <td>6.781943e-03</td>\n",
" <td>-1.119293e-02</td>\n",
" <td>4.097606e-02</td>\n",
" <td>1.659350e-02</td>\n",
" <td>-5.213911e-02</td>\n",
" <td>1.342146e-03</td>\n",
" <td>1.124383e-02</td>\n",
" <td>22.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>139320.500000</td>\n",
" <td>1.315642e+00</td>\n",
" <td>8.037239e-01</td>\n",
" <td>1.027196e+00</td>\n",
" <td>7.433413e-01</td>\n",
" <td>6.119264e-01</td>\n",
" <td>3.985649e-01</td>\n",
" <td>5.704361e-01</td>\n",
" <td>3.273459e-01</td>\n",
" <td>5.971390e-01</td>\n",
" <td>...</td>\n",
" <td>1.863772e-01</td>\n",
" <td>5.285536e-01</td>\n",
" <td>1.476421e-01</td>\n",
" <td>4.395266e-01</td>\n",
" <td>3.507156e-01</td>\n",
" <td>2.409522e-01</td>\n",
" <td>9.104512e-02</td>\n",
" <td>7.827995e-02</td>\n",
" <td>77.165000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>172792.000000</td>\n",
" <td>2.454930e+00</td>\n",
" <td>2.205773e+01</td>\n",
" <td>9.382558e+00</td>\n",
" <td>1.687534e+01</td>\n",
" <td>3.480167e+01</td>\n",
" <td>7.330163e+01</td>\n",
" <td>1.205895e+02</td>\n",
" <td>2.000721e+01</td>\n",
" <td>1.559499e+01</td>\n",
" <td>...</td>\n",
" <td>2.720284e+01</td>\n",
" <td>1.050309e+01</td>\n",
" <td>2.252841e+01</td>\n",
" <td>4.584549e+00</td>\n",
" <td>7.519589e+00</td>\n",
" <td>3.517346e+00</td>\n",
" <td>3.161220e+01</td>\n",
" <td>3.384781e+01</td>\n",
" <td>25691.160000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows × 31 columns</p>\n",
"</div>"
],
"text/plain": [
" time v1 v2 v3 v4 \\\n",
"count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 \n",
"mean 94813.859575 1.759061e-12 -8.251130e-13 -9.654937e-13 8.321385e-13 \n",
"std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00 \n",
"min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00 \n",
"25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01 \n",
"50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02 \n",
"75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01 \n",
"max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01 \n",
"\n",
" v5 v6 v7 v8 v9 \\\n",
"count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 \n",
"mean 1.649999e-13 4.248366e-13 -3.054600e-13 8.777971e-14 -1.179749e-12 \n",
"std 1.380247e+00 1.332271e+00 1.237094e+00 1.194353e+00 1.098632e+00 \n",
"min -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01 \n",
"25% -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01 \n",
"50% -5.433583e-02 -2.741871e-01 4.010308e-02 2.235804e-02 -5.142873e-02 \n",
"75% 6.119264e-01 3.985649e-01 5.704361e-01 3.273459e-01 5.971390e-01 \n",
"max 3.480167e+01 7.330163e+01 1.205895e+02 2.000721e+01 1.559499e+01 \n",
"\n",
" ... v21 v22 v23 v24 \\\n",
"count ... 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 \n",
"mean ... -3.405756e-13 -5.723197e-13 -9.725856e-13 1.464150e-12 \n",
"std ... 7.345240e-01 7.257016e-01 6.244603e-01 6.056471e-01 \n",
"min ... -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00 \n",
"25% ... -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01 \n",
"50% ... -2.945017e-02 6.781943e-03 -1.119293e-02 4.097606e-02 \n",
"75% ... 1.863772e-01 5.285536e-01 1.476421e-01 4.395266e-01 \n",
"max ... 2.720284e+01 1.050309e+01 2.252841e+01 4.584549e+00 \n",
"\n",
" v25 v26 v27 v28 amount \\\n",
"count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000 \n",
"mean -6.987102e-13 -5.617874e-13 3.332082e-12 -3.518874e-12 88.349619 \n",
"std 5.212781e-01 4.822270e-01 4.036325e-01 3.300833e-01 250.120109 \n",
"min -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01 0.000000 \n",
"25% -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 5.600000 \n",
"50% 1.659350e-02 -5.213911e-02 1.342146e-03 1.124383e-02 22.000000 \n",
"75% 3.507156e-01 2.409522e-01 9.104512e-02 7.827995e-02 77.165000 \n",
"max 7.519589e+00 3.517346e+00 3.161220e+01 3.384781e+01 25691.160000 \n",
"\n",
" class \n",
"count 284807.000000 \n",
"mean 0.001727 \n",
"std 0.041527 \n",
"min 0.000000 \n",
"25% 0.000000 \n",
"50% 0.000000 \n",
"75% 0.000000 \n",
"max 1.000000 \n",
"\n",
"[8 rows x 31 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# statistical summary of numerical variables\n",
"df_train.describe()"
]
},
{
"cell_type": "markdown",
"id": "unsigned-mystery",
"metadata": {},
"source": [
"### Univariate Analysis"
]
},
{
"cell_type": "markdown",
"id": "geological-inclusion",
"metadata": {},
"source": [
"**Analyze the target variable**\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "leading-athens",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# checking for missing values\n",
"df_train['class'].isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "developing-multimedia",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# number of unique values\n",
"df_train['class'].nunique()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "patent-scenario",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 284315\n",
"1 492\n",
"Name: class, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# frequency distribution\n",
"df_train['class'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "separated-gender",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 0.998273\n",
"1 0.001727\n",
"Name: class, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Percent breakdown of target (ratio of frequency distribution of values)\n",
"df_train['class'].value_counts(normalize=True)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "later-bloom",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAHgCAYAAABpQSB0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWJklEQVR4nO3df8zudX3f8ddbjrZ21YpyyixgcZYsQdehnihp94fTBA8mHbZRo1vlzBHpJm41cZnWbMP5Y2nXWlOtJaHxFDBWarVOtuAYQVPXRZBjpQo64wnVeAgKclBsrO1g7/1xf0+8ON4cbvB93/fhnMcjuXKu6319f3zuhOSZ67q+10V1dwBg0qO2ewEAHHvEBYBx4gLAOHEBYJy4ADBOXAAYt2O7F3C0OOmkk/r000/f7mUAPKJ85jOf+WZ37zx8Li6L008/Pfv27dvuZQA8olTVV9ebe1sMgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWDcju1ewLHk9R+7YruXwFHoHeeev91LgC3nlQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwbtPiUlWnVdUnquoLVXVLVf3qMn9zVd1WVTcttxet7PNrVbW/qr5UVS9cme9eZvur6o0r86dW1Q3L/I+q6jHL/EeWx/uX50/frL8TgB+0ma9c7k3y+u4+M8nZSS6qqjOX597Z3Wctt6uTZHnu5UmenmR3kt+rqhOq6oQk70lybpIzk7xi5Ti/sRzrZ5LcneSCZX5BkruX+TuX7QDYIpsWl+6+vbv/fLn/nSRfTHLKEXY5L8mV3f033f2XSfYnec5y29/dt3b33ya5Msl5VVVJnp/kQ8v+lyd58cqxLl/ufyjJC5btAdgCW/KZy/K21DOT3LCMXltVn6uqvVV14jI7JcnXVnY7sMweaP6kJN/q7nsPm9/vWMvz3162B2ALbHpcqurHk3w4yeu6+54klyR5WpKzktye5B2bvYYjrO3CqtpXVfvuvPPO7VoGwDFnU+NSVY/OWlje391/kiTd/Y3uvq+7/1+S38/a215JcluS01Z2P3WZPdD8riRPqKodh83vd6zl+Z9Ytr+f7r60u3d1966dO3f+sH8uAIvNvFqskrw3yRe7+7dX5k9e2ewXk9y83L8qycuXK72emuSMJJ9OcmOSM5Yrwx6TtQ/9r+ruTvKJJC9Z9t+T5KMrx9qz3H9Jko8v2wOwBXY8+CYP288neWWSz1fVTcvsTVm72uusJJ3kK0l+JUm6+5aq+mCSL2TtSrOLuvu+JKmq1ya5JskJSfZ29y3L8d6Q5MqqeluSz2YtZln+fV9V7U9yMGtBAmCLbFpcuvvPkqx3hdbVR9jn7Unevs786vX26+5b8/231Vbn30vy0oeyXgDm+IY+AOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGbVpcquq0qvpEVX2hqm6pql9d5k+sqmur6svLvycu86qqd1XV/qr6XFU9a+VYe5btv1xVe1bmz66qzy/7vKuq6kjnAGBrbOYrl3uTvL67z0xydpKLqurMJG9Mcl13n5HkuuVxkpyb5IzldmGSS5K1UCS5OMlzkzwnycUrsbgkyatX9tu9zB/oHABsgU2LS3ff3t1/vtz/TpIvJjklyXlJLl82uzzJi5f75yW5otdcn+QJVfXkJC9Mcm13H+zuu5Ncm2T38tzju/v67u4kVxx2rPXOAcAW2JLPXKrq9CTPTHJDkpO7+/blqa8nOXm5f0qSr63sdmCZHWl+YJ15jnCOw9d1YVXtq6p9d95558P4ywBYz6bHpap+PMmHk7yuu+9ZfW55xdGbef4jnaO7L+3uXd29a+fOnZu5DIDjyqbGpaoenbWwvL+7/2QZf2N5SyvLv3cs89uSnLay+6nL7EjzU9eZH+kcAGyBzbxarJK8N8kXu/u3V566KsmhK772JPnoyvz85aqxs5N8e3lr65ok51TVicsH+eckuWZ57p6qOns51/mHHWu9cwCwBXZs4rF/Pskrk3y+qm5aZm9K8utJPlhVFyT5apKXLc9dneRFSfYn+W6SVyVJdx+sqrcmuXHZ7i3dfXC5/5oklyV5bJKPLbcc4RwAbIFNi0t3/1mSeoCnX7DO9p3kogc41t4ke9eZ70vyjHXmd613DgC2hm/oAzBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwbkNxqarrNjIDgCTZcaQnq+pHk/xYkpOq6sQktTz1+CSnbPLaAHiEOmJckvxKktcl+akkn8n343JPkt/dvGUB8Eh2xLh09+8k+Z2q+tfd/e4tWhMAj3AP9solSdLd766qn0ty+uo+3X3FJq0LgEewDcWlqt6X5GlJbkpy3zLuJOICwA/YUFyS7EpyZnf3Zi4GgGPDRr/ncnOSv/tQDlxVe6vqjqq6eWX25qq6rapuWm4vWnnu16pqf1V9qapeuDLfvcz2V9UbV+ZPraoblvkfVdVjlvmPLI/3L8+f/lDWDcAPb6NxOSnJF6rqmqq66tDtQfa5LMnudebv7O6zltvVSVJVZyZ5eZKnL/v8XlWdUFUnJHlPknOTnJnkFcu2SfIby7F+JsndSS5Y5hckuXuZv3PZDoAttNG3xd78UA/c3Z98CK8azktyZXf/TZK/rKr9SZ6zPLe/u29Nkqq6Msl5VfXFJM9P8k+XbS5f1njJcqxD6/1Qkt+tqvKWHsDW2ejVYn86eM7XVtX5SfYleX133521L2Rev7LNgXz/S5pfO2z+3CRPSvKt7r53ne1PObRPd99bVd9etv/m4QupqguTXJgkT3nKU374vwyAJBv/+ZfvVNU9y+17VXVfVd3zMM53SdauOjsrye1J3vEwjjGmuy/t7l3dvWvnzp3buRSAY8pGX7k87tD9qqqsvfV09kM9WXd/Y+U4v5/kvy8Pb0ty2sqmpy6zPMD8riRPqKody6uX1e0PHetAVe1I8hPL9gBskYf8q8i95r8meeGDbXu4qnryysNfzNpVaElyVZKXL1d6PTXJGUk+neTGJGcsV4Y9Jmsf+l+1fH7yiSQvWfbfk+SjK8fas9x/SZKP+7wFYGtt9EuUv7Ty8FFZ+97L9x5knw8keV7WfvTyQJKLkzyvqs7K2hcwv5K13y5Ld99SVR9M8oUk9ya5qLvvW47z2iTXJDkhyd7uvmU5xRuSXFlVb0vy2STvXebvTfK+5aKAg1kLEgBbaKNXi/3Cyv17sxaG8460Q3e/Yp3xe9eZHdr+7Unevs786iRXrzO/Nd+/omx1/r0kLz3S2gDYXBv9zOVVm70QAI4dG71a7NSq+sjyjfs7qurDVXXqZi8OgEemjX6g/wdZ+6D8p5bbf1tmAPADNhqXnd39B91973K7LIkvhgCwro3G5a6q+uVDv/dVVb8c3x0B4AFsNC7/IsnLknw9a9+sf0mSf75JawLgEW6jlyK/Jcme5XfAUlVPTPJbWYsOANzPRl+5/OyhsCRJdx9M8szNWRIAj3QbjcujqurEQw+WVy4bfdUDwHFmo4F4R5JPVdUfL49fmnW+TQ8Ayca/oX9FVe3L2v+gK0l+qbu/sHnLAuCRbMNvbS0xERQAHtRD/sl9AHgw4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYNymxaWq9lbVHVV188rsiVV1bVV9efn3xGVeVfWuqtpfVZ+rqmet7LNn2f7LVbVnZf7sqvr8ss+7qqqOdA4Ats5mvnK5LMnuw2ZvTHJdd5+R5LrlcZKcm+SM5XZhkkuStVAkuTjJc5M8J8nFK7G4JMmrV/bb/SDnAGCLbFpcuvuTSQ4eNj4vyeXL/cuTvHhlfkWvuT7JE6rqyUlemOTa7j7Y3XcnuTbJ7uW5x3f39d3dSa447FjrnQOALbLVn7mc3N23L/e/nuTk5f4pSb62st2BZXak+YF15kc6BwBbZNs+0F9ecfR2nqOqLqyqfVW1784779zMpQAcV7Y6Lt9Y3tLK8u8dy/y2JKetbHfqMjvS/NR15kc6xw/o7ku7e1d379q5c+fD/qMAuL+tjstVSQ5d8bUnyUdX5ucvV42dneTby1tb1yQ5p6pOXD7IPyfJNctz91TV2ctVYucfdqz1zgHAFtmxWQeuqg8keV6Sk6rqQNau+vr1JB+sqguSfDXJy5bNr07yoiT7k3w3yauSpLsPVtVbk9y4bPeW7j50kcBrsnZF2mOTfGy55QjnAGCLbFpcuvsVD/DUC9bZtpNc9ADH2Ztk7zrzfUmesc78rvXOAcDW8Q19AMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCM25a4VNVXqurzVXVTVe1bZk+sqmur6svLvycu86qqd1XV/qr6XFU9a+U4e5btv1xVe1bmz16Ov3/Zt7b+rwQ4fm3nK5d/3N1ndfeu5fEbk1zX3WckuW55nCTnJjljuV2Y5JJkLUZJLk7y3CTPSXLxoSAt27x6Zb/dm//nAHDI0fS22HlJLl/uX57kxSvzK3rN9UmeUFVPTvLCJNd298HuvjvJtUl2L889vruv7+5OcsXKsQDYAtsVl07yP6vqM1V14TI7ubtvX+5/PcnJy/1TknxtZd8Dy+xI8wPrzAHYIju26bz/qLtvq6qfTHJtVf2f1Se7u6uqN3sRS9guTJKnPOUpm306gOPGtrxy6e7bln/vSPKRrH1m8o3lLa0s/96xbH5bktNWdj91mR1pfuo68/XWcWl37+ruXTt37vxh/ywAFlsel6r6O1X1uEP3k5yT5OYkVyU5dMXXniQfXe5fleT85aqxs5N8e3n77Jok51TVicsH+eckuWZ57p6qOnu5Suz8lWMBsAW2422xk5N8ZLk6eEeSP+zu/1FVNyb5YFVdkOSrSV62bH91khcl2Z/ku0lelSTdfbCq3prkxmW7t3T3weX+a5JcluSxST623ADYIlsel+6+Nck/XGd+V5IXrDPvJBc9wLH2Jtm7znxfkmf80IsF4GE5mi5FBuAYIS4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAME5cABgnLgCMExcAxokLAOPEBYBx4gLAOHEBYJy4ADBOXAAYJy4AjBMXAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIwTFwDGiQsA48QFgHHiAsA4cQFgnLgAMO6YjUtV7a6qL1XV/qp643avB+B4ckzGpapOSPKeJOcmOTPJK6rqzO1dFcDx45iMS5LnJNnf3bd2998muTLJedu8JoDjxo7tXsAmOSXJ11YeH0jy3G1aC2y7Oy75d9u9BI5CP/mv/sumHftYjcuGVNWFSS5cHv5VVX1pO9dzjDkpyTe3exFHg9/Onu1eAvfnv81DXvObE0f56fWGx2pcbkty2srjU5fZ/XT3pUku3apFHU+qal9379rudcDh/Le5NY7Vz1xuTHJGVT21qh6T5OVJrtrmNQEcN47JVy7dfW9VvTbJNUlOSLK3u2/Z5mUBHDeOybgkSXdfneTq7V7HcczbjRyt/Le5Baq7t3sNABxjjtXPXADYRuLCKD+7w9GqqvZW1R1VdfN2r+V4IC6M8bM7HOUuS7J7uxdxvBAXJvnZHY5a3f3JJAe3ex3HC3Fh0no/u3PKNq0F2EbiAsA4cWHShn52Bzj2iQuT/OwOkERcGNTd9yY59LM7X0zyQT+7w9Giqj6Q5FNJ/n5VHaiqC7Z7Tccy39AHYJxXLgCMExcAxokLAOPEBYBx4gLAOHGBo0BVvbmq/u12rwOmiAsA48QFtkFVnV9Vn6uqv6iq9x323Kur6sbluQ9X1Y8t85dW1c3L/JPL7OlV9emqumk53hnb8ffA4XyJErZYVT09yUeS/Fx3f7Oqnpjk3yT5q+7+rap6UnfftWz7tiTf6O53V9Xnk+zu7tuq6gnd/a2qeneS67v7/ctP7pzQ3X+9XX8bHOKVC2y95yf54+7+ZpJ09+H/j5FnVNX/WmLyz5I8fZn/7ySXVdWrk5ywzD6V5E1V9YYkPy0sHC3EBY4+lyV5bXf/gyT/KcmPJkl3/8sk/z5rvzz9meUVzh8m+SdJ/jrJ1VX1/O1ZMtyfuMDW+3iSl1bVk5JkeVts1eOS3F5Vj87aK5cs2z2tu2/o7v+Y5M4kp1XV30tya3e/K8lHk/zslvwF8CB2bPcC4HjT3bdU1duT/GlV3Zfks0m+srLJf0hyQ9YCckPWYpMkv7l8YF9JrkvyF0nekOSVVfV/k3w9yX/ekj8CHoQP9AEY520xAMaJCwDjxAWAceICwDhxAWCcuAAwTlwAGCcuAIz7/ztJQk5mNAKnAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# visualizing the frequency distribution\n",
"f, ax = plt.subplots(figsize=(6, 8))\n",
"ax = sns.countplot(x=\"class\", data=df_train, palette=\"Set2\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "handmade-holder",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"492 cards out of 284807\n"
]
}
],
"source": [
"# Examining how many cards are fraudulent\n",
"print(sum(df_train['class']),'cards out of',len(df_train))"
]
},
{
"cell_type": "markdown",
"id": "detailed-hampshire",
"metadata": {},
"source": [
"### Bivariate Analysis\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "confused-endorsement",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>amount</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>149.62</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.69</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>378.66</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>123.50</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>69.99</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284802</th>\n",
" <td>0.77</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284803</th>\n",
" <td>24.79</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284804</th>\n",
" <td>67.88</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284805</th>\n",
" <td>10.00</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>284806</th>\n",
" <td>217.00</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>284807 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" amount class\n",
"0 149.62 0\n",
"1 2.69 0\n",
"2 378.66 0\n",
"3 123.50 0\n",
"4 69.99 0\n",
"... ... ...\n",
"284802 0.77 0\n",
"284803 24.79 0\n",
"284804 67.88 0\n",
"284805 10.00 0\n",
"284806 217.00 0\n",
"\n",
"[284807 rows x 2 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compare_class_amount = df_train[['amount', 'class']]\n",
"compare_class_amount"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "restricted-implement",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>amount</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>541</th>\n",
" <td>0.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>623</th>\n",
" <td>529.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4920</th>\n",
" <td>239.93</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6108</th>\n",
" <td>59.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6329</th>\n",
" <td>1.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>279863</th>\n",
" <td>390.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>280143</th>\n",
" <td>0.76</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>280149</th>\n",
" <td>77.89</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>281144</th>\n",
" <td>245.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>281674</th>\n",
" <td>42.53</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>492 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" amount class\n",
"541 0.00 1\n",
"623 529.00 1\n",
"4920 239.93 1\n",
"6108 59.00 1\n",
"6329 1.00 1\n",
"... ... ...\n",
"279863 390.00 1\n",
"280143 0.76 1\n",
"280149 77.89 1\n",
"281144 245.00 1\n",
"281674 42.53 1\n",
"\n",
"[492 rows x 2 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compare_class_amount[compare_class_amount[\"class\"] == 1]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "metallic-accountability",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"88.34961925093133"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compare_class_amount[\"amount\"].mean()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "addressed-stress",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1584x1440 with 36 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# distribution \n",
"df_train.hist(figsize=(22, 20), bins=30, edgecolor=\"black\")\n",
"plt.subplots_adjust(hspace=0.7, wspace=0.4)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "loved-warning",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>time</th>\n",
" <th>v1</th>\n",
" <th>v2</th>\n",
" <th>v3</th>\n",
" <th>v4</th>\n",
" <th>v5</th>\n",
" <th>v6</th>\n",
" <th>v7</th>\n",
" <th>v8</th>\n",
" <th>v9</th>\n",
" <th>...</th>\n",
" <th>v21</th>\n",
" <th>v22</th>\n",
" <th>v23</th>\n",
" <th>v24</th>\n",
" <th>v25</th>\n",
" <th>v26</th>\n",
" <th>v27</th>\n",
" <th>v28</th>\n",
" <th>amount</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>time</th>\n",
" <td>1.000000</td>\n",
" <td>1.173963e-01</td>\n",
" <td>-1.059333e-02</td>\n",
" <td>-4.196182e-01</td>\n",
" <td>-1.052602e-01</td>\n",
" <td>1.730721e-01</td>\n",
" <td>-6.301647e-02</td>\n",
" <td>8.471437e-02</td>\n",
" <td>-3.694943e-02</td>\n",
" <td>-8.660434e-03</td>\n",
" <td>...</td>\n",
" <td>4.473573e-02</td>\n",
" <td>1.440591e-01</td>\n",
" <td>5.114236e-02</td>\n",
" <td>-1.618187e-02</td>\n",
" <td>-2.330828e-01</td>\n",
" <td>-4.140710e-02</td>\n",
" <td>-5.134591e-03</td>\n",
" <td>-9.412688e-03</td>\n",
" <td>-0.010596</td>\n",
" <td>-0.012323</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v1</th>\n",
" <td>0.117396</td>\n",
" <td>1.000000e+00</td>\n",
" <td>3.777823e-12</td>\n",
" <td>-2.118614e-12</td>\n",
" <td>-1.733159e-13</td>\n",
" <td>-3.473231e-12</td>\n",
" <td>-1.306165e-13</td>\n",
" <td>-1.116494e-13</td>\n",
" <td>2.114527e-12</td>\n",
" <td>3.016285e-14</td>\n",
" <td>...</td>\n",
" <td>-3.276238e-12</td>\n",
" <td>2.281863e-12</td>\n",
" <td>-2.969746e-12</td>\n",
" <td>-1.029876e-12</td>\n",
" <td>1.144179e-12</td>\n",
" <td>1.835263e-12</td>\n",
" <td>7.624804e-12</td>\n",
" <td>-9.769215e-13</td>\n",
" <td>-0.227709</td>\n",
" <td>-0.101347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v2</th>\n",
" <td>-0.010593</td>\n",
" <td>3.777823e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.325661e-12</td>\n",
" <td>-2.314981e-12</td>\n",
" <td>-1.831952e-12</td>\n",
" <td>9.438444e-13</td>\n",
" <td>5.403436e-12</td>\n",
" <td>2.133785e-14</td>\n",
" <td>3.238513e-13</td>\n",
" <td>...</td>\n",
" <td>2.280202e-12</td>\n",
" <td>-2.548560e-13</td>\n",
" <td>-4.856120e-12</td>\n",
" <td>6.431308e-13</td>\n",
" <td>-9.423730e-13</td>\n",
" <td>-4.129100e-13</td>\n",
" <td>-9.856545e-13</td>\n",
" <td>2.525513e-12</td>\n",
" <td>-0.531409</td>\n",
" <td>0.091289</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v3</th>\n",
" <td>-0.419618</td>\n",
" <td>-2.118614e-12</td>\n",
" <td>2.325661e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.046235e-13</td>\n",
" <td>-4.032993e-12</td>\n",
" <td>-1.574471e-13</td>\n",
" <td>3.405586e-12</td>\n",
" <td>-1.272385e-12</td>\n",
" <td>-6.812351e-13</td>\n",
" <td>...</td>\n",
" <td>6.736294e-13</td>\n",
" <td>-8.909339e-13</td>\n",
" <td>4.147209e-12</td>\n",
" <td>3.407636e-12</td>\n",
" <td>5.712956e-13</td>\n",
" <td>-2.577274e-12</td>\n",
" <td>-5.041444e-12</td>\n",
" <td>5.189109e-12</td>\n",
" <td>-0.210880</td>\n",
" <td>-0.192961</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v4</th>\n",
" <td>-0.105260</td>\n",
" <td>-1.733159e-13</td>\n",
" <td>-2.314981e-12</td>\n",
" <td>2.046235e-13</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-2.552389e-13</td>\n",
" <td>1.084041e-12</td>\n",
" <td>8.135064e-13</td>\n",
" <td>7.334818e-13</td>\n",
" <td>-7.143069e-13</td>\n",
" <td>...</td>\n",
" <td>-2.696370e-12</td>\n",
" <td>4.347776e-13</td>\n",
" <td>-4.160969e-12</td>\n",
" <td>-2.368743e-12</td>\n",
" <td>1.619944e-12</td>\n",
" <td>-3.043100e-13</td>\n",
" <td>-1.456066e-12</td>\n",
" <td>-2.832372e-12</td>\n",
" <td>0.098732</td>\n",
" <td>0.133447</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v5</th>\n",
" <td>0.173072</td>\n",
" <td>-3.473231e-12</td>\n",
" <td>-1.831952e-12</td>\n",
" <td>-4.032993e-12</td>\n",
" <td>-2.552389e-13</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-6.934789e-14</td>\n",
" <td>1.573956e-11</td>\n",
" <td>-2.038243e-12</td>\n",
" <td>-1.000756e-12</td>\n",
" <td>...</td>\n",
" <td>-1.751796e-12</td>\n",
" <td>7.095269e-13</td>\n",
" <td>3.616075e-12</td>\n",
" <td>-2.808776e-13</td>\n",
" <td>1.451126e-12</td>\n",
" <td>-1.896141e-13</td>\n",
" <td>-2.124559e-12</td>\n",
" <td>1.010196e-11</td>\n",
" <td>-0.386356</td>\n",
" <td>-0.094974</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v6</th>\n",
" <td>-0.063016</td>\n",
" <td>-1.306165e-13</td>\n",
" <td>9.438444e-13</td>\n",
" <td>-1.574471e-13</td>\n",
" <td>1.084041e-12</td>\n",
" <td>-6.934789e-14</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-2.798968e-12</td>\n",
" <td>-5.446480e-13</td>\n",
" <td>2.036743e-12</td>\n",
" <td>...</td>\n",
" <td>1.476858e-12</td>\n",
" <td>-1.144797e-12</td>\n",
" <td>-1.527842e-12</td>\n",
" <td>1.551854e-12</td>\n",
" <td>-2.723707e-12</td>\n",
" <td>3.351239e-12</td>\n",
" <td>1.481307e-12</td>\n",
" <td>-6.069227e-13</td>\n",
" <td>0.215981</td>\n",
" <td>-0.043643</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v7</th>\n",
" <td>0.084714</td>\n",
" <td>-1.116494e-13</td>\n",
" <td>5.403436e-12</td>\n",
" <td>3.405586e-12</td>\n",
" <td>8.135064e-13</td>\n",
" <td>1.573956e-11</td>\n",
" <td>-2.798968e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>5.528803e-12</td>\n",
" <td>5.088082e-13</td>\n",
" <td>...</td>\n",
" <td>2.788246e-12</td>\n",
" <td>-8.133209e-13</td>\n",
" <td>-4.293094e-12</td>\n",
" <td>-2.553518e-12</td>\n",
" <td>-7.406970e-13</td>\n",
" <td>-4.476467e-12</td>\n",
" <td>-1.328637e-11</td>\n",
" <td>2.958679e-13</td>\n",
" <td>0.397311</td>\n",
" <td>-0.187257</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v8</th>\n",
" <td>-0.036949</td>\n",
" <td>2.114527e-12</td>\n",
" <td>2.133785e-14</td>\n",
" <td>-1.272385e-12</td>\n",
" <td>7.334818e-13</td>\n",
" <td>-2.038243e-12</td>\n",
" <td>-5.446480e-13</td>\n",
" <td>5.528803e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-2.243172e-12</td>\n",
" <td>...</td>\n",
" <td>-4.022440e-12</td>\n",
" <td>-2.679560e-12</td>\n",
" <td>9.013064e-13</td>\n",
" <td>-1.074365e-12</td>\n",
" <td>-3.268979e-12</td>\n",
" <td>1.043839e-12</td>\n",
" <td>-3.499804e-12</td>\n",
" <td>1.866598e-12</td>\n",
" <td>-0.103079</td>\n",
" <td>0.019875</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v9</th>\n",
" <td>-0.008660</td>\n",
" <td>3.016285e-14</td>\n",
" <td>3.238513e-13</td>\n",
" <td>-6.812351e-13</td>\n",
" <td>-7.143069e-13</td>\n",
" <td>-1.000756e-12</td>\n",
" <td>2.036743e-12</td>\n",
" <td>5.088082e-13</td>\n",
" <td>-2.243172e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>...</td>\n",
" <td>3.040326e-12</td>\n",
" <td>-7.467526e-13</td>\n",
" <td>-1.011003e-12</td>\n",
" <td>8.579072e-13</td>\n",
" <td>-1.590341e-12</td>\n",
" <td>-7.723547e-13</td>\n",
" <td>2.428930e-12</td>\n",
" <td>-1.406856e-12</td>\n",
" <td>-0.044246</td>\n",
" <td>-0.097733</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v10</th>\n",
" <td>0.030617</td>\n",
" <td>-2.615192e-12</td>\n",
" <td>1.463282e-12</td>\n",
" <td>-1.609126e-12</td>\n",
" <td>-1.938143e-12</td>\n",
" <td>-7.200329e-13</td>\n",
" <td>7.429770e-13</td>\n",
" <td>1.674650e-12</td>\n",
" <td>-1.660630e-12</td>\n",
" <td>1.185391e-12</td>\n",
" <td>...</td>\n",
" <td>-5.547700e-13</td>\n",
" <td>-1.320186e-13</td>\n",
" <td>1.173332e-12</td>\n",
" <td>6.405710e-13</td>\n",
" <td>2.794979e-12</td>\n",
" <td>-2.738577e-13</td>\n",
" <td>1.552492e-12</td>\n",
" <td>5.116568e-12</td>\n",
" <td>-0.101502</td>\n",
" <td>-0.216883</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v11</th>\n",
" <td>-0.247689</td>\n",
" <td>1.866551e-12</td>\n",
" <td>-8.314960e-13</td>\n",
" <td>8.707055e-13</td>\n",
" <td>1.874473e-12</td>\n",
" <td>-5.928181e-13</td>\n",
" <td>1.014893e-12</td>\n",
" <td>-8.525291e-13</td>\n",
" <td>1.296877e-12</td>\n",
" <td>-3.970652e-13</td>\n",
" <td>...</td>\n",
" <td>1.100352e-13</td>\n",
" <td>-5.644168e-14</td>\n",
" <td>1.724963e-12</td>\n",
" <td>-1.162239e-12</td>\n",
" <td>-1.351430e-12</td>\n",
" <td>2.718291e-12</td>\n",
" <td>-3.950227e-12</td>\n",
" <td>-4.247931e-12</td>\n",
" <td>0.000104</td>\n",
" <td>0.154876</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v12</th>\n",
" <td>0.124348</td>\n",
" <td>-1.238745e-12</td>\n",
" <td>6.139448e-13</td>\n",
" <td>-2.730043e-12</td>\n",
" <td>5.393827e-13</td>\n",
" <td>1.812994e-12</td>\n",
" <td>-9.265590e-13</td>\n",
" <td>-2.826770e-13</td>\n",
" <td>-3.860109e-13</td>\n",
" <td>-1.904908e-12</td>\n",
" <td>...</td>\n",
" <td>8.106835e-13</td>\n",
" <td>-2.346533e-12</td>\n",
" <td>-6.878556e-13</td>\n",
" <td>-2.911084e-12</td>\n",
" <td>1.102899e-12</td>\n",
" <td>2.808714e-13</td>\n",
" <td>5.953998e-13</td>\n",
" <td>-7.428113e-12</td>\n",
" <td>-0.009542</td>\n",
" <td>-0.260593</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v13</th>\n",
" <td>-0.065902</td>\n",
" <td>7.589589e-13</td>\n",
" <td>-1.181068e-12</td>\n",
" <td>-1.020592e-12</td>\n",
" <td>6.813810e-13</td>\n",
" <td>-7.021996e-14</td>\n",
" <td>1.484679e-12</td>\n",
" <td>-8.171731e-13</td>\n",
" <td>7.722897e-13</td>\n",
" <td>8.754859e-13</td>\n",
" <td>...</td>\n",
" <td>-2.037258e-12</td>\n",
" <td>-5.491535e-13</td>\n",
" <td>3.508022e-12</td>\n",
" <td>1.225112e-13</td>\n",
" <td>-1.513549e-12</td>\n",
" <td>-2.008364e-12</td>\n",
" <td>4.975659e-12</td>\n",
" <td>-6.777880e-12</td>\n",
" <td>0.005293</td>\n",
" <td>-0.004570</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v14</th>\n",
" <td>-0.098757</td>\n",
" <td>-1.871054e-13</td>\n",
" <td>-3.384684e-13</td>\n",
" <td>-5.597874e-13</td>\n",
" <td>-1.404120e-12</td>\n",
" <td>-1.113015e-13</td>\n",
" <td>-1.212766e-12</td>\n",
" <td>2.038217e-12</td>\n",
" <td>-2.596182e-12</td>\n",
" <td>-1.271311e-12</td>\n",
" <td>...</td>\n",
" <td>-4.557223e-13</td>\n",
" <td>2.572021e-12</td>\n",
" <td>8.288666e-13</td>\n",
" <td>-3.382145e-12</td>\n",
" <td>8.299871e-13</td>\n",
" <td>-3.304576e-13</td>\n",
" <td>-2.447674e-12</td>\n",
" <td>-1.700091e-12</td>\n",
" <td>0.033751</td>\n",
" <td>-0.302544</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v15</th>\n",
" <td>-0.183453</td>\n",
" <td>-3.601390e-13</td>\n",
" <td>2.196083e-13</td>\n",
" <td>6.442512e-13</td>\n",
" <td>1.526382e-12</td>\n",
" <td>-1.593594e-12</td>\n",
" <td>-1.053548e-12</td>\n",
" <td>1.074440e-12</td>\n",
" <td>1.648898e-12</td>\n",
" <td>8.628709e-13</td>\n",
" <td>...</td>\n",
" <td>5.921902e-13</td>\n",
" <td>-4.115704e-13</td>\n",
" <td>-9.846654e-13</td>\n",
" <td>-3.256310e-12</td>\n",
" <td>-1.725436e-12</td>\n",
" <td>5.478951e-13</td>\n",
" <td>-4.690702e-12</td>\n",
" <td>-4.214967e-12</td>\n",
" <td>-0.002986</td>\n",
" <td>-0.004223</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v16</th>\n",
" <td>0.011903</td>\n",
" <td>-1.142884e-12</td>\n",
" <td>-8.000510e-13</td>\n",
" <td>-8.748795e-13</td>\n",
" <td>3.095722e-13</td>\n",
" <td>-1.619090e-14</td>\n",
" <td>1.374197e-12</td>\n",
" <td>-1.478776e-12</td>\n",
" <td>-1.830899e-12</td>\n",
" <td>1.239835e-12</td>\n",
" <td>...</td>\n",
" <td>-1.067918e-12</td>\n",
" <td>2.009490e-12</td>\n",
" <td>4.057311e-13</td>\n",
" <td>-4.061029e-13</td>\n",
" <td>7.626529e-13</td>\n",
" <td>-1.323365e-12</td>\n",
" <td>7.022747e-12</td>\n",
" <td>5.737097e-13</td>\n",
" <td>-0.003910</td>\n",
" <td>-0.196539</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v17</th>\n",
" <td>-0.073297</td>\n",
" <td>1.671073e-12</td>\n",
" <td>2.028957e-12</td>\n",
" <td>-1.058101e-12</td>\n",
" <td>1.151414e-14</td>\n",
" <td>1.713794e-13</td>\n",
" <td>7.431528e-13</td>\n",
" <td>-1.231314e-12</td>\n",
" <td>7.025405e-13</td>\n",
" <td>-1.450585e-12</td>\n",
" <td>...</td>\n",
" <td>1.793607e-12</td>\n",
" <td>2.280366e-13</td>\n",
" <td>-9.948639e-13</td>\n",
" <td>-2.073066e-12</td>\n",
" <td>4.514159e-12</td>\n",
" <td>2.940618e-12</td>\n",
" <td>-1.324408e-12</td>\n",
" <td>1.854033e-12</td>\n",
" <td>0.007309</td>\n",
" <td>-0.326481</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v18</th>\n",
" <td>0.090438</td>\n",
" <td>-5.738830e-13</td>\n",
" <td>-1.916566e-14</td>\n",
" <td>-8.846578e-13</td>\n",
" <td>-1.309615e-12</td>\n",
" <td>1.101433e-12</td>\n",
" <td>6.859871e-13</td>\n",
" <td>-4.281952e-13</td>\n",
" <td>1.499555e-12</td>\n",
" <td>7.186934e-13</td>\n",
" <td>...</td>\n",
" <td>-2.185508e-12</td>\n",
" <td>1.392636e-12</td>\n",
" <td>-2.160673e-12</td>\n",
" <td>4.303958e-12</td>\n",
" <td>5.432404e-13</td>\n",
" <td>-1.810692e-12</td>\n",
" <td>-4.949670e-12</td>\n",
" <td>4.113104e-12</td>\n",
" <td>0.035650</td>\n",
" <td>-0.111485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v19</th>\n",
" <td>0.028975</td>\n",
" <td>-2.770259e-12</td>\n",
" <td>-2.237098e-13</td>\n",
" <td>-1.061131e-12</td>\n",
" <td>-9.754131e-13</td>\n",
" <td>5.956033e-13</td>\n",
" <td>1.148589e-12</td>\n",
" <td>-3.742188e-12</td>\n",
" <td>1.988417e-12</td>\n",
" <td>-8.786777e-13</td>\n",
" <td>...</td>\n",
" <td>-3.315774e-13</td>\n",
" <td>7.050020e-14</td>\n",
" <td>-7.118335e-13</td>\n",
" <td>1.326310e-12</td>\n",
" <td>9.270702e-13</td>\n",
" <td>2.412082e-12</td>\n",
" <td>-2.201365e-12</td>\n",
" <td>3.450583e-12</td>\n",
" <td>-0.056151</td>\n",
" <td>0.034783</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v20</th>\n",
" <td>-0.050866</td>\n",
" <td>2.662926e-13</td>\n",
" <td>5.839893e-13</td>\n",
" <td>1.873059e-12</td>\n",
" <td>-2.347029e-12</td>\n",
" <td>-1.728728e-13</td>\n",
" <td>-2.382062e-12</td>\n",
" <td>8.068665e-12</td>\n",
" <td>-1.884661e-13</td>\n",
" <td>1.270200e-12</td>\n",
" <td>...</td>\n",
" <td>-3.892661e-12</td>\n",
" <td>1.632957e-12</td>\n",
" <td>-1.019668e-11</td>\n",
" <td>1.267519e-12</td>\n",
" <td>-1.593346e-12</td>\n",
" <td>1.469484e-13</td>\n",
" <td>-2.996546e-12</td>\n",
" <td>6.123479e-12</td>\n",
" <td>0.339403</td>\n",
" <td>0.020090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v21</th>\n",
" <td>0.044736</td>\n",
" <td>-3.276238e-12</td>\n",
" <td>2.280202e-12</td>\n",
" <td>6.736294e-13</td>\n",
" <td>-2.696370e-12</td>\n",
" <td>-1.751796e-12</td>\n",
" <td>1.476858e-12</td>\n",
" <td>2.788246e-12</td>\n",
" <td>-4.022440e-12</td>\n",
" <td>3.040326e-12</td>\n",
" <td>...</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-3.415801e-12</td>\n",
" <td>1.066923e-12</td>\n",
" <td>2.350293e-12</td>\n",
" <td>-3.120502e-12</td>\n",
" <td>8.463789e-13</td>\n",
" <td>-8.527973e-13</td>\n",
" <td>4.256994e-12</td>\n",
" <td>0.105999</td>\n",
" <td>0.040413</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v22</th>\n",
" <td>0.144059</td>\n",
" <td>2.281863e-12</td>\n",
" <td>-2.548560e-13</td>\n",
" <td>-8.909339e-13</td>\n",
" <td>4.347776e-13</td>\n",
" <td>7.095269e-13</td>\n",
" <td>-1.144797e-12</td>\n",
" <td>-8.133209e-13</td>\n",
" <td>-2.679560e-12</td>\n",
" <td>-7.467526e-13</td>\n",
" <td>...</td>\n",
" <td>-3.415801e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-9.443573e-13</td>\n",
" <td>-1.123546e-12</td>\n",
" <td>1.968449e-12</td>\n",
" <td>-1.013828e-12</td>\n",
" <td>-1.726653e-13</td>\n",
" <td>5.948423e-12</td>\n",
" <td>-0.064801</td>\n",
" <td>0.000805</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v23</th>\n",
" <td>0.051142</td>\n",
" <td>-2.969746e-12</td>\n",
" <td>-4.856120e-12</td>\n",
" <td>4.147209e-12</td>\n",
" <td>-4.160969e-12</td>\n",
" <td>3.616075e-12</td>\n",
" <td>-1.527842e-12</td>\n",
" <td>-4.293094e-12</td>\n",
" <td>9.013064e-13</td>\n",
" <td>-1.011003e-12</td>\n",
" <td>...</td>\n",
" <td>1.066923e-12</td>\n",
" <td>-9.443573e-13</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.354049e-12</td>\n",
" <td>-3.751334e-12</td>\n",
" <td>-1.002379e-12</td>\n",
" <td>9.199153e-12</td>\n",
" <td>3.819775e-12</td>\n",
" <td>-0.112633</td>\n",
" <td>-0.002685</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v24</th>\n",
" <td>-0.016182</td>\n",
" <td>-1.029876e-12</td>\n",
" <td>6.431308e-13</td>\n",
" <td>3.407636e-12</td>\n",
" <td>-2.368743e-12</td>\n",
" <td>-2.808776e-13</td>\n",
" <td>1.551854e-12</td>\n",
" <td>-2.553518e-12</td>\n",
" <td>-1.074365e-12</td>\n",
" <td>8.579072e-13</td>\n",
" <td>...</td>\n",
" <td>2.350293e-12</td>\n",
" <td>-1.123546e-12</td>\n",
" <td>2.354049e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-3.917943e-12</td>\n",
" <td>1.604779e-12</td>\n",
" <td>1.554565e-12</td>\n",
" <td>1.380805e-11</td>\n",
" <td>0.005146</td>\n",
" <td>-0.007221</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v25</th>\n",
" <td>-0.233083</td>\n",
" <td>1.144179e-12</td>\n",
" <td>-9.423730e-13</td>\n",
" <td>5.712956e-13</td>\n",
" <td>1.619944e-12</td>\n",
" <td>1.451126e-12</td>\n",
" <td>-2.723707e-12</td>\n",
" <td>-7.406970e-13</td>\n",
" <td>-3.268979e-12</td>\n",
" <td>-1.590341e-12</td>\n",
" <td>...</td>\n",
" <td>-3.120502e-12</td>\n",
" <td>1.968449e-12</td>\n",
" <td>-3.751334e-12</td>\n",
" <td>-3.917943e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.111834e-12</td>\n",
" <td>-6.220008e-13</td>\n",
" <td>-8.597190e-12</td>\n",
" <td>-0.047837</td>\n",
" <td>0.003308</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v26</th>\n",
" <td>-0.041407</td>\n",
" <td>1.835263e-12</td>\n",
" <td>-4.129100e-13</td>\n",
" <td>-2.577274e-12</td>\n",
" <td>-3.043100e-13</td>\n",
" <td>-1.896141e-13</td>\n",
" <td>3.351239e-12</td>\n",
" <td>-4.476467e-12</td>\n",
" <td>1.043839e-12</td>\n",
" <td>-7.723547e-13</td>\n",
" <td>...</td>\n",
" <td>8.463789e-13</td>\n",
" <td>-1.013828e-12</td>\n",
" <td>-1.002379e-12</td>\n",
" <td>1.604779e-12</td>\n",
" <td>2.111834e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.374854e-12</td>\n",
" <td>-1.036858e-11</td>\n",
" <td>-0.003208</td>\n",
" <td>0.004455</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v27</th>\n",
" <td>-0.005135</td>\n",
" <td>7.624804e-12</td>\n",
" <td>-9.856545e-13</td>\n",
" <td>-5.041444e-12</td>\n",
" <td>-1.456066e-12</td>\n",
" <td>-2.124559e-12</td>\n",
" <td>1.481307e-12</td>\n",
" <td>-1.328637e-11</td>\n",
" <td>-3.499804e-12</td>\n",
" <td>2.428930e-12</td>\n",
" <td>...</td>\n",
" <td>-8.527973e-13</td>\n",
" <td>-1.726653e-13</td>\n",
" <td>9.199153e-12</td>\n",
" <td>1.554565e-12</td>\n",
" <td>-6.220008e-13</td>\n",
" <td>2.374854e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>-4.441112e-12</td>\n",
" <td>0.028825</td>\n",
" <td>0.017580</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v28</th>\n",
" <td>-0.009413</td>\n",
" <td>-9.769215e-13</td>\n",
" <td>2.525513e-12</td>\n",
" <td>5.189109e-12</td>\n",
" <td>-2.832372e-12</td>\n",
" <td>1.010196e-11</td>\n",
" <td>-6.069227e-13</td>\n",
" <td>2.958679e-13</td>\n",
" <td>1.866598e-12</td>\n",
" <td>-1.406856e-12</td>\n",
" <td>...</td>\n",
" <td>4.256994e-12</td>\n",
" <td>5.948423e-12</td>\n",
" <td>3.819775e-12</td>\n",
" <td>1.380805e-11</td>\n",
" <td>-8.597190e-12</td>\n",
" <td>-1.036858e-11</td>\n",
" <td>-4.441112e-12</td>\n",
" <td>1.000000e+00</td>\n",
" <td>0.010258</td>\n",
" <td>0.009536</td>\n",
" </tr>\n",
" <tr>\n",
" <th>amount</th>\n",
" <td>-0.010596</td>\n",
" <td>-2.277087e-01</td>\n",
" <td>-5.314089e-01</td>\n",
" <td>-2.108805e-01</td>\n",
" <td>9.873167e-02</td>\n",
" <td>-3.863563e-01</td>\n",
" <td>2.159812e-01</td>\n",
" <td>3.973113e-01</td>\n",
" <td>-1.030791e-01</td>\n",
" <td>-4.424560e-02</td>\n",
" <td>...</td>\n",
" <td>1.059989e-01</td>\n",
" <td>-6.480065e-02</td>\n",
" <td>-1.126326e-01</td>\n",
" <td>5.146217e-03</td>\n",
" <td>-4.783686e-02</td>\n",
" <td>-3.208037e-03</td>\n",
" <td>2.882546e-02</td>\n",
" <td>1.025822e-02</td>\n",
" <td>1.000000</td>\n",
" <td>0.005632</td>\n",
" </tr>\n",
" <tr>\n",
" <th>class</th>\n",
" <td>-0.012323</td>\n",
" <td>-1.013473e-01</td>\n",
" <td>9.128865e-02</td>\n",
" <td>-1.929608e-01</td>\n",
" <td>1.334475e-01</td>\n",
" <td>-9.497430e-02</td>\n",
" <td>-4.364316e-02</td>\n",
" <td>-1.872566e-01</td>\n",
" <td>1.987512e-02</td>\n",
" <td>-9.773269e-02</td>\n",
" <td>...</td>\n",
" <td>4.041338e-02</td>\n",
" <td>8.053175e-04</td>\n",
" <td>-2.685156e-03</td>\n",
" <td>-7.220907e-03</td>\n",
" <td>3.307706e-03</td>\n",
" <td>4.455398e-03</td>\n",
" <td>1.757973e-02</td>\n",
" <td>9.536041e-03</td>\n",
" <td>0.005632</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>31 rows × 31 columns</p>\n",
"</div>"
],
"text/plain": [
" time v1 v2 v3 v4 \\\n",
"time 1.000000 1.173963e-01 -1.059333e-02 -4.196182e-01 -1.052602e-01 \n",
"v1 0.117396 1.000000e+00 3.777823e-12 -2.118614e-12 -1.733159e-13 \n",
"v2 -0.010593 3.777823e-12 1.000000e+00 2.325661e-12 -2.314981e-12 \n",
"v3 -0.419618 -2.118614e-12 2.325661e-12 1.000000e+00 2.046235e-13 \n",
"v4 -0.105260 -1.733159e-13 -2.314981e-12 2.046235e-13 1.000000e+00 \n",
"v5 0.173072 -3.473231e-12 -1.831952e-12 -4.032993e-12 -2.552389e-13 \n",
"v6 -0.063016 -1.306165e-13 9.438444e-13 -1.574471e-13 1.084041e-12 \n",
"v7 0.084714 -1.116494e-13 5.403436e-12 3.405586e-12 8.135064e-13 \n",
"v8 -0.036949 2.114527e-12 2.133785e-14 -1.272385e-12 7.334818e-13 \n",
"v9 -0.008660 3.016285e-14 3.238513e-13 -6.812351e-13 -7.143069e-13 \n",
"v10 0.030617 -2.615192e-12 1.463282e-12 -1.609126e-12 -1.938143e-12 \n",
"v11 -0.247689 1.866551e-12 -8.314960e-13 8.707055e-13 1.874473e-12 \n",
"v12 0.124348 -1.238745e-12 6.139448e-13 -2.730043e-12 5.393827e-13 \n",
"v13 -0.065902 7.589589e-13 -1.181068e-12 -1.020592e-12 6.813810e-13 \n",
"v14 -0.098757 -1.871054e-13 -3.384684e-13 -5.597874e-13 -1.404120e-12 \n",
"v15 -0.183453 -3.601390e-13 2.196083e-13 6.442512e-13 1.526382e-12 \n",
"v16 0.011903 -1.142884e-12 -8.000510e-13 -8.748795e-13 3.095722e-13 \n",
"v17 -0.073297 1.671073e-12 2.028957e-12 -1.058101e-12 1.151414e-14 \n",
"v18 0.090438 -5.738830e-13 -1.916566e-14 -8.846578e-13 -1.309615e-12 \n",
"v19 0.028975 -2.770259e-12 -2.237098e-13 -1.061131e-12 -9.754131e-13 \n",
"v20 -0.050866 2.662926e-13 5.839893e-13 1.873059e-12 -2.347029e-12 \n",
"v21 0.044736 -3.276238e-12 2.280202e-12 6.736294e-13 -2.696370e-12 \n",
"v22 0.144059 2.281863e-12 -2.548560e-13 -8.909339e-13 4.347776e-13 \n",
"v23 0.051142 -2.969746e-12 -4.856120e-12 4.147209e-12 -4.160969e-12 \n",
"v24 -0.016182 -1.029876e-12 6.431308e-13 3.407636e-12 -2.368743e-12 \n",
"v25 -0.233083 1.144179e-12 -9.423730e-13 5.712956e-13 1.619944e-12 \n",
"v26 -0.041407 1.835263e-12 -4.129100e-13 -2.577274e-12 -3.043100e-13 \n",
"v27 -0.005135 7.624804e-12 -9.856545e-13 -5.041444e-12 -1.456066e-12 \n",
"v28 -0.009413 -9.769215e-13 2.525513e-12 5.189109e-12 -2.832372e-12 \n",
"amount -0.010596 -2.277087e-01 -5.314089e-01 -2.108805e-01 9.873167e-02 \n",
"class -0.012323 -1.013473e-01 9.128865e-02 -1.929608e-01 1.334475e-01 \n",
"\n",
" v5 v6 v7 v8 v9 \\\n",
"time 1.730721e-01 -6.301647e-02 8.471437e-02 -3.694943e-02 -8.660434e-03 \n",
"v1 -3.473231e-12 -1.306165e-13 -1.116494e-13 2.114527e-12 3.016285e-14 \n",
"v2 -1.831952e-12 9.438444e-13 5.403436e-12 2.133785e-14 3.238513e-13 \n",
"v3 -4.032993e-12 -1.574471e-13 3.405586e-12 -1.272385e-12 -6.812351e-13 \n",
"v4 -2.552389e-13 1.084041e-12 8.135064e-13 7.334818e-13 -7.143069e-13 \n",
"v5 1.000000e+00 -6.934789e-14 1.573956e-11 -2.038243e-12 -1.000756e-12 \n",
"v6 -6.934789e-14 1.000000e+00 -2.798968e-12 -5.446480e-13 2.036743e-12 \n",
"v7 1.573956e-11 -2.798968e-12 1.000000e+00 5.528803e-12 5.088082e-13 \n",
"v8 -2.038243e-12 -5.446480e-13 5.528803e-12 1.000000e+00 -2.243172e-12 \n",
"v9 -1.000756e-12 2.036743e-12 5.088082e-13 -2.243172e-12 1.000000e+00 \n",
"v10 -7.200329e-13 7.429770e-13 1.674650e-12 -1.660630e-12 1.185391e-12 \n",
"v11 -5.928181e-13 1.014893e-12 -8.525291e-13 1.296877e-12 -3.970652e-13 \n",
"v12 1.812994e-12 -9.265590e-13 -2.826770e-13 -3.860109e-13 -1.904908e-12 \n",
"v13 -7.021996e-14 1.484679e-12 -8.171731e-13 7.722897e-13 8.754859e-13 \n",
"v14 -1.113015e-13 -1.212766e-12 2.038217e-12 -2.596182e-12 -1.271311e-12 \n",
"v15 -1.593594e-12 -1.053548e-12 1.074440e-12 1.648898e-12 8.628709e-13 \n",
"v16 -1.619090e-14 1.374197e-12 -1.478776e-12 -1.830899e-12 1.239835e-12 \n",
"v17 1.713794e-13 7.431528e-13 -1.231314e-12 7.025405e-13 -1.450585e-12 \n",
"v18 1.101433e-12 6.859871e-13 -4.281952e-13 1.499555e-12 7.186934e-13 \n",
"v19 5.956033e-13 1.148589e-12 -3.742188e-12 1.988417e-12 -8.786777e-13 \n",
"v20 -1.728728e-13 -2.382062e-12 8.068665e-12 -1.884661e-13 1.270200e-12 \n",
"v21 -1.751796e-12 1.476858e-12 2.788246e-12 -4.022440e-12 3.040326e-12 \n",
"v22 7.095269e-13 -1.144797e-12 -8.133209e-13 -2.679560e-12 -7.467526e-13 \n",
"v23 3.616075e-12 -1.527842e-12 -4.293094e-12 9.013064e-13 -1.011003e-12 \n",
"v24 -2.808776e-13 1.551854e-12 -2.553518e-12 -1.074365e-12 8.579072e-13 \n",
"v25 1.451126e-12 -2.723707e-12 -7.406970e-13 -3.268979e-12 -1.590341e-12 \n",
"v26 -1.896141e-13 3.351239e-12 -4.476467e-12 1.043839e-12 -7.723547e-13 \n",
"v27 -2.124559e-12 1.481307e-12 -1.328637e-11 -3.499804e-12 2.428930e-12 \n",
"v28 1.010196e-11 -6.069227e-13 2.958679e-13 1.866598e-12 -1.406856e-12 \n",
"amount -3.863563e-01 2.159812e-01 3.973113e-01 -1.030791e-01 -4.424560e-02 \n",
"class -9.497430e-02 -4.364316e-02 -1.872566e-01 1.987512e-02 -9.773269e-02 \n",
"\n",
" ... v21 v22 v23 v24 \\\n",
"time ... 4.473573e-02 1.440591e-01 5.114236e-02 -1.618187e-02 \n",
"v1 ... -3.276238e-12 2.281863e-12 -2.969746e-12 -1.029876e-12 \n",
"v2 ... 2.280202e-12 -2.548560e-13 -4.856120e-12 6.431308e-13 \n",
"v3 ... 6.736294e-13 -8.909339e-13 4.147209e-12 3.407636e-12 \n",
"v4 ... -2.696370e-12 4.347776e-13 -4.160969e-12 -2.368743e-12 \n",
"v5 ... -1.751796e-12 7.095269e-13 3.616075e-12 -2.808776e-13 \n",
"v6 ... 1.476858e-12 -1.144797e-12 -1.527842e-12 1.551854e-12 \n",
"v7 ... 2.788246e-12 -8.133209e-13 -4.293094e-12 -2.553518e-12 \n",
"v8 ... -4.022440e-12 -2.679560e-12 9.013064e-13 -1.074365e-12 \n",
"v9 ... 3.040326e-12 -7.467526e-13 -1.011003e-12 8.579072e-13 \n",
"v10 ... -5.547700e-13 -1.320186e-13 1.173332e-12 6.405710e-13 \n",
"v11 ... 1.100352e-13 -5.644168e-14 1.724963e-12 -1.162239e-12 \n",
"v12 ... 8.106835e-13 -2.346533e-12 -6.878556e-13 -2.911084e-12 \n",
"v13 ... -2.037258e-12 -5.491535e-13 3.508022e-12 1.225112e-13 \n",
"v14 ... -4.557223e-13 2.572021e-12 8.288666e-13 -3.382145e-12 \n",
"v15 ... 5.921902e-13 -4.115704e-13 -9.846654e-13 -3.256310e-12 \n",
"v16 ... -1.067918e-12 2.009490e-12 4.057311e-13 -4.061029e-13 \n",
"v17 ... 1.793607e-12 2.280366e-13 -9.948639e-13 -2.073066e-12 \n",
"v18 ... -2.185508e-12 1.392636e-12 -2.160673e-12 4.303958e-12 \n",
"v19 ... -3.315774e-13 7.050020e-14 -7.118335e-13 1.326310e-12 \n",
"v20 ... -3.892661e-12 1.632957e-12 -1.019668e-11 1.267519e-12 \n",
"v21 ... 1.000000e+00 -3.415801e-12 1.066923e-12 2.350293e-12 \n",
"v22 ... -3.415801e-12 1.000000e+00 -9.443573e-13 -1.123546e-12 \n",
"v23 ... 1.066923e-12 -9.443573e-13 1.000000e+00 2.354049e-12 \n",
"v24 ... 2.350293e-12 -1.123546e-12 2.354049e-12 1.000000e+00 \n",
"v25 ... -3.120502e-12 1.968449e-12 -3.751334e-12 -3.917943e-12 \n",
"v26 ... 8.463789e-13 -1.013828e-12 -1.002379e-12 1.604779e-12 \n",
"v27 ... -8.527973e-13 -1.726653e-13 9.199153e-12 1.554565e-12 \n",
"v28 ... 4.256994e-12 5.948423e-12 3.819775e-12 1.380805e-11 \n",
"amount ... 1.059989e-01 -6.480065e-02 -1.126326e-01 5.146217e-03 \n",
"class ... 4.041338e-02 8.053175e-04 -2.685156e-03 -7.220907e-03 \n",
"\n",
" v25 v26 v27 v28 amount \\\n",
"time -2.330828e-01 -4.140710e-02 -5.134591e-03 -9.412688e-03 -0.010596 \n",
"v1 1.144179e-12 1.835263e-12 7.624804e-12 -9.769215e-13 -0.227709 \n",
"v2 -9.423730e-13 -4.129100e-13 -9.856545e-13 2.525513e-12 -0.531409 \n",
"v3 5.712956e-13 -2.577274e-12 -5.041444e-12 5.189109e-12 -0.210880 \n",
"v4 1.619944e-12 -3.043100e-13 -1.456066e-12 -2.832372e-12 0.098732 \n",
"v5 1.451126e-12 -1.896141e-13 -2.124559e-12 1.010196e-11 -0.386356 \n",
"v6 -2.723707e-12 3.351239e-12 1.481307e-12 -6.069227e-13 0.215981 \n",
"v7 -7.406970e-13 -4.476467e-12 -1.328637e-11 2.958679e-13 0.397311 \n",
"v8 -3.268979e-12 1.043839e-12 -3.499804e-12 1.866598e-12 -0.103079 \n",
"v9 -1.590341e-12 -7.723547e-13 2.428930e-12 -1.406856e-12 -0.044246 \n",
"v10 2.794979e-12 -2.738577e-13 1.552492e-12 5.116568e-12 -0.101502 \n",
"v11 -1.351430e-12 2.718291e-12 -3.950227e-12 -4.247931e-12 0.000104 \n",
"v12 1.102899e-12 2.808714e-13 5.953998e-13 -7.428113e-12 -0.009542 \n",
"v13 -1.513549e-12 -2.008364e-12 4.975659e-12 -6.777880e-12 0.005293 \n",
"v14 8.299871e-13 -3.304576e-13 -2.447674e-12 -1.700091e-12 0.033751 \n",
"v15 -1.725436e-12 5.478951e-13 -4.690702e-12 -4.214967e-12 -0.002986 \n",
"v16 7.626529e-13 -1.323365e-12 7.022747e-12 5.737097e-13 -0.003910 \n",
"v17 4.514159e-12 2.940618e-12 -1.324408e-12 1.854033e-12 0.007309 \n",
"v18 5.432404e-13 -1.810692e-12 -4.949670e-12 4.113104e-12 0.035650 \n",
"v19 9.270702e-13 2.412082e-12 -2.201365e-12 3.450583e-12 -0.056151 \n",
"v20 -1.593346e-12 1.469484e-13 -2.996546e-12 6.123479e-12 0.339403 \n",
"v21 -3.120502e-12 8.463789e-13 -8.527973e-13 4.256994e-12 0.105999 \n",
"v22 1.968449e-12 -1.013828e-12 -1.726653e-13 5.948423e-12 -0.064801 \n",
"v23 -3.751334e-12 -1.002379e-12 9.199153e-12 3.819775e-12 -0.112633 \n",
"v24 -3.917943e-12 1.604779e-12 1.554565e-12 1.380805e-11 0.005146 \n",
"v25 1.000000e+00 2.111834e-12 -6.220008e-13 -8.597190e-12 -0.047837 \n",
"v26 2.111834e-12 1.000000e+00 2.374854e-12 -1.036858e-11 -0.003208 \n",
"v27 -6.220008e-13 2.374854e-12 1.000000e+00 -4.441112e-12 0.028825 \n",
"v28 -8.597190e-12 -1.036858e-11 -4.441112e-12 1.000000e+00 0.010258 \n",
"amount -4.783686e-02 -3.208037e-03 2.882546e-02 1.025822e-02 1.000000 \n",
"class 3.307706e-03 4.455398e-03 1.757973e-02 9.536041e-03 0.005632 \n",
"\n",
" class \n",
"time -0.012323 \n",
"v1 -0.101347 \n",
"v2 0.091289 \n",
"v3 -0.192961 \n",
"v4 0.133447 \n",
"v5 -0.094974 \n",
"v6 -0.043643 \n",
"v7 -0.187257 \n",
"v8 0.019875 \n",
"v9 -0.097733 \n",
"v10 -0.216883 \n",
"v11 0.154876 \n",
"v12 -0.260593 \n",
"v13 -0.004570 \n",
"v14 -0.302544 \n",
"v15 -0.004223 \n",
"v16 -0.196539 \n",
"v17 -0.326481 \n",
"v18 -0.111485 \n",
"v19 0.034783 \n",
"v20 0.020090 \n",
"v21 0.040413 \n",
"v22 0.000805 \n",
"v23 -0.002685 \n",
"v24 -0.007221 \n",
"v25 0.003308 \n",
"v26 0.004455 \n",
"v27 0.017580 \n",
"v28 0.009536 \n",
"amount 0.005632 \n",
"class 1.000000 \n",
"\n",
"[31 rows x 31 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# correlations\n",
"corr_matrix = df_train.corr()\n",
"corr_matrix"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "postal-microphone",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"v17 -0.326481\n",
"v14 -0.302544\n",
"v12 -0.260593\n",
"v10 -0.216883\n",
"v16 -0.196539\n",
"v3 -0.192961\n",
"v7 -0.187257\n",
"v18 -0.111485\n",
"v1 -0.101347\n",
"v9 -0.097733\n",
"v5 -0.094974\n",
"v6 -0.043643\n",
"time -0.012323\n",
"v24 -0.007221\n",
"v13 -0.004570\n",
"v15 -0.004223\n",
"v23 -0.002685\n",
"v22 0.000805\n",
"v25 0.003308\n",
"v26 0.004455\n",
"amount 0.005632\n",
"v28 0.009536\n",
"v27 0.017580\n",
"v8 0.019875\n",
"v20 0.020090\n",
"v19 0.034783\n",
"v21 0.040413\n",
"v2 0.091289\n",
"v4 0.133447\n",
"v11 0.154876\n",
"class 1.000000\n",
"Name: class, dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# order the correlations with class\n",
"correlations = corr_matrix[\"class\"].sort_values()\n",
"correlations"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "revolutionary-medicare",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# check correlation matrix, darker means more correlation \n",
"k = 10 #number of variables for heatmap\n",
"cols = corr_matrix.nlargest(k, 'class')['class'].index\n",
"cm = np.corrcoef(df_train[cols].values.T)\n",
"sns.set(font_scale=1.25)\n",
"hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f', cmap=\"YlGnBu\", annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "accessory-block",
"metadata": {},
"source": [
"### Check outliers"
]
},
{
"cell_type": "markdown",
"id": "changing-flesh",
"metadata": {},
"source": [
"#### Numerical Variables Analysis"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "fresh-boundary",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>time</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v1</th>\n",
" <td>-1.359807</td>\n",
" <td>1.191857</td>\n",
" <td>-1.358354</td>\n",
" <td>-0.966272</td>\n",
" <td>-1.158233</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v2</th>\n",
" <td>-0.072781</td>\n",
" <td>0.266151</td>\n",
" <td>-1.340163</td>\n",
" <td>-0.185226</td>\n",
" <td>0.877737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v3</th>\n",
" <td>2.536347</td>\n",
" <td>0.166480</td>\n",
" <td>1.773209</td>\n",
" <td>1.792993</td>\n",
" <td>1.548718</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v4</th>\n",
" <td>1.378155</td>\n",
" <td>0.448154</td>\n",
" <td>0.379780</td>\n",
" <td>-0.863291</td>\n",
" <td>0.403034</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v5</th>\n",
" <td>-0.338321</td>\n",
" <td>0.060018</td>\n",
" <td>-0.503198</td>\n",
" <td>-0.010309</td>\n",
" <td>-0.407193</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v6</th>\n",
" <td>0.462388</td>\n",
" <td>-0.082361</td>\n",
" <td>1.800499</td>\n",
" <td>1.247203</td>\n",
" <td>0.095921</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v7</th>\n",
" <td>0.239599</td>\n",
" <td>-0.078803</td>\n",
" <td>0.791461</td>\n",
" <td>0.237609</td>\n",
" <td>0.592941</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v8</th>\n",
" <td>0.098698</td>\n",
" <td>0.085102</td>\n",
" <td>0.247676</td>\n",
" <td>0.377436</td>\n",
" <td>-0.270533</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v9</th>\n",
" <td>0.363787</td>\n",
" <td>-0.255425</td>\n",
" <td>-1.514654</td>\n",
" <td>-1.387024</td>\n",
" <td>0.817739</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v10</th>\n",
" <td>0.090794</td>\n",
" <td>-0.166974</td>\n",
" <td>0.207643</td>\n",
" <td>-0.054952</td>\n",
" <td>0.753074</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v11</th>\n",
" <td>-0.551600</td>\n",
" <td>1.612727</td>\n",
" <td>0.624501</td>\n",
" <td>-0.226487</td>\n",
" <td>-0.822843</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v12</th>\n",
" <td>-0.617801</td>\n",
" <td>1.065235</td>\n",
" <td>0.066084</td>\n",
" <td>0.178228</td>\n",
" <td>0.538196</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v13</th>\n",
" <td>-0.991390</td>\n",
" <td>0.489095</td>\n",
" <td>0.717293</td>\n",
" <td>0.507757</td>\n",
" <td>1.345852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v14</th>\n",
" <td>-0.311169</td>\n",
" <td>-0.143772</td>\n",
" <td>-0.165946</td>\n",
" <td>-0.287924</td>\n",
" <td>-1.119670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v15</th>\n",
" <td>1.468177</td>\n",
" <td>0.635558</td>\n",
" <td>2.345865</td>\n",
" <td>-0.631418</td>\n",
" <td>0.175121</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v16</th>\n",
" <td>-0.470401</td>\n",
" <td>0.463917</td>\n",
" <td>-2.890083</td>\n",
" <td>-1.059647</td>\n",
" <td>-0.451449</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v17</th>\n",
" <td>0.207971</td>\n",
" <td>-0.114805</td>\n",
" <td>1.109969</td>\n",
" <td>-0.684093</td>\n",
" <td>-0.237033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v18</th>\n",
" <td>0.025791</td>\n",
" <td>-0.183361</td>\n",
" <td>-0.121359</td>\n",
" <td>1.965775</td>\n",
" <td>-0.038195</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v19</th>\n",
" <td>0.403993</td>\n",
" <td>-0.145783</td>\n",
" <td>-2.261857</td>\n",
" <td>-1.232622</td>\n",
" <td>0.803487</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v20</th>\n",
" <td>0.251412</td>\n",
" <td>-0.069083</td>\n",
" <td>0.524980</td>\n",
" <td>-0.208038</td>\n",
" <td>0.408542</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v21</th>\n",
" <td>-0.018307</td>\n",
" <td>-0.225775</td>\n",
" <td>0.247998</td>\n",
" <td>-0.108300</td>\n",
" <td>-0.009431</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v22</th>\n",
" <td>0.277838</td>\n",
" <td>-0.638672</td>\n",
" <td>0.771679</td>\n",
" <td>0.005274</td>\n",
" <td>0.798278</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v23</th>\n",
" <td>-0.110474</td>\n",
" <td>0.101288</td>\n",
" <td>0.909412</td>\n",
" <td>-0.190321</td>\n",
" <td>-0.137458</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v24</th>\n",
" <td>0.066928</td>\n",
" <td>-0.339846</td>\n",
" <td>-0.689281</td>\n",
" <td>-1.175575</td>\n",
" <td>0.141267</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v25</th>\n",
" <td>0.128539</td>\n",
" <td>0.167170</td>\n",
" <td>-0.327642</td>\n",
" <td>0.647376</td>\n",
" <td>-0.206010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v26</th>\n",
" <td>-0.189115</td>\n",
" <td>0.125895</td>\n",
" <td>-0.139097</td>\n",
" <td>-0.221929</td>\n",
" <td>0.502292</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v27</th>\n",
" <td>0.133558</td>\n",
" <td>-0.008983</td>\n",
" <td>-0.055353</td>\n",
" <td>0.062723</td>\n",
" <td>0.219422</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v28</th>\n",
" <td>-0.021053</td>\n",
" <td>0.014724</td>\n",
" <td>-0.059752</td>\n",
" <td>0.061458</td>\n",
" <td>0.215153</td>\n",
" </tr>\n",
" <tr>\n",
" <th>amount</th>\n",
" <td>149.620000</td>\n",
" <td>2.690000</td>\n",
" <td>378.660000</td>\n",
" <td>123.500000</td>\n",
" <td>69.990000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>class</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4\n",
"time 0.000000 0.000000 1.000000 1.000000 2.000000\n",
"v1 -1.359807 1.191857 -1.358354 -0.966272 -1.158233\n",
"v2 -0.072781 0.266151 -1.340163 -0.185226 0.877737\n",
"v3 2.536347 0.166480 1.773209 1.792993 1.548718\n",
"v4 1.378155 0.448154 0.379780 -0.863291 0.403034\n",
"v5 -0.338321 0.060018 -0.503198 -0.010309 -0.407193\n",
"v6 0.462388 -0.082361 1.800499 1.247203 0.095921\n",
"v7 0.239599 -0.078803 0.791461 0.237609 0.592941\n",
"v8 0.098698 0.085102 0.247676 0.377436 -0.270533\n",
"v9 0.363787 -0.255425 -1.514654 -1.387024 0.817739\n",
"v10 0.090794 -0.166974 0.207643 -0.054952 0.753074\n",
"v11 -0.551600 1.612727 0.624501 -0.226487 -0.822843\n",
"v12 -0.617801 1.065235 0.066084 0.178228 0.538196\n",
"v13 -0.991390 0.489095 0.717293 0.507757 1.345852\n",
"v14 -0.311169 -0.143772 -0.165946 -0.287924 -1.119670\n",
"v15 1.468177 0.635558 2.345865 -0.631418 0.175121\n",
"v16 -0.470401 0.463917 -2.890083 -1.059647 -0.451449\n",
"v17 0.207971 -0.114805 1.109969 -0.684093 -0.237033\n",
"v18 0.025791 -0.183361 -0.121359 1.965775 -0.038195\n",
"v19 0.403993 -0.145783 -2.261857 -1.232622 0.803487\n",
"v20 0.251412 -0.069083 0.524980 -0.208038 0.408542\n",
"v21 -0.018307 -0.225775 0.247998 -0.108300 -0.009431\n",
"v22 0.277838 -0.638672 0.771679 0.005274 0.798278\n",
"v23 -0.110474 0.101288 0.909412 -0.190321 -0.137458\n",
"v24 0.066928 -0.339846 -0.689281 -1.175575 0.141267\n",
"v25 0.128539 0.167170 -0.327642 0.647376 -0.206010\n",
"v26 -0.189115 0.125895 -0.139097 -0.221929 0.502292\n",
"v27 0.133558 -0.008983 -0.055353 0.062723 0.219422\n",
"v28 -0.021053 0.014724 -0.059752 0.061458 0.215153\n",
"amount 149.620000 2.690000 378.660000 123.500000 69.990000\n",
"class 0.000000 0.000000 0.000000 0.000000 0.000000"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# isolating numerical columns in a dataframe\n",
"numerics = ['int64', 'float64']\n",
"num_df = df_train.select_dtypes(include=numerics)\n",
"num_df.head().T"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "stock-geneva",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>time</th>\n",
" <td>284807.0</td>\n",
" <td>9.481386e+04</td>\n",
" <td>47488.145955</td>\n",
" <td>0.000000</td>\n",
" <td>54201.500000</td>\n",
" <td>84692.000000</td>\n",
" <td>139320.500000</td>\n",
" <td>172792.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v1</th>\n",
" <td>284807.0</td>\n",
" <td>1.759061e-12</td>\n",
" <td>1.958696</td>\n",
" <td>-56.407510</td>\n",
" <td>-0.920373</td>\n",
" <td>0.018109</td>\n",
" <td>1.315642</td>\n",
" <td>2.454930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v2</th>\n",
" <td>284807.0</td>\n",
" <td>-8.251130e-13</td>\n",
" <td>1.651309</td>\n",
" <td>-72.715728</td>\n",
" <td>-0.598550</td>\n",
" <td>0.065486</td>\n",
" <td>0.803724</td>\n",
" <td>22.057729</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v3</th>\n",
" <td>284807.0</td>\n",
" <td>-9.654937e-13</td>\n",
" <td>1.516255</td>\n",
" <td>-48.325589</td>\n",
" <td>-0.890365</td>\n",
" <td>0.179846</td>\n",
" <td>1.027196</td>\n",
" <td>9.382558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v4</th>\n",
" <td>284807.0</td>\n",
" <td>8.321385e-13</td>\n",
" <td>1.415869</td>\n",
" <td>-5.683171</td>\n",
" <td>-0.848640</td>\n",
" <td>-0.019847</td>\n",
" <td>0.743341</td>\n",
" <td>16.875344</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v5</th>\n",
" <td>284807.0</td>\n",
" <td>1.649999e-13</td>\n",
" <td>1.380247</td>\n",
" <td>-113.743307</td>\n",
" <td>-0.691597</td>\n",
" <td>-0.054336</td>\n",
" <td>0.611926</td>\n",
" <td>34.801666</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v6</th>\n",
" <td>284807.0</td>\n",
" <td>4.248366e-13</td>\n",
" <td>1.332271</td>\n",
" <td>-26.160506</td>\n",
" <td>-0.768296</td>\n",
" <td>-0.274187</td>\n",
" <td>0.398565</td>\n",
" <td>73.301626</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v7</th>\n",
" <td>284807.0</td>\n",
" <td>-3.054600e-13</td>\n",
" <td>1.237094</td>\n",
" <td>-43.557242</td>\n",
" <td>-0.554076</td>\n",
" <td>0.040103</td>\n",
" <td>0.570436</td>\n",
" <td>120.589494</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v8</th>\n",
" <td>284807.0</td>\n",
" <td>8.777971e-14</td>\n",
" <td>1.194353</td>\n",
" <td>-73.216718</td>\n",
" <td>-0.208630</td>\n",
" <td>0.022358</td>\n",
" <td>0.327346</td>\n",
" <td>20.007208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v9</th>\n",
" <td>284807.0</td>\n",
" <td>-1.179749e-12</td>\n",
" <td>1.098632</td>\n",
" <td>-13.434066</td>\n",
" <td>-0.643098</td>\n",
" <td>-0.051429</td>\n",
" <td>0.597139</td>\n",
" <td>15.594995</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v10</th>\n",
" <td>284807.0</td>\n",
" <td>7.092545e-13</td>\n",
" <td>1.088850</td>\n",
" <td>-24.588262</td>\n",
" <td>-0.535426</td>\n",
" <td>-0.092917</td>\n",
" <td>0.453923</td>\n",
" <td>23.745136</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v11</th>\n",
" <td>284807.0</td>\n",
" <td>1.874948e-12</td>\n",
" <td>1.020713</td>\n",
" <td>-4.797473</td>\n",
" <td>-0.762494</td>\n",
" <td>-0.032757</td>\n",
" <td>0.739593</td>\n",
" <td>12.018913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v12</th>\n",
" <td>284807.0</td>\n",
" <td>1.053347e-12</td>\n",
" <td>0.999201</td>\n",
" <td>-18.683715</td>\n",
" <td>-0.405571</td>\n",
" <td>0.140033</td>\n",
" <td>0.618238</td>\n",
" <td>7.848392</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v13</th>\n",
" <td>284807.0</td>\n",
" <td>7.127611e-13</td>\n",
" <td>0.995274</td>\n",
" <td>-5.791881</td>\n",
" <td>-0.648539</td>\n",
" <td>-0.013568</td>\n",
" <td>0.662505</td>\n",
" <td>7.126883</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v14</th>\n",
" <td>284807.0</td>\n",
" <td>-1.474791e-13</td>\n",
" <td>0.958596</td>\n",
" <td>-19.214325</td>\n",
" <td>-0.425574</td>\n",
" <td>0.050601</td>\n",
" <td>0.493150</td>\n",
" <td>10.526766</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v15</th>\n",
" <td>284807.0</td>\n",
" <td>-5.231558e-13</td>\n",
" <td>0.915316</td>\n",
" <td>-4.498945</td>\n",
" <td>-0.582884</td>\n",
" <td>0.048072</td>\n",
" <td>0.648821</td>\n",
" <td>8.877742</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v16</th>\n",
" <td>284807.0</td>\n",
" <td>-2.282250e-13</td>\n",
" <td>0.876253</td>\n",
" <td>-14.129855</td>\n",
" <td>-0.468037</td>\n",
" <td>0.066413</td>\n",
" <td>0.523296</td>\n",
" <td>17.315112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v17</th>\n",
" <td>284807.0</td>\n",
" <td>-6.425436e-13</td>\n",
" <td>0.849337</td>\n",
" <td>-25.162799</td>\n",
" <td>-0.483748</td>\n",
" <td>-0.065676</td>\n",
" <td>0.399675</td>\n",
" <td>9.253526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v18</th>\n",
" <td>284807.0</td>\n",
" <td>4.950748e-13</td>\n",
" <td>0.838176</td>\n",
" <td>-9.498746</td>\n",
" <td>-0.498850</td>\n",
" <td>-0.003636</td>\n",
" <td>0.500807</td>\n",
" <td>5.041069</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v19</th>\n",
" <td>284807.0</td>\n",
" <td>7.057397e-13</td>\n",
" <td>0.814041</td>\n",
" <td>-7.213527</td>\n",
" <td>-0.456299</td>\n",
" <td>0.003735</td>\n",
" <td>0.458949</td>\n",
" <td>5.591971</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v20</th>\n",
" <td>284807.0</td>\n",
" <td>1.766111e-12</td>\n",
" <td>0.770925</td>\n",
" <td>-54.497720</td>\n",
" <td>-0.211721</td>\n",
" <td>-0.062481</td>\n",
" <td>0.133041</td>\n",
" <td>39.420904</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v21</th>\n",
" <td>284807.0</td>\n",
" <td>-3.405756e-13</td>\n",
" <td>0.734524</td>\n",
" <td>-34.830382</td>\n",
" <td>-0.228395</td>\n",
" <td>-0.029450</td>\n",
" <td>0.186377</td>\n",
" <td>27.202839</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v22</th>\n",
" <td>284807.0</td>\n",
" <td>-5.723197e-13</td>\n",
" <td>0.725702</td>\n",
" <td>-10.933144</td>\n",
" <td>-0.542350</td>\n",
" <td>0.006782</td>\n",
" <td>0.528554</td>\n",
" <td>10.503090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v23</th>\n",
" <td>284807.0</td>\n",
" <td>-9.725856e-13</td>\n",
" <td>0.624460</td>\n",
" <td>-44.807735</td>\n",
" <td>-0.161846</td>\n",
" <td>-0.011193</td>\n",
" <td>0.147642</td>\n",
" <td>22.528412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v24</th>\n",
" <td>284807.0</td>\n",
" <td>1.464150e-12</td>\n",
" <td>0.605647</td>\n",
" <td>-2.836627</td>\n",
" <td>-0.354586</td>\n",
" <td>0.040976</td>\n",
" <td>0.439527</td>\n",
" <td>4.584549</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v25</th>\n",
" <td>284807.0</td>\n",
" <td>-6.987102e-13</td>\n",
" <td>0.521278</td>\n",
" <td>-10.295397</td>\n",
" <td>-0.317145</td>\n",
" <td>0.016594</td>\n",
" <td>0.350716</td>\n",
" <td>7.519589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v26</th>\n",
" <td>284807.0</td>\n",
" <td>-5.617874e-13</td>\n",
" <td>0.482227</td>\n",
" <td>-2.604551</td>\n",
" <td>-0.326984</td>\n",
" <td>-0.052139</td>\n",
" <td>0.240952</td>\n",
" <td>3.517346</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v27</th>\n",
" <td>284807.0</td>\n",
" <td>3.332082e-12</td>\n",
" <td>0.403632</td>\n",
" <td>-22.565679</td>\n",
" <td>-0.070840</td>\n",
" <td>0.001342</td>\n",
" <td>0.091045</td>\n",
" <td>31.612198</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v28</th>\n",
" <td>284807.0</td>\n",
" <td>-3.518874e-12</td>\n",
" <td>0.330083</td>\n",
" <td>-15.430084</td>\n",
" <td>-0.052960</td>\n",
" <td>0.011244</td>\n",
" <td>0.078280</td>\n",
" <td>33.847808</td>\n",
" </tr>\n",
" <tr>\n",
" <th>amount</th>\n",
" <td>284807.0</td>\n",
" <td>8.834962e+01</td>\n",
" <td>250.120109</td>\n",
" <td>0.000000</td>\n",
" <td>5.600000</td>\n",
" <td>22.000000</td>\n",
" <td>77.165000</td>\n",
" <td>25691.160000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>class</th>\n",
" <td>284807.0</td>\n",
" <td>1.727486e-03</td>\n",
" <td>0.041527</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min 25% \\\n",
"time 284807.0 9.481386e+04 47488.145955 0.000000 54201.500000 \n",
"v1 284807.0 1.759061e-12 1.958696 -56.407510 -0.920373 \n",
"v2 284807.0 -8.251130e-13 1.651309 -72.715728 -0.598550 \n",
"v3 284807.0 -9.654937e-13 1.516255 -48.325589 -0.890365 \n",
"v4 284807.0 8.321385e-13 1.415869 -5.683171 -0.848640 \n",
"v5 284807.0 1.649999e-13 1.380247 -113.743307 -0.691597 \n",
"v6 284807.0 4.248366e-13 1.332271 -26.160506 -0.768296 \n",
"v7 284807.0 -3.054600e-13 1.237094 -43.557242 -0.554076 \n",
"v8 284807.0 8.777971e-14 1.194353 -73.216718 -0.208630 \n",
"v9 284807.0 -1.179749e-12 1.098632 -13.434066 -0.643098 \n",
"v10 284807.0 7.092545e-13 1.088850 -24.588262 -0.535426 \n",
"v11 284807.0 1.874948e-12 1.020713 -4.797473 -0.762494 \n",
"v12 284807.0 1.053347e-12 0.999201 -18.683715 -0.405571 \n",
"v13 284807.0 7.127611e-13 0.995274 -5.791881 -0.648539 \n",
"v14 284807.0 -1.474791e-13 0.958596 -19.214325 -0.425574 \n",
"v15 284807.0 -5.231558e-13 0.915316 -4.498945 -0.582884 \n",
"v16 284807.0 -2.282250e-13 0.876253 -14.129855 -0.468037 \n",
"v17 284807.0 -6.425436e-13 0.849337 -25.162799 -0.483748 \n",
"v18 284807.0 4.950748e-13 0.838176 -9.498746 -0.498850 \n",
"v19 284807.0 7.057397e-13 0.814041 -7.213527 -0.456299 \n",
"v20 284807.0 1.766111e-12 0.770925 -54.497720 -0.211721 \n",
"v21 284807.0 -3.405756e-13 0.734524 -34.830382 -0.228395 \n",
"v22 284807.0 -5.723197e-13 0.725702 -10.933144 -0.542350 \n",
"v23 284807.0 -9.725856e-13 0.624460 -44.807735 -0.161846 \n",
"v24 284807.0 1.464150e-12 0.605647 -2.836627 -0.354586 \n",
"v25 284807.0 -6.987102e-13 0.521278 -10.295397 -0.317145 \n",
"v26 284807.0 -5.617874e-13 0.482227 -2.604551 -0.326984 \n",
"v27 284807.0 3.332082e-12 0.403632 -22.565679 -0.070840 \n",
"v28 284807.0 -3.518874e-12 0.330083 -15.430084 -0.052960 \n",
"amount 284807.0 8.834962e+01 250.120109 0.000000 5.600000 \n",
"class 284807.0 1.727486e-03 0.041527 0.000000 0.000000 \n",
"\n",
" 50% 75% max \n",
"time 84692.000000 139320.500000 172792.000000 \n",
"v1 0.018109 1.315642 2.454930 \n",
"v2 0.065486 0.803724 22.057729 \n",
"v3 0.179846 1.027196 9.382558 \n",
"v4 -0.019847 0.743341 16.875344 \n",
"v5 -0.054336 0.611926 34.801666 \n",
"v6 -0.274187 0.398565 73.301626 \n",
"v7 0.040103 0.570436 120.589494 \n",
"v8 0.022358 0.327346 20.007208 \n",
"v9 -0.051429 0.597139 15.594995 \n",
"v10 -0.092917 0.453923 23.745136 \n",
"v11 -0.032757 0.739593 12.018913 \n",
"v12 0.140033 0.618238 7.848392 \n",
"v13 -0.013568 0.662505 7.126883 \n",
"v14 0.050601 0.493150 10.526766 \n",
"v15 0.048072 0.648821 8.877742 \n",
"v16 0.066413 0.523296 17.315112 \n",
"v17 -0.065676 0.399675 9.253526 \n",
"v18 -0.003636 0.500807 5.041069 \n",
"v19 0.003735 0.458949 5.591971 \n",
"v20 -0.062481 0.133041 39.420904 \n",
"v21 -0.029450 0.186377 27.202839 \n",
"v22 0.006782 0.528554 10.503090 \n",
"v23 -0.011193 0.147642 22.528412 \n",
"v24 0.040976 0.439527 4.584549 \n",
"v25 0.016594 0.350716 7.519589 \n",
"v26 -0.052139 0.240952 3.517346 \n",
"v27 0.001342 0.091045 31.612198 \n",
"v28 0.011244 0.078280 33.847808 \n",
"amount 22.000000 77.165000 25691.160000 \n",
"class 0.000000 0.000000 1.000000 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# summary statistics of all the columns\n",
"num_df.describe().T"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "institutional-filing",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x720 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(15,10))\n",
"\n",
"plt.subplot(2, 2, 1)\n",
"ax = sns.boxplot(y=df_train[\"amount\"])\n",
"ax.set_xlabel(\"amount\")\n",
"sns.set(style=\"darkgrid\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "prime-circuit",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Outliers for the amount are < -209.095 or > 291.86\n"
]
}
],
"source": [
"# calculating outlier space for amount\n",
"\n",
"IQR = df_train[\"amount\"].quantile(0.75) - df_train[\"amount\"].quantile(0.25)\n",
"lf = df_train[\"amount\"].quantile(0.25) - (IQR * 3)\n",
"uf = df_train[\"amount\"].quantile(0.75) + (IQR * 3)\n",
"print('Outliers for the amount are < {lbound} or > {ubound}'.format(\n",
" lbound=lf, \n",
" ubound=uf)\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "anonymous-tourism",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x720 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(15,10))\n",
"\n",
"plt.subplot(2, 2, 1)\n",
"ax = sns.boxplot(y=df_train[\"v25\"])\n",
"ax.set_xlabel(\"v25\")\n",
"sns.set(style=\"darkgrid\")"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "closed-african",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Outliers for the v25 are < -2.3207269050000003 or > 2.3542974140000004\n"
]
}
],
"source": [
"# calculating outlier space for V25\n",
"\n",
"IQR = df_train[\"v25\"].quantile(0.75) - df_train[\"v25\"].quantile(0.25)\n",
"lf = df_train[\"v25\"].quantile(0.25) - (IQR * 3)\n",
"uf = df_train[\"v25\"].quantile(0.75) + (IQR * 3)\n",
"\n",
"print('Outliers for the v25 are < {lbound} or > {ubound}'.format(\n",
" lbound=lf, \n",
" ubound=uf)\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "three-detroit",
"metadata": {},
"source": [
"## Baseline model"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "cardiac-butterfly",
"metadata": {},
"outputs": [],
"source": [
"base = ['amount', 'v13', 'v14', 'v15', 'v16', 'v27']"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "excellent-bidder",
"metadata": {},
"outputs": [],
"source": [
"# isolate the target and filter the features we want to use\n",
"def baseline_model(X):\n",
" target = X[\"class\"]\n",
" X = X[base]\n",
" return X, target"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "seeing-cleanup",
"metadata": {},
"outputs": [],
"source": [
"X, y = baseline_model(df_train.copy())"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "armed-conclusion",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>amount</th>\n",
" <th>v13</th>\n",
" <th>v14</th>\n",
" <th>v15</th>\n",
" <th>v16</th>\n",
" <th>v27</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>149.62</td>\n",
" <td>-0.991390</td>\n",
" <td>-0.311169</td>\n",
" <td>1.468177</td>\n",
" <td>-0.470401</td>\n",
" <td>0.133558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.69</td>\n",
" <td>0.489095</td>\n",
" <td>-0.143772</td>\n",
" <td>0.635558</td>\n",
" <td>0.463917</td>\n",
" <td>-0.008983</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>378.66</td>\n",
" <td>0.717293</td>\n",
" <td>-0.165946</td>\n",
" <td>2.345865</td>\n",
" <td>-2.890083</td>\n",
" <td>-0.055353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>123.50</td>\n",
" <td>0.507757</td>\n",
" <td>-0.287924</td>\n",
" <td>-0.631418</td>\n",
" <td>-1.059647</td>\n",
" <td>0.062723</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>69.99</td>\n",
" <td>1.345852</td>\n",
" <td>-1.119670</td>\n",
" <td>0.175121</td>\n",
" <td>-0.451449</td>\n",
" <td>0.219422</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" amount v13 v14 v15 v16 v27\n",
"0 149.62 -0.991390 -0.311169 1.468177 -0.470401 0.133558\n",
"1 2.69 0.489095 -0.143772 0.635558 0.463917 -0.008983\n",
"2 378.66 0.717293 -0.165946 2.345865 -2.890083 -0.055353\n",
"3 123.50 0.507757 -0.287924 -0.631418 -1.059647 0.062723\n",
"4 69.99 1.345852 -1.119670 0.175121 -0.451449 0.219422"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.head()"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "deluxe-parliament",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LogisticRegression(max_iter=1000)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div>"
],
"text/plain": [
"LogisticRegression(max_iter=1000)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"lr_model = LogisticRegression(max_iter=1000)\n",
"lr_model.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "included-distance",
"metadata": {},
"outputs": [],
"source": [
"# Make prediction\n",
"\n",
"lr_base_predicted_labels = lr_model.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "verified-sleep",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 284315\n",
" 1 0.85 0.55 0.67 492\n",
"\n",
" accuracy 1.00 284807\n",
" macro avg 0.92 0.78 0.84 284807\n",
"weighted avg 1.00 1.00 1.00 284807\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix\n",
"\n",
"from sklearn.metrics import confusion_matrix, classification_report\n",
"\n",
"cm = confusion_matrix(y, lr_base_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"\n",
"fig, ax = plt.subplots(1,1)\n",
"\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"\n",
"print(classification_report(y, lr_base_predicted_labels))"
]
},
{
"cell_type": "markdown",
"id": "acquired-tension",
"metadata": {},
"source": [
"## Feature engineering"
]
},
{
"cell_type": "markdown",
"id": "stuck-mainstream",
"metadata": {},
"source": [
"### Feature scaling"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "level-remedy",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>time</th>\n",
" <td>284807.0</td>\n",
" <td>9.481386e+04</td>\n",
" <td>47488.145955</td>\n",
" <td>0.000000</td>\n",
" <td>54201.500000</td>\n",
" <td>84692.000000</td>\n",
" <td>139320.500000</td>\n",
" <td>172792.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v1</th>\n",
" <td>284807.0</td>\n",
" <td>1.759061e-12</td>\n",
" <td>1.958696</td>\n",
" <td>-56.407510</td>\n",
" <td>-0.920373</td>\n",
" <td>0.018109</td>\n",
" <td>1.315642</td>\n",
" <td>2.454930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v2</th>\n",
" <td>284807.0</td>\n",
" <td>-8.251130e-13</td>\n",
" <td>1.651309</td>\n",
" <td>-72.715728</td>\n",
" <td>-0.598550</td>\n",
" <td>0.065486</td>\n",
" <td>0.803724</td>\n",
" <td>22.057729</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v3</th>\n",
" <td>284807.0</td>\n",
" <td>-9.654937e-13</td>\n",
" <td>1.516255</td>\n",
" <td>-48.325589</td>\n",
" <td>-0.890365</td>\n",
" <td>0.179846</td>\n",
" <td>1.027196</td>\n",
" <td>9.382558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v4</th>\n",
" <td>284807.0</td>\n",
" <td>8.321385e-13</td>\n",
" <td>1.415869</td>\n",
" <td>-5.683171</td>\n",
" <td>-0.848640</td>\n",
" <td>-0.019847</td>\n",
" <td>0.743341</td>\n",
" <td>16.875344</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v5</th>\n",
" <td>284807.0</td>\n",
" <td>1.649999e-13</td>\n",
" <td>1.380247</td>\n",
" <td>-113.743307</td>\n",
" <td>-0.691597</td>\n",
" <td>-0.054336</td>\n",
" <td>0.611926</td>\n",
" <td>34.801666</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v6</th>\n",
" <td>284807.0</td>\n",
" <td>4.248366e-13</td>\n",
" <td>1.332271</td>\n",
" <td>-26.160506</td>\n",
" <td>-0.768296</td>\n",
" <td>-0.274187</td>\n",
" <td>0.398565</td>\n",
" <td>73.301626</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v7</th>\n",
" <td>284807.0</td>\n",
" <td>-3.054600e-13</td>\n",
" <td>1.237094</td>\n",
" <td>-43.557242</td>\n",
" <td>-0.554076</td>\n",
" <td>0.040103</td>\n",
" <td>0.570436</td>\n",
" <td>120.589494</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v8</th>\n",
" <td>284807.0</td>\n",
" <td>8.777971e-14</td>\n",
" <td>1.194353</td>\n",
" <td>-73.216718</td>\n",
" <td>-0.208630</td>\n",
" <td>0.022358</td>\n",
" <td>0.327346</td>\n",
" <td>20.007208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v9</th>\n",
" <td>284807.0</td>\n",
" <td>-1.179749e-12</td>\n",
" <td>1.098632</td>\n",
" <td>-13.434066</td>\n",
" <td>-0.643098</td>\n",
" <td>-0.051429</td>\n",
" <td>0.597139</td>\n",
" <td>15.594995</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v10</th>\n",
" <td>284807.0</td>\n",
" <td>7.092545e-13</td>\n",
" <td>1.088850</td>\n",
" <td>-24.588262</td>\n",
" <td>-0.535426</td>\n",
" <td>-0.092917</td>\n",
" <td>0.453923</td>\n",
" <td>23.745136</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v11</th>\n",
" <td>284807.0</td>\n",
" <td>1.874948e-12</td>\n",
" <td>1.020713</td>\n",
" <td>-4.797473</td>\n",
" <td>-0.762494</td>\n",
" <td>-0.032757</td>\n",
" <td>0.739593</td>\n",
" <td>12.018913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v12</th>\n",
" <td>284807.0</td>\n",
" <td>1.053347e-12</td>\n",
" <td>0.999201</td>\n",
" <td>-18.683715</td>\n",
" <td>-0.405571</td>\n",
" <td>0.140033</td>\n",
" <td>0.618238</td>\n",
" <td>7.848392</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v13</th>\n",
" <td>284807.0</td>\n",
" <td>7.127611e-13</td>\n",
" <td>0.995274</td>\n",
" <td>-5.791881</td>\n",
" <td>-0.648539</td>\n",
" <td>-0.013568</td>\n",
" <td>0.662505</td>\n",
" <td>7.126883</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v14</th>\n",
" <td>284807.0</td>\n",
" <td>-1.474791e-13</td>\n",
" <td>0.958596</td>\n",
" <td>-19.214325</td>\n",
" <td>-0.425574</td>\n",
" <td>0.050601</td>\n",
" <td>0.493150</td>\n",
" <td>10.526766</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v15</th>\n",
" <td>284807.0</td>\n",
" <td>-5.231558e-13</td>\n",
" <td>0.915316</td>\n",
" <td>-4.498945</td>\n",
" <td>-0.582884</td>\n",
" <td>0.048072</td>\n",
" <td>0.648821</td>\n",
" <td>8.877742</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v16</th>\n",
" <td>284807.0</td>\n",
" <td>-2.282250e-13</td>\n",
" <td>0.876253</td>\n",
" <td>-14.129855</td>\n",
" <td>-0.468037</td>\n",
" <td>0.066413</td>\n",
" <td>0.523296</td>\n",
" <td>17.315112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v17</th>\n",
" <td>284807.0</td>\n",
" <td>-6.425436e-13</td>\n",
" <td>0.849337</td>\n",
" <td>-25.162799</td>\n",
" <td>-0.483748</td>\n",
" <td>-0.065676</td>\n",
" <td>0.399675</td>\n",
" <td>9.253526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v18</th>\n",
" <td>284807.0</td>\n",
" <td>4.950748e-13</td>\n",
" <td>0.838176</td>\n",
" <td>-9.498746</td>\n",
" <td>-0.498850</td>\n",
" <td>-0.003636</td>\n",
" <td>0.500807</td>\n",
" <td>5.041069</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v19</th>\n",
" <td>284807.0</td>\n",
" <td>7.057397e-13</td>\n",
" <td>0.814041</td>\n",
" <td>-7.213527</td>\n",
" <td>-0.456299</td>\n",
" <td>0.003735</td>\n",
" <td>0.458949</td>\n",
" <td>5.591971</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v20</th>\n",
" <td>284807.0</td>\n",
" <td>1.766111e-12</td>\n",
" <td>0.770925</td>\n",
" <td>-54.497720</td>\n",
" <td>-0.211721</td>\n",
" <td>-0.062481</td>\n",
" <td>0.133041</td>\n",
" <td>39.420904</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v21</th>\n",
" <td>284807.0</td>\n",
" <td>-3.405756e-13</td>\n",
" <td>0.734524</td>\n",
" <td>-34.830382</td>\n",
" <td>-0.228395</td>\n",
" <td>-0.029450</td>\n",
" <td>0.186377</td>\n",
" <td>27.202839</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v22</th>\n",
" <td>284807.0</td>\n",
" <td>-5.723197e-13</td>\n",
" <td>0.725702</td>\n",
" <td>-10.933144</td>\n",
" <td>-0.542350</td>\n",
" <td>0.006782</td>\n",
" <td>0.528554</td>\n",
" <td>10.503090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v23</th>\n",
" <td>284807.0</td>\n",
" <td>-9.725856e-13</td>\n",
" <td>0.624460</td>\n",
" <td>-44.807735</td>\n",
" <td>-0.161846</td>\n",
" <td>-0.011193</td>\n",
" <td>0.147642</td>\n",
" <td>22.528412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v24</th>\n",
" <td>284807.0</td>\n",
" <td>1.464150e-12</td>\n",
" <td>0.605647</td>\n",
" <td>-2.836627</td>\n",
" <td>-0.354586</td>\n",
" <td>0.040976</td>\n",
" <td>0.439527</td>\n",
" <td>4.584549</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v25</th>\n",
" <td>284807.0</td>\n",
" <td>-6.987102e-13</td>\n",
" <td>0.521278</td>\n",
" <td>-10.295397</td>\n",
" <td>-0.317145</td>\n",
" <td>0.016594</td>\n",
" <td>0.350716</td>\n",
" <td>7.519589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v26</th>\n",
" <td>284807.0</td>\n",
" <td>-5.617874e-13</td>\n",
" <td>0.482227</td>\n",
" <td>-2.604551</td>\n",
" <td>-0.326984</td>\n",
" <td>-0.052139</td>\n",
" <td>0.240952</td>\n",
" <td>3.517346</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v27</th>\n",
" <td>284807.0</td>\n",
" <td>3.332082e-12</td>\n",
" <td>0.403632</td>\n",
" <td>-22.565679</td>\n",
" <td>-0.070840</td>\n",
" <td>0.001342</td>\n",
" <td>0.091045</td>\n",
" <td>31.612198</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v28</th>\n",
" <td>284807.0</td>\n",
" <td>-3.518874e-12</td>\n",
" <td>0.330083</td>\n",
" <td>-15.430084</td>\n",
" <td>-0.052960</td>\n",
" <td>0.011244</td>\n",
" <td>0.078280</td>\n",
" <td>33.847808</td>\n",
" </tr>\n",
" <tr>\n",
" <th>amount</th>\n",
" <td>284807.0</td>\n",
" <td>8.834962e+01</td>\n",
" <td>250.120109</td>\n",
" <td>0.000000</td>\n",
" <td>5.600000</td>\n",
" <td>22.000000</td>\n",
" <td>77.165000</td>\n",
" <td>25691.160000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>class</th>\n",
" <td>284807.0</td>\n",
" <td>1.727486e-03</td>\n",
" <td>0.041527</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min 25% \\\n",
"time 284807.0 9.481386e+04 47488.145955 0.000000 54201.500000 \n",
"v1 284807.0 1.759061e-12 1.958696 -56.407510 -0.920373 \n",
"v2 284807.0 -8.251130e-13 1.651309 -72.715728 -0.598550 \n",
"v3 284807.0 -9.654937e-13 1.516255 -48.325589 -0.890365 \n",
"v4 284807.0 8.321385e-13 1.415869 -5.683171 -0.848640 \n",
"v5 284807.0 1.649999e-13 1.380247 -113.743307 -0.691597 \n",
"v6 284807.0 4.248366e-13 1.332271 -26.160506 -0.768296 \n",
"v7 284807.0 -3.054600e-13 1.237094 -43.557242 -0.554076 \n",
"v8 284807.0 8.777971e-14 1.194353 -73.216718 -0.208630 \n",
"v9 284807.0 -1.179749e-12 1.098632 -13.434066 -0.643098 \n",
"v10 284807.0 7.092545e-13 1.088850 -24.588262 -0.535426 \n",
"v11 284807.0 1.874948e-12 1.020713 -4.797473 -0.762494 \n",
"v12 284807.0 1.053347e-12 0.999201 -18.683715 -0.405571 \n",
"v13 284807.0 7.127611e-13 0.995274 -5.791881 -0.648539 \n",
"v14 284807.0 -1.474791e-13 0.958596 -19.214325 -0.425574 \n",
"v15 284807.0 -5.231558e-13 0.915316 -4.498945 -0.582884 \n",
"v16 284807.0 -2.282250e-13 0.876253 -14.129855 -0.468037 \n",
"v17 284807.0 -6.425436e-13 0.849337 -25.162799 -0.483748 \n",
"v18 284807.0 4.950748e-13 0.838176 -9.498746 -0.498850 \n",
"v19 284807.0 7.057397e-13 0.814041 -7.213527 -0.456299 \n",
"v20 284807.0 1.766111e-12 0.770925 -54.497720 -0.211721 \n",
"v21 284807.0 -3.405756e-13 0.734524 -34.830382 -0.228395 \n",
"v22 284807.0 -5.723197e-13 0.725702 -10.933144 -0.542350 \n",
"v23 284807.0 -9.725856e-13 0.624460 -44.807735 -0.161846 \n",
"v24 284807.0 1.464150e-12 0.605647 -2.836627 -0.354586 \n",
"v25 284807.0 -6.987102e-13 0.521278 -10.295397 -0.317145 \n",
"v26 284807.0 -5.617874e-13 0.482227 -2.604551 -0.326984 \n",
"v27 284807.0 3.332082e-12 0.403632 -22.565679 -0.070840 \n",
"v28 284807.0 -3.518874e-12 0.330083 -15.430084 -0.052960 \n",
"amount 284807.0 8.834962e+01 250.120109 0.000000 5.600000 \n",
"class 284807.0 1.727486e-03 0.041527 0.000000 0.000000 \n",
"\n",
" 50% 75% max \n",
"time 84692.000000 139320.500000 172792.000000 \n",
"v1 0.018109 1.315642 2.454930 \n",
"v2 0.065486 0.803724 22.057729 \n",
"v3 0.179846 1.027196 9.382558 \n",
"v4 -0.019847 0.743341 16.875344 \n",
"v5 -0.054336 0.611926 34.801666 \n",
"v6 -0.274187 0.398565 73.301626 \n",
"v7 0.040103 0.570436 120.589494 \n",
"v8 0.022358 0.327346 20.007208 \n",
"v9 -0.051429 0.597139 15.594995 \n",
"v10 -0.092917 0.453923 23.745136 \n",
"v11 -0.032757 0.739593 12.018913 \n",
"v12 0.140033 0.618238 7.848392 \n",
"v13 -0.013568 0.662505 7.126883 \n",
"v14 0.050601 0.493150 10.526766 \n",
"v15 0.048072 0.648821 8.877742 \n",
"v16 0.066413 0.523296 17.315112 \n",
"v17 -0.065676 0.399675 9.253526 \n",
"v18 -0.003636 0.500807 5.041069 \n",
"v19 0.003735 0.458949 5.591971 \n",
"v20 -0.062481 0.133041 39.420904 \n",
"v21 -0.029450 0.186377 27.202839 \n",
"v22 0.006782 0.528554 10.503090 \n",
"v23 -0.011193 0.147642 22.528412 \n",
"v24 0.040976 0.439527 4.584549 \n",
"v25 0.016594 0.350716 7.519589 \n",
"v26 -0.052139 0.240952 3.517346 \n",
"v27 0.001342 0.091045 31.612198 \n",
"v28 0.011244 0.078280 33.847808 \n",
"amount 22.000000 77.165000 25691.160000 \n",
"class 0.000000 0.000000 1.000000 "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train.describe().T"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "extra-sweden",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>284807.0</td>\n",
" <td>-3.065637e-16</td>\n",
" <td>1.000002</td>\n",
" <td>-1.996583</td>\n",
" <td>-0.855212</td>\n",
" <td>-0.213145</td>\n",
" <td>0.937217</td>\n",
" <td>1.642058</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>284807.0</td>\n",
" <td>2.594615e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-28.798555</td>\n",
" <td>-0.469892</td>\n",
" <td>0.009245</td>\n",
" <td>0.671694</td>\n",
" <td>1.253351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>284807.0</td>\n",
" <td>-3.991715e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-44.035292</td>\n",
" <td>-0.362471</td>\n",
" <td>0.039657</td>\n",
" <td>0.486720</td>\n",
" <td>13.357750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>284807.0</td>\n",
" <td>-7.025418e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-31.871733</td>\n",
" <td>-0.587214</td>\n",
" <td>0.118612</td>\n",
" <td>0.677457</td>\n",
" <td>6.187993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>284807.0</td>\n",
" <td>-3.991715e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-4.013919</td>\n",
" <td>-0.599379</td>\n",
" <td>-0.014017</td>\n",
" <td>0.525008</td>\n",
" <td>11.918743</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>284807.0</td>\n",
" <td>3.033703e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-82.408097</td>\n",
" <td>-0.501069</td>\n",
" <td>-0.039367</td>\n",
" <td>0.443346</td>\n",
" <td>25.214135</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>284807.0</td>\n",
" <td>1.197515e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-19.636058</td>\n",
" <td>-0.576682</td>\n",
" <td>-0.205805</td>\n",
" <td>0.299163</td>\n",
" <td>55.020149</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>284807.0</td>\n",
" <td>5.189230e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-35.209396</td>\n",
" <td>-0.447886</td>\n",
" <td>0.032417</td>\n",
" <td>0.461111</td>\n",
" <td>97.478239</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>284807.0</td>\n",
" <td>-4.291094e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-61.302524</td>\n",
" <td>-0.174680</td>\n",
" <td>0.018720</td>\n",
" <td>0.274078</td>\n",
" <td>16.751534</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>284807.0</td>\n",
" <td>-2.020806e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-12.228015</td>\n",
" <td>-0.585363</td>\n",
" <td>-0.046812</td>\n",
" <td>0.543531</td>\n",
" <td>14.194945</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>284807.0</td>\n",
" <td>1.886085e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-22.581908</td>\n",
" <td>-0.491736</td>\n",
" <td>-0.085336</td>\n",
" <td>0.416884</td>\n",
" <td>21.807579</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>284807.0</td>\n",
" <td>3.512709e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-4.700128</td>\n",
" <td>-0.747022</td>\n",
" <td>-0.032093</td>\n",
" <td>0.724586</td>\n",
" <td>11.775038</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>284807.0</td>\n",
" <td>-2.075692e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-18.698680</td>\n",
" <td>-0.405896</td>\n",
" <td>0.140145</td>\n",
" <td>0.618733</td>\n",
" <td>7.854679</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>284807.0</td>\n",
" <td>-2.065713e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-5.819392</td>\n",
" <td>-0.651620</td>\n",
" <td>-0.013633</td>\n",
" <td>0.665652</td>\n",
" <td>7.160735</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>284807.0</td>\n",
" <td>9.580116e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-20.044280</td>\n",
" <td>-0.443957</td>\n",
" <td>0.052787</td>\n",
" <td>0.514451</td>\n",
" <td>10.981465</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>284807.0</td>\n",
" <td>-3.033703e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-4.915191</td>\n",
" <td>-0.636813</td>\n",
" <td>0.052519</td>\n",
" <td>0.708850</td>\n",
" <td>9.699117</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>284807.0</td>\n",
" <td>1.806251e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-16.125344</td>\n",
" <td>-0.534135</td>\n",
" <td>0.075793</td>\n",
" <td>0.597199</td>\n",
" <td>19.760439</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>284807.0</td>\n",
" <td>4.331011e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-29.626452</td>\n",
" <td>-0.569561</td>\n",
" <td>-0.077326</td>\n",
" <td>0.470574</td>\n",
" <td>10.895018</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>284807.0</td>\n",
" <td>-1.676520e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-11.332656</td>\n",
" <td>-0.595162</td>\n",
" <td>-0.004338</td>\n",
" <td>0.597497</td>\n",
" <td>6.014342</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>284807.0</td>\n",
" <td>1.771324e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-8.861402</td>\n",
" <td>-0.560537</td>\n",
" <td>0.004588</td>\n",
" <td>0.563793</td>\n",
" <td>6.869414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>284807.0</td>\n",
" <td>-1.896065e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-70.691461</td>\n",
" <td>-0.274633</td>\n",
" <td>-0.081047</td>\n",
" <td>0.172573</td>\n",
" <td>51.134640</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>284807.0</td>\n",
" <td>8.482394e-19</td>\n",
" <td>1.000002</td>\n",
" <td>-47.419067</td>\n",
" <td>-0.310943</td>\n",
" <td>-0.040094</td>\n",
" <td>0.253739</td>\n",
" <td>37.034714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>284807.0</td>\n",
" <td>1.995858e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-15.065646</td>\n",
" <td>-0.747348</td>\n",
" <td>0.009345</td>\n",
" <td>0.728336</td>\n",
" <td>14.473041</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>284807.0</td>\n",
" <td>-3.592544e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-71.754464</td>\n",
" <td>-0.259178</td>\n",
" <td>-0.017924</td>\n",
" <td>0.236432</td>\n",
" <td>36.076675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>284807.0</td>\n",
" <td>-1.456976e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-4.683638</td>\n",
" <td>-0.585468</td>\n",
" <td>0.067657</td>\n",
" <td>0.725715</td>\n",
" <td>7.569684</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>284807.0</td>\n",
" <td>-1.357183e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-19.750332</td>\n",
" <td>-0.608400</td>\n",
" <td>0.031832</td>\n",
" <td>0.672801</td>\n",
" <td>14.425318</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>284807.0</td>\n",
" <td>-9.692383e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-5.401098</td>\n",
" <td>-0.678072</td>\n",
" <td>-0.108122</td>\n",
" <td>0.499666</td>\n",
" <td>7.293975</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>284807.0</td>\n",
" <td>2.544718e-18</td>\n",
" <td>1.000002</td>\n",
" <td>-55.906596</td>\n",
" <td>-0.175505</td>\n",
" <td>0.003325</td>\n",
" <td>0.225565</td>\n",
" <td>78.319397</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>284807.0</td>\n",
" <td>9.979288e-19</td>\n",
" <td>1.000002</td>\n",
" <td>-46.746117</td>\n",
" <td>-0.160444</td>\n",
" <td>0.034064</td>\n",
" <td>0.237153</td>\n",
" <td>102.543421</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>284807.0</td>\n",
" <td>2.913952e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-0.353229</td>\n",
" <td>-0.330840</td>\n",
" <td>-0.265271</td>\n",
" <td>-0.044717</td>\n",
" <td>102.362243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>284807.0</td>\n",
" <td>-1.197515e-17</td>\n",
" <td>1.000002</td>\n",
" <td>-0.041599</td>\n",
" <td>-0.041599</td>\n",
" <td>-0.041599</td>\n",
" <td>-0.041599</td>\n",
" <td>24.039052</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min 25% 50% 75% \\\n",
"0 284807.0 -3.065637e-16 1.000002 -1.996583 -0.855212 -0.213145 0.937217 \n",
"1 284807.0 2.594615e-18 1.000002 -28.798555 -0.469892 0.009245 0.671694 \n",
"2 284807.0 -3.991715e-18 1.000002 -44.035292 -0.362471 0.039657 0.486720 \n",
"3 284807.0 -7.025418e-17 1.000002 -31.871733 -0.587214 0.118612 0.677457 \n",
"4 284807.0 -3.991715e-18 1.000002 -4.013919 -0.599379 -0.014017 0.525008 \n",
"5 284807.0 3.033703e-17 1.000002 -82.408097 -0.501069 -0.039367 0.443346 \n",
"6 284807.0 1.197515e-18 1.000002 -19.636058 -0.576682 -0.205805 0.299163 \n",
"7 284807.0 5.189230e-18 1.000002 -35.209396 -0.447886 0.032417 0.461111 \n",
"8 284807.0 -4.291094e-18 1.000002 -61.302524 -0.174680 0.018720 0.274078 \n",
"9 284807.0 -2.020806e-18 1.000002 -12.228015 -0.585363 -0.046812 0.543531 \n",
"10 284807.0 1.886085e-17 1.000002 -22.581908 -0.491736 -0.085336 0.416884 \n",
"11 284807.0 3.512709e-17 1.000002 -4.700128 -0.747022 -0.032093 0.724586 \n",
"12 284807.0 -2.075692e-17 1.000002 -18.698680 -0.405896 0.140145 0.618733 \n",
"13 284807.0 -2.065713e-17 1.000002 -5.819392 -0.651620 -0.013633 0.665652 \n",
"14 284807.0 9.580116e-18 1.000002 -20.044280 -0.443957 0.052787 0.514451 \n",
"15 284807.0 -3.033703e-17 1.000002 -4.915191 -0.636813 0.052519 0.708850 \n",
"16 284807.0 1.806251e-17 1.000002 -16.125344 -0.534135 0.075793 0.597199 \n",
"17 284807.0 4.331011e-17 1.000002 -29.626452 -0.569561 -0.077326 0.470574 \n",
"18 284807.0 -1.676520e-17 1.000002 -11.332656 -0.595162 -0.004338 0.597497 \n",
"19 284807.0 1.771324e-17 1.000002 -8.861402 -0.560537 0.004588 0.563793 \n",
"20 284807.0 -1.896065e-18 1.000002 -70.691461 -0.274633 -0.081047 0.172573 \n",
"21 284807.0 8.482394e-19 1.000002 -47.419067 -0.310943 -0.040094 0.253739 \n",
"22 284807.0 1.995858e-18 1.000002 -15.065646 -0.747348 0.009345 0.728336 \n",
"23 284807.0 -3.592544e-18 1.000002 -71.754464 -0.259178 -0.017924 0.236432 \n",
"24 284807.0 -1.456976e-17 1.000002 -4.683638 -0.585468 0.067657 0.725715 \n",
"25 284807.0 -1.357183e-17 1.000002 -19.750332 -0.608400 0.031832 0.672801 \n",
"26 284807.0 -9.692383e-18 1.000002 -5.401098 -0.678072 -0.108122 0.499666 \n",
"27 284807.0 2.544718e-18 1.000002 -55.906596 -0.175505 0.003325 0.225565 \n",
"28 284807.0 9.979288e-19 1.000002 -46.746117 -0.160444 0.034064 0.237153 \n",
"29 284807.0 2.913952e-17 1.000002 -0.353229 -0.330840 -0.265271 -0.044717 \n",
"30 284807.0 -1.197515e-17 1.000002 -0.041599 -0.041599 -0.041599 -0.041599 \n",
"\n",
" max \n",
"0 1.642058 \n",
"1 1.253351 \n",
"2 13.357750 \n",
"3 6.187993 \n",
"4 11.918743 \n",
"5 25.214135 \n",
"6 55.020149 \n",
"7 97.478239 \n",
"8 16.751534 \n",
"9 14.194945 \n",
"10 21.807579 \n",
"11 11.775038 \n",
"12 7.854679 \n",
"13 7.160735 \n",
"14 10.981465 \n",
"15 9.699117 \n",
"16 19.760439 \n",
"17 10.895018 \n",
"18 6.014342 \n",
"19 6.869414 \n",
"20 51.134640 \n",
"21 37.034714 \n",
"22 14.473041 \n",
"23 36.076675 \n",
"24 7.569684 \n",
"25 14.425318 \n",
"26 7.293975 \n",
"27 78.319397 \n",
"28 102.543421 \n",
"29 102.362243 \n",
"30 24.039052 "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
"\n",
"pd.DataFrame( # mean of 0 and std of 1 but ranges are different (see min and max)\n",
" StandardScaler().fit_transform(df_train),\n",
").describe().T"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "russian-intranet",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>284807.0</td>\n",
" <td>0.548717</td>\n",
" <td>0.274828</td>\n",
" <td>0.0</td>\n",
" <td>0.313681</td>\n",
" <td>0.490138</td>\n",
" <td>0.806290</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>284807.0</td>\n",
" <td>0.958294</td>\n",
" <td>0.033276</td>\n",
" <td>0.0</td>\n",
" <td>0.942658</td>\n",
" <td>0.958601</td>\n",
" <td>0.980645</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>284807.0</td>\n",
" <td>0.767258</td>\n",
" <td>0.017424</td>\n",
" <td>0.0</td>\n",
" <td>0.760943</td>\n",
" <td>0.767949</td>\n",
" <td>0.775739</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>284807.0</td>\n",
" <td>0.837414</td>\n",
" <td>0.026275</td>\n",
" <td>0.0</td>\n",
" <td>0.821985</td>\n",
" <td>0.840530</td>\n",
" <td>0.855213</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>284807.0</td>\n",
" <td>0.251930</td>\n",
" <td>0.062764</td>\n",
" <td>0.0</td>\n",
" <td>0.214311</td>\n",
" <td>0.251050</td>\n",
" <td>0.284882</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>284807.0</td>\n",
" <td>0.765716</td>\n",
" <td>0.009292</td>\n",
" <td>0.0</td>\n",
" <td>0.761060</td>\n",
" <td>0.765351</td>\n",
" <td>0.769836</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>284807.0</td>\n",
" <td>0.263020</td>\n",
" <td>0.013395</td>\n",
" <td>0.0</td>\n",
" <td>0.255295</td>\n",
" <td>0.260263</td>\n",
" <td>0.267027</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>284807.0</td>\n",
" <td>0.265356</td>\n",
" <td>0.007537</td>\n",
" <td>0.0</td>\n",
" <td>0.261980</td>\n",
" <td>0.265600</td>\n",
" <td>0.268831</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>284807.0</td>\n",
" <td>0.785385</td>\n",
" <td>0.012812</td>\n",
" <td>0.0</td>\n",
" <td>0.783148</td>\n",
" <td>0.785625</td>\n",
" <td>0.788897</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>284807.0</td>\n",
" <td>0.462780</td>\n",
" <td>0.037846</td>\n",
" <td>0.0</td>\n",
" <td>0.440626</td>\n",
" <td>0.461008</td>\n",
" <td>0.483350</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>284807.0</td>\n",
" <td>0.508722</td>\n",
" <td>0.022528</td>\n",
" <td>0.0</td>\n",
" <td>0.497644</td>\n",
" <td>0.506800</td>\n",
" <td>0.518113</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>284807.0</td>\n",
" <td>0.285286</td>\n",
" <td>0.060698</td>\n",
" <td>0.0</td>\n",
" <td>0.239943</td>\n",
" <td>0.283338</td>\n",
" <td>0.329266</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>284807.0</td>\n",
" <td>0.704193</td>\n",
" <td>0.037660</td>\n",
" <td>0.0</td>\n",
" <td>0.688907</td>\n",
" <td>0.709471</td>\n",
" <td>0.727494</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>284807.0</td>\n",
" <td>0.448331</td>\n",
" <td>0.077041</td>\n",
" <td>0.0</td>\n",
" <td>0.398130</td>\n",
" <td>0.447281</td>\n",
" <td>0.499613</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>284807.0</td>\n",
" <td>0.646053</td>\n",
" <td>0.032231</td>\n",
" <td>0.0</td>\n",
" <td>0.631744</td>\n",
" <td>0.647755</td>\n",
" <td>0.662635</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>284807.0</td>\n",
" <td>0.336327</td>\n",
" <td>0.068426</td>\n",
" <td>0.0</td>\n",
" <td>0.292753</td>\n",
" <td>0.339921</td>\n",
" <td>0.384831</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>284807.0</td>\n",
" <td>0.449352</td>\n",
" <td>0.027866</td>\n",
" <td>0.0</td>\n",
" <td>0.434468</td>\n",
" <td>0.451464</td>\n",
" <td>0.465994</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>284807.0</td>\n",
" <td>0.731130</td>\n",
" <td>0.024678</td>\n",
" <td>0.0</td>\n",
" <td>0.717074</td>\n",
" <td>0.729221</td>\n",
" <td>0.742743</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>284807.0</td>\n",
" <td>0.653292</td>\n",
" <td>0.057647</td>\n",
" <td>0.0</td>\n",
" <td>0.618983</td>\n",
" <td>0.653042</td>\n",
" <td>0.687736</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>284807.0</td>\n",
" <td>0.563315</td>\n",
" <td>0.063570</td>\n",
" <td>0.0</td>\n",
" <td>0.527682</td>\n",
" <td>0.563606</td>\n",
" <td>0.599155</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>284807.0</td>\n",
" <td>0.580265</td>\n",
" <td>0.008208</td>\n",
" <td>0.0</td>\n",
" <td>0.578011</td>\n",
" <td>0.579600</td>\n",
" <td>0.581682</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>284807.0</td>\n",
" <td>0.561480</td>\n",
" <td>0.011841</td>\n",
" <td>0.0</td>\n",
" <td>0.557798</td>\n",
" <td>0.561005</td>\n",
" <td>0.564484</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>284807.0</td>\n",
" <td>0.510031</td>\n",
" <td>0.033854</td>\n",
" <td>0.0</td>\n",
" <td>0.484730</td>\n",
" <td>0.510347</td>\n",
" <td>0.534688</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>284807.0</td>\n",
" <td>0.665434</td>\n",
" <td>0.009274</td>\n",
" <td>0.0</td>\n",
" <td>0.663030</td>\n",
" <td>0.665267</td>\n",
" <td>0.667626</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>284807.0</td>\n",
" <td>0.382234</td>\n",
" <td>0.081611</td>\n",
" <td>0.0</td>\n",
" <td>0.334454</td>\n",
" <td>0.387756</td>\n",
" <td>0.441460</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>284807.0</td>\n",
" <td>0.577907</td>\n",
" <td>0.029261</td>\n",
" <td>0.0</td>\n",
" <td>0.560104</td>\n",
" <td>0.578838</td>\n",
" <td>0.597593</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>284807.0</td>\n",
" <td>0.425448</td>\n",
" <td>0.078771</td>\n",
" <td>0.0</td>\n",
" <td>0.372036</td>\n",
" <td>0.416932</td>\n",
" <td>0.464807</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>284807.0</td>\n",
" <td>0.416511</td>\n",
" <td>0.007450</td>\n",
" <td>0.0</td>\n",
" <td>0.415203</td>\n",
" <td>0.416536</td>\n",
" <td>0.418191</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>284807.0</td>\n",
" <td>0.313124</td>\n",
" <td>0.006698</td>\n",
" <td>0.0</td>\n",
" <td>0.312049</td>\n",
" <td>0.313352</td>\n",
" <td>0.314712</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>284807.0</td>\n",
" <td>0.003439</td>\n",
" <td>0.009736</td>\n",
" <td>0.0</td>\n",
" <td>0.000218</td>\n",
" <td>0.000856</td>\n",
" <td>0.003004</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>284807.0</td>\n",
" <td>0.001727</td>\n",
" <td>0.041527</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min 25% 50% 75% max\n",
"0 284807.0 0.548717 0.274828 0.0 0.313681 0.490138 0.806290 1.0\n",
"1 284807.0 0.958294 0.033276 0.0 0.942658 0.958601 0.980645 1.0\n",
"2 284807.0 0.767258 0.017424 0.0 0.760943 0.767949 0.775739 1.0\n",
"3 284807.0 0.837414 0.026275 0.0 0.821985 0.840530 0.855213 1.0\n",
"4 284807.0 0.251930 0.062764 0.0 0.214311 0.251050 0.284882 1.0\n",
"5 284807.0 0.765716 0.009292 0.0 0.761060 0.765351 0.769836 1.0\n",
"6 284807.0 0.263020 0.013395 0.0 0.255295 0.260263 0.267027 1.0\n",
"7 284807.0 0.265356 0.007537 0.0 0.261980 0.265600 0.268831 1.0\n",
"8 284807.0 0.785385 0.012812 0.0 0.783148 0.785625 0.788897 1.0\n",
"9 284807.0 0.462780 0.037846 0.0 0.440626 0.461008 0.483350 1.0\n",
"10 284807.0 0.508722 0.022528 0.0 0.497644 0.506800 0.518113 1.0\n",
"11 284807.0 0.285286 0.060698 0.0 0.239943 0.283338 0.329266 1.0\n",
"12 284807.0 0.704193 0.037660 0.0 0.688907 0.709471 0.727494 1.0\n",
"13 284807.0 0.448331 0.077041 0.0 0.398130 0.447281 0.499613 1.0\n",
"14 284807.0 0.646053 0.032231 0.0 0.631744 0.647755 0.662635 1.0\n",
"15 284807.0 0.336327 0.068426 0.0 0.292753 0.339921 0.384831 1.0\n",
"16 284807.0 0.449352 0.027866 0.0 0.434468 0.451464 0.465994 1.0\n",
"17 284807.0 0.731130 0.024678 0.0 0.717074 0.729221 0.742743 1.0\n",
"18 284807.0 0.653292 0.057647 0.0 0.618983 0.653042 0.687736 1.0\n",
"19 284807.0 0.563315 0.063570 0.0 0.527682 0.563606 0.599155 1.0\n",
"20 284807.0 0.580265 0.008208 0.0 0.578011 0.579600 0.581682 1.0\n",
"21 284807.0 0.561480 0.011841 0.0 0.557798 0.561005 0.564484 1.0\n",
"22 284807.0 0.510031 0.033854 0.0 0.484730 0.510347 0.534688 1.0\n",
"23 284807.0 0.665434 0.009274 0.0 0.663030 0.665267 0.667626 1.0\n",
"24 284807.0 0.382234 0.081611 0.0 0.334454 0.387756 0.441460 1.0\n",
"25 284807.0 0.577907 0.029261 0.0 0.560104 0.578838 0.597593 1.0\n",
"26 284807.0 0.425448 0.078771 0.0 0.372036 0.416932 0.464807 1.0\n",
"27 284807.0 0.416511 0.007450 0.0 0.415203 0.416536 0.418191 1.0\n",
"28 284807.0 0.313124 0.006698 0.0 0.312049 0.313352 0.314712 1.0\n",
"29 284807.0 0.003439 0.009736 0.0 0.000218 0.000856 0.003004 1.0\n",
"30 284807.0 0.001727 0.041527 0.0 0.000000 0.000000 0.000000 1.0"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame( # mean and std are different but min and max are 0s and 1s\n",
" MinMaxScaler().fit_transform(df_train),\n",
").describe().T"
]
},
{
"cell_type": "markdown",
"id": "macro-pavilion",
"metadata": {},
"source": [
"**Update the model with the other features**"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "liked-banks",
"metadata": {},
"outputs": [],
"source": [
"# save dataset\n",
"df_train.to_csv('preprocessed_data.csv', index=None)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "incorrect-paragraph",
"metadata": {},
"outputs": [],
"source": [
"new_data = pd.read_csv('./preprocessed_data.csv')"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "continuous-pharmacology",
"metadata": {},
"outputs": [],
"source": [
"def update_model(X):\n",
" target = X[\"class\"]\n",
" X = X.drop([\"class\"], axis=1)\n",
" return X, target"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "compressed-ensemble",
"metadata": {},
"outputs": [],
"source": [
"X, y = update_model(new_data.copy())"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "immediate-conservation",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-2 {color: black;background-color: white;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LogisticRegression(max_iter=1000)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div>"
],
"text/plain": [
"LogisticRegression(max_iter=1000)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr_model.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "accredited-swedish",
"metadata": {},
"outputs": [],
"source": [
"# Make prediction\n",
"\n",
"lr_base_updated_predicted_labels = lr_model.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "limiting-blood",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 284315\n",
" 1 0.85 0.55 0.67 492\n",
"\n",
" accuracy 1.00 284807\n",
" macro avg 0.92 0.78 0.84 284807\n",
"weighted avg 1.00 1.00 1.00 284807\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix\n",
"\n",
"from sklearn.metrics import confusion_matrix, classification_report\n",
"\n",
"cm = confusion_matrix(y, lr_base_updated_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y, lr_base_predicted_labels))"
]
},
{
"cell_type": "markdown",
"id": "stopped-islam",
"metadata": {},
"source": [
"**Weight of variables**"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "seventh-frost",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['time', 'v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7', 'v8', 'v9',\n",
" 'v10', 'v11', 'v12', 'v13', 'v14', 'v15', 'v16', 'v17', 'v18',\n",
" 'v19', 'v20', 'v21', 'v22', 'v23', 'v24', 'v25', 'v26', 'v27',\n",
" 'v28', 'amount'], dtype=object)"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"feature_names = lr_model.feature_names_in_\n",
"feature_names"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "every-conversation",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Coefficients</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>v3</th>\n",
" <td>-1.167256</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v15</th>\n",
" <td>-0.897130</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v25</th>\n",
" <td>-0.833816</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v14</th>\n",
" <td>-0.793581</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v11</th>\n",
" <td>-0.606635</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v9</th>\n",
" <td>-0.602953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v17</th>\n",
" <td>-0.515965</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v13</th>\n",
" <td>-0.505511</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v8</th>\n",
" <td>-0.476484</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v2</th>\n",
" <td>-0.449773</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v10</th>\n",
" <td>-0.413056</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v16</th>\n",
" <td>-0.399403</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v27</th>\n",
" <td>-0.122023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v5</th>\n",
" <td>-0.121882</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v6</th>\n",
" <td>-0.077509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v24</th>\n",
" <td>-0.013119</td>\n",
" </tr>\n",
" <tr>\n",
" <th>amount</th>\n",
" <td>-0.006276</td>\n",
" </tr>\n",
" <tr>\n",
" <th>time</th>\n",
" <td>-0.000058</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v19</th>\n",
" <td>0.002891</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v23</th>\n",
" <td>0.049393</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v28</th>\n",
" <td>0.067823</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v18</th>\n",
" <td>0.111300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v26</th>\n",
" <td>0.135808</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v12</th>\n",
" <td>0.144170</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v4</th>\n",
" <td>0.183679</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v20</th>\n",
" <td>0.201338</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v1</th>\n",
" <td>0.411559</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v21</th>\n",
" <td>0.615650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v22</th>\n",
" <td>0.660915</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v7</th>\n",
" <td>0.933920</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Coefficients\n",
"v3 -1.167256\n",
"v15 -0.897130\n",
"v25 -0.833816\n",
"v14 -0.793581\n",
"v11 -0.606635\n",
"v9 -0.602953\n",
"v17 -0.515965\n",
"v13 -0.505511\n",
"v8 -0.476484\n",
"v2 -0.449773\n",
"v10 -0.413056\n",
"v16 -0.399403\n",
"v27 -0.122023\n",
"v5 -0.121882\n",
"v6 -0.077509\n",
"v24 -0.013119\n",
"amount -0.006276\n",
"time -0.000058\n",
"v19 0.002891\n",
"v23 0.049393\n",
"v28 0.067823\n",
"v18 0.111300\n",
"v26 0.135808\n",
"v12 0.144170\n",
"v4 0.183679\n",
"v20 0.201338\n",
"v1 0.411559\n",
"v21 0.615650\n",
"v22 0.660915\n",
"v7 0.933920"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coefficients = pd.DataFrame(\n",
" lr_model.coef_[0],\n",
" columns=[\"Coefficients\"],\n",
" index=feature_names,\n",
")\n",
"\n",
"coefficients.sort_values(by=['Coefficients'])"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "accredited-bride",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"axes = coefficients['Coefficients'].plot.bar()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "alternative-statistics",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-3.295466040646744"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# bias\n",
"lr_model.intercept_[0]"
]
},
{
"cell_type": "markdown",
"id": "disciplinary-treaty",
"metadata": {},
"source": [
"## 5. Training other models\n"
]
},
{
"cell_type": "markdown",
"id": "likely-aquarium",
"metadata": {},
"source": [
"**Algorithms to use**\n",
"- Decision tree\n",
"- Naive Bayes\n",
"- Support vector machine (SVM)\n",
"- Random Forest\n",
"- Gradient Boosting"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "junior-honolulu",
"metadata": {},
"outputs": [],
"source": [
"# split the dataset\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" X, y, test_size=0.4, random_state=1) # 60/40\n",
"\n",
"X_train, X_val, y_train, y_val = train_test_split(\n",
" X_train, y_train, test_size=0.5, random_state=1) # 20/20"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "protected-sleep",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-3\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LogisticRegression(max_iter=1000)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" checked><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div>"
],
"text/plain": [
"LogisticRegression(max_iter=1000)"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fit again Logistic regression with the split data\n",
"lr_model.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "special-packet",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-4 {color: black;background-color: white;}#sk-container-id-4 pre{padding: 0;}#sk-container-id-4 div.sk-toggleable {background-color: white;}#sk-container-id-4 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-4 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-4 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-4 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-4 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-4 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-4 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-4 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-4 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-4 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-4 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-4 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-4 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-4 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-4 div.sk-item {position: relative;z-index: 1;}#sk-container-id-4 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-4 div.sk-item::before, #sk-container-id-4 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-4 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-4 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-4 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-4 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-4 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-4 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-4 div.sk-label-container {text-align: center;}#sk-container-id-4 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-4 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-4\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>DecisionTreeClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" checked><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">DecisionTreeClassifier</label><div class=\"sk-toggleable__content\"><pre>DecisionTreeClassifier()</pre></div></div></div></div></div>"
],
"text/plain": [
"DecisionTreeClassifier()"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.tree import DecisionTreeClassifier\n",
"\n",
"dt_model = DecisionTreeClassifier()\n",
"dt_model.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "internal-duncan",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-5 {color: black;background-color: white;}#sk-container-id-5 pre{padding: 0;}#sk-container-id-5 div.sk-toggleable {background-color: white;}#sk-container-id-5 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-5 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-5 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-5 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-5 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-5 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-5 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-5 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-5 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-5 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-5 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-5 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-5 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-5 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-5 div.sk-item {position: relative;z-index: 1;}#sk-container-id-5 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-5 div.sk-item::before, #sk-container-id-5 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-5 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-5 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-5 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-5 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-5 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-5 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-5 div.sk-label-container {text-align: center;}#sk-container-id-5 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-5 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-5\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GaussianNB()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-5\" type=\"checkbox\" checked><label for=\"sk-estimator-id-5\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">GaussianNB</label><div class=\"sk-toggleable__content\"><pre>GaussianNB()</pre></div></div></div></div></div>"
],
"text/plain": [
"GaussianNB()"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.naive_bayes import GaussianNB\n",
"\n",
"nb_model = GaussianNB()\n",
"nb_model.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "amended-livestock",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-6 {color: black;background-color: white;}#sk-container-id-6 pre{padding: 0;}#sk-container-id-6 div.sk-toggleable {background-color: white;}#sk-container-id-6 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-6 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-6 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-6 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-6 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-6 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-6 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-6 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-6 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-6 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-6 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-6 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-6 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-6 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-6 div.sk-item {position: relative;z-index: 1;}#sk-container-id-6 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-6 div.sk-item::before, #sk-container-id-6 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-6 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-6 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-6 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-6 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-6 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-6 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-6 div.sk-label-container {text-align: center;}#sk-container-id-6 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-6 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-6\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>SVC()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-6\" type=\"checkbox\" checked><label for=\"sk-estimator-id-6\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">SVC</label><div class=\"sk-toggleable__content\"><pre>SVC()</pre></div></div></div></div></div>"
],
"text/plain": [
"SVC()"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.svm import SVC\n",
"\n",
"svm_model = SVC()\n",
"svm_model.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "floral-functionality",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-7 {color: black;background-color: white;}#sk-container-id-7 pre{padding: 0;}#sk-container-id-7 div.sk-toggleable {background-color: white;}#sk-container-id-7 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-7 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-7 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-7 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-7 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-7 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-7 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-7 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-7 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-7 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-7 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-7 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-7 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-7 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-7 div.sk-item {position: relative;z-index: 1;}#sk-container-id-7 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-7 div.sk-item::before, #sk-container-id-7 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-7 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-7 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-7 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-7 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-7 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-7 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-7 div.sk-label-container {text-align: center;}#sk-container-id-7 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-7 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-7\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>RandomForestClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-7\" type=\"checkbox\" checked><label for=\"sk-estimator-id-7\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomForestClassifier</label><div class=\"sk-toggleable__content\"><pre>RandomForestClassifier()</pre></div></div></div></div></div>"
],
"text/plain": [
"RandomForestClassifier()"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.ensemble import RandomForestClassifier\n",
"\n",
"rf_model = RandomForestClassifier()\n",
"rf_model.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "olive-milan",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-8 {color: black;background-color: white;}#sk-container-id-8 pre{padding: 0;}#sk-container-id-8 div.sk-toggleable {background-color: white;}#sk-container-id-8 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-8 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-8 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-8 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-8 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-8 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-8 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-8 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-8 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-8 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-8 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-8 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-8 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-8 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-8 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-8 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-8 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-8 div.sk-item {position: relative;z-index: 1;}#sk-container-id-8 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-8 div.sk-item::before, #sk-container-id-8 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-8 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-8 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-8 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-8 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-8 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-8 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-8 div.sk-label-container {text-align: center;}#sk-container-id-8 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-8 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-8\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GradientBoostingClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-8\" type=\"checkbox\" checked><label for=\"sk-estimator-id-8\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">GradientBoostingClassifier</label><div class=\"sk-toggleable__content\"><pre>GradientBoostingClassifier()</pre></div></div></div></div></div>"
],
"text/plain": [
"GradientBoostingClassifier()"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.ensemble import GradientBoostingClassifier\n",
"\n",
"gb_model = GradientBoostingClassifier()\n",
"gb_model.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"id": "technical-signature",
"metadata": {},
"source": [
"## 6. Evaluating models"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "celtic-burner",
"metadata": {},
"outputs": [],
"source": [
"# Make prediction on validation data\n",
"\n",
"lr_predicted_labels = lr_model.predict(X_val)\n",
"dt_predicted_labels = dt_model.predict(X_val)\n",
"nb_predicted_labels = dt_model.predict(X_val)\n",
"svm_predicted_labels = svm_model.predict(X_val)\n",
"rf_predicted_labels = rf_model.predict(X_val)\n",
"gb_predicted_labels = gb_model.predict(X_val)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "together-scratch",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.84 0.70 0.77 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.92 0.85 0.88 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix for logistic reg\n",
"\n",
"cm = confusion_matrix(y_val, lr_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, lr_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "sudden-pierce",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.77 0.76 0.76 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.88 0.88 0.88 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix for decision tree\n",
"\n",
"cm = confusion_matrix(y_val, dt_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, dt_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "acceptable-chain",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.77 0.76 0.76 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.88 0.88 0.88 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix for Naive Bayes\n",
"\n",
"cm = confusion_matrix(y_val, nb_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, nb_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "common-satin",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.00 0.00 0.00 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.50 0.50 0.50 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1334: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
" _warn_prf(average, modifier, msg_start, len(result))\n",
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1334: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
" _warn_prf(average, modifier, msg_start, len(result))\n",
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1334: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
" _warn_prf(average, modifier, msg_start, len(result))\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix for SVM\n",
"\n",
"cm = confusion_matrix(y_val, svm_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, svm_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "offensive-uganda",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.87 0.78 0.82 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.93 0.89 0.91 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix for Random forest\n",
"\n",
"cm = confusion_matrix(y_val, rf_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, rf_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "suffering-saudi",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.80 0.72 0.76 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.90 0.86 0.88 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix for Gradient Boosting\n",
"\n",
"cm = confusion_matrix(y_val, gb_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, gb_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "successful-northern",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"F1-scores of the models:\n",
"Logistic regression: 0.7676767676767676\n",
"Decision tree: 0.7639751552795032\n",
"Naive Bayes: 0.7639751552795032\n",
"Support Vector Machine: 0.0\n",
"Random Forest: 0.8208469055374594\n",
"Gradient boosting: 0.7557003257328991\n"
]
}
],
"source": [
"from sklearn.metrics import f1_score\n",
"\n",
"print(\"F1-scores of the models:\")\n",
"\n",
"print(\"Logistic regression:\", f1_score(y_val, lr_predicted_labels))\n",
"print(\"Decision tree:\", f1_score(y_val, dt_predicted_labels))\n",
"print(\"Naive Bayes:\", f1_score(y_val, nb_predicted_labels))\n",
"print(\"Support Vector Machine:\", f1_score(y_val, svm_predicted_labels))\n",
"print(\"Random Forest:\", f1_score(y_val, rf_predicted_labels))\n",
"print(\"Gradient boosting:\", f1_score(y_val, gb_predicted_labels))"
]
},
{
"cell_type": "markdown",
"id": "fluid-canadian",
"metadata": {},
"source": [
"## Hyperparameter tuning"
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "expressed-objective",
"metadata": {},
"outputs": [],
"source": [
"lr_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],\n",
" 'solver' : ['liblinear','lbfgs'],#saga\n",
" 'max_iter': [5000]}"
]
},
{
"cell_type": "code",
"execution_count": 69,
"id": "laden-mexico",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting 3 folds for each of 10 candidates, totalling 30 fits\n",
"[CV] END ..................C=10, max_iter=5000, solver=lbfgs; total time= 6.3s\n",
"[CV] END ..................C=10, max_iter=5000, solver=lbfgs; total time= 5.9s\n",
"[CV] END ..................C=10, max_iter=5000, solver=lbfgs; total time= 4.3s\n",
"[CV] END .................C=100, max_iter=5000, solver=lbfgs; total time= 4.9s\n",
"[CV] END .................C=100, max_iter=5000, solver=lbfgs; total time= 2.7s\n",
"[CV] END .................C=100, max_iter=5000, solver=lbfgs; total time= 2.0s\n",
"[CV] END ...........C=0.001, max_iter=5000, solver=liblinear; total time= 0.7s\n",
"[CV] END ...........C=0.001, max_iter=5000, solver=liblinear; total time= 0.7s\n",
"[CV] END ...........C=0.001, max_iter=5000, solver=liblinear; total time= 0.7s\n",
"[CV] END ............C=1000, max_iter=5000, solver=liblinear; total time= 0.7s\n",
"[CV] END ............C=1000, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END ............C=1000, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END .................C=0.1, max_iter=5000, solver=lbfgs; total time= 3.8s\n",
"[CV] END .................C=0.1, max_iter=5000, solver=lbfgs; total time= 4.9s\n",
"[CV] END .................C=0.1, max_iter=5000, solver=lbfgs; total time= 3.0s\n",
"[CV] END ..............C=10, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END ..............C=10, max_iter=5000, solver=liblinear; total time= 0.9s\n",
"[CV] END ..............C=10, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END ............C=0.01, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END ............C=0.01, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END ............C=0.01, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END ...............C=0.001, max_iter=5000, solver=lbfgs; total time= 3.8s\n",
"[CV] END ...............C=0.001, max_iter=5000, solver=lbfgs; total time= 2.7s\n",
"[CV] END ...............C=0.001, max_iter=5000, solver=lbfgs; total time= 4.5s\n",
"[CV] END ................C=1000, max_iter=5000, solver=lbfgs; total time= 7.2s\n",
"[CV] END ................C=1000, max_iter=5000, solver=lbfgs; total time= 3.0s\n",
"[CV] END ................C=1000, max_iter=5000, solver=lbfgs; total time= 3.7s\n",
"[CV] END .............C=0.1, max_iter=5000, solver=liblinear; total time= 0.8s\n",
"[CV] END .............C=0.1, max_iter=5000, solver=liblinear; total time= 0.7s\n",
"[CV] END .............C=0.1, max_iter=5000, solver=liblinear; total time= 0.8s\n"
]
}
],
"source": [
"from sklearn.model_selection import RandomizedSearchCV, GridSearchCV\n",
"\n",
"# Setup random seed\n",
"np.random.seed(42)\n",
"\n",
"# Setup random hyperparameter search for LogisticRegression\n",
"rs_lr = RandomizedSearchCV(LogisticRegression(),\n",
" param_distributions=lr_grid,\n",
" cv=3,\n",
" verbose=2)\n",
"\n",
"rs_lr.fit(X_train, y_train);"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "mechanical-mayor",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-9 {color: black;background-color: white;}#sk-container-id-9 pre{padding: 0;}#sk-container-id-9 div.sk-toggleable {background-color: white;}#sk-container-id-9 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-9 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-9 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-9 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-9 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-9 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-9 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-9 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-9 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-9 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-9 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-9 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-9 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-9 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-9 div.sk-item {position: relative;z-index: 1;}#sk-container-id-9 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-9 div.sk-item::before, #sk-container-id-9 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-9 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-9 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-9 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-9 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-9 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-9 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-9 div.sk-label-container {text-align: center;}#sk-container-id-9 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-9 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-9\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LogisticRegression(C=1000, max_iter=5000)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-9\" type=\"checkbox\" checked><label for=\"sk-estimator-id-9\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression(C=1000, max_iter=5000)</pre></div></div></div></div></div>"
],
"text/plain": [
"LogisticRegression(C=1000, max_iter=5000)"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the best parameters\n",
"rs_lr.best_estimator_"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "excited-dinner",
"metadata": {},
"outputs": [],
"source": [
"# Evaluate the randomized search logistic regression model\n",
"rs_lr_predicted_labels = rs_lr.predict(X_val)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"id": "stable-million",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.84 0.73 0.79 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.92 0.87 0.89 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix\n",
"\n",
"cm = confusion_matrix(y_val, rs_lr_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, rs_lr_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "numeric-meaning",
"metadata": {},
"outputs": [],
"source": [
"# RandomForestClassifier hyperparameters\n",
"rf_grid = {\n",
" 'n_estimators': [100, 200, 300],\n",
" 'max_depth': [2],\n",
" 'min_samples_leaf': [1, 2, 3]\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "removable-marsh",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting 3 folds for each of 9 candidates, totalling 27 fits\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=100; total time= 8.3s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=100; total time= 6.4s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=100; total time= 5.7s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=200; total time= 12.2s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=200; total time= 12.2s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=200; total time= 12.0s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=300; total time= 17.7s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=300; total time= 17.3s\n",
"[CV] END ..max_depth=2, min_samples_leaf=1, n_estimators=300; total time= 17.0s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=100; total time= 5.5s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=100; total time= 5.6s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=100; total time= 5.7s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=200; total time= 12.0s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=200; total time= 12.6s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=200; total time= 11.0s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=300; total time= 16.3s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=300; total time= 16.2s\n",
"[CV] END ..max_depth=2, min_samples_leaf=2, n_estimators=300; total time= 16.2s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=100; total time= 5.6s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=100; total time= 5.4s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=100; total time= 5.4s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=200; total time= 11.0s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=200; total time= 10.8s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=200; total time= 11.2s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=300; total time= 16.4s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=300; total time= 17.2s\n",
"[CV] END ..max_depth=2, min_samples_leaf=3, n_estimators=300; total time= 16.5s\n"
]
},
{
"data": {
"text/html": [
"<style>#sk-container-id-10 {color: black;background-color: white;}#sk-container-id-10 pre{padding: 0;}#sk-container-id-10 div.sk-toggleable {background-color: white;}#sk-container-id-10 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-10 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-10 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-10 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-10 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-10 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-10 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-10 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-10 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-10 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-10 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-10 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-10 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-10 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-10 div.sk-item {position: relative;z-index: 1;}#sk-container-id-10 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-10 div.sk-item::before, #sk-container-id-10 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-10 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-10 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-10 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-10 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-10 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-10 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-10 div.sk-label-container {text-align: center;}#sk-container-id-10 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-10 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-10\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GridSearchCV(cv=3, estimator=RandomForestClassifier(),\n",
" param_grid={&#x27;max_depth&#x27;: [2], &#x27;min_samples_leaf&#x27;: [1, 2, 3],\n",
" &#x27;n_estimators&#x27;: [100, 200, 300]},\n",
" verbose=2)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-10\" type=\"checkbox\" ><label for=\"sk-estimator-id-10\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">GridSearchCV</label><div class=\"sk-toggleable__content\"><pre>GridSearchCV(cv=3, estimator=RandomForestClassifier(),\n",
" param_grid={&#x27;max_depth&#x27;: [2], &#x27;min_samples_leaf&#x27;: [1, 2, 3],\n",
" &#x27;n_estimators&#x27;: [100, 200, 300]},\n",
" verbose=2)</pre></div></div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-11\" type=\"checkbox\" ><label for=\"sk-estimator-id-11\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">estimator: RandomForestClassifier</label><div class=\"sk-toggleable__content\"><pre>RandomForestClassifier()</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-12\" type=\"checkbox\" ><label for=\"sk-estimator-id-12\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomForestClassifier</label><div class=\"sk-toggleable__content\"><pre>RandomForestClassifier()</pre></div></div></div></div></div></div></div></div></div></div>"
],
"text/plain": [
"GridSearchCV(cv=3, estimator=RandomForestClassifier(),\n",
" param_grid={'max_depth': [2], 'min_samples_leaf': [1, 2, 3],\n",
" 'n_estimators': [100, 200, 300]},\n",
" verbose=2)"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Setup random seed\n",
"np.random.seed(42)\n",
"\n",
"gs_rf = GridSearchCV(RandomForestClassifier(),\n",
" param_grid=rf_grid,\n",
" cv=3,\n",
" verbose=2)\n",
"\n",
"gs_rf.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "legal-quilt",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-11 {color: black;background-color: white;}#sk-container-id-11 pre{padding: 0;}#sk-container-id-11 div.sk-toggleable {background-color: white;}#sk-container-id-11 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-11 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-11 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-11 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-11 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-11 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-11 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-11 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-11 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-11 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-11 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-11 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-11 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-11 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-11 div.sk-item {position: relative;z-index: 1;}#sk-container-id-11 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-11 div.sk-item::before, #sk-container-id-11 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-11 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-11 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-11 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-11 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-11 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-11 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-11 div.sk-label-container {text-align: center;}#sk-container-id-11 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-11 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-11\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>RandomForestClassifier(max_depth=2, n_estimators=300)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-13\" type=\"checkbox\" checked><label for=\"sk-estimator-id-13\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomForestClassifier</label><div class=\"sk-toggleable__content\"><pre>RandomForestClassifier(max_depth=2, n_estimators=300)</pre></div></div></div></div></div>"
],
"text/plain": [
"RandomForestClassifier(max_depth=2, n_estimators=300)"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the best parameters\n",
"gs_rf.best_estimator_"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "unlikely-bolivia",
"metadata": {},
"outputs": [],
"source": [
"# Evaluate the grid search random forest model\n",
"gs_rf_predicted_labels = gs_rf.predict(X_val)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "threatened-cardiff",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 85280\n",
" 1 0.86 0.59 0.70 162\n",
"\n",
" accuracy 1.00 85442\n",
" macro avg 0.93 0.80 0.85 85442\n",
"weighted avg 1.00 1.00 1.00 85442\n",
"\n"
]
}
],
"source": [
"# Plotting the Confusion Matrix\n",
"\n",
"cm = confusion_matrix(y_val, gs_rf_predicted_labels)\n",
"cmdf = pd.DataFrame(cm, index=[\"0\",\"1\"],columns=[\"0\",\"1\"])\n",
"fig, ax = plt.subplots(1,1)\n",
"sns.heatmap(cmdf,annot=True, fmt='d', ax = ax)\n",
"ax.set_xlabel('Predicted Label')\n",
"ax.set_ylabel('True Label')\n",
"plt.show()\n",
"\n",
"#Printing the Classification Report\n",
"print(classification_report(y_val, gs_rf_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "alike-judgment",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Logistic regression evaluation: 0.7854785478547854\n"
]
}
],
"source": [
"# Evaluate f1-score logistic regression model\n",
"print(\"Logistic regression evaluation:\", f1_score(y_val, rs_lr_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "worst-court",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Random forest evaluation: 0.7032967032967032\n"
]
}
],
"source": [
"# Evaluate f1-score random forest model\n",
"print(\"Random forest evaluation:\", f1_score(y_val, gs_rf_predicted_labels))"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "apart-trout",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function plot_roc_curve is deprecated; Function :func:`plot_roc_curve` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: :meth:`sklearn.metrics.RocCurveDisplay.from_predictions` or :meth:`sklearn.metrics.RocCurveDisplay.from_estimator`.\n",
" warnings.warn(msg, category=FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# AUC Score\n",
"# Import ROC curve function from metrics module\n",
"from sklearn.metrics import plot_roc_curve\n",
"\n",
"# Plot ROC curve and calculate AUC metric\n",
"plot_roc_curve(rs_lr, X_val, y_val);"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "multiple-robert",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function plot_roc_curve is deprecated; Function :func:`plot_roc_curve` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: :meth:`sklearn.metrics.RocCurveDisplay.from_predictions` or :meth:`sklearn.metrics.RocCurveDisplay.from_estimator`.\n",
" warnings.warn(msg, category=FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_roc_curve(rf_model, X_val, y_val);"
]
},
{
"cell_type": "markdown",
"id": "historic-surprise",
"metadata": {},
"source": [
"## 7. Testing the model\n"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "annual-playback",
"metadata": {},
"outputs": [],
"source": [
"y_pred_test = rf_model.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "seventh-klein",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Random forest evaluation: 0.8231884057971014\n"
]
}
],
"source": [
"# Evaluate f1-score random forest model\n",
"print(\"Random forest evaluation:\", f1_score(y_test, y_pred_test))"
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "subjective-welsh",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function plot_roc_curve is deprecated; Function :func:`plot_roc_curve` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: :meth:`sklearn.metrics.RocCurveDisplay.from_predictions` or :meth:`sklearn.metrics.RocCurveDisplay.from_estimator`.\n",
" warnings.warn(msg, category=FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot ROC curve and calculate AUC metric\n",
"plot_roc_curve(rf_model, X_test, y_test);"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "demanding-difficulty",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0., 0., 0., ..., 0., 0., 0.])"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# make predictions on test data \n",
"rf_predicted_test_labels = rf_model.predict_proba(X_test)[:, 1]\n",
"rf_predicted_test_labels"
]
},
{
"cell_type": "markdown",
"id": "velvet-annotation",
"metadata": {},
"source": [
"**Use the model**"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "compound-faculty",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'time': 58022.0,\n",
" 'v1': -1.071075751,\n",
" 'v2': 0.74032389,\n",
" 'v3': 2.138828111,\n",
" 'v4': 0.246285129,\n",
" 'v5': 0.06198512,\n",
" 'v6': -0.371016514,\n",
" 'v7': 0.797548048,\n",
" 'v8': 0.021495732,\n",
" 'v9': -0.129206888,\n",
" 'v10': -0.886336993,\n",
" 'v11': -0.176539632,\n",
" 'v12': 0.723445645,\n",
" 'v13': 0.157480258,\n",
" 'v14': -0.282065003,\n",
" 'v15': -0.674822792,\n",
" 'v16': -0.531534111,\n",
" 'v17': 0.070645295,\n",
" 'v18': -1.153616413,\n",
" 'v19': -1.155834158,\n",
" 'v20': -0.234713579,\n",
" 'v21': -0.11976122,\n",
" 'v22': -0.23017531,\n",
" 'v23': -0.091648999,\n",
" 'v24': 0.602902637,\n",
" 'v25': 0.086165058,\n",
" 'v26': -0.71868059,\n",
" 'v27': 0.016999323,\n",
" 'v28': 0.092905471,\n",
" 'amount': 49.99}"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"credit_card = X_test.iloc[1000]\n",
"credit_card_dict = credit_card.to_dict()\n",
"credit_card_dict"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "formal-lender",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/assitan/.local/lib/python3.8/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rf_model.predict_proba([credit_card])[0, 1]"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "vital-denmark",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_test.iloc[1000]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment