Skip to content

Instantly share code, notes, and snippets.

@KhyatiMahendru
Last active February 21, 2022 16:37
Show Gist options
  • Save KhyatiMahendru/734da147385bcf630a42ba7c11cfc6c4 to your computer and use it in GitHub Desktop.
Save KhyatiMahendru/734da147385bcf630a42ba7c11cfc6c4 to your computer and use it in GitHub Desktop.
CreditCardFraudDetection.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "CreditCardFraudDetection.ipynb",
"version": "0.3.2",
"provenance": [],
"include_colab_link": true
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/KhyatiMahendru/734da147385bcf630a42ba7c11cfc6c4/creditcardfrauddetection.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "LsDYJigFA1tu",
"colab_type": "code",
"colab": {}
},
"source": [
"# import required libraries\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Yi7iv4MTA1t2",
"colab_type": "code",
"colab": {}
},
"source": [
"data = pd.read_csv('creditcard.csv')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "9k2KAwFbA1t8",
"colab_type": "code",
"colab": {},
"outputId": "31291396-1218-4d8f-f9cf-97276e8be19e"
},
"source": [
"data.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>...</th>\n",
" <th>V21</th>\n",
" <th>V22</th>\n",
" <th>V23</th>\n",
" <th>V24</th>\n",
" <th>V25</th>\n",
" <th>V26</th>\n",
" <th>V27</th>\n",
" <th>V28</th>\n",
" <th>Amount</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.0</td>\n",
" <td>-1.359807</td>\n",
" <td>-0.072781</td>\n",
" <td>2.536347</td>\n",
" <td>1.378155</td>\n",
" <td>-0.338321</td>\n",
" <td>0.462388</td>\n",
" <td>0.239599</td>\n",
" <td>0.098698</td>\n",
" <td>0.363787</td>\n",
" <td>...</td>\n",
" <td>-0.018307</td>\n",
" <td>0.277838</td>\n",
" <td>-0.110474</td>\n",
" <td>0.066928</td>\n",
" <td>0.128539</td>\n",
" <td>-0.189115</td>\n",
" <td>0.133558</td>\n",
" <td>-0.021053</td>\n",
" <td>149.62</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0</td>\n",
" <td>1.191857</td>\n",
" <td>0.266151</td>\n",
" <td>0.166480</td>\n",
" <td>0.448154</td>\n",
" <td>0.060018</td>\n",
" <td>-0.082361</td>\n",
" <td>-0.078803</td>\n",
" <td>0.085102</td>\n",
" <td>-0.255425</td>\n",
" <td>...</td>\n",
" <td>-0.225775</td>\n",
" <td>-0.638672</td>\n",
" <td>0.101288</td>\n",
" <td>-0.339846</td>\n",
" <td>0.167170</td>\n",
" <td>0.125895</td>\n",
" <td>-0.008983</td>\n",
" <td>0.014724</td>\n",
" <td>2.69</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>-1.358354</td>\n",
" <td>-1.340163</td>\n",
" <td>1.773209</td>\n",
" <td>0.379780</td>\n",
" <td>-0.503198</td>\n",
" <td>1.800499</td>\n",
" <td>0.791461</td>\n",
" <td>0.247676</td>\n",
" <td>-1.514654</td>\n",
" <td>...</td>\n",
" <td>0.247998</td>\n",
" <td>0.771679</td>\n",
" <td>0.909412</td>\n",
" <td>-0.689281</td>\n",
" <td>-0.327642</td>\n",
" <td>-0.139097</td>\n",
" <td>-0.055353</td>\n",
" <td>-0.059752</td>\n",
" <td>378.66</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>-0.966272</td>\n",
" <td>-0.185226</td>\n",
" <td>1.792993</td>\n",
" <td>-0.863291</td>\n",
" <td>-0.010309</td>\n",
" <td>1.247203</td>\n",
" <td>0.237609</td>\n",
" <td>0.377436</td>\n",
" <td>-1.387024</td>\n",
" <td>...</td>\n",
" <td>-0.108300</td>\n",
" <td>0.005274</td>\n",
" <td>-0.190321</td>\n",
" <td>-1.175575</td>\n",
" <td>0.647376</td>\n",
" <td>-0.221929</td>\n",
" <td>0.062723</td>\n",
" <td>0.061458</td>\n",
" <td>123.50</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2.0</td>\n",
" <td>-1.158233</td>\n",
" <td>0.877737</td>\n",
" <td>1.548718</td>\n",
" <td>0.403034</td>\n",
" <td>-0.407193</td>\n",
" <td>0.095921</td>\n",
" <td>0.592941</td>\n",
" <td>-0.270533</td>\n",
" <td>0.817739</td>\n",
" <td>...</td>\n",
" <td>-0.009431</td>\n",
" <td>0.798278</td>\n",
" <td>-0.137458</td>\n",
" <td>0.141267</td>\n",
" <td>-0.206010</td>\n",
" <td>0.502292</td>\n",
" <td>0.219422</td>\n",
" <td>0.215153</td>\n",
" <td>69.99</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 31 columns</p>\n",
"</div>"
],
"text/plain": [
" Time V1 V2 V3 V4 V5 V6 V7 \\\n",
"0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 \n",
"1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 \n",
"2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 \n",
"3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 \n",
"4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 \n",
"\n",
" V8 V9 ... V21 V22 V23 V24 \\\n",
"0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 \n",
"1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 \n",
"2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 \n",
"3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 \n",
"4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 \n",
"\n",
" V25 V26 V27 V28 Amount Class \n",
"0 0.128539 -0.189115 0.133558 -0.021053 149.62 0 \n",
"1 0.167170 0.125895 -0.008983 0.014724 2.69 0 \n",
"2 -0.327642 -0.139097 -0.055353 -0.059752 378.66 0 \n",
"3 0.647376 -0.221929 0.062723 0.061458 123.50 0 \n",
"4 -0.206010 0.502292 0.219422 0.215153 69.99 0 \n",
"\n",
"[5 rows x 31 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pG5u12G6A1uJ",
"colab_type": "text"
},
"source": [
"## About the Data\n",
"To quote from Kaggle:\n",
"\n",
"\"The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.\n",
"\n",
"It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'.\""
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "P7RY2YDLA1uL",
"colab_type": "code",
"colab": {},
"outputId": "806f76c4-a88f-488c-c9a8-b235c15c9985"
},
"source": [
"data['Class'].value_counts().plot.bar()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x26f6ee314e0>"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD4CAYAAAAQP7oXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAD4tJREFUeJzt3FGsXVWdx/Hvz1aMGUepUgjTdqZEbzKiyVRtoIkvjiRQmIdiAgk8SEOa1JiSaOKD6AuOSqIPSkKiTWroUIwjEtTQzNTpNJWJMSPYixKgMkxvkJFrCVxsRSZGHeA/D2ddPdye3rt6b+EU7/eT7Jx9/nuttddO2vyy197npqqQJKnH68Y9AUnSa4ehIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSp28pxT+B0O+ecc2r9+vXjnoYkvaY88MADz1bV6oXa/dmFxvr165mcnBz3NCTpNSXJ//S0c3lKktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVK3P7sf971WrL/xX8c9hT8rT3zhH8Y9BWlZ8E5DktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1G3B0EiyLsm9SR5NcjjJx1r9M0l+meTBtl0x1OdTSaaSPJbksqH65labSnLjUP2CJPcnOZLkW0nOavU3tO9T7fj603nxkqRT03On8QLwiap6J7AJ2JHkwnbslqra0LZ9AO3YNcC7gM3AV5OsSLIC+ApwOXAhcO3QOF9sY00Ax4Ftrb4NOF5V7wBuae0kSWOyYGhU1VNV9ZO2/zzwKLBmni5bgDur6vdV9XNgCriobVNV9XhV/QG4E9iSJMAHgbtb/z3AlUNj7Wn7dwOXtPaSpDE4pWcabXnoPcD9rXRDkoeS7E6yqtXWAE8OdZtutZPV3wb8uqpemFN/2Vjt+HOt/dx5bU8ymWRyZmbmVC5JknQKukMjyZuAbwMfr6rfADuBtwMbgKeAL802HdG9FlGfb6yXF6p2VdXGqtq4evXqea9DkrR4XaGR5PUMAuMbVfUdgKp6uqperKqXgK8xWH6CwZ3CuqHua4Gj89SfBc5OsnJO/WVjteNvAY6dygVKkk6fnrenAtwGPFpVXx6qnz/U7EPAI21/L3BNe/PpAmAC+DFwCJhob0qdxeBh+d6qKuBe4KrWfytwz9BYW9v+VcD3W3tJ0hisXLgJ7wc+DDyc5MFW+zSDt582MFguegL4CEBVHU5yF/AzBm9e7aiqFwGS3ADsB1YAu6vqcBvvk8CdST4P/JRBSNE+v55kisEdxjVLuFZJ0hItGBpV9UNGP1vYN0+fm4GbR9T3jepXVY/zp+Wt4frvgKsXmqMk6dXhL8IlSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktRtwdBIsi7JvUkeTXI4ycda/a1JDiQ50j5XtXqS3JpkKslDSd47NNbW1v5Ikq1D9fclebj1uTVJ5juHJGk8eu40XgA+UVXvBDYBO5JcCNwIHKyqCeBg+w5wOTDRtu3AThgEAHATcDFwEXDTUAjsbG1n+21u9ZOdQ5I0BguGRlU9VVU/afvPA48Ca4AtwJ7WbA9wZdvfAtxRA/cBZyc5H7gMOFBVx6rqOHAA2NyOvbmqflRVBdwxZ6xR55AkjcEpPdNIsh54D3A/cF5VPQWDYAHObc3WAE8OdZtutfnq0yPqzHMOSdIYdIdGkjcB3wY+XlW/ma/piFotot4tyfYkk0kmZ2ZmTqWrJOkUdIVGktczCIxvVNV3WvnptrRE+3ym1aeBdUPd1wJHF6ivHVGf7xwvU1W7qmpjVW1cvXp1zyVJkhah5+2pALcBj1bVl4cO7QVm34DaCtwzVL+uvUW1CXiuLS3tBy5Nsqo9AL8U2N+OPZ9kUzvXdXPGGnUOSdIYrOxo837gw8DDSR5stU8DXwDuSrIN+AVwdTu2D7gCmAJ+C1wPUFXHknwOONTafbaqjrX9jwK3A28Evtc25jmHJGkMFgyNqvoho587AFwyon0BO04y1m5g94j6JPDuEfVfjTqHJGk8/EW4JKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG4LhkaS3UmeSfLIUO0zSX6Z5MG2XTF07FNJppI8luSyofrmVptKcuNQ/YIk9yc5kuRbSc5q9Te071Pt+PrTddGSpMXpudO4Hdg8on5LVW1o2z6AJBcC1wDvan2+mmRFkhXAV4DLgQuBa1tbgC+2sSaA48C2Vt8GHK+qdwC3tHaSpDFaMDSq6gfAsc7xtgB3VtXvq+rnwBRwUdumqurxqvoDcCewJUmADwJ3t/57gCuHxtrT9u8GLmntJUljspRnGjckeagtX61qtTXAk0NtplvtZPW3Ab+uqhfm1F82Vjv+XGt/giTbk0wmmZyZmVnCJUmS5rPY0NgJvB3YADwFfKnVR90J1CLq8411YrFqV1VtrKqNq1evnm/ekqQlWFRoVNXTVfViVb0EfI3B8hMM7hTWDTVdCxydp/4scHaSlXPqLxurHX8L/ctkkqRXwKJCI8n5Q18/BMy+WbUXuKa9+XQBMAH8GDgETLQ3pc5i8LB8b1UVcC9wVeu/FbhnaKytbf8q4PutvSRpTFYu1CDJN4EPAOckmQZuAj6QZAOD5aIngI8AVNXhJHcBPwNeAHZU1YttnBuA/cAKYHdVHW6n+CRwZ5LPAz8Fbmv124CvJ5licIdxzZKvVpK0JAuGRlVdO6J824jabPubgZtH1PcB+0bUH+dPy1vD9d8BVy80P0nSq8dfhEuSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqduCoZFkd5JnkjwyVHtrkgNJjrTPVa2eJLcmmUryUJL3DvXZ2tofSbJ1qP6+JA+3PrcmyXznkCSNT8+dxu3A5jm1G4GDVTUBHGzfAS4HJtq2HdgJgwAAbgIuBi4CbhoKgZ2t7Wy/zQucQ5I0JguGRlX9ADg2p7wF2NP29wBXDtXvqIH7gLOTnA9cBhyoqmNVdRw4AGxux95cVT+qqgLumDPWqHNIksZksc80zquqpwDa57mtvgZ4cqjddKvNV58eUZ/vHJKkMTndD8IzolaLqJ/aSZPtSSaTTM7MzJxqd0lSp8WGxtNtaYn2+UyrTwPrhtqtBY4uUF87oj7fOU5QVbuqamNVbVy9evUiL0mStJDFhsZeYPYNqK3APUP169pbVJuA59rS0n7g0iSr2gPwS4H97djzSTa1t6aumzPWqHNIksZk5UINknwT+ABwTpJpBm9BfQG4K8k24BfA1a35PuAKYAr4LXA9QFUdS/I54FBr99mqmn24/lEGb2i9Efhe25jnHJKkMVkwNKrq2pMcumRE2wJ2nGSc3cDuEfVJ4N0j6r8adQ5J0vj4i3BJUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3ZYUGkmeSPJwkgeTTLbaW5McSHKkfa5q9SS5NclUkoeSvHdonK2t/ZEkW4fq72vjT7W+Wcp8JUlLczruNP6+qjZU1cb2/UbgYFVNAAfbd4DLgYm2bQd2wiBkgJuAi4GLgJtmg6a12T7Ub/NpmK8kaZFeieWpLcCetr8HuHKofkcN3AecneR84DLgQFUdq6rjwAFgczv25qr6UVUVcMfQWJKkMVhqaBTw70keSLK91c6rqqcA2ue5rb4GeHKo73SrzVefHlE/QZLtSSaTTM7MzCzxkiRJJ7Nyif3fX1VHk5wLHEjyX/O0HfU8ohZRP7FYtQvYBbBx48aRbSRJS7ekO42qOto+nwG+y+CZxNNtaYn2+UxrPg2sG+q+Fji6QH3tiLokaUwWHRpJ/iLJX87uA5cCjwB7gdk3oLYC97T9vcB17S2qTcBzbflqP3BpklXtAfilwP527Pkkm9pbU9cNjSVJGoOlLE+dB3y3vQW7Evjnqvq3JIeAu5JsA34BXN3a7wOuAKaA3wLXA1TVsSSfAw61dp+tqmNt/6PA7cAbge+1TZI0JosOjap6HPi7EfVfAZeMqBew4yRj7QZ2j6hPAu9e7BwlSaeXvwiXJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVK3Mz40kmxO8liSqSQ3jns+krScndGhkWQF8BXgcuBC4NokF453VpK0fJ3RoQFcBExV1eNV9QfgTmDLmOckScvWynFPYAFrgCeHvk8DF89tlGQ7sL19/d8kj70Kc1suzgGeHfckFpIvjnsGGoPXxL/N15C/6Wl0podGRtTqhELVLmDXKz+d5SfJZFVtHPc8pLn8tzkeZ/ry1DSwbuj7WuDomOYiScvemR4ah4CJJBckOQu4Btg75jlJ0rJ1Ri9PVdULSW4A9gMrgN1VdXjM01puXPbTmcp/m2OQqhMeEUiSNNKZvjwlSTqDGBqSpG6GhiSp2xn9IFyvriR/y+AX92sY/B7mKLC3qh4d68QknTG80xAAST7J4M+0BPgxg9edA3zTPxQpaZZvTwmAJP8NvKuq/m9O/SzgcFVNjGdm0vySXF9V/zTueSwX3mlo1kvAX42on9+OSWeqfxz3BJYTn2lo1seBg0mO8Kc/EvnXwDuAG8Y2KwlI8tDJDgHnvZpzWe5cntIfJXkdgz9Hv4bBf8Zp4FBVvTjWiWnZS/I0cBlwfO4h4D+ratRdsl4B3mnoj6rqJeC+cc9DGuFfgDdV1YNzDyT5j1d/OsuXdxqSpG4+CJckdTM0JEndDA1JUjdDQ5LU7f8Bso5iJrY+IPYAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "7KVbEKVAA1uT",
"colab_type": "code",
"colab": {},
"outputId": "c23bc7e3-43e6-47f3-9a60-6a672a2ee693"
},
"source": [
"print('Proportion of the classes in the data:')\n",
"print(data['Class'].value_counts() / len(data))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Proportion of the classes in the data:\n",
"0 0.998273\n",
"1 0.001727\n",
"Name: Class, dtype: float64\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mEj_b4K5A1ub",
"colab_type": "text"
},
"source": [
"We will build a simple logistic regression classifer and compare the results for the classifier without SMOTE to with SMOTE."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZxAD3Va0A1uc",
"colab_type": "code",
"colab": {}
},
"source": [
"data = data.drop(['Time'], axis = 1)\n",
"X = np.array(data.loc[:, data.columns != 'Class'])\n",
"y = np.array(data.loc[:, data.columns == 'Class']).reshape(-1, 1)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ETrCNnWRA1ui",
"colab_type": "code",
"colab": {}
},
"source": [
"# standardize the data\n",
"from sklearn.preprocessing import StandardScaler\n",
"scaler = StandardScaler()\n",
"X = scaler.fit_transform(X)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "orBcshj5A1un",
"colab_type": "code",
"colab": {}
},
"source": [
"# split into training and testing datasets\n",
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 2, shuffle = True, stratify = y)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "cWK4Df0WA1us",
"colab_type": "code",
"colab": {}
},
"source": [
"# import logistic regression model and accuracy_score metric\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import accuracy_score\n",
"clf = LogisticRegression(solver = 'lbfgs')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "YHUexHjoA1u0",
"colab_type": "text"
},
"source": [
"# Without SMOTE"
]
},
{
"cell_type": "code",
"metadata": {
"id": "KAzw0KTVA1u2",
"colab_type": "code",
"colab": {}
},
"source": [
"# fit the model\n",
"clf.fit(X_train, y_train.ravel())\n",
"\n",
"# prediction for training dataset\n",
"train_pred = clf.predict(X_train)\n",
"\n",
"# prediction for testing dataset\n",
"test_pred = clf.predict(X_test)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "gQaxjF_PA1u-",
"colab_type": "code",
"colab": {},
"outputId": "2e30f164-9734-4536-8f63-ea1aae4f72b2"
},
"source": [
"print('Accuracy score for Training Dataset = ', accuracy_score(train_pred, y_train))\n",
"print('Accuracy score for Testing Dataset = ', accuracy_score(test_pred, y_test))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Accuracy score for Training Dataset = 0.9991248296824232\n",
"Accuracy score for Testing Dataset = 0.9992871354549033\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2F-JK9lrA1vF",
"colab_type": "text"
},
"source": [
"Wow! Such high accuracies!\n",
"\n",
"You might think that the model has performed exceptionally well. Well, that's not the case. Let us examine the confusion matrix for our predictions."
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": false,
"id": "VtpuOTeeA1vG",
"colab_type": "code",
"colab": {},
"outputId": "5565d965-b990-4a11-ab3f-723cc92e2546"
},
"source": [
"print('Confusion Matrix - Training Dataset')\n",
"print(pd.crosstab(y_train.ravel(), train_pred, rownames = ['True'], colnames = ['Predicted'], margins = True))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Confusion Matrix - Training Dataset\n",
"Predicted 0 1 All\n",
"True \n",
"0 190457 33 190490\n",
"1 134 196 330\n",
"All 190591 229 190820\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6OpP0-GwA1vO",
"colab_type": "text"
},
"source": [
"Now let's interpret the results. \n",
"\n",
"134 out of 330 instances which belong to class 1 have been classifed as class 0."
]
},
{
"cell_type": "code",
"metadata": {
"id": "vkkZnbI0A1vP",
"colab_type": "code",
"colab": {},
"outputId": "7484ae96-b16a-4191-f899-cd6a44a4ea82"
},
"source": [
"134/330"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.40606060606060607"
]
},
"metadata": {
"tags": []
},
"execution_count": 13
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rIf2Y-eAA1vW",
"colab_type": "text"
},
"source": [
"That is a whopping 41%! We are classifying 41% of the <b>fraud</b> cases as <b>not fraud</b>. This is going to cost some serious losses to the credit card company. You can observe this similarly in the confusion matrix of the Testing Dataset.\n",
"\n",
"The higher accuracy is not due to correct classification. The model has predicted the majority class for almost all the examples. And since about 99.8% of the examples actually belong to this class, it leads to such high accuracy scores."
]
},
{
"cell_type": "code",
"metadata": {
"id": "pK7xr71GA1vX",
"colab_type": "code",
"colab": {},
"outputId": "0dd41ee3-b259-4cff-8deb-7bcb9baeb168"
},
"source": [
"print('Confusion Matrix - Testing Dataset')\n",
"print(pd.crosstab(y_test.ravel(), test_pred.ravel(), rownames = ['True'], colnames = ['Predicted'], margins = True))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Confusion Matrix - Testing Dataset\n",
"Predicted 0 1 All\n",
"True \n",
"0 93815 10 93825\n",
"1 57 105 162\n",
"All 93872 115 93987\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xi40QKODA1vf",
"colab_type": "text"
},
"source": [
"55 out of 162 instances which belong to class 1 have been classifed as class 0. We are missing about 34% of the fraud cases."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DoXu_oieA1vh",
"colab_type": "text"
},
"source": [
"# Using SMOTE\n",
"Researchers have found that balancing the data will to better classification models. We will try balancing our data using SMOTE."
]
},
{
"cell_type": "code",
"metadata": {
"id": "kuRQlEN7A1vi",
"colab_type": "code",
"colab": {}
},
"source": [
"from imblearn.over_sampling import SMOTE\n",
"sm = SMOTE(random_state = 33)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "oYd1CvrtA1vq",
"colab_type": "code",
"colab": {}
},
"source": [
"X_train_new, y_train_new = sm.fit_sample(X_train, y_train.ravel())"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "puhGbwHDA1v3",
"colab_type": "code",
"colab": {},
"outputId": "d8a7d766-bde7-4f1a-d7e8-abd9c576658c"
},
"source": [
"# observe that data has been balanced\n",
"pd.Series(y_train_new).value_counts().plot.bar()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x26f079d7f60>"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD5CAYAAADbY2myAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFL9JREFUeJzt3W+sXdWd3vHvU3uI0qQZTHJB1H8Kk7mZDEStJ1jEUpRRGhqwaTUmVWixqthNkZxQkCbSvAiZviBNgpS0ykRCShg5xcJUKQ4NyWC1ZhjLZSaqCgQTKH9CGF8cAje2wMGEUDElNfn1xVk3OVyO712+1/HxxN+PtHX2+e211l4bWXrYa+9jp6qQJKnH3xn3BCRJf3sYGpKkboaGJKmboSFJ6mZoSJK6GRqSpG7zhkaSlUnuTvJ4kseS/GGrn5Fkd5J97XNZqyfJDUmmkjyc5N1DY21u7fcl2TxUvyDJI63PDUky1zkkSePRc6dxBPijqvpdYC1wdZLzgGuBPVU1Cexp3wHWA5Nt2wLcCIMAAK4D3gNcCFw3FAI3trYz/da1+tHOIUkag3lDo6oOVtV32/5LwOPAcmADsL012w5c1vY3ALfUwL3A6UnOBi4BdlfV4ap6AdgNrGvH3lJV99Tgl4a3zBpr1DkkSWNwTM80kpwD/B5wH3BWVR2EQbAAZ7Zmy4FnhrpNt9pc9ekRdeY4hyRpDJb2NkzyZuB24BNV9dP22GFk0xG1WkC9W5ItDJa3eNOb3nTBO9/5zmPpLkmnvAceeODHVTUxX7uu0EjyGwwC42tV9c1WfjbJ2VV1sC0xPdfq08DKoe4rgAOt/v5Z9b9s9RUj2s91jteoqq3AVoA1a9bU3r17ey5LktQk+WFPu563pwLcBDxeVX8ydGgnMPMG1GbgjqH6pvYW1Vrgxba0dBdwcZJl7QH4xcBd7dhLSda2c22aNdaoc0iSxqDnTuO9wEeAR5I81Gp/DHweuC3JlcDTwOXt2C7gUmAKeBn4KEBVHU7yWeD+1u4zVXW47V8F3Ay8EbizbcxxDknSGOTX7a9Gd3lKko5dkgeqas187fxFuCSpm6EhSepmaEiSuhkakqRuhoYkqVv3L8J1fJ1z7X8f9xR+rTz1+X867in8+vj0b457Br9ePv3iuGdwXHmnIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqdu8oZFkW5Lnkjw6VPt6kofa9tTMvx2e5JwkfzN07E+H+lyQ5JEkU0luSJJWPyPJ7iT72ueyVk9rN5Xk4STvPv6XL0k6Fj13GjcD64YLVfUvq2p1Va0Gbge+OXT4yZljVfXxofqNwBZgsm0zY14L7KmqSWBP+w6wfqjtltZfkjRG84ZGVX0bODzqWLtb+BfArXONkeRs4C1VdU9VFXALcFk7vAHY3va3z6rfUgP3Aqe3cSRJY7LYZxrvA56tqn1DtXOTPJjkr5K8r9WWA9NDbaZbDeCsqjoI0D7PHOrzzFH6SJLGYLH/CNNGXnuXcRBYVVXPJ7kA+LMk5wMZ0bfmGbu7T5ItDJawWLVq1byTliQtzILvNJIsBf458PWZWlW9UlXPt/0HgCeBdzC4S1gx1H0FcKDtPzuz7NQ+n2v1aWDlUfq8RlVtrao1VbVmYmJioZckSZrHYpan/gnw/ar6xbJTkokkS9r+bzF4iL2/LTu9lGRtew6yCbijddsJbG77m2fVN7W3qNYCL84sY0mSxqPnldtbgXuA30kyneTKdugKXv8A/PeBh5P8b+AbwMerauYh+lXAfwKmGNyB3Nnqnwc+mGQf8MH2HWAXsL+1/yrwb4/98iRJx9O8zzSqauNR6v96RO12Bq/gjmq/F3jXiPrzwEUj6gVcPd/8JEknjr8IlyR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUrd5QyPJtiTPJXl0qPbpJD9K8lDbLh069qkkU0meSHLJUH1dq00luXaofm6S+5LsS/L1JKe1+hva96l2/JzjddGSpIXpudO4GVg3ov6lqlrdtl0ASc4DrgDOb32+kmRJkiXAl4H1wHnAxtYW4AttrEngBeDKVr8SeKGqfhv4UmsnSRqjeUOjqr4NHO4cbwOwo6peqaofAFPAhW2bqqr9VfUzYAewIUmADwDfaP23A5cNjbW97X8DuKi1lySNyWKeaVyT5OG2fLWs1ZYDzwy1mW61o9XfCvykqo7Mqr9mrHb8xdb+dZJsSbI3yd5Dhw4t4pIkSXNZaGjcCLwdWA0cBL7Y6qPuBGoB9bnGen2xamtVramqNRMTE3PNW5K0CAsKjap6tqperaqfA19lsPwEgzuFlUNNVwAH5qj/GDg9ydJZ9deM1Y7/Jv3LZJKkX4EFhUaSs4e+fgiYebNqJ3BFe/PpXGAS+A5wPzDZ3pQ6jcHD8p1VVcDdwIdb/83AHUNjbW77Hwb+R2svSRqTpfM1SHIr8H7gbUmmgeuA9ydZzWC56CngYwBV9ViS24DvAUeAq6vq1TbONcBdwBJgW1U91k7xSWBHks8BDwI3tfpNwH9OMsXgDuOKRV+tJGlR5g2Nqto4onzTiNpM++uB60fUdwG7RtT388vlreH6/wUun29+kqQTx1+ES5K6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRu84ZGkm1Jnkvy6FDtPyb5fpKHk3wryemtfk6Sv0nyUNv+dKjPBUkeSTKV5IYkafUzkuxOsq99Lmv1tHZT7TzvPv6XL0k6Fj13GjcD62bVdgPvqqp/CPw18KmhY09W1eq2fXyofiOwBZhs28yY1wJ7qmoS2NO+A6wfarul9ZckjdG8oVFV3wYOz6r9RVUdaV/vBVbMNUaSs4G3VNU9VVXALcBl7fAGYHvb3z6rfksN3Auc3saRJI3J8Xim8W+AO4e+n5vkwSR/leR9rbYcmB5qM91qAGdV1UGA9nnmUJ9njtJHkjQGSxfTOcm/A44AX2ulg8Cqqno+yQXAnyU5H8iI7jXf8L19kmxhsITFqlWreqYuSVqABd9pJNkM/DPgX7UlJ6rqlap6vu0/ADwJvIPBXcLwEtYK4EDbf3Zm2al9Ptfq08DKo/R5jaraWlVrqmrNxMTEQi9JkjSPBYVGknXAJ4E/qKqXh+oTSZa0/d9i8BB7f1t2einJ2vbW1CbgjtZtJ7C57W+eVd/U3qJaC7w4s4wlSRqPeZenktwKvB94W5Jp4DoGb0u9Adjd3py9t70p9fvAZ5IcAV4FPl5VMw/Rr2LwJtYbGTwDmXkO8nngtiRXAk8Dl7f6LuBSYAp4GfjoYi5UkrR484ZGVW0cUb7pKG1vB24/yrG9wLtG1J8HLhpRL+Dq+eYnSTpx/EW4JKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSerWFRpJtiV5LsmjQ7UzkuxOsq99Lmv1JLkhyVSSh5O8e6jP5tZ+X5LNQ/ULkjzS+tyQ9g+PH+0ckqTx6L3TuBlYN6t2LbCnqiaBPe07wHpgsm1bgBthEADAdcB7gAuB64ZC4MbWdqbfunnOIUkag67QqKpvA4dnlTcA29v+duCyofotNXAvcHqSs4FLgN1VdbiqXgB2A+vasbdU1T1VVcAts8YadQ5J0hgs5pnGWVV1EKB9ntnqy4FnhtpNt9pc9ekR9bnOIUkag1/Fg/CMqNUC6v0nTLYk2Ztk76FDh46lqyTpGCwmNJ5tS0u0z+dafRpYOdRuBXBgnvqKEfW5zvEaVbW1qtZU1ZqJiYlFXJIkaS6LCY2dwMwbUJuBO4bqm9pbVGuBF9vS0l3AxUmWtQfgFwN3tWMvJVnb3praNGusUeeQJI3B0p5GSW4F3g+8Lck0g7egPg/cluRK4Gng8tZ8F3ApMAW8DHwUoKoOJ/kscH9r95mqmnm4fhWDN7TeCNzZNuY4hyRpDLpCo6o2HuXQRSPaFnD1UcbZBmwbUd8LvGtE/flR55AkjYe/CJckdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3RYcGkl+J8lDQ9tPk3wiyaeT/GiofulQn08lmUryRJJLhurrWm0qybVD9XOT3JdkX5KvJzlt4ZcqSVqsBYdGVT1RVaurajVwAfAy8K12+Eszx6pqF0CS84ArgPOBdcBXkixJsgT4MrAeOA/Y2NoCfKGNNQm8AFy50PlKkhbveC1PXQQ8WVU/nKPNBmBHVb1SVT8ApoAL2zZVVfur6mfADmBDkgAfAL7R+m8HLjtO85UkLcDxCo0rgFuHvl+T5OEk25Isa7XlwDNDbaZb7Wj1twI/qaojs+qSpDFZdGi05wx/APzXVroReDuwGjgIfHGm6YjutYD6qDlsSbI3yd5Dhw4dw+wlScfieNxprAe+W1XPAlTVs1X1alX9HPgqg+UnGNwprBzqtwI4MEf9x8DpSZbOqr9OVW2tqjVVtWZiYuI4XJIkaZTjERobGVqaSnL20LEPAY+2/Z3AFUnekORcYBL4DnA/MNnelDqNwVLXzqoq4G7gw63/ZuCO4zBfSdICLZ2/ydEl+bvAB4GPDZX/Q5LVDJaSnpo5VlWPJbkN+B5wBLi6ql5t41wD3AUsAbZV1WNtrE8CO5J8DngQuGkx85UkLc6iQqOqXmbwwHq49pE52l8PXD+ivgvYNaK+n18ub0mSxsxfhEuSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkbosOjSRPJXkkyUNJ9rbaGUl2J9nXPpe1epLckGQqycNJ3j00zubWfl+SzUP1C9r4U61vFjtnSdLCHK87jX9cVaurak37fi2wp6omgT3tO8B6YLJtW4AbYRAywHXAexj8m+DXzQRNa7NlqN+64zRnSdIx+lUtT20Atrf97cBlQ/VbauBe4PQkZwOXALur6nBVvQDsBta1Y2+pqnuqqoBbhsaSJJ1gxyM0CviLJA8k2dJqZ1XVQYD2eWarLweeGeo73Wpz1adH1CVJY7D0OIzx3qo6kORMYHeS78/RdtTziFpA/bWDDsJqC8CqVavmn7EkaUEWfadRVQfa53PAtxg8k3i2LS3RPp9rzaeBlUPdVwAH5qmvGFGfPYetVbWmqtZMTEws9pIkSUexqNBI8qYkf29mH7gYeBTYCcy8AbUZuKPt7wQ2tbeo1gIvtuWru4CLkyxrD8AvBu5qx15Ksra9NbVpaCxJ0gm22OWps4BvtbdglwL/par+PMn9wG1JrgSeBi5v7XcBlwJTwMvARwGq6nCSzwL3t3afqarDbf8q4GbgjcCdbZMkjcGiQqOq9gP/aET9eeCiEfUCrj7KWNuAbSPqe4F3LWaekqTjw1+ES5K6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuCw6NJCuT3J3k8SSPJfnDVv90kh8leahtlw71+VSSqSRPJLlkqL6u1aaSXDtUPzfJfUn2Jfl6ktMWOl9J0uIt5k7jCPBHVfW7wFrg6iTntWNfqqrVbdsF0I5dAZwPrAO+kmRJkiXAl4H1wHnAxqFxvtDGmgReAK5cxHwlSYu04NCoqoNV9d22/xLwOLB8ji4bgB1V9UpV/QCYAi5s21RV7a+qnwE7gA1JAnwA+Ebrvx24bKHzlSQt3nF5ppHkHOD3gPta6ZokDyfZlmRZqy0HnhnqNt1qR6u/FfhJVR2ZVZckjcmiQyPJm4HbgU9U1U+BG4G3A6uBg8AXZ5qO6F4LqI+aw5Yke5PsPXTo0DFegSSp16JCI8lvMAiMr1XVNwGq6tmqerWqfg58lcHyEwzuFFYOdV8BHJij/mPg9CRLZ9Vfp6q2VtWaqlozMTGxmEuSJM1hMW9PBbgJeLyq/mSofvZQsw8Bj7b9ncAVSd6Q5FxgEvgOcD8w2d6UOo3Bw/KdVVXA3cCHW//NwB0Lna8kafGWzt/kqN4LfAR4JMlDrfbHDN5+Ws1gKekp4GMAVfVYktuA7zF48+rqqnoVIMk1wF3AEmBbVT3WxvsksCPJ54AHGYSUJGlMFhwaVfU/Gf3cYdccfa4Hrh9R3zWqX1Xt55fLW5KkMfMX4ZKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSep20odGknVJnkgyleTacc9Hkk5lJ3VoJFkCfBlYD5wHbExy3nhnJUmnrpM6NIALgamq2l9VPwN2ABvGPCdJOmUtHfcE5rEceGbo+zTwntmNkmwBtrSv/yfJEydgbqeKtwE/Hvck5pMvjHsGGoO/FX82+fcZ9wx6/YOeRid7aIz6r12vK1RtBbb+6qdz6kmyt6rWjHse0mz+2RyPk315ahpYOfR9BXBgTHORpFPeyR4a9wOTSc5NchpwBbBzzHOSpFPWSb08VVVHklwD3AUsAbZV1WNjntapxmU/naz8szkGqXrdIwJJkkY62ZenJEknEUNDktTN0JAkdTupH4RL0owk72TwN0IsZ/B7rQPAzqp6fKwTO8V4p6EuST467jno1JXkkwz+GqEA32HwOn6AW/2LTE8s355SlyRPV9Wqcc9Dp6Ykfw2cX1X/b1b9NOCxqpocz8xOPS5P6ReSPHy0Q8BZJ3Iu0iw/B/4+8MNZ9bPbMZ0ghoaGnQVcArwwqx7gf5346Ui/8AlgT5J9/PIvMV0F/DZwzdhmdQoyNDTsvwFvrqqHZh9I8pcnfjrSQFX9eZJ3MPjnEpYz+B+ZaeD+qnp1rJM7xfhMQ5LUzbenJEndDA1JUjdDQ5LUzdCQJHUzNCRJ3f4/K1V59+afoecAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "bc6FdOh_A1wB",
"colab_type": "code",
"colab": {}
},
"source": [
"# fit the model\n",
"clf.fit(X_train_new, y_train_new)\n",
"\n",
"# prediction for Training data\n",
"train_pred_sm = clf.predict(X_train_new)\n",
"\n",
"# prediction for Testing data\n",
"test_pred_sm = clf.predict(X_test)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "lX2nDmxCA1wH",
"colab_type": "code",
"colab": {},
"outputId": "1f2cfdb9-2690-438f-f34e-c782086c4e67"
},
"source": [
"print('Accuracy score for Training Dataset = ', accuracy_score(train_pred_sm, y_train_new))\n",
"print('Accuracy score for Testing Dataset = ', accuracy_score(test_pred_sm, y_test))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Accuracy score for Training Dataset = 0.9425271667804084\n",
"Accuracy score for Testing Dataset = 0.9720812452786024\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5x5b7DXKA1wP",
"colab_type": "text"
},
"source": [
"Our accuracy has reduced. But our model has definitely improved. Observe the confusion matrices."
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "psjNVX2mA1wY",
"colab_type": "code",
"colab": {},
"outputId": "5ce3fff5-29fe-4d51-f42c-9bfad268ab5b"
},
"source": [
"print('Confusion Matrix - Training Dataset')\n",
"print(pd.crosstab(y_train_new, train_pred_sm, rownames = ['True'], colnames = ['Predicted'], margins = True))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Confusion Matrix - Training Dataset\n",
"Predicted 0 1 All\n",
"True \n",
"0 185279 5211 190490\n",
"1 16685 173805 190490\n",
"All 201964 179016 380980\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "BsE0umreA1we",
"colab_type": "code",
"colab": {},
"outputId": "a4ed0ac5-dfe8-4f22-c7c6-399795eef5d1"
},
"source": [
"16685/190490"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.08758989973226941"
]
},
"metadata": {
"tags": []
},
"execution_count": 21
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XrIY1nC0A1wk",
"colab_type": "text"
},
"source": [
"16685 out of 190490 <b>fraud</b> cases have been classified as <b>not fraud</b>. This is a mere 8.7% compared to the previous 41%.\n",
"\n",
"A vast improvement!\n",
"\n",
"Same is the case with the Testing Dataset."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3Wl9C6vOA1wl",
"colab_type": "code",
"colab": {},
"outputId": "aa0a8908-18da-4b88-b365-0ffc015ad42d"
},
"source": [
"print('Confusion Matrix - Testing Dataset')\n",
"print(pd.crosstab(y_test.ravel(), test_pred_sm, rownames = ['True'], colnames = ['Predicted'], margins = True))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Confusion Matrix - Testing Dataset\n",
"Predicted 0 1 All\n",
"True \n",
"0 91213 2612 93825\n",
"1 12 150 162\n",
"All 91225 2762 93987\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "ol1wZAJ4A1ws",
"colab_type": "code",
"colab": {},
"outputId": "b6fdca5c-cd68-499c-dd93-c302d5fe8214"
},
"source": [
"12/162"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.07407407407407407"
]
},
"metadata": {
"tags": []
},
"execution_count": 23
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7fzpuvPZA1wy",
"colab_type": "text"
},
"source": [
"Roughly 7.4% of the fraud classes have been classified as not fraud."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fUwoquLEA1w0",
"colab_type": "text"
},
"source": [
"# Conclusion\n",
"One might argue that the reduced accuracy is an indicator of lower model performance. However, this is not true.\n",
"\n",
"Error in prediction can be made in two ways:\n",
"1. Classifying <b>not fraud</b> as <b>fraud</b>\n",
"2. Classifying <b>fraud</b> as <b>not fraud</b>\n",
"\n",
"It should not be hard to understand that the second error is costlier than the first.\n",
"\n",
"The objective of each classification problem is different. So make sure to evaluate each model with respect to its own objective instead of merely judging it on its accuracy."
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment