Skip to content

Instantly share code, notes, and snippets.

@kn1kn1
Created December 5, 2022 05:00
Show Gist options
  • Save kn1kn1/faa3e8b78afcfb26d2925642f3f7a922 to your computer and use it in GitHub Desktop.
Save kn1kn1/faa3e8b78afcfb26d2925642f3f7a922 to your computer and use it in GitHub Desktop.
imbalanced_data.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/kn1kn1/faa3e8b78afcfb26d2925642f3f7a922/imbalanced_data.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dUeKVCYTbcyT"
},
"source": [
"#### Copyright 2019 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "4ellrPx7tdxq"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7JfLUlawto_D"
},
"source": [
"# Classification on imbalanced data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DwdpaTKJOoPu"
},
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/structured_data/imbalanced_data\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/structured_data/imbalanced_data.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/structured_data/imbalanced_data.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n",
" <td>\n",
" <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/structured_data/imbalanced_data.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mthoSGBAOoX-"
},
"source": [
"This tutorial demonstrates how to classify a highly imbalanced dataset in which the number of examples in one class greatly outnumbers the examples in another. You will work with the [Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud) dataset hosted on Kaggle. The aim is to detect a mere 492 fraudulent transactions from 284,807 transactions in total. You will use [Keras](https://www.tensorflow.org/guide/keras/overview) to define the model and [class weights](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Model) to help the model learn from the imbalanced data. .\n",
"\n",
"This tutorial contains complete code to:\n",
"\n",
"* Load a CSV file using Pandas.\n",
"* Create train, validation, and test sets.\n",
"* Define and train a model using Keras (including setting class weights).\n",
"* Evaluate the model using various metrics (including precision and recall).\n",
"* Try common techniques for dealing with imbalanced data like:\n",
" * Class weighting \n",
" * Oversampling\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kRHmSyHxEIhN"
},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JM7hDSNClfoK"
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"import os\n",
"import tempfile\n",
"\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"import sklearn\n",
"from sklearn.metrics import confusion_matrix\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c8o1FHzD-_y_"
},
"outputs": [],
"source": [
"mpl.rcParams['figure.figsize'] = (12, 10)\n",
"colors = plt.rcParams['axes.prop_cycle'].by_key()['color']"
]
},
{
"cell_type": "markdown",
"source": [
"Make Keras program deterministic with rand seed:\n",
"cf. https://www.tensorflow.org/api_docs/python/tf/keras/utils/set_random_seed"
],
"metadata": {
"id": "7MTt6J-NJ7qb"
}
},
{
"cell_type": "code",
"source": [
"import random\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"\n",
"seed = 42\n",
"random.seed(seed)\n",
"np.random.seed(seed)\n",
"tf.random.set_seed(seed)"
],
"metadata": {
"id": "SZRVGoVIJsy8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z3iZVjziKHmX"
},
"source": [
"## Data processing and exploration"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4sA9WOcmzH2D"
},
"source": [
"### Download the Kaggle Credit Card Fraud data set\n",
"\n",
"Pandas is a Python library with many helpful utilities for loading and working with structured data. It can be used to download CSVs into a Pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame).\n",
"\n",
"Note: This dataset has been collected and analysed during a research collaboration of Worldline and the [Machine Learning Group](http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available [here](https://www.researchgate.net/project/Fraud-detection-5) and the page of the [DefeatFraud](https://mlg.ulb.ac.be/wordpress/portfolio_page/defeatfraud-assessment-and-validation-of-deep-feature-engineering-and-learning-solutions-for-fraud-detection/) project"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pR_SnbMArXr7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"outputId": "790e0b1e-8ced-44d8-8440-6ca5a797fc80"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Time V1 V2 V3 V4 V5 V6 V7 \\\n",
"0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 \n",
"1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 \n",
"2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 \n",
"3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 \n",
"4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 \n",
"\n",
" V8 V9 ... V21 V22 V23 V24 V25 \\\n",
"0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 \n",
"1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 \n",
"2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 \n",
"3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 \n",
"4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 \n",
"\n",
" V26 V27 V28 Amount Class \n",
"0 -0.189115 0.133558 -0.021053 149.62 0 \n",
"1 0.125895 -0.008983 0.014724 2.69 0 \n",
"2 -0.139097 -0.055353 -0.059752 378.66 0 \n",
"3 -0.221929 0.062723 0.061458 123.50 0 \n",
"4 0.502292 0.219422 0.215153 69.99 0 \n",
"\n",
"[5 rows x 31 columns]"
],
"text/html": [
"\n",
" <div id=\"df-000a3213-2544-4f8c-9687-46b3d73b2345\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>...</th>\n",
" <th>V21</th>\n",
" <th>V22</th>\n",
" <th>V23</th>\n",
" <th>V24</th>\n",
" <th>V25</th>\n",
" <th>V26</th>\n",
" <th>V27</th>\n",
" <th>V28</th>\n",
" <th>Amount</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.0</td>\n",
" <td>-1.359807</td>\n",
" <td>-0.072781</td>\n",
" <td>2.536347</td>\n",
" <td>1.378155</td>\n",
" <td>-0.338321</td>\n",
" <td>0.462388</td>\n",
" <td>0.239599</td>\n",
" <td>0.098698</td>\n",
" <td>0.363787</td>\n",
" <td>...</td>\n",
" <td>-0.018307</td>\n",
" <td>0.277838</td>\n",
" <td>-0.110474</td>\n",
" <td>0.066928</td>\n",
" <td>0.128539</td>\n",
" <td>-0.189115</td>\n",
" <td>0.133558</td>\n",
" <td>-0.021053</td>\n",
" <td>149.62</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0</td>\n",
" <td>1.191857</td>\n",
" <td>0.266151</td>\n",
" <td>0.166480</td>\n",
" <td>0.448154</td>\n",
" <td>0.060018</td>\n",
" <td>-0.082361</td>\n",
" <td>-0.078803</td>\n",
" <td>0.085102</td>\n",
" <td>-0.255425</td>\n",
" <td>...</td>\n",
" <td>-0.225775</td>\n",
" <td>-0.638672</td>\n",
" <td>0.101288</td>\n",
" <td>-0.339846</td>\n",
" <td>0.167170</td>\n",
" <td>0.125895</td>\n",
" <td>-0.008983</td>\n",
" <td>0.014724</td>\n",
" <td>2.69</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>-1.358354</td>\n",
" <td>-1.340163</td>\n",
" <td>1.773209</td>\n",
" <td>0.379780</td>\n",
" <td>-0.503198</td>\n",
" <td>1.800499</td>\n",
" <td>0.791461</td>\n",
" <td>0.247676</td>\n",
" <td>-1.514654</td>\n",
" <td>...</td>\n",
" <td>0.247998</td>\n",
" <td>0.771679</td>\n",
" <td>0.909412</td>\n",
" <td>-0.689281</td>\n",
" <td>-0.327642</td>\n",
" <td>-0.139097</td>\n",
" <td>-0.055353</td>\n",
" <td>-0.059752</td>\n",
" <td>378.66</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>-0.966272</td>\n",
" <td>-0.185226</td>\n",
" <td>1.792993</td>\n",
" <td>-0.863291</td>\n",
" <td>-0.010309</td>\n",
" <td>1.247203</td>\n",
" <td>0.237609</td>\n",
" <td>0.377436</td>\n",
" <td>-1.387024</td>\n",
" <td>...</td>\n",
" <td>-0.108300</td>\n",
" <td>0.005274</td>\n",
" <td>-0.190321</td>\n",
" <td>-1.175575</td>\n",
" <td>0.647376</td>\n",
" <td>-0.221929</td>\n",
" <td>0.062723</td>\n",
" <td>0.061458</td>\n",
" <td>123.50</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2.0</td>\n",
" <td>-1.158233</td>\n",
" <td>0.877737</td>\n",
" <td>1.548718</td>\n",
" <td>0.403034</td>\n",
" <td>-0.407193</td>\n",
" <td>0.095921</td>\n",
" <td>0.592941</td>\n",
" <td>-0.270533</td>\n",
" <td>0.817739</td>\n",
" <td>...</td>\n",
" <td>-0.009431</td>\n",
" <td>0.798278</td>\n",
" <td>-0.137458</td>\n",
" <td>0.141267</td>\n",
" <td>-0.206010</td>\n",
" <td>0.502292</td>\n",
" <td>0.219422</td>\n",
" <td>0.215153</td>\n",
" <td>69.99</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 31 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-000a3213-2544-4f8c-9687-46b3d73b2345')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-000a3213-2544-4f8c-9687-46b3d73b2345 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-000a3213-2544-4f8c-9687-46b3d73b2345');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"file = tf.keras.utils\n",
"raw_df = pd.read_csv('https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv')\n",
"raw_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-fgdQgmwUFuj",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 364
},
"outputId": "7bd2bbf1-e129-4a1e-e4bd-c087666b11d1"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Time V1 V2 V3 V4 \\\n",
"count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 \n",
"mean 94813.859575 1.168375e-15 3.416908e-16 -1.379537e-15 2.074095e-15 \n",
"std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00 \n",
"min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00 \n",
"25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01 \n",
"50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02 \n",
"75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01 \n",
"max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01 \n",
"\n",
" V5 V26 V27 V28 Amount \\\n",
"count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000 \n",
"mean 9.604066e-16 1.683437e-15 -3.660091e-16 -1.227390e-16 88.349619 \n",
"std 1.380247e+00 4.822270e-01 4.036325e-01 3.300833e-01 250.120109 \n",
"min -1.137433e+02 -2.604551e+00 -2.256568e+01 -1.543008e+01 0.000000 \n",
"25% -6.915971e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 5.600000 \n",
"50% -5.433583e-02 -5.213911e-02 1.342146e-03 1.124383e-02 22.000000 \n",
"75% 6.119264e-01 2.409522e-01 9.104512e-02 7.827995e-02 77.165000 \n",
"max 3.480167e+01 3.517346e+00 3.161220e+01 3.384781e+01 25691.160000 \n",
"\n",
" Class \n",
"count 284807.000000 \n",
"mean 0.001727 \n",
"std 0.041527 \n",
"min 0.000000 \n",
"25% 0.000000 \n",
"50% 0.000000 \n",
"75% 0.000000 \n",
"max 1.000000 "
],
"text/html": [
"\n",
" <div id=\"df-b588e8a1-ed69-44be-996d-d735e2fbe240\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V26</th>\n",
" <th>V27</th>\n",
" <th>V28</th>\n",
" <th>Amount</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>284807.000000</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>2.848070e+05</td>\n",
" <td>284807.000000</td>\n",
" <td>284807.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>94813.859575</td>\n",
" <td>1.168375e-15</td>\n",
" <td>3.416908e-16</td>\n",
" <td>-1.379537e-15</td>\n",
" <td>2.074095e-15</td>\n",
" <td>9.604066e-16</td>\n",
" <td>1.683437e-15</td>\n",
" <td>-3.660091e-16</td>\n",
" <td>-1.227390e-16</td>\n",
" <td>88.349619</td>\n",
" <td>0.001727</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>47488.145955</td>\n",
" <td>1.958696e+00</td>\n",
" <td>1.651309e+00</td>\n",
" <td>1.516255e+00</td>\n",
" <td>1.415869e+00</td>\n",
" <td>1.380247e+00</td>\n",
" <td>4.822270e-01</td>\n",
" <td>4.036325e-01</td>\n",
" <td>3.300833e-01</td>\n",
" <td>250.120109</td>\n",
" <td>0.041527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>-5.640751e+01</td>\n",
" <td>-7.271573e+01</td>\n",
" <td>-4.832559e+01</td>\n",
" <td>-5.683171e+00</td>\n",
" <td>-1.137433e+02</td>\n",
" <td>-2.604551e+00</td>\n",
" <td>-2.256568e+01</td>\n",
" <td>-1.543008e+01</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>54201.500000</td>\n",
" <td>-9.203734e-01</td>\n",
" <td>-5.985499e-01</td>\n",
" <td>-8.903648e-01</td>\n",
" <td>-8.486401e-01</td>\n",
" <td>-6.915971e-01</td>\n",
" <td>-3.269839e-01</td>\n",
" <td>-7.083953e-02</td>\n",
" <td>-5.295979e-02</td>\n",
" <td>5.600000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>84692.000000</td>\n",
" <td>1.810880e-02</td>\n",
" <td>6.548556e-02</td>\n",
" <td>1.798463e-01</td>\n",
" <td>-1.984653e-02</td>\n",
" <td>-5.433583e-02</td>\n",
" <td>-5.213911e-02</td>\n",
" <td>1.342146e-03</td>\n",
" <td>1.124383e-02</td>\n",
" <td>22.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>139320.500000</td>\n",
" <td>1.315642e+00</td>\n",
" <td>8.037239e-01</td>\n",
" <td>1.027196e+00</td>\n",
" <td>7.433413e-01</td>\n",
" <td>6.119264e-01</td>\n",
" <td>2.409522e-01</td>\n",
" <td>9.104512e-02</td>\n",
" <td>7.827995e-02</td>\n",
" <td>77.165000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>172792.000000</td>\n",
" <td>2.454930e+00</td>\n",
" <td>2.205773e+01</td>\n",
" <td>9.382558e+00</td>\n",
" <td>1.687534e+01</td>\n",
" <td>3.480167e+01</td>\n",
" <td>3.517346e+00</td>\n",
" <td>3.161220e+01</td>\n",
" <td>3.384781e+01</td>\n",
" <td>25691.160000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b588e8a1-ed69-44be-996d-d735e2fbe240')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-b588e8a1-ed69-44be-996d-d735e2fbe240 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-b588e8a1-ed69-44be-996d-d735e2fbe240');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 6
}
],
"source": [
"raw_df[['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V26', 'V27', 'V28', 'Amount', 'Class']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xWKB_CVZFLpB"
},
"source": [
"### Examine the class label imbalance\n",
"\n",
"Let's look at the dataset imbalance:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HCJFrtuY2iLF",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "0216b95e-abce-45b2-de2d-f97cbf451761"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Examples:\n",
" Total: 284807\n",
" Positive: 492 (0.17% of total)\n",
"\n"
]
}
],
"source": [
"neg, pos = np.bincount(raw_df['Class'])\n",
"total = neg + pos\n",
"print('Examples:\\n Total: {}\\n Positive: {} ({:.2f}% of total)\\n'.format(\n",
" total, pos, 100 * pos / total))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KnLKFQDsCBUg"
},
"source": [
"This shows the small fraction of positive samples."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6qox6ryyzwdr"
},
"source": [
"### Clean, split and normalize the data\n",
"\n",
"The raw data has a few issues. First the `Time` and `Amount` columns are too variable to use directly. Drop the `Time` column (since it's not clear what it means) and take the log of the `Amount` column to reduce its range."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ef42jTuxEjnj"
},
"outputs": [],
"source": [
"cleaned_df = raw_df.copy()\n",
"\n",
"# You don't want the `Time` column.\n",
"cleaned_df.pop('Time')\n",
"\n",
"# The `Amount` column covers a huge range. Convert to log-space.\n",
"eps = 0.001 # 0 => 0.1¢\n",
"cleaned_df['Log Amount'] = np.log(cleaned_df.pop('Amount')+eps)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uSNgdQFFFQ6u"
},
"source": [
"Split the dataset into train, validation, and test sets. The validation set is used during the model fitting to evaluate the loss and any metrics, however the model is not fit with this data. The test set is completely unused during the training phase and is only used at the end to evaluate how well the model generalizes to new data. This is especially important with imbalanced datasets where [overfitting](https://developers.google.com/machine-learning/crash-course/generalization/peril-of-overfitting) is a significant concern from the lack of training data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xfxhKg7Yr1-b"
},
"outputs": [],
"source": [
"# Use a utility from sklearn to split and shuffle your dataset.\n",
"train_df, test_df = train_test_split(cleaned_df, test_size=0.2, random_state=42)\n",
"train_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42)\n",
"\n",
"# Form np arrays of labels and features.\n",
"train_labels = np.array(train_df.pop('Class'))\n",
"bool_train_labels = train_labels != 0\n",
"val_labels = np.array(val_df.pop('Class'))\n",
"test_labels = np.array(test_df.pop('Class'))\n",
"\n",
"train_features = np.array(train_df)\n",
"val_features = np.array(val_df)\n",
"test_features = np.array(test_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8a_Z_kBmr7Oh"
},
"source": [
"Normalize the input features using the sklearn StandardScaler.\n",
"This will set the mean to 0 and standard deviation to 1.\n",
"\n",
"Note: The `StandardScaler` is only fit using the `train_features` to be sure the model is not peeking at the validation or test sets. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IO-qEUmJ5JQg",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "363e5b2c-d1d3-4ac8-9652-699ab61099eb"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Training labels shape: (182276,)\n",
"Validation labels shape: (45569,)\n",
"Test labels shape: (56962,)\n",
"Training features shape: (182276, 29)\n",
"Validation features shape: (45569, 29)\n",
"Test features shape: (56962, 29)\n"
]
}
],
"source": [
"scaler = StandardScaler()\n",
"train_features = scaler.fit_transform(train_features)\n",
"\n",
"val_features = scaler.transform(val_features)\n",
"test_features = scaler.transform(test_features)\n",
"\n",
"train_features = np.clip(train_features, -5, 5)\n",
"val_features = np.clip(val_features, -5, 5)\n",
"test_features = np.clip(test_features, -5, 5)\n",
"\n",
"\n",
"print('Training labels shape:', train_labels.shape)\n",
"print('Validation labels shape:', val_labels.shape)\n",
"print('Test labels shape:', test_labels.shape)\n",
"\n",
"print('Training features shape:', train_features.shape)\n",
"print('Validation features shape:', val_features.shape)\n",
"print('Test features shape:', test_features.shape)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XF2nNfWKJ33w"
},
"source": [
"Caution: If you want to deploy a model, it's critical that you preserve the preprocessing calculations. The easiest way to implement them as layers, and attach them to your model before export.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uQ7m9nqDC3W6"
},
"source": [
"### Look at the data distribution\n",
"\n",
"Next compare the distributions of the positive and negative examples over a few features. Good questions to ask yourself at this point are:\n",
"\n",
"* Do these distributions make sense? \n",
" * Yes. You've normalized the input and these are mostly concentrated in the `+/- 2` range.\n",
"* Can you see the difference between the distributions?\n",
" * Yes the positive examples contain a much higher rate of extreme values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "raK7hyjd_vf6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 869
},
"outputId": "dcd3f87c-0e01-401b-b8e3-a34823118444"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 432x432 with 3 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 432x432 with 3 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"pos_df = pd.DataFrame(train_features[ bool_train_labels], columns=train_df.columns)\n",
"neg_df = pd.DataFrame(train_features[~bool_train_labels], columns=train_df.columns)\n",
"\n",
"sns.jointplot(x=pos_df['V5'], y=pos_df['V6'],\n",
" kind='hex', xlim=(-5,5), ylim=(-5,5))\n",
"plt.suptitle(\"Positive distribution\")\n",
"\n",
"sns.jointplot(x=neg_df['V5'], y=neg_df['V6'],\n",
" kind='hex', xlim=(-5,5), ylim=(-5,5))\n",
"_ = plt.suptitle(\"Negative distribution\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qFK1u4JX16D8"
},
"source": [
"## Define the model and metrics\n",
"\n",
"Define a function that creates a simple neural network with a densly connected hidden layer, a [dropout](https://developers.google.com/machine-learning/glossary/#dropout_regularization) layer to reduce overfitting, and an output sigmoid layer that returns the probability of a transaction being fraudulent: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3JQDzUqT3UYG"
},
"outputs": [],
"source": [
"METRICS = [\n",
" keras.metrics.TruePositives(name='tp'),\n",
" keras.metrics.FalsePositives(name='fp'),\n",
" keras.metrics.TrueNegatives(name='tn'),\n",
" keras.metrics.FalseNegatives(name='fn'), \n",
" keras.metrics.BinaryAccuracy(name='accuracy'),\n",
" keras.metrics.Precision(name='precision'),\n",
" keras.metrics.Recall(name='recall'),\n",
" keras.metrics.AUC(name='auc'),\n",
" keras.metrics.AUC(name='prc', curve='PR'), # precision-recall curve\n",
"]\n",
"\n",
"def make_model(metrics=METRICS, output_bias=None):\n",
" if output_bias is not None:\n",
" output_bias = tf.keras.initializers.Constant(output_bias)\n",
" model = keras.Sequential([\n",
" keras.layers.Dense(\n",
" 16, activation='relu',\n",
" input_shape=(train_features.shape[-1],)),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(1, activation='sigmoid',\n",
" bias_initializer=output_bias),\n",
" ])\n",
"\n",
" model.compile(\n",
" optimizer=keras.optimizers.Adam(learning_rate=1e-3),\n",
" loss=keras.losses.BinaryCrossentropy(),\n",
" metrics=metrics)\n",
"\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SU0GX6E6mieP"
},
"source": [
"### Understanding useful metrics\n",
"\n",
"Notice that there are a few metrics defined above that can be computed by the model that will be helpful when evaluating the performance.\n",
"\n",
"\n",
"\n",
"* **False** negatives and **false** positives are samples that were **incorrectly** classified\n",
"* **True** negatives and **true** positives are samples that were **correctly** classified\n",
"* **Accuracy** is the percentage of examples correctly classified\n",
"> $\\frac{\\text{true samples}}{\\text{total samples}}$\n",
"* **Precision** is the percentage of **predicted** positives that were correctly classified\n",
"> $\\frac{\\text{true positives}}{\\text{true positives + false positives}}$\n",
"* **Recall** is the percentage of **actual** positives that were correctly classified\n",
"> $\\frac{\\text{true positives}}{\\text{true positives + false negatives}}$\n",
"* **AUC** refers to the Area Under the Curve of a Receiver Operating Characteristic curve (ROC-AUC). This metric is equal to the probability that a classifier will rank a random positive sample higher than a random negative sample.\n",
"* **AUPRC** refers to Area Under the Curve of the Precision-Recall Curve. This metric computes precision-recall pairs for different probability thresholds. \n",
"\n",
"Note: Accuracy is not a helpful metric for this task. You can have 99.8%+ accuracy on this task by predicting False all the time. \n",
"\n",
"Read more:\n",
"* [True vs. False and Positive vs. Negative](https://developers.google.com/machine-learning/crash-course/classification/true-false-positive-negative)\n",
"* [Accuracy](https://developers.google.com/machine-learning/crash-course/classification/accuracy)\n",
"* [Precision and Recall](https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall)\n",
"* [ROC-AUC](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc)\n",
"* [Relationship between Precision-Recall and ROC Curves](https://www.biostat.wisc.edu/~page/rocpr.pdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FYdhSAoaF_TK"
},
"source": [
"## Baseline model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IDbltVPg2m2q"
},
"source": [
"### Build the model\n",
"\n",
"Now create and train your model using the function that was defined earlier. Notice that the model is fit using a larger than default batch size of 2048, this is important to ensure that each batch has a decent chance of containing a few positive samples. If the batch size was too small, they would likely have no fraudulent transactions to learn from.\n",
"\n",
"\n",
"Note: this model will not handle the class imbalance well. You will improve it later in this tutorial."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ouUkwPcGQsy3"
},
"outputs": [],
"source": [
"EPOCHS = 100\n",
"BATCH_SIZE = 2048\n",
"\n",
"early_stopping = tf.keras.callbacks.EarlyStopping(\n",
" monitor='val_prc', \n",
" verbose=1,\n",
" patience=10,\n",
" mode='max',\n",
" restore_best_weights=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1xlR_dekzw7C",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "2ef982f0-66b5-4c64-d4fb-48bb3b845a93"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"sequential\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" dense (Dense) (None, 16) 480 \n",
" \n",
" dropout (Dropout) (None, 16) 0 \n",
" \n",
" dense_1 (Dense) (None, 1) 17 \n",
" \n",
"=================================================================\n",
"Total params: 497\n",
"Trainable params: 497\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
],
"source": [
"model = make_model()\n",
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Wx7ND3_SqckO"
},
"source": [
"Test run the model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LopSd-yQqO3a",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "5b6e739f-2210-4e91-b7f2-802758015a22"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1/1 [==============================] - 1s 540ms/step\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[0.66212314],\n",
" [0.9795485 ],\n",
" [0.9611054 ],\n",
" [0.5962904 ],\n",
" [0.8575357 ],\n",
" [0.82157207],\n",
" [0.85708076],\n",
" [0.85908675],\n",
" [0.7135863 ],\n",
" [0.92740554]], dtype=float32)"
]
},
"metadata": {},
"execution_count": 15
}
],
"source": [
"model.predict(train_features[:10])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YKIgWqHms_03"
},
"source": [
"### Optional: Set the correct initial bias."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qk_3Ry6EoYDq"
},
"source": [
"These initial guesses are not great. You know the dataset is imbalanced. Set the output layer's bias to reflect that (See: [A Recipe for Training Neural Networks: \"init well\"](http://karpathy.github.io/2019/04/25/recipe/#2-set-up-the-end-to-end-trainingevaluation-skeleton--get-dumb-baselines)). This can help with initial convergence."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PdbfWDuVpo6k"
},
"source": [
"With the default bias initialization the loss should be about `math.log(2) = 0.69314` "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "H-oPqh3SoGXk",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b5096542-355a-4392-e646-82f53fe9e8dd"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Loss: 1.8883\n"
]
}
],
"source": [
"results = model.evaluate(train_features, train_labels, batch_size=BATCH_SIZE, verbose=0)\n",
"print(\"Loss: {:0.4f}\".format(results[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hE-JRzfKqfhB"
},
"source": [
"The correct bias to set can be derived from:\n",
"\n",
"$$ p_0 = pos/(pos + neg) = 1/(1+e^{-b_0}) $$\n",
"$$ b_0 = -log_e(1/p_0 - 1) $$\n",
"$$ b_0 = log_e(pos/neg)$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "F5KWPSjjstUS",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b15a204a-c3b3-496f-bc95-f35bbf6c356a"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([-6.35935934])"
]
},
"metadata": {},
"execution_count": 17
}
],
"source": [
"initial_bias = np.log([pos/neg])\n",
"initial_bias"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d1juXI9yY1KD"
},
"source": [
"Set that as the initial bias, and the model will give much more reasonable initial guesses. \n",
"\n",
"It should be near: `pos/total = 0.0018`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "50oyu1uss0i-",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c29ba2a0-363a-4c41-d60b-e5844f3998d3"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1/1 [==============================] - 0s 70ms/step\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[0.00171803],\n",
" [0.00110678],\n",
" [0.00230583],\n",
" [0.00033414],\n",
" [0.00359259],\n",
" [0.00960666],\n",
" [0.00155863],\n",
" [0.00273352],\n",
" [0.00070974],\n",
" [0.00354959]], dtype=float32)"
]
},
"metadata": {},
"execution_count": 18
}
],
"source": [
"model = make_model(output_bias=initial_bias)\n",
"model.predict(train_features[:10])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4xqFYb2KqRHQ"
},
"source": [
"With this initialization the initial loss should be approximately:\n",
"\n",
"$$-p_0log(p_0)-(1-p_0)log(1-p_0) = 0.01317$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xVDqCWXDqHSc",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f9aa009a-9bc5-4e34-bd4d-17c8a17a46c8"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Loss: 0.0175\n"
]
}
],
"source": [
"results = model.evaluate(train_features, train_labels, batch_size=BATCH_SIZE, verbose=0)\n",
"print(\"Loss: {:0.4f}\".format(results[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FrDC8hvNr9yw"
},
"source": [
"This initial loss is about 50 times less than if would have been with naive initialization.\n",
"\n",
"This way the model doesn't need to spend the first few epochs just learning that positive examples are unlikely. This also makes it easier to read plots of the loss during training."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0EJj9ixKVBMT"
},
"source": [
"### Checkpoint the initial weights\n",
"\n",
"To make the various training runs more comparable, keep this initial model's weights in a checkpoint file, and load them into each model before training:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_tSUm4yAVIif"
},
"outputs": [],
"source": [
"initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights')\n",
"model.save_weights(initial_weights)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EVXiLyqyZ8AX"
},
"source": [
"### Confirm that the bias fix helps\n",
"\n",
"Before moving on, confirm quick that the careful bias initialization actually helped.\n",
"\n",
"Train the model for 20 epochs, with and without this careful initialization, and compare the losses: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Dm4-4K5RZ63Q"
},
"outputs": [],
"source": [
"model = make_model()\n",
"model.load_weights(initial_weights)\n",
"model.layers[-1].bias.assign([0.0])\n",
"zero_bias_history = model.fit(\n",
" train_features,\n",
" train_labels,\n",
" batch_size=BATCH_SIZE,\n",
" epochs=20,\n",
" validation_data=(val_features, val_labels), \n",
" verbose=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "j8DsLXHQaSql"
},
"outputs": [],
"source": [
"model = make_model()\n",
"model.load_weights(initial_weights)\n",
"careful_bias_history = model.fit(\n",
" train_features,\n",
" train_labels,\n",
" batch_size=BATCH_SIZE,\n",
" epochs=20,\n",
" validation_data=(val_features, val_labels), \n",
" verbose=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "E3XsMBjhauFV"
},
"outputs": [],
"source": [
"def plot_loss(history, label, n):\n",
" # Use a log scale on y-axis to show the wide range of values.\n",
" plt.semilogy(history.epoch, history.history['loss'],\n",
" color=colors[n], label='Train ' + label)\n",
" plt.semilogy(history.epoch, history.history['val_loss'],\n",
" color=colors[n], label='Val ' + label,\n",
" linestyle=\"--\")\n",
" plt.xlabel('Epoch')\n",
" plt.ylabel('Loss')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dxFaskm7beC7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 609
},
"outputId": "23f0e39e-3634-438f-f5d3-1e63c5156add"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_loss(zero_bias_history, \"Zero Bias\", 0)\n",
"plot_loss(careful_bias_history, \"Careful Bias\", 1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fKMioV0ddG3R"
},
"source": [
"The above figure makes it clear: In terms of validation loss, on this problem, this careful initialization gives a clear advantage. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RsA_7SEntRaV"
},
"source": [
"### Train the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yZKAc8NCDnoR",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "d8b9bb6f-eed2-46a4-cc84-52ddb82bf917"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/100\n",
"90/90 [==============================] - 3s 12ms/step - loss: 0.0153 - tp: 90.0000 - fp: 149.0000 - tn: 227302.0000 - fn: 304.0000 - accuracy: 0.9980 - precision: 0.3766 - recall: 0.2284 - auc: 0.7630 - prc: 0.1514 - val_loss: 0.0059 - val_tp: 13.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 51.0000 - val_accuracy: 0.9988 - val_precision: 0.7222 - val_recall: 0.2031 - val_auc: 0.9345 - val_prc: 0.6460\n",
"Epoch 2/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0081 - tp: 125.0000 - fp: 42.0000 - tn: 181904.0000 - fn: 205.0000 - accuracy: 0.9986 - precision: 0.7485 - recall: 0.3788 - auc: 0.8666 - prc: 0.4829 - val_loss: 0.0043 - val_tp: 30.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 34.0000 - val_accuracy: 0.9991 - val_precision: 0.8571 - val_recall: 0.4688 - val_auc: 0.9365 - val_prc: 0.7418\n",
"Epoch 3/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0067 - tp: 160.0000 - fp: 36.0000 - tn: 181910.0000 - fn: 170.0000 - accuracy: 0.9989 - precision: 0.8163 - recall: 0.4848 - auc: 0.9196 - prc: 0.5668 - val_loss: 0.0038 - val_tp: 35.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 29.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.5469 - val_auc: 0.9369 - val_prc: 0.7499\n",
"Epoch 4/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0061 - tp: 176.0000 - fp: 38.0000 - tn: 181908.0000 - fn: 154.0000 - accuracy: 0.9989 - precision: 0.8224 - recall: 0.5333 - auc: 0.9100 - prc: 0.6103 - val_loss: 0.0035 - val_tp: 42.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 22.0000 - val_accuracy: 0.9994 - val_precision: 0.8936 - val_recall: 0.6562 - val_auc: 0.9292 - val_prc: 0.7644\n",
"Epoch 5/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0057 - tp: 181.0000 - fp: 29.0000 - tn: 181917.0000 - fn: 149.0000 - accuracy: 0.9990 - precision: 0.8619 - recall: 0.5485 - auc: 0.9186 - prc: 0.6417 - val_loss: 0.0032 - val_tp: 47.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 17.0000 - val_accuracy: 0.9995 - val_precision: 0.9038 - val_recall: 0.7344 - val_auc: 0.9293 - val_prc: 0.7797\n",
"Epoch 6/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0054 - tp: 185.0000 - fp: 29.0000 - tn: 181917.0000 - fn: 145.0000 - accuracy: 0.9990 - precision: 0.8645 - recall: 0.5606 - auc: 0.9162 - prc: 0.6716 - val_loss: 0.0030 - val_tp: 48.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 16.0000 - val_accuracy: 0.9995 - val_precision: 0.9057 - val_recall: 0.7500 - val_auc: 0.9293 - val_prc: 0.7863\n",
"Epoch 7/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0049 - tp: 193.0000 - fp: 29.0000 - tn: 181917.0000 - fn: 137.0000 - accuracy: 0.9991 - precision: 0.8694 - recall: 0.5848 - auc: 0.9227 - prc: 0.6994 - val_loss: 0.0029 - val_tp: 48.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 16.0000 - val_accuracy: 0.9995 - val_precision: 0.9057 - val_recall: 0.7500 - val_auc: 0.9293 - val_prc: 0.7921\n",
"Epoch 8/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0049 - tp: 195.0000 - fp: 30.0000 - tn: 181916.0000 - fn: 135.0000 - accuracy: 0.9991 - precision: 0.8667 - recall: 0.5909 - auc: 0.9227 - prc: 0.6841 - val_loss: 0.0027 - val_tp: 49.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 15.0000 - val_accuracy: 0.9996 - val_precision: 0.9074 - val_recall: 0.7656 - val_auc: 0.9293 - val_prc: 0.8046\n",
"Epoch 9/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0047 - tp: 195.0000 - fp: 27.0000 - tn: 181919.0000 - fn: 135.0000 - accuracy: 0.9991 - precision: 0.8784 - recall: 0.5909 - auc: 0.9274 - prc: 0.7149 - val_loss: 0.0027 - val_tp: 49.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 15.0000 - val_accuracy: 0.9996 - val_precision: 0.9074 - val_recall: 0.7656 - val_auc: 0.9293 - val_prc: 0.8070\n",
"Epoch 10/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0049 - tp: 190.0000 - fp: 33.0000 - tn: 181913.0000 - fn: 140.0000 - accuracy: 0.9991 - precision: 0.8520 - recall: 0.5758 - auc: 0.9154 - prc: 0.6790 - val_loss: 0.0026 - val_tp: 49.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 15.0000 - val_accuracy: 0.9996 - val_precision: 0.9074 - val_recall: 0.7656 - val_auc: 0.9293 - val_prc: 0.8052\n",
"Epoch 11/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0046 - tp: 199.0000 - fp: 26.0000 - tn: 181920.0000 - fn: 131.0000 - accuracy: 0.9991 - precision: 0.8844 - recall: 0.6030 - auc: 0.9215 - prc: 0.7030 - val_loss: 0.0025 - val_tp: 50.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 14.0000 - val_accuracy: 0.9996 - val_precision: 0.9091 - val_recall: 0.7812 - val_auc: 0.9293 - val_prc: 0.8134\n",
"Epoch 12/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0042 - tp: 200.0000 - fp: 39.0000 - tn: 181907.0000 - fn: 130.0000 - accuracy: 0.9991 - precision: 0.8368 - recall: 0.6061 - auc: 0.9277 - prc: 0.7415 - val_loss: 0.0025 - val_tp: 51.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9107 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8209\n",
"Epoch 13/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0045 - tp: 193.0000 - fp: 33.0000 - tn: 181913.0000 - fn: 137.0000 - accuracy: 0.9991 - precision: 0.8540 - recall: 0.5848 - auc: 0.9278 - prc: 0.7027 - val_loss: 0.0025 - val_tp: 51.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9107 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8114\n",
"Epoch 14/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0043 - tp: 202.0000 - fp: 28.0000 - tn: 181918.0000 - fn: 128.0000 - accuracy: 0.9991 - precision: 0.8783 - recall: 0.6121 - auc: 0.9308 - prc: 0.7246 - val_loss: 0.0024 - val_tp: 51.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9107 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8245\n",
"Epoch 15/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0042 - tp: 199.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 131.0000 - accuracy: 0.9991 - precision: 0.8541 - recall: 0.6030 - auc: 0.9309 - prc: 0.7312 - val_loss: 0.0023 - val_tp: 51.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9273 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8310\n",
"Epoch 16/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0043 - tp: 199.0000 - fp: 27.0000 - tn: 181919.0000 - fn: 131.0000 - accuracy: 0.9991 - precision: 0.8805 - recall: 0.6030 - auc: 0.9247 - prc: 0.7244 - val_loss: 0.0023 - val_tp: 52.0000 - val_fp: 6.0000 - val_tn: 45499.0000 - val_fn: 12.0000 - val_accuracy: 0.9996 - val_precision: 0.8966 - val_recall: 0.8125 - val_auc: 0.9294 - val_prc: 0.8290\n",
"Epoch 17/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0040 - tp: 217.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 113.0000 - accuracy: 0.9992 - precision: 0.8645 - recall: 0.6576 - auc: 0.9295 - prc: 0.7406 - val_loss: 0.0023 - val_tp: 51.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9273 - val_recall: 0.7969 - val_auc: 0.9295 - val_prc: 0.8306\n",
"Epoch 18/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0043 - tp: 207.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 123.0000 - accuracy: 0.9991 - precision: 0.8589 - recall: 0.6273 - auc: 0.9310 - prc: 0.7250 - val_loss: 0.0023 - val_tp: 51.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9273 - val_recall: 0.7969 - val_auc: 0.9295 - val_prc: 0.8343\n",
"Epoch 19/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0041 - tp: 220.0000 - fp: 37.0000 - tn: 181909.0000 - fn: 110.0000 - accuracy: 0.9992 - precision: 0.8560 - recall: 0.6667 - auc: 0.9309 - prc: 0.7281 - val_loss: 0.0023 - val_tp: 50.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 14.0000 - val_accuracy: 0.9996 - val_precision: 0.9615 - val_recall: 0.7812 - val_auc: 0.9295 - val_prc: 0.8387\n",
"Epoch 20/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0042 - tp: 205.0000 - fp: 35.0000 - tn: 181911.0000 - fn: 125.0000 - accuracy: 0.9991 - precision: 0.8542 - recall: 0.6212 - auc: 0.9294 - prc: 0.7311 - val_loss: 0.0022 - val_tp: 51.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 13.0000 - val_accuracy: 0.9997 - val_precision: 0.9623 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8390\n",
"Epoch 21/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0041 - tp: 201.0000 - fp: 30.0000 - tn: 181916.0000 - fn: 129.0000 - accuracy: 0.9991 - precision: 0.8701 - recall: 0.6091 - auc: 0.9339 - prc: 0.7383 - val_loss: 0.0022 - val_tp: 51.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 13.0000 - val_accuracy: 0.9996 - val_precision: 0.9273 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8379\n",
"Epoch 22/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 214.0000 - fp: 33.0000 - tn: 181913.0000 - fn: 116.0000 - accuracy: 0.9992 - precision: 0.8664 - recall: 0.6485 - auc: 0.9311 - prc: 0.7588 - val_loss: 0.0022 - val_tp: 51.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 13.0000 - val_accuracy: 0.9997 - val_precision: 0.9623 - val_recall: 0.7969 - val_auc: 0.9294 - val_prc: 0.8386\n",
"Epoch 23/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0042 - tp: 193.0000 - fp: 36.0000 - tn: 181910.0000 - fn: 137.0000 - accuracy: 0.9991 - precision: 0.8428 - recall: 0.5848 - auc: 0.9310 - prc: 0.7141 - val_loss: 0.0022 - val_tp: 52.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 12.0000 - val_accuracy: 0.9996 - val_precision: 0.9286 - val_recall: 0.8125 - val_auc: 0.9294 - val_prc: 0.8380\n",
"Epoch 24/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0040 - tp: 208.0000 - fp: 36.0000 - tn: 181910.0000 - fn: 122.0000 - accuracy: 0.9991 - precision: 0.8525 - recall: 0.6303 - auc: 0.9310 - prc: 0.7407 - val_loss: 0.0022 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9295 - val_prc: 0.8387\n",
"Epoch 25/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0040 - tp: 201.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 129.0000 - accuracy: 0.9991 - precision: 0.8553 - recall: 0.6091 - auc: 0.9294 - prc: 0.7305 - val_loss: 0.0022 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9294 - val_prc: 0.8387\n",
"Epoch 26/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0042 - tp: 200.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 130.0000 - accuracy: 0.9991 - precision: 0.8547 - recall: 0.6061 - auc: 0.9234 - prc: 0.7162 - val_loss: 0.0022 - val_tp: 52.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9811 - val_recall: 0.8125 - val_auc: 0.9294 - val_prc: 0.8390\n",
"Epoch 27/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0041 - tp: 195.0000 - fp: 39.0000 - tn: 181907.0000 - fn: 135.0000 - accuracy: 0.9990 - precision: 0.8333 - recall: 0.5909 - auc: 0.9340 - prc: 0.7351 - val_loss: 0.0022 - val_tp: 52.0000 - val_fp: 3.0000 - val_tn: 45502.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9455 - val_recall: 0.8125 - val_auc: 0.9295 - val_prc: 0.8376\n",
"Epoch 28/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 216.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 114.0000 - accuracy: 0.9992 - precision: 0.8640 - recall: 0.6545 - auc: 0.9326 - prc: 0.7486 - val_loss: 0.0022 - val_tp: 52.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 12.0000 - val_accuracy: 0.9996 - val_precision: 0.9286 - val_recall: 0.8125 - val_auc: 0.9295 - val_prc: 0.8399\n",
"Epoch 29/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0040 - tp: 204.0000 - fp: 33.0000 - tn: 181913.0000 - fn: 126.0000 - accuracy: 0.9991 - precision: 0.8608 - recall: 0.6182 - auc: 0.9326 - prc: 0.7350 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9811 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8449\n",
"Epoch 30/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 205.0000 - fp: 41.0000 - tn: 181905.0000 - fn: 125.0000 - accuracy: 0.9991 - precision: 0.8333 - recall: 0.6212 - auc: 0.9402 - prc: 0.7496 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9373 - val_prc: 0.8455\n",
"Epoch 31/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 205.0000 - fp: 33.0000 - tn: 181913.0000 - fn: 125.0000 - accuracy: 0.9991 - precision: 0.8613 - recall: 0.6212 - auc: 0.9387 - prc: 0.7518 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8451\n",
"Epoch 32/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 213.0000 - fp: 35.0000 - tn: 181911.0000 - fn: 117.0000 - accuracy: 0.9992 - precision: 0.8589 - recall: 0.6455 - auc: 0.9372 - prc: 0.7557 - val_loss: 0.0021 - val_tp: 51.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 13.0000 - val_accuracy: 0.9997 - val_precision: 0.9808 - val_recall: 0.7969 - val_auc: 0.9372 - val_prc: 0.8455\n",
"Epoch 33/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.0039 - tp: 198.0000 - fp: 29.0000 - tn: 181917.0000 - fn: 132.0000 - accuracy: 0.9991 - precision: 0.8722 - recall: 0.6000 - auc: 0.9356 - prc: 0.7521 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 5.0000 - val_tn: 45500.0000 - val_fn: 12.0000 - val_accuracy: 0.9996 - val_precision: 0.9123 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8435\n",
"Epoch 34/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 212.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 118.0000 - accuracy: 0.9992 - precision: 0.8618 - recall: 0.6424 - auc: 0.9310 - prc: 0.7465 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 3.0000 - val_tn: 45502.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9455 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8450\n",
"Epoch 35/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 210.0000 - fp: 35.0000 - tn: 181911.0000 - fn: 120.0000 - accuracy: 0.9991 - precision: 0.8571 - recall: 0.6364 - auc: 0.9310 - prc: 0.7367 - val_loss: 0.0021 - val_tp: 51.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 13.0000 - val_accuracy: 0.9997 - val_precision: 0.9808 - val_recall: 0.7969 - val_auc: 0.9372 - val_prc: 0.8457\n",
"Epoch 36/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 215.0000 - fp: 32.0000 - tn: 181914.0000 - fn: 115.0000 - accuracy: 0.9992 - precision: 0.8704 - recall: 0.6515 - auc: 0.9342 - prc: 0.7546 - val_loss: 0.0021 - val_tp: 50.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 14.0000 - val_accuracy: 0.9997 - val_precision: 0.9804 - val_recall: 0.7812 - val_auc: 0.9372 - val_prc: 0.8474\n",
"Epoch 37/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 208.0000 - fp: 31.0000 - tn: 181915.0000 - fn: 122.0000 - accuracy: 0.9992 - precision: 0.8703 - recall: 0.6303 - auc: 0.9371 - prc: 0.7612 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 3.0000 - val_tn: 45502.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9455 - val_recall: 0.8125 - val_auc: 0.9373 - val_prc: 0.8455\n",
"Epoch 38/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 210.0000 - fp: 32.0000 - tn: 181914.0000 - fn: 120.0000 - accuracy: 0.9992 - precision: 0.8678 - recall: 0.6364 - auc: 0.9326 - prc: 0.7474 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 3.0000 - val_tn: 45502.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9455 - val_recall: 0.8125 - val_auc: 0.9373 - val_prc: 0.8476\n",
"Epoch 39/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0037 - tp: 217.0000 - fp: 29.0000 - tn: 181917.0000 - fn: 113.0000 - accuracy: 0.9992 - precision: 0.8821 - recall: 0.6576 - auc: 0.9357 - prc: 0.7725 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 12.0000 - val_accuracy: 0.9996 - val_precision: 0.9286 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8464\n",
"Epoch 40/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 215.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 115.0000 - accuracy: 0.9992 - precision: 0.8635 - recall: 0.6515 - auc: 0.9295 - prc: 0.7487 - val_loss: 0.0021 - val_tp: 51.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 13.0000 - val_accuracy: 0.9997 - val_precision: 0.9808 - val_recall: 0.7969 - val_auc: 0.9372 - val_prc: 0.8451\n",
"Epoch 41/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 211.0000 - fp: 31.0000 - tn: 181915.0000 - fn: 119.0000 - accuracy: 0.9992 - precision: 0.8719 - recall: 0.6394 - auc: 0.9387 - prc: 0.7512 - val_loss: 0.0021 - val_tp: 51.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 13.0000 - val_accuracy: 0.9997 - val_precision: 0.9808 - val_recall: 0.7969 - val_auc: 0.9372 - val_prc: 0.8462\n",
"Epoch 42/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 211.0000 - fp: 32.0000 - tn: 181914.0000 - fn: 119.0000 - accuracy: 0.9992 - precision: 0.8683 - recall: 0.6394 - auc: 0.9372 - prc: 0.7551 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8439\n",
"Epoch 43/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0040 - tp: 200.0000 - fp: 37.0000 - tn: 181909.0000 - fn: 130.0000 - accuracy: 0.9991 - precision: 0.8439 - recall: 0.6061 - auc: 0.9326 - prc: 0.7454 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8451\n",
"Epoch 44/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 223.0000 - fp: 30.0000 - tn: 181916.0000 - fn: 107.0000 - accuracy: 0.9992 - precision: 0.8814 - recall: 0.6758 - auc: 0.9447 - prc: 0.7534 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9294 - val_prc: 0.8402\n",
"Epoch 45/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0039 - tp: 206.0000 - fp: 29.0000 - tn: 181917.0000 - fn: 124.0000 - accuracy: 0.9992 - precision: 0.8766 - recall: 0.6242 - auc: 0.9342 - prc: 0.7476 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 4.0000 - val_tn: 45501.0000 - val_fn: 12.0000 - val_accuracy: 0.9996 - val_precision: 0.9286 - val_recall: 0.8125 - val_auc: 0.9372 - val_prc: 0.8422\n",
"Epoch 46/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0036 - tp: 227.0000 - fp: 31.0000 - tn: 181915.0000 - fn: 103.0000 - accuracy: 0.9993 - precision: 0.8798 - recall: 0.6879 - auc: 0.9357 - prc: 0.7731 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 1.0000 - val_tn: 45504.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9811 - val_recall: 0.8125 - val_auc: 0.9373 - val_prc: 0.8472\n",
"Epoch 47/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0036 - tp: 219.0000 - fp: 30.0000 - tn: 181916.0000 - fn: 111.0000 - accuracy: 0.9992 - precision: 0.8795 - recall: 0.6636 - auc: 0.9403 - prc: 0.7680 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9373 - val_prc: 0.8457\n",
"Epoch 48/100\n",
"90/90 [==============================] - ETA: 0s - loss: 0.0038 - tp: 210.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 120.0000 - accuracy: 0.9992 - precision: 0.8607 - recall: 0.6364 - auc: 0.9372 - prc: 0.7622Restoring model weights from the end of the best epoch: 38.\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.0038 - tp: 210.0000 - fp: 34.0000 - tn: 181912.0000 - fn: 120.0000 - accuracy: 0.9992 - precision: 0.8607 - recall: 0.6364 - auc: 0.9372 - prc: 0.7622 - val_loss: 0.0021 - val_tp: 52.0000 - val_fp: 2.0000 - val_tn: 45503.0000 - val_fn: 12.0000 - val_accuracy: 0.9997 - val_precision: 0.9630 - val_recall: 0.8125 - val_auc: 0.9373 - val_prc: 0.8449\n",
"Epoch 48: early stopping\n"
]
}
],
"source": [
"model = make_model()\n",
"model.load_weights(initial_weights)\n",
"baseline_history = model.fit(\n",
" train_features,\n",
" train_labels,\n",
" batch_size=BATCH_SIZE,\n",
" epochs=EPOCHS,\n",
" callbacks=[early_stopping],\n",
" validation_data=(val_features, val_labels))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iSaDBYU9xtP6"
},
"source": [
"### Check training history\n",
"\n",
"In this section, you will produce plots of your model's accuracy and loss on the training and validation set. These are useful to check for overfitting, which you can learn more about in the [Overfit and underfit](https://www.tensorflow.org/tutorials/keras/overfit_and_underfit) tutorial.\n",
"\n",
"Additionally, you can produce these plots for any of the metrics you created above. False negatives are included as an example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WTSkhT1jyGu6"
},
"outputs": [],
"source": [
"def plot_metrics(history):\n",
" metrics = ['loss', 'prc', 'precision', 'recall']\n",
" for n, metric in enumerate(metrics):\n",
" name = metric.replace(\"_\",\" \").capitalize()\n",
" plt.subplot(2,2,n+1)\n",
" plt.plot(history.epoch, history.history[metric], color=colors[0], label='Train')\n",
" plt.plot(history.epoch, history.history['val_'+metric],\n",
" color=colors[0], linestyle=\"--\", label='Val')\n",
" plt.xlabel('Epoch')\n",
" plt.ylabel(name)\n",
" if metric == 'loss':\n",
" plt.ylim([0, plt.ylim()[1]])\n",
" elif metric == 'auc':\n",
" plt.ylim([0.8,1])\n",
" else:\n",
" plt.ylim([0,1])\n",
"\n",
" plt.legend()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "u6LReDsqlZlk",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 610
},
"outputId": "467f2042-620b-432d-bd00-d0024979ea09"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 4 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_metrics(baseline_history)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UCa4iWo6WDKR"
},
"source": [
"Note: That the validation curve generally performs better than the training curve. This is mainly caused by the fact that the dropout layer is not active when evaluating the model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aJC1booryouo"
},
"source": [
"### Evaluate metrics\n",
"\n",
"You can use a [confusion matrix](https://developers.google.com/machine-learning/glossary/#confusion_matrix) to summarize the actual vs. predicted labels, where the X axis is the predicted label and the Y axis is the actual label:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "aNS796IJKrev",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c4be0499-333a-486c-a18e-7d0aef17c69b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"90/90 [==============================] - 0s 990us/step\n",
"28/28 [==============================] - 0s 928us/step\n"
]
}
],
"source": [
"train_predictions_baseline = model.predict(train_features, batch_size=BATCH_SIZE)\n",
"test_predictions_baseline = model.predict(test_features, batch_size=BATCH_SIZE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MVWBGfADwbWI"
},
"outputs": [],
"source": [
"def plot_cm(labels, predictions, p=0.5):\n",
" cm = confusion_matrix(labels, predictions > p)\n",
" plt.figure(figsize=(5,5))\n",
" sns.heatmap(cm, annot=True, fmt=\"d\")\n",
" plt.title('Confusion matrix @{:.2f}'.format(p))\n",
" plt.ylabel('Actual label')\n",
" plt.xlabel('Predicted label')\n",
"\n",
" print('Legitimate Transactions Detected (True Negatives): ', cm[0][0])\n",
" print('Legitimate Transactions Incorrectly Detected (False Positives): ', cm[0][1])\n",
" print('Fraudulent Transactions Missed (False Negatives): ', cm[1][0])\n",
" print('Fraudulent Transactions Detected (True Positives): ', cm[1][1])\n",
" print('Total Fraudulent Transactions: ', np.sum(cm[1]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nOTjD5Z5Wp1U"
},
"source": [
"Evaluate your model on the test dataset and display the results for the metrics you created above:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "poh_hZngt2_9",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 636
},
"outputId": "0b5c7a16-2007-4ae0-bbe0-3af75b91d592"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"loss : 0.0030375574715435505\n",
"tp : 75.0\n",
"fp : 8.0\n",
"tn : 56856.0\n",
"fn : 23.0\n",
"accuracy : 0.9994557499885559\n",
"precision : 0.9036144614219666\n",
"recall : 0.7653061151504517\n",
"auc : 0.9385272860527039\n",
"prc : 0.8193020820617676\n",
"\n",
"Legitimate Transactions Detected (True Negatives): 56856\n",
"Legitimate Transactions Incorrectly Detected (False Positives): 8\n",
"Fraudulent Transactions Missed (False Negatives): 23\n",
"Fraudulent Transactions Detected (True Positives): 75\n",
"Total Fraudulent Transactions: 98\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 360x360 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"baseline_results = model.evaluate(test_features, test_labels,\n",
" batch_size=BATCH_SIZE, verbose=0)\n",
"for name, value in zip(model.metrics_names, baseline_results):\n",
" print(name, ': ', value)\n",
"print()\n",
"\n",
"plot_cm(test_labels, test_predictions_baseline)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PyZtSr1v6L4t"
},
"source": [
"If the model had predicted everything perfectly, this would be a [diagonal matrix](https://en.wikipedia.org/wiki/Diagonal_matrix) where values off the main diagonal, indicating incorrect predictions, would be zero. In this case the matrix shows that you have relatively few false positives, meaning that there were relatively few legitimate transactions that were incorrectly flagged. However, you would likely want to have even fewer false negatives despite the cost of increasing the number of false positives. This trade off may be preferable because false negatives would allow fraudulent transactions to go through, whereas false positives may cause an email to be sent to a customer to ask them to verify their card activity."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P-QpQsip_F2Q"
},
"source": [
"### Plot the ROC\n",
"\n",
"Now plot the [ROC](https://developers.google.com/machine-learning/glossary#ROC). This plot is useful because it shows, at a glance, the range of performance the model can reach just by tuning the output threshold."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lhaxsLSvANF9"
},
"outputs": [],
"source": [
"def plot_roc(name, labels, predictions, **kwargs):\n",
" fp, tp, _ = sklearn.metrics.roc_curve(labels, predictions)\n",
"\n",
" plt.plot(100*fp, 100*tp, label=name, linewidth=2, **kwargs)\n",
" plt.xlabel('False positives [%]')\n",
" plt.ylabel('True positives [%]')\n",
" plt.xlim([-0.5,20])\n",
" plt.ylim([80,100.5])\n",
" plt.grid(True)\n",
" ax = plt.gca()\n",
" ax.set_aspect('equal')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DfHHspttKJE0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 606
},
"outputId": "3688edc9-8f83-4390-fd45-673ba4e9c138"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_roc(\"Train Baseline\", train_labels, train_predictions_baseline, color=colors[0])\n",
"plot_roc(\"Test Baseline\", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')\n",
"plt.legend(loc='lower right');"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y5twGRLfNwmO"
},
"source": [
"### Plot the AUPRC\n",
"\n",
"Now plot the [AUPRC](https://developers.google.com/machine-learning/glossary?hl=en#PR_AUC). Area under the interpolated precision-recall curve, obtained by plotting (recall, precision) points for different values of the classification threshold. Depending on how it's calculated, PR AUC may be equivalent to the average precision of the model.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XV6JSlFGEqGI"
},
"outputs": [],
"source": [
"def plot_prc(name, labels, predictions, **kwargs):\n",
" precision, recall, _ = sklearn.metrics.precision_recall_curve(labels, predictions)\n",
"\n",
" plt.plot(precision, recall, label=name, linewidth=2, **kwargs)\n",
" plt.xlabel('Precision')\n",
" plt.ylabel('Recall')\n",
" plt.grid(True)\n",
" ax = plt.gca()\n",
" ax.set_aspect('equal')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FdQs_PcqEsiL",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 606
},
"outputId": "d766dc42-54f8-4e49-954d-519d122216ed"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAJNCAYAAAARaCA+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3xb1f3/8feRbHlvO3YSJ3ESZ4eEkJBBGA4QSOFbKC27LYUWaEtL6YBvW/ptS9cXfv3SDR200EmhQKHQsgmYmUACScgge9gZ3nvKku7vD8myZcuJYluRbL2ej4cfte49ujrKfVC/H+ec+znGsiwBAADg+Ngi3QEAAICRiBAFAAAwCIQoAACAQSBEAQAADAIhCgAAYBAIUQAAAIMQF+kOHK/c3FyrqKgobNdvbW1VSkpK2K6P0HEvogP3IXpwL6IH9yJ6nIh78e6779ZYlpXX9/iIC1FFRUVav3592K5fWlqqkpKSsF0foeNeRAfuQ/TgXkQP7kX0OBH3whhzINhxpvMAAAAGgRAFAAAwCIQoAACAQSBEAQAADAIhCgAAYBAIUQAAAINAiAIAABgEQhQAAMAgEKIAAAAGgRAFAAAwCIQoAACAQSBEAQAADAIhCgAAYBAIUQAAAINAiAIAABgEQhQAAMAgEKIAAAAGgRAFAAAwCIQoAACAQQhbiDLGPGCMqTLGbBngvDHG/NIYs9sY874x5pRw9QUAAGC4hXMk6k+SVh3l/IckTfP93CjpN2HsCwAAwLAKW4iyLOs1SXVHaXKxpL9YXmslZRpjxoarP6Fq7bJU1dyhTpc70l0BAABRLJJrosZLKu/1+qDvWMR0uty6d2OHFv9otV7fWRPJrgAAgCgXF+kOhMIYc6O8U37Kz89XaWlpWD5nW61b22o9kqTX1r+vuKr4sHwOQtPS0hK2e43QcR+iB/cienAvokck70UkQ9QhSRN6vS70HevHsqz7JN0nSYsWLbJKSkrC0qH43TXSurclSV2p+SopmReWz0FoSktLFa57jdBxH6IH9yJ6cC+iRyTvRSSn856SdI3vKb2lkhotyzoSwf4EeOidcrncnkh3AwAARKmwjUQZYx6SVCIp1xhzUNJ3JcVLkmVZv5X0jKQLJO2W1CbpunD1JVSWFfj6QF2bpualRqYzAAAgqoUtRFmWddUxzluSvhCuzx8MS4Ep6sG1Zfr2f82SMSZCPQIAANGKiuW9pCV6F5InxdslSQ+8uU+3Pvp+JLsEAACiFCGql8wkb4hKT+oZoPvnewd19/M71O6kbhQAAOhBiOplbGaifrA8SQ9ev1RvfeNs//F7XtmtF7ZVRLBnAAAg2hCiekmIs2tCmk3FY1I1LjNJf79hif/cLQ9v1KGG9gj2DgAARBNC1FGcNjVX917dsy/y5oONEewNAACIJoSoPr6/pl0r7i5VQ5tTknThvLEqzEqSJG0or49k1wAAQBQhRPVR2ebRvppWeXpVO5iRnyZJ+t2re/WTF3ZEqGcAACCaEKIG0Lsy1CeWTvL//quXd+vu53eoo4un9QAAiGWEqBCsmDlGj3x2mf/1Pa/s1uPvHVJNS2cEewUAACKJEDWABT94Ud/791b/60WTsvTHa0/VzALv1N7tT2zWoh++pIffKYtUFwEAQAQRovqYlW33/+5y9yyMstmMVswco8+eNUUTspOUlugtyPmNxzerqqnjhPcTAABEVtj2zhupvnBygpadfqYkyRZkz7xLFhTqkgWFenR9uW57zLslzK9e3q3bL5ilJIe9X3sAADA6MRLVhzFGifF2Oew27apq1vaKpqDtLj55vKbkpkiS/rr2gJbdtVrNHV0nsqsAACCCCFEDcLo9uvCXb+iie94Met4RZ9OPL52n4jGpstuMGtq69JvSPSe4lwAAIFIIUQOwfMuhnC6PFnz/Bf/P5//2rizfyUVF2Xrpq2dpzrh0SdKvS/fI5fZEqssAAOAEIkQNICHO5n8Sr76ty//z7JYKdXQFBqVff7xnaxgTZB0VAAAYfVhYPgCbzejpL52hxvaedU6HG9qVlhinhLjA7Dk+M8n/+22PbtLUMam6qWQqgQoAgFGMEHUUdptRdorD/7r3731lJseroa1Lj284JEk6aXyGJuemqDAriTAFAMAoRIgaBsYYPfrZZdp0sFF3PfuBalqcuuaBdyRJHz1lvH56+ckR7iEAABhuhKgQOV0efeHv7+nFbZUan5kkm0360tnTdNmiCZKkaflpmpafpqrmDv397TJ1ujyqbu7U4+8dknyL1B1xNl1/xhQVj0mN4DcBAADDgRAVojib0Z7qFknSoYZ2SdKTGw/7Q1S3m0qKdVNJsXZXNevcn74mSf4pPkk60tih7188R5KUn56oxHgKdAIAMBIRokJksxk9d8uZqmru0Ft7avXfj72vePvAa52Kx6Tp4RuX6lC9N3C9tqtaT248rFd3Vuus/yuVJE3ITlLprStkt7FmCgCAkYYQdRwccTYVZiUrPdFbxTzefvQKEUun5Ph/n1eYoT3VLWpqd8mSpfK6dpXXtetLD22QI86m04tz9bGFhWHtPwAAGD6EqEEwRkpLiFNKQuj/fNPy0/Sfm8+QJHk8lhb84EU1tnfp6c1HJElPbDik8+cWKPU4rgkAACKHv9iDcP6cAp3/vYJBv99mM/rn55fp/YONanW69e1/bZEkLfvf1XrjG2crIyl+uLoKAADChBAVIcVj0lQ8Jk2WZWndvjo9temwmjtduub+t1Xk29hYklIS4vTFFcUa16ugJwAAiDxC1CA9ur5c7V1uXXzy+CGNHBlj9MurFmhnZbO2VzRr08FGbTrYGNBmTFqCvnzu9KF2GQAADCNC1CD96uXdKqtr01nT84Zl+u3+a0/Vun11Acf++d5Bvb6rRj9/aZduOWcalc8BAIgihKhB6nJ7NyE+1hN6oRqfmaTxC8YHHMtIjtfru2okSV96eKN6R6iSGXn66Ck8zQcAQKQQogapO0Rd+pu3lJuWoFkF6brrYycN62jR0sk5SoizqdPl0b83HQ4499Smw5o/IVOSlJ4Yr7y0hGH7XAAAcGyEqEEan5mkmhanDjd26HBjhw7UtukrK6erICNx2D4jyWHXk19crh0Vzf5jda1Ofe/f2yRJ5/zkVUmSzUgP3bBUS3rVpQIAAOFFiBqkh29cpp2VPeFm7viMsFQen1mQrpkF6f7XHo+lt/fW+T+7oqlDbU63rrhvrS6aP06XnDJeK2aMGfZ+AACAQISoQUpy2P3TaSeSzWb0208u9L/+6iMbvZscyzvFt/qDSj35xdP95/PSEqg7BQBAGBCiRrg7Lpqjc2bma+vhRv26dI9anW6d+9NX/eeTHXa99Y2zlZnsiGAvAQAYfQhRQ+TxWFpy52pVN3dq8eRs9Z3Q++6H52j2OO903J/f2q9nfNu89DU+M0k/veJk/+trHnhHnV3uoG2vWVakC+eNleRdVH7hvLE6vThXG8sbVNHU4W+3v6ZVbU63vvbIJv3yqgXHtU0NAAA4Ov6qDoPu4PROnzpPktTc0eX//UBtm94O0kaSpuSlBLxet69O7QOEqJWz8/sdy0iO199vWBpwbMXdpdpX06rV26v00DtlunLxRPbmAwBgmPAXdYhsNqNnbzlDOytbgp6fObZnUfg1yyYFDUCSd41Vb3+67lR5rOCfOSknOaS+/e36JVp+18uSpB8+/YF+XbpHa755thLi7Md4JwAAOBZC1DDISU3QstRj12kqyk0J2BfvaI5WruC3r+7Ri9sqdcMZk7Vq7tgB243PTNKPLpmrP765X7urWlTX6tQNf3lXab1Go86bk6+LTx4/4DUAAEBwhKgRaGdFs949UK8rT51wzLYfXzJJV506USd//wU1dbj02s7qgPNPbz6i2WPTZYzRlNwU2cJQpgEAgNGIEDUC1bQ6JUm3Pfa+f6H6qrkFuuLUiZKkD4406cfPbQ94z+yx6WrudEmSrlo8QU0dLv34uR2SpJU/e02SNC4zUYuLsvXNC2YpP334ioYCADAaEaJGoJRe66de2eEdWSoek+o/Vt/m9B8PZnlxnsZmJGrdvjqt21+vFl+4OtzQoX9tPKw91S2akD3wuiubMfrk0klUSAcAxDRC1Ah092XzdfmpE2RZPSvPJ2T1hJ6ZBel64NpFA74/Pz1BifF2/fG6xdpyqFHPb63Qr17e7T+/+VCTNh9qOmofdlY264WvnDWEbwEAwMhGiBqBUhLijrq1S3aKQ2fPDP4UYF9zx2do9th0LZuao6/8Y6Mqmzp1+wWzND4zKWj7v79zQG/urtXOyhb9+a39+tRpRYP5CgAAjHiEKMhmMzptaq4+dVqR9lS16uolA9eTmjU2TWf7Nj7+7lNbtXZvbcifMyYtQbetmkmtKgDAqMBfM/jdVFIsSfrJCzu0t7o1aJslU7J189nF/um/Z7dUHNdnTMhO1unTcgOOJcbZQy79AABAtCBEoZ+39tTq3QP1Qc8lxNt050dP0skTMtXp8oR8zZsefE+St+hnMN+/eI6uWVZ03H0FACBSCFHo56srp6u+zRn03PjMJCXE2XXOrNDWXHX71gWz9Ni7B/sd31HZLEn6zpNb9dbuwKnB6poOPVz+br/3JDvsuvmcaZrM6BUAIIIIUehneXHusRsdpxvOnKIbzpzS7/gP/7NNf3hjnyTpua1BpgYrg08XdrjcuvnsacpJcWgMNa0AABFAiELIfvriTr1/sEE3lRRr8eTsYbnmrefP0NIpOepy958a3LJ1q+bOmRNw7OaHNsjlsfTM5go9s7lCdpvRk19YrrnjM4alPwAAhIoQhZCt21enNXtr1dHlVmGvulQXnzxOZ0zL87bZX6fH3zuk65YXaXp+2jGvmRhv17kDbcpcu0MlJwXuDXivzeiXq3fJ5ba0o7JZbo+lT9z/thYXDU+oC6fUhDjdcu40TcphGhIARgNCFELm9niLe67dWyepzn98zrh0f4h6dH25Hll/UDYj/eiSk4a9D+fPKdD5cwokSat+/pq2VzSroa1LL2yrHPbPCgePZenGM6dG7PNTE+I0MWfgavQAgNARohCyX1x1st7YVSOrz/EFEzL9vzd3eLeQGZMW/nVKf/3MkgGfIow2n/ubd4H8vzYe1r82Ho5oX35x5cm6+OTxEe0DAIwGhCiEbGxGki5bNOGobVy+0aqpY8I/ZZWXlqBVcwvC/jnD4e7L5uuBN/bJY/WNoCfO9grvk5C3PLxR/3n/SMjvq6np0INl6wc8PzUvVbedP0N2mxlyHwFgJCFEYVjtqW6R5P3Dih6XLizUpQsLI9qHq3+/Vm/t8ZaRePF4pz+rBm7/oiq1cFKWxmb0H33MTnFo3ABbCAHASEeIwrDpcnu0r8Zb6TwjKd5//JF15dpyuDHoeybnpui65ZMlSe1Ot+58tqcY56FDnXq5cYv/9WULJ+ikQu9TeK/vqh4wCCTG23X7BbP8r3/24s4B614tL871r7HaU+3dD3Agt5wzTTmpCUP6Tn2dyO80ITtZJXG2gLb/NW+s0hK99+rN3TUqq2vzn4u327RiRp6O7N+luXPn9vucIw3tuuPf2yRJN/wl+EiVzUiPfu40LZyUFfQ8AIxkhCgMm06XR0aSJQWULHh1V7WeHmD66LSpOf7A4XR79Jc1BwIblPW8PrUo2x84th5u6t/WJy0xLiBwPLHhUEA46C0p3u4PHBWNHQNeU5I+c/pkf4ga0nfqJdLf6XsXzfE/LfjUpsMq3VEdcH7amFQtzo9TyZz+06Zdbo/WH6jXnupWBZvI23akSR5L+vSf1unOj56kC/o8aQkAIx0hCsMmNSFOD92wVNsrmpWV4vAfv2xh4YAlCAp6TQElxtv0vYt66kLt2rVL06ZN87/uXQvq9OLcgLa9xdsDR1tuOWeaWjpdQdvOGZfu/31ybsqA15Q0LN+pr2j9Tr9/fa8O1rd7y1Q0Bi94Gm+36Z6rTxnw2kv/d7UqmjrU2N6lL/79PT31xdMHbBtMemI8TxICiGrGiuBC18FYtGiRtX79wItch6q0tFQlJSVhuz5Cx72IjI4ut+Z/7wV1ujxa/z/nasv6NYO6DxWNHfryPzb4SmIMzu+vWaSVA9QRi0X8NxE9uBfR40TcC2PMu5ZlLep7nJEoAP38+NJ52l3Volzf9OVgFGQk6rsfnqPvPrVVLR0umeN4eG/r4SZJ3rVW58wcM+g+DEZCvE03lRRTBR/AMRGiAARIjLfr4pPH6/H3DurOZz9QeZlTa9oDF8cX5aToqsUTJXkXz/989c4Br/f9i+doZoF3ivGtPTV6dWd10HaJcXZ9ZeV0SdKHfvG6PjjiDVKrt1cN+Tsdr3anWw9ce6rM8SQ/ADGHEAUgqBe3VerZLb71UPv2Bpw7vTjXH6I6XW797tW9fd/uN78w0x+iNpQ1DNg2IyneH6IevH6JzvvZq6pp6f8EYlpinE6ZmKVFk7I0a2x6v/NDcb3vKcNXdlTr/jf26foz+m+aDQDdCFEAgvrIgvGaV5ipvXv3aMqUwK1qCrN6aj8lxtv19VUzB7xO7z0Ul07JGbBtQq/yC9kpDn353On+Cvh9fXj+2ID9G4fLr65aoJsf2iBJ+uHTH/jravVWMiNP1ywrGvbPBjDyEKIABNVdJqFU5SopGXi/v8R4uz5/lPO9LZyUFXLNqE8snRRSu+H04fnj9PT7R/TcVu8I3MtBphJf3l6lk3ttdTQYE7KSA56MBDAyEaIAjBjPbj6ilk6XzpmVr+wwhZCfXjFfV+yd0G+Lnqc3H9Hj7x2SJF10z5tD+owUh13vfnulEuPtQ7oOgMgiRAEYMe57fa82lDVo5ex8zR3X8/TcWTPy/KNDWw836qVtAy9G/+LZxf59/h5ZV64jjR1B280am6bzfKNx1c2demtPrSblJKvd6e7XNivZIYdvOrK5o0ttQdpI3vIRTR0utTrduu6P61TR1K6BqszkpDiU7qv839zhUk1Lp9rb25W07hV/m4Q4u04rztH/XDh7UN/p72+XBf9wSZctKvRv2fPKjiq9Xx68Qn9OqiNg1PBXq3fJM8B3Gup9irMbXTR/nCZkUz8M0YEQBWDE6N5W6MVtlQFb5GQmx/f649ykn7008NOCny+Z6v/j/NC6Mm0oawja7tKFhf7AUdPSqfvf2DfgNX/ziYX+acrvPLllwCrxU3JT1ORb57Vmb//1Vr3trw1ekV5tgcd3VDbrgpPG+guy3v/GPu2obA761r7f6Wj/TqdPy+0JUdurBvxOMwvSAkLUL1/epS538BQ1HPdpV2Wzfn7lggHfB5xIhCgAI8a9V5+it4OEj3mFPaNSs8em60tnFw94DVuvqgWXLZygM4pzg7ab3WukKyfVcdRr9t58+azpecrstXdkb7lpCVo6JUcH671B6MmNh/tNG3abV5jh38j7QG2r3itrUFVllcbke+tm/XtTz7ZDl/12zYB96y0tsef/8ofzO/X2hRXF8gwwFDWU++RyW9p8qFFj2dAaUYQQBWDEWF6cq+UDhJ5uc8dnhFwo8+olE0NqNyYtUV89b0ZIbc+Zla9zZh29ynr3E4tnzwy9Gvt1y7srM3u32lky+YAeXV8uSwq6d2Fvmw56p+L++OZ+/2jesbz/xOaQ+xZsAX7Qax4KPiUoeUPb7RfM8m+I3dvVSybqiQ0HJUmL2MwaUYQQBQAj0CeWTgr5Ccar7lvrnz7su8l0NCkek6ZrTyvyT+N163S5/UEw1Kc7gROBEAUAo9x91yzU+v31ke5GUJYsffpP3iKnP/jPNu2radEPP3JSQJsth5rkdHk0bUyqMpMdemdfnTaVB1/LlpIQF/IIIzBUhCgAGOXSEuO14gTvQXg8vriiWPe8sluS9Le1ZSqraw84f9H8sXrks8vU6vQuyi/dUaVfl+4Jeq3xmUm6eslEVTV3KCneHnR6EBguhCgAQETdev4MJTns+r/nd0iSXuuzv+JrO6tVemuJinKzJUmnFmXr+tM9Qa+V4VsA/4uXdunv75Tpfy85yb9FETDcCFEAgIj73FlTtXBSljq6empsPbK+XM9s9laPL7m7VA/fuFRLp+RoxcwxxxxZ21DWIMvylpUAwoUQBQCIOLvNaOmUnIBjk3NT1Nrp1qu+kak/vL63X5tgWjtd2l7RJLvNaF7h0LboAY6GEAUAiEqTclL0508v1uIfvaSq5k699EGV3j1Q5z+fkeRQ8ZjUfu97/2CjPJaUmxKvf753UB+aW6CcVG89q7V7a7W7qiWg/fzCTJ1UGFpZDKA3QhQAIKp9+79m6+aHNkiSPvabwMKif7hmkc6dHVhvq7zOW8y0psWp//nXFs0vzPSHqCc3HtJD75QHtE9x2LX5jvNlsx2r4hYQiBAFAIhqy4tztWpOgSqbO/yFRd/zbQNz/V/W6yvnTtct507zt185O1+fr52qxvYuSVJWSs8TeosnZ8sY71Vqmjv1wrZKpSfFE6AwKIQoAEBUy05x6LefXBhw7PLfrtE7+71Tez97aaeWTsmWI86meYWZykpx6OurZga91iULCnXJgkJJ0mPvHtQL2yo1n3VTGCRCFABgxHngulP1t7UHdNez2yVJV9y3VpJ03fIifffDc0K6xsrZ+frzpxcrxWEPWz8xuhGiAAAjTmpCnK5ZNkkfHGnSwfp2vXvAW5H9j2/u167KlmO8u0eX26O6VqckyWaMslMcstuMMpLj9T8XztLYDDY8xsAIUQCAESnZEadfXLlAkvSr1bv0kxd3SpLe2F0zLNdPT4zTDWdM0ZS8/k8AAhIhCgAwCty0olhLpuSo0+U+duNeKps69MSGQ9pV2aKq5k5dsmC8nthwSJL00Dvleuidcj36uWU6tSg7HN3GCEeIAgCMeHab0eLJgws6ly6coMt/u0ZVzZ266ORxWl6cq3+sK9M636bNV/xuTb8in/X17bpv11r/65SEOH191cygdaswehGiAAAxzeX2aPOhRknewpvZKQ5durBQF9/7pjaVN8hjSW/tqe3/xrrAY26Ppc+dNVWSNK8wQ4nxLFgf7QhRAICYVlbXpi63RxOzk5Wd4tBrO6vV2unSp5ZN0oHpeQFtx6QnqCgnRZs2bdKUGbP1wZFm/e61Pero8ujl7VV6eXuVJGlmQZoe/dwypSV6a1RtKm/Q4Yb2oJ+fmezQsqnH3s4G0YcQBQCIaVPyUrXle+frSGOHJOkH/9mmXVXBn/C79rQifXzJJHUdtCslNUG/WP1e0HbbK5p1pLHDH6L+/NZ+Pe5ba9XXoklZeuzzpw3DN8GJRogCAMS8xHi7JuemSJJOn5arqQM8kTd7bLr/96zkeK2aUxBwfm9Ni3ZWtigvLUG7K1tUlJMiR5xNJxVmqM0ZfNH71DEpw/QtcKIRogAA6CXUYp3FY9L6VVL/78c2aWdli6qbO3XT39/TR04ep59fuUDXLZ+s65ZPDnqdLYca9dyWI5o/IZO6VCMMIQoAgGHy0VMKdbihw1+r6l8bD6uiqWPA9jkpCUpLjNPD68r136tm6KaS4hPVVQwDQhQAAMNk6ZQcLZ2Soyc3HtItD2+UJK3dW3fU98wqSJMktXW6VbrDuzC9eEyqCrOSJUkVjR3aXtEUcAzRIawhyhizStIvJNkl/cGyrLv6nJ8o6c+SMn1tvmFZ1jPh7BMAAOF20fxxmpqXqqaOrgHbfOUfG1XZ1KmmDpck6Z5XdvvP3fHh2brWN/33+q5q3fbY+0qMt2njd86jdEIUCVuIMsbYJd0raaWkg5LWGWOesixrW69m/yPpEcuyfmOMmS3pGUlF4eoTAAAngjFGc8dnDHi+0+VWTYt3zz67zSgjKd5/LsVh19jMnrVRWckO33s8shkTph5jMMI5ErVY0m7LsvZKkjHmYUkXS+odoixJ3Y86ZEg6HMb+AAAQNRLibGpzulVW1xZwvLG9Sx1dbr21x7uuqra1U5I0KTtZjjjbCe8nBhbOEDVeUnmv1wclLenT5g5JLxhjbpaUIuncMPYHAICokBBnV+mtJdpb0+o/1uX26JP3vyNJ/vVUveWmJpyw/iE0xrKs8FzYmEslrbIs63rf609KWmJZ1hd7tfmqrw8/McYsk3S/pLmWZXn6XOtGSTdKUn5+/sKHH344LH2WpJaWFqWmsvdRNOBeRAfuQ/TgXkSPcN2Lx3Y6tbshsJ7U9rqeP4nTs4KPRCXYjS6f4dCEtNgbqToR/12sWLHiXcuyFvU9Hs6RqEOSJvR6Xeg71ttnJK2SJMuy1hhjEiXlSqrq3ciyrPsk3SdJixYtskpKSsLUZam0tFThvD5Cx72IDtyH6MG9iB7huhfBLrnohy/610/trPf0b+CTmJqor5w7fcDzk3JTND5z9NWhiuR/F+EMUeskTTPGTJY3PF0p6eo+bcoknSPpT8aYWZISJVWHsU8AAIwo31g1U89uqdB5c/JVlNO/uvm3/rVFu6tatG5/va7+w9sDXsdht2n9t89VemL8gG1wfMIWoizLchljvijpeXnLFzxgWdZWY8z3Ja23LOspSV+T9HtjzFfkXWR+rRWu+UUAAEagSxdN0KWLJgx4/v99bJ5+uXqXnK7go1SWLK3dWyen26OP/votZSUHhqhFRdn6+qqZw9rnWBHWOlG+mk/P9Dn2nV6/b5O0PJx9AABgNFs4KUt//vTiAc9blqXT7npZRxo7tDvIxsrr9tfrK+dO58m/QaBiOQAAUepgfZve3F2jtMR4TcwOrFYeb7dphq/aueQtjdC73lQ3Y4yeveUM7awMDFD1bU599q/vKj0xjgA1SIQoAACi1Jo9tfr6PzcHPVeYlaQ3vn62JKmpo0urfv6aXvjKmUoLsuYpM9mhxZOzA45tOdQoSepweXTpb96SMdJliybo8qNMHSIQIQoAgCi1dEqOlhfnqKGt//Yx+emJkrzTddfc/47qWp3qdHmU1q9lcGmJcTJGcro8Wn+gXpLU3OEiRB0HQhQAAFFqQnayHrx+6VHb1Ld1aWN5g9IS4pST4gj52pNyUvTabStU0dShdw/U665ntysnNfT3gxAFAMCItr/WW/V8Um6yzHHurTchO1kTspO1x7fgvCB99NWRCidCFAAAI9gBX4jKSnaoorFDBRneaT6ny6Py+sB9+deReKcAACAASURBVBLibCrMSu53jYqmDklSQQZbyxwPQhQAACPYofp2SdLru2r0/f9s1a8/vlCSVNHYoXN+8mq/9t/+r9n6zOmTA45Vdoco3zorhIYQBQDACHbm9Dw9t7VCbZ1ujUnrCUFxdqMpuT0VzmtbnWps71JFY3u/axxp9Iao+9/Yp8c3HFJmUrx+8JG5QUet0IMQBQDACDavMFP/ufmMfsfHZSbp5VtL/K+/9cRmPfh2WdBglJXsXVC+v7ZN+2u9U4Bv7KrRlYsnhqfTowQhCgCAGLBqboFyUxO0cFJWv3M/vnSerlk2SR5L+v5/tmlTeYNyUlkfdSyEKAAAYsAZ0/K0eHK26lqdOtJrSi8tMV6pCXFaMNEbrjq73JKksRmsjzoWQhQAADFiU3mjLv/dmoBjSfF2PXPLGZrsWz/Vvcg8n0Xmx8RmOQAAxIh4u1FBeqL/J85m1N7lVoVvYXlHl1v1bV2Kt5vjKtwZqxiJAgAgRiyYmKW1t5/jf73sztU60tihwixvkc3uUagxaYmy2Y6vcGcsIkQBABCDLMvSqUXZKq9v8xfo7C51UMB6qJAQogAAiEHGGP3yqgUBx/xFNwlRIWFNFAAAkNRrJIpF5SEhRAEAEINqWzp1qKFdLrfHf6x7gTnlDUJDiAIAIAb9/e0yLb/rZd39wk7/McobHB9CFAAAMehQg7fg5njfk3lSz3QeI1GhIUQBABCD/CEqsycwMRJ1fAhRAADEoJ4Q5d2Q2O2xVNXcKYkQFSpCFAAAMcayLB32hahxvpGompZOuT2WclMdcsQRD0LBvxIAADGmrtWpji6P0hPjlJYYL6lnPRSjUKEjRAEAEGN6FpUn+49R3uD4UbEcAIAYM21Mmh6/6TS53Jb/WEWjN1gxEhU6QhQAADEmyWHXKROzAo5VNHkXlTMSFTqm8wAAAOUNBoGRKAAAYsxvSveoqrlDn1w6SVPyUiVJR3zTeWw+HDpGogAAiDFPbz6sP765X43tXf5j3QvL91a3qs3pilTXRhRCFAAAMeZQfXehzZ4tXxp8geq7T23Vd57cGpF+jTSEKAAAYkib06X6ti457Dblpib4j3991UzNLEiTJHW5PZHq3ojCmigAAGLIwfqeSuU2m/Efv2rxRNW2dGp7RbPGZiQN9Hb0wkgUAAAxpLyuTZI0ITu537kjFNw8LoQoAABiSHeImniUEPXbV/fopy/uPKH9GokIUQAAxJCM5HidMjFTM8em9zuXn+5dI3WksUN/WbP/xHZsBGJNFAAAMeSSBYW6ZEFh0HM/+shJOmNanm568D2NSUsI2gY9CFEAAECSZLMZJcZ7J6nK6tp03s9elSRNyU3VPVcvUJydCaze+NcAACBGWJalstq2o5YwyE1NkM1IHV0e7axs0c7KFj23tUJ/W3tA9a3OE9jb6MdIFAAAMaK+rUtn/t8rykqO14bvnBe0zbzCTK395jmqb/MW37z8d2vU2N6lO/69TS99UKW/Xb/kRHY5qjESBQBAjOh+Mu9YdaDGpCdqRkGaZhSk6dv/NVsLJ2VJkt7ZX6eP3Pumth1uCntfRwJCFAAAMaLMXyMq9GKaly4s1F0fPUlxNiOny6ON5Q16dWd1uLo4ohCiAACIEWVHqRF1NNPy07T29nN0zswxkqScFMew920kIkQBABAjDtYPXK38WHJTE+T0LUjPo/yBJEIUAAAxo+woW76Eorq5UxIhqhshCgCAGFFe5918eELW4EJUlS9EjUknREmUOAAAIGb8+uOn6EBtmyZkJ+mWhzdoZ2WLJMlIuva0Il1+6oQB39vl9qiu1SmbkXJSCFESI1EAAMSMueMzdOG8sUqIs2tfTas+ONKkD440aduRJj28ruyo761p8Y5CJcXb9eK2CrU5XSeiy1GNEAUAQAz62RUn6+kvna6vrZwu6di1o1o6vKGp1enW5/72nn7yws6w9zHaEaIAAIhBU/NSNWdchv914TFqR03NS9UXVkzVzII0STrq1jGxghAFAEAM63R5lBhvO+Zic5vN6LbzZ+psX62oqqZOrdtfJ8uyTkQ3oxILywEAiGG3nj9DXztvutye0MJQa6d3Wu+5rRXeny+foZkF6eHsYtQiRAEAEOOMMYqzm5DafnzpJLU63XpuS4VaOl3qcsXuSBTTeQAAxCjLso57Om56fpruvmy+EuPtkmK78CYhCgCAGFVe1655d7ygax5457je5/ZYqmv1ljzISY3dffQIUQAAxKjy+jY1d7rUfpw1n+panfJYUlZyvOLtsRslYvebAwAQ48q799I7zm1gugtvxvJUnkSIAgAgZpXXe0NU4XFuSNwdonJTYztE8XQeAAAxqmdD4qMX2uyr2rcR8YayBp31f69IklIccfrxpfM0d3zG0d46qhCiAACIUd0jUROOcyRqfGaS7Daj9i63DtS2+Y+/tquaEAUAAEY//0jUcYaoJVNy9M7t56jZt5/enc9+oOe3Vio7Obae1CNEAQAQgyzL0m3nT1dZXZsK0hOP+/05qQnK8a2J6nR599GLtTVShCgAAGKQMUZXnDpxWK5V2+KUFHs1o3g6DwAADEmsPq1HiAIAIAZtKKvXP9aVaXdV85CuY1mWfySKEAUAAEa9ZzYf0df/uVnPb60c0nWaOlxyuj1KcdiV5LAPU+9GBkIUAAAxaLBP5vXln8qLwerlhCgAAGJQmX/Ll+MrtNlXTXNsroeSCFEAAMQcy7J0oLZVkjQ5N2VI16rxr4eKrSfzJEIUAAAxp6bFqVanWxlJ8cocYoHM2lbvSFQOI1EAAGC02+8bhSrKGdp6KInpPAAAEEPqW51KcdhVNMSpPEmq9k3n5cXgdB4VywEAiDHnzSnQlu+d79+uZShqW5jOAwAAMcQYo8T4odd1itVq5RIhCgCAmGNZ1rBdi6fzAABATLAsS6f+aLXO/9lrau10Dfl6sTydx5ooAABiSG2rUzUtnep0uZU8xG1a2p1utTrdcthtSk+MvUgRe98YAIAY1rvIpjFmSNfqXg+VEGfTw+vK/cdPmZilGQVpQ7r2SECIAgAghuyr8W73Miln6OUN2rvckqTmTpe++fhm//G8tASt+9a5Q75+tCNEAQAQQ/wjUcNQaLM4L1W3nT9D5b59+Fqdbv1702G5PcO3cD2aEaIAAIgh+2q8IWo4RqJsNqMvrCj2v95R0ax/bzqs7JTYeFKPp/MAAIghB2q9o0ZFuUMfieqr+0m9WAlRjEQBABBDbjxzinZWNqs4b/gXfte2xlbNKEIUAAAx5MPzx4Xt2rE2EsV0HgAAGBZ1vpGo7JTYKLxJiAIAIEa8e6BeD759QLurmsNy/RpfiLr/9b0688ev6O29tWH5nGhBiAIAIEY8u/mIvvXEFj2/tTIs15+R711n1ep0q6yuTW/srgnL50QLQhQAADFiT3WLJGlq3tDLGwTzqdOKtPE7K/WxUwoljf61UYQoAABixF5fjagpealh+4zMZIc6fJXMR/umxIQoAABiQKfLrfK6NtmMNGkYqpUfTfeeermMRA2eMWaVMWaHMWa3MeYbA7S53BizzRiz1Rjz93D2BwCAWFVW2yaPJRVmJSshzh7Wz+quF5U9yutFha1OlDHGLuleSSslHZS0zhjzlGVZ23q1mSbpm5KWW5ZVb4wZE67+AAAQy/ZUd0/l9ayH+se6MrU7vVNvY9ITdf6cAtltZsif1V3qIGeUlzoIZ7HNxZJ2W5a1V5KMMQ9LuljStl5tbpB0r2VZ9ZJkWVZVGPsDAEDMqm9zKi8tQTecMcV/7O4Xdqq6udP/+h83LtWSKTlD+hyX26P6NqeMkbKS44d0rWgXzhA1XlJ5r9cHJS3p02a6JBlj3pRkl3SHZVnPhbFPAADEpFVzCtTS4dLy4lz/scsXFaqlw6XH3zuk5k6XEuOHPs1X39Yly/IGqDj76F56HeltX+IkTZNUIqlQ0mvGmJMsy2ro3cgYc6OkGyUpPz9fpaWlYetQS0tLWK+P0HEvogP3IXpwL6LHSL0X0ySVlpb5X5+aIHkclh7qckmSDm7foPo9Q5vOK2/2SJKSjOuE/BtF8l6EM0QdkjSh1+tC37HeDkp627KsLkn7jDE75b3H63o3sizrPkn3SdKiRYuskpKScPVZpaWlCuf1ETruRXTgPkQP7kX0GE334nBDu5zPv6ycFIcuXLliyNd7a3eN9ObbmjAmSyUly4ahh0cXyXsRznG2dZKmGWMmG2Mckq6U9FSfNv+SdxRKxphceaf39oaxTwAAoBe3x9KF88bq7JnD82xX99YvuaO8RpQUxpEoy7JcxpgvSnpe3vVOD1iWtdUY831J6y3Lesp37jxjzDZJbkm3WZY1ujfaAQAgikzITta9V58ybNer9dWIGu3VyqUwr4myLOsZSc/0OfadXr9bkr7q+wEAACOcv7zBKK8RJVGxHACAmLajolkVjR3yjmsMXU1Ld4ga/dN5hCgAAGLYTQ++q6V3rta2I03Dcr3aGNnyRSJEAQAQs9weS2V1bZKkybkpx2gdGv+WL4QoAAAwWh2qb1eX21J+eoKSHcOzTLpnTRTTeQAAYJTaV+vdT2+4RqEkqaZ7Oo+F5QAAYLTaV90iafhCVKfLreYOl+w2o/TE0b1vnkSIAgAgZu2vHd71UHW91kPZbEPbPmYkiPTeeQAAIEL21nin84pyUtTY3qX1++sCzmckxWvhpCwZE1ogqvWVN6hp6dRJ331ec8dn6MHrl4zaQEWIAgAgRv38ipO1r6ZFxXlpKqtt02f+vL5fm998/BR96KSxIV1vfGaSclIcqm11qrnTpTV7a/Wrl3drUk6yLpw3VvH20TUBRogCACBGZac4lJ2SLUlKa4sL2D9vQ1m96tu65D6OIpxZKQ6tvf0ctTndWnbnarU53frZSzslSelJcTp7Zv7wfoEII0QBAAAV5abogWtP9b9e9fPXVN/WpUnZx7deKt5uU0aSTT+/4mSt3Vun57Yc0eHGDg1TQfSoQogCAAD93Hz2NO2qalZRbvKg3n/enAKdN6dA6/bX6XBjx6gsvkmIAgAA/Vw4b6yk0NZCHY2/+GbK6Cu+ObpWeAEAgKhS2+otvpk9CotvEqIAAECATeUN+uua/fpgiJsStzld6ujyyBFnU4rDPjydiyKEKAAAEOClDyr17Se36tnNR4Z0ne66UTkpjpBrTY0khCgAABCgu5L5pJyhVTLvXcF8NCJEAQCAAGW+jYkn5QzuybxuhCgAABBTukeiJg4xRNW29kznjUaEKAAA4NfQ5lRje5eSHXblpQ6tLEFd95N5o7C8gUSIAgAAvRzoHoXKTh7yYnD/SNQoLG8gEaIAAEAvje1dyk5xqGiIi8olqa5ldK+JomI5AADwO3N6nt779krVtHRqR0Vz0Da5qQ7lhDDVN9oXlhOiAABAP6/uqNbXHt0U9JzDbtPLt56lwqyjLzwf7QvLCVEAAKCf9KR4Tc9P7Xd8Z2WLzpk1Rk6X55jXYCQKAADEnJWz87Vydv6QrjGaNx+WWFgOAADCoNPlVkunS3E2o/Sk0TlmQ4gCAAAhqWnp1Gs7q1Ve13bMtt2jUFmjdN88iRAFAABCtG5fna554B1979/bjtm29+bDoxUhCgAAhORgfbskqTAr6ZhtR3uhTYkQBQAAQnSw3juNF0qIqm72bvkyJi0xrH2KJEIUAAAISc9I1LE3Ju4OUbmjeCTqqMvljTHNkqxgpyRZlmWlh6VXAAAg6nSHqIKMntGllk6XOrrc/teJ8XalJsT5Q1Re2ugsbyAdI0RZlpV2ojoCAACi2+EGb4hqau/yH7vzmQ/04Ntl/tdxNqPff2qRalpiPEQZY7KPdt6yrLrh7Q4AAIhWF84bqxe2VQZUIE9JiPM/gdfc6ZLT5dGeqpaekajU0bsm6ljVr96VdzovWIEHS9KUYe8RAACISnd9bJ7u+ljgsdsvmKXbL5glSfrsX9fr+a2VKshIVHWsj0RZljX5RHUEAACMbJVN3uBUkJ7ImqjejDFZkqZJ8o/LWZb1Wjg6BQAARp5fXbVAB+vbNT4rSY3tXbIbyWE3anO6/G2MjJIc9gj2cviEFKKMMddLukVSoaSNkpZKWiPp7PB1DQAAjCQTspM1ITtZX31koyTJbUlz73ghoM3MgjQ99+UzI9G9YRdqnahbJJ0q6YBlWSskLZDUELZeAQCAEcvl9lZHMkZKircH/CTEj45RKCn06bwOy7I6jDEyxiRYlrXdGDMjrD0DAAAj0kXzx+mpTYdVMj1Pf7xucaS7EzahhqiDxphMSf+S9KIxpl7SgfB1CwAAjFQDPZnncntkM0Y2W7CH/keekKbzLMu6xLKsBsuy7pD0bUn3S/pIODsGAABGpoGezHtmS4Wm/c+z+u/HNkWiW8MupBBljFlqjEmTJMuyXpVUKu+6KAAAgAA9hTYDQ1R9q1NujyVH3OjYujfUb/EbSS29Xrf4jgEAAAToDlFv7anVT1/cqXf2eTc4qWt1SpKyk0fHpsShhihjWZZ/I2LLsjw6jhpTAAAgdrg8HknSC9sq9cvVu/SlhzZIkurbfCEqZXSEqFCD0F5jzJfUM/p0k6S94ekSAAAYyW6/YJbmF2aquqVTf1lzQHF270Ly7pGorFESokIdifqcpNMkHZJ0UNISSTeGq1MAAGDkmpKXqpvPmaaLTx4nScr1rY3yT+eNkhAV0kiUZVlVkq4Mc18AAMAo0r02qm+Iyhola6JC3fZlurxTefmWZc01xsyTdJFlWT8Ma+8AAMCI1bfUwZfPna7DDe2akJUcyW4Nm1Cn834v6ZuSuiTJsqz3xcgUAAA4iuoW78hTXqp35GnV3AJ9+vTJykiOj2S3hk2oISrZsqx3+hxzBW0JAACgnpGo7BSHXG6Pej3oPyqEGqJqjDFTJVmSZIy5VNKRsPUKAACMeDW+7V/u+Pc2FX/rWS29c7We21IR4V4Nn1BLHHxB0n2SZhpjDknaJ+njYesVAAAY8U6bmqNXd1bL7bHk9liqbOrUt57YrC2HGv1tbEa6cN44zShIi2BPByfUp/P2SjrXGJMi7+hVm7xrotiEGAAABHXd8sm6bvlkWZalmd9+Tp0uj2pbnbrnld0B7TaUN+ivn1kSoV4O3lGn84wx6caYbxpj7jHGrJQ3PH1K0m5Jl5+IDgIAgJHNGKNPn14kSZpZkKavrZyur62crksWjI9sx4boWCNRf5VUL2mNpBskfUuSkXSJZVkbw9w3AAAwSuSkeMscLJuao5vPmSZJemrTYT2x4ZAO1bfrx89tV0KcXVcunqD89MRIdjVkxwpRUyzLOkmSjDF/kHcx+UTLsjrC3jMAADBqBNt82ObdDUZ7a1r169I9kqS2Lpe++aFZJ7x/g3GsENXV/YtlWW5jzEECFAAAOF7dmw/33jfv3Fn5+tElc9XQ1qU1e2r1xu6aSHVvUI4VouYbY5p8vxtJSb7XRpJlWVZ6WHsHAABGhS63JbvNKKdXiEqMt+vjSyZJklo7XXpjd43SEkItHBB5R+2pZVn2E9URAAAwet192Xz9+GPz5Bmg4GZLp7eGd8poCVEAAADDxWYzsskEPdcdolJHUIgKtWI5AABA2LR0eENUWiIhCgAAQJJ3vdOyO1fr8t+uGbBNz0jUyNmceOTEPQAAMCLVtHTqSGOH7LbgU3lS7zVRI2c5NiNRAAAgrKqbvRsR56YmDNimO0QxnQcAAOBT0xJCiPKtibLbbGp3uuX2BH+KL5qMnLgHAABGpOoWb6HNvLRjj0StuLtUkjQuI1EvfPWsqH5aj5EoAAAQVt3TeXmpjgHbnD+nQInxNiXGe6PJ4cYOHapvPyH9GyxCFAAACCv/dN5RRqJ+dsXJ2v6DD2n7Dz6k8ZlJkqRkR3QvMo/eMTIAADAqlEzPU2pCnOYVZobUvs3pndojRAEAgJh23pwCnTenIOT2rU63JOlXL+9WvN2oICNJ155WdNQSCZFAiAIAAFHDsiwlxNnkdHn0p7f2+48vmJipUyZmRa5jQRCiAABAWD2z+YhyUhxaPDlbxhx9NMkYoz9dt1jvHqiTJP35rQM61NAelSUPCFEAACBsWjpduunB95QQZ9P2H6wK6T0LJ2Vp4STvqNOTGw/rUEO7EuOib30UT+cBAICwqWzqkCQVZCQecxQqmI4u7/qo7tIH0ST6egQAAEaNykZviMpPSxzU+zu6PJKkBEaiAABALKnwjUTlZwwuRHW6GIkCAAAxqLLJW2izIH3gQptH09k9EhXPSBQAAIgh3Wui8tMHOZ3HSBQAAIhFVc2DD1Fuj6UutyVjJIc9+iILJQ4AAEDY/OqqU3THhzuVknD8kaP7ybyEONugnuwLN0IUAAAIG7vNaMwgp/I6Xd71UIlRuB5KYjoPAABEKX+NqCgsbyARogAAQJjUtTp18T1v6KuPbBzU+6O50KbEdB4AAAiTww3t2nSw0V8w83hFc6FNiZEoAAAQJpVDLLQZzeUNJEIUAAAIk9FcaFNiOg8AAIRJ95YvG8sb9Ne1B/TJpZMkSTUtnfp/z24P+p4PnVSgs2fmS+o9EhWDIcoYs0rSLyTZJf3Bsqy7Bmj3MUmPSTrVsqz14ewTAAA4MWy+0k47K1u0dk+tP0S1drr06LsHg77n3QP1/hDV6X86LzonzsIWoowxdkn3Slop6aCkdcaYpyzL2tanXZqkWyS9Ha6+AACAE+/GM6docm6KOrs8KsxK8h/PTnHoxx+bF9D2YEO7frl6l2aPS/cf64jh6bzFknZblrVXkowxD0u6WNK2Pu1+IOn/SbotjH0BAAAnWLIjThefPL7f8bTEeF1+6oR+x7+6cnrA644oH4kKZ6/GSyrv9fqg75ifMeYUSRMsy3o6jP0AAAAjULRXLI/YwnJjjE3STyVdG0LbGyXdKEn5+fkqLS0NW79aWlrCen2EjnsRHbgP0YN7ET24F8Ovw2XJbUlJcZLNt0/e1n1OSVJ1xSGVltYEfV8k70U4Q9QhSb3H6gp9x7qlSZorqdS3qWCBpKeMMRf1XVxuWdZ9ku6TpEWLFlklJSVh63RpaanCeX2EjnsRHbgP0YN7ET24F8PvD6/v1Q+f/kDXnlakOy6aI0na5Nol7dipaVOKVFIyI+j7Inkvwjmdt07SNGPMZGOMQ9KVkp7qPmlZVqNlWbmWZRVZllUkaa2kfgEKAACMfk3tXZKkjKR4/7HuEgcJsbYmyrIsl6QvSnpe0geSHrEsa6sx5vvGmIvC9bkAAGDkaepwSZLSe4eorhiuE2VZ1jOSnulz7DsDtC0JZ18AAED0agwyEtW9sDxaSxxE5/gYAACIKd3TeemJPeM7sVziAAAAICRBR6K6orvEASEKAABEXFOHbySqV4hqdXrXSSU7ojNEsQExAACIuDs+PEfVLZ0B28O0Ob3TeUmEKAAAgOBOK87td6zdF6JSHNEZV5jOAwAAUaktyqfzCFEAACCimju69NMXdujBtw8EHI/26TxCFAAAiKiq5k798uXd+v1rewOOtzGdBwAAMLBgW75IPWuiGIkCAAAIortGVO/yBl1uj5xuj2wmBvfOAwAACEWwENU9lZfsiJMxJiL9OhZCFAAAiKieLV96QlS7P0RF51SeRIgCAAARVt/mDVHZKb1HoqK7vIFEiAIAABFmtxnlpjqUm5rgP9ZT3iA6n8yTqFgOAAAi7AsrivWFFcUBx3rKGzASBQAAELLu6bxoLW8gEaIAAEAUYmE5AADAMZz701d12p2rVdHY4T/WGuXVyiXWRAEAgAg73NCuNqdbqYk9saSd6TwAAICBdXS51eZ0y2G3BSwib2M6DwAAYGD1bU5JUlZKfEBl8t4Vy6MVIQoAAERMXasvRCU7Ao5TbBMAAOAoukNUdkrfEMV0HgAAwID8I1F9QlQ7FcsBAAAGNmtsur6+aqYm5yYHHG/p9E7npSZE70gUIQoAAETM9Pw0Tc9P63e8J0TF9zsXLZjOAwAAUafVF6JSGIkCAADor3RHldqdbi2enK2c1AT/8WZfiEpLjN6owkgUAACImHte3q3PP/iedlW1BBzvGYkiRAEAAPRT09IpScpLSwg43tLRvSaKEAUAANBPdXP/EOXxWCNiA2JCFAAAiIjWTpdanW454mxK6zXi1OqrVp7isMtmMwO9PeIIUQAAICL8U3mpCQH75vnLG0TxonKJp/MAAECEHGs9VHOHS7c8vMF/PDc1QbeeN0NJUbIVDCEKAABERF1rl6T+ISrZN7XX5nTryY2HA84tL87R2TPzT0wHj4EQBQAAImLl7Hzt/OGH1OFyBxwfn5mkx286TWW1bf5jv311j7ZXNJ/oLh4VIQoAAESMI84mR1z/JdqnTMzSKROz/K//sa5ckpQQFx1TeRILywEAwAjQ6RutSggSuCIlenoCAABiyu1PbNbHfvOWNpTVH7Ntp8sjSUqMj56RKKbzAABARGw93KRN5Q3yWMdu2x2iGIkCAAAxr8ZXrXxMn6fzgumZzouekShCFAAAOOEsy/Jv+ZKbGkKI6vKNRMVHT3SJnp4AAICY0dDWJafbo7TEuJCKZzKdBwAAIKmiqUOSVJCeGFJ7pvMAAAAkVTT6QlTGsUOUZVn+kahgNaUihafzAADACZefnqhrTyvSlLwUVTV3+PfL68sRZ9OYtERZlhRnkw7Utgacd4XyaF+YEKIAAMAJN3tcuu64aI4k6Rv/fF8P+yqS9zWvMEMPXr9EkmS3GZ39k1cDzk/JsOncs8Pb14EQogAAQETlpiZocm5K0HPjMpLU5faONnW6+o867Wv0hLVvR0OIAgAAEXXr+TN06/kzBjxvWZYuXVioTeUN/mNdbo/217YpcpN5hCgAABDljDG6+7L5AceONLRr2V0vR6hHXtGzxB0AACBE3UU3I1l7kxAFAABGnHi7N8LYTeT6QIgCAAAYBEIUAAAYcZy+4puuyD2cR4gCAAAjjz9ERfDxPEIUAADAIBCiMHr+gAAAGHZJREFUAAAABoEQBQAAMAiEKAAAgEEgRAEAAAwCIQoAAIw4xldkM4K1NglRAABg5ElJ8G7/m2CPXB8IUQAAAINAiAIAABgEQhQAABhxWjtdkqROd+T6QIgCAAAjjuXb7iWCu74QogAAAAaDEAUAADAIhCgAAIBBIEQBAAAMAiEKAABgEAhRAABgxHHEeSNMXAT3fSFEAQCAEccfoiKYZAhRAAAAg0CIAgAAI06X2yNJckew2iYhCgAAjDidXd4Q5fufiCBEAQAADAIhCgAAYBAIUQAAAINAiAIAABgEQhQAAMAgEKIAAAAGgRAFAABGnGSHXZKUQMVyAACA0Nls3k3zDHvnAQAAjCyEKAAAMOK0O92SJCcVywEAAELn9ng3zfOwdx4AAMDIQogCAAAYBEIUAADAIIQ1RBljVhljdhhjdhtjvhHk/FeNMduMMe8bY1YbYyaFsz8AAADDJWwhyhhjl3SvpA9Jmi3pKmPM7D7NNkhaZFnWPEmPSfpxuPoDAAAwnMI5ErVY0m7LsvZaluWU9LCki3s3sCzrFcuy2nwv10oqDGN/AADAKGH3Fdu0jdJim+Mllfd6fdB3bCCfkfRsGPsDAABGiSTfti+OCK7ujovcR/cwxnxC0iJJZw1w/kZJN0r6/+3dfZTU1Z3n8feXBhoVIwQVDZ0IMcAgQrrlSUOMoCgYIzoKo47O6JhdYxJx1XUNmlVZRzcYiboas4yjDGaSyagYVzy6A8NK+/wQVIzyoKgQbeL6QEJrg9A03Pmjyj4t8lD87Oqupt6vczhd9atf3bpV39PNp+/v9r307t2b2traovWloaGhqO2rcNaiNFiH0mEtSoe1aH8fN32yQFRqt1oUM0StBr7c4n5V/tinRMQ44MfAUSmljdtqKKV0O3A7wPDhw9OYMWNavbOfqK2tpZjtq3DWojRYh9JhLUqHtWh/9esbYcG/k1K0Wy2KOQj2O6B/RPSLiK7A6cDclidERA3wD8DElNJ7ReyLJEnajazPb/uycXfc9iWl1ARcAMwDlgH3pJSWRMQ1ETExf9oNQHfg3ohYHBFzt9OcJElSSSnqnKiU0sPAw1sdu6rF7XHFfH1JkqRiccVySZKkDAxRkiRJGRiiJEmSMjBESZIkZWCIkiRJHU5ll1yE6dKOScYQJUmSOpwuFbkIU7Gb7p0nSZK02zJESZKkDqexKbdUedPuuGK5JElSsTSHqLSTE4vIECVJkpSBIUqSJCkDQ5QkSVIGhihJkqQMDFGSJEkZGKIkSVKHE/lFNttxrU1DlCRJ6nj2quwMQGVF+/XBECVJkpSBIUqSJCkDQ5QkSepw1m1sAmDj5vbrgyFKkiR1OCm/3Us77vpiiJIkScrCECVJkpSBIUqSJCkDQ5QkSVIGhihJkqQMDFGSJKnD6do5F2E6t+O+L4YoSZLU4TSHqHZMMoYoSZKkDAxRkiSpw9m0eQsAm9txtU1DlCRJ6nA2bsqFqPyXdmGIkiRJysAQJUmSlIEhSpIkKQNDlCRJUgaGKEmSpAwMUZIkSRkYoiRJUoezZ9cKACpdsVySJKlwnTrlNs0L986TJEnqWAxRkiSpw/m4cTMAja5YLkmSVLjNW3Kb5m1x7zxJkqSOxRAlSZKUgSFKkiQpA0OUJElSBoYoSZKkDAxRkiSpw6nIL7bZycU2JUmSCrdHftuXrm77IkmS1LEYoiRJUoezJb/KZnKxTUmSpMKtz2/7stFtXyRJkjoWQ5QkSVIGhihJkqQMDFGSJEkZGKIkSZIyMERJkiRlYIiSJEkdTmWXXITp4orlkiRJhetSkYswFe6dJ0mS1LEYoiRJUofT2JRbqrzJFcslSZIK1xyi3DtPkiSpYzFESZIkZWCIkiRJysAQJUmSlIEhSpIkKQNDlCRJ6nAiv8hmO661aYiSJEkdz16VnQGorGi/PhiiJEmSMjBESZIkZWCIkiRJHc66jU0AbNzcfn0wREmSpA4n5bd7acddXwxRkiRJWRiiJEmSMjBESZIkZWCIkiRJysAQJUmSlIEhSpIkdThdO+ciTOd23PfFECVJkjqc5hDVjknGECVJkpRB5/bugCRJ5WDTpk3U1dWxYcOG9u7KbmHLlsQ/TjyQAJYtW9YqbXbr1o2qqiq6dOlS0PmGKEmS2kBdXR177703ffv2JaIdJ/LsJhqbtrD5/38IwKCqHp+7vZQSa9asoa6ujn79+hX0HC/nSZLUBjZs2ECvXr0MUCUqIujVq9cujRQaoiRJaiMGqNK2q/UxREmSVAbWrFlDdXU11dXVHHDAAfTp06f5fmNj4w6fu2jRIi688MJder2+ffsyZMgQqqurGTJkCA888MDn6f5n/P01/4O7Zt4KwFVXXcWCBQtatf1COCdKkqQy0KtXLxYvXgzAtGnT6N69O5deemnz401NTXTuvO1YMHz4cIYPH77Lr7lw4UL23XdfXn31VY477jhOOumkbJ3fiWuuuaYo7e6MI1GSJJWpc845h/PPP59Ro0Zx2WWX8dxzz3HEEUdQU1PDN77xDV599VUAamtr+c53vgPkAti5557LmDFj+OpXv8ott9yy09f58MMP6dmzZ/P9k08+mWHDhjF48GBuv/12ADZv3sw555zDoYceypAhQ7jpppsAeOONN5gwYQLDhg3jyCOPZPny5dt8H3PmzAFyI2BXX301hx12GEOGDGk+f926dZx77rmMHDmSmpqaVhkZcyRKkqQ21nfqQ0Vpd9X0E3b5OXV1dTz11FNUVFTw4Ycf8vjjj9O5c2cWLFjAFVdcwX333feZ5yxfvpyFCxfy0UcfMXDgQL7//e9vc1mAsWPHklLizTff5J577mk+PmvWLL74xS/y8ccfM2LECE499VRWrVrF6tWreeWVVwBYu3YtAOeddx4zZ86kf//+PPvss/zgBz/gkUce2eF72nfffXnhhRf4xS9+wYwZM7jjjju47rrrOProo5k1axZr165l5MiRjBs3jr322muXP7NPGKIkSSpjkydPpqKiAoD6+nrOPvtsVqxYQUSwadOmbT7nhBNOoLKyksrKSvbff3/effddqqqqPnPeJ5fz3njjDY455hjGjBlD9+7dueWWW7j//vsBePvtt1mxYgUDBw7kzTffZMqUKZxwwgkcd9xxNDQ08NRTTzF58uTmNjdu3AjAJ1PAt3VJ7ZRTTgFg2LBh/Pa3vwVg/vz5zJ07lxkzZgC5v5Z86623GDRo0C5/Zp8oaoiKiAnA/wIqgDtSStO3erwS+CUwDFgDnJZSWlXMPkmS1N6yjBgVS8uRmCuvvJKxY8dy//33s2rVKsaMGbPN51RWVjbfrqiooKmpaYevcfDBB9O7d2+WLl3K+vXrWbBgAU8//TR77rknY8aMYcOGDfTs2ZOXXnqJefPmMXPmTO655x5uvvlmevTo0TyX61Niq6/b6F/LvqWUuO+++xg4cOAO+7orijYnKiIqgNuA44FDgDMi4pCtTvsu8OeU0teAm4Dri9UfSZK0Y/X19fTp0weA2bNnt1q77733HitXruSggw6ivr6enj17sueee7J8+XKeeeYZAD744AO2bNnCqaeeyrXXXssLL7zAF77wBfr168e9994L5ILQSy+9lKkP48eP59ZbbyWlBMCLL774ud9XMSeWjwReTym9mVJqBP4V2Hpa/knAXfnbc4BjwkU0JElqF5dddhmXX345NTU1Ox1dKsTYsWOprq5m7NixTJ8+nd69ezNhwgSampoYNGgQU6dO5fDDDwdg9erVjBkzhurqas466yx+8pOfAPDrX/+aO++8k69//esMHjy4eUJ4Pgs1f92ZK6+8kk2bNjF06FAGDx7MlVde+bnfX6RCX31XG46YBExIKf2n/P2/AUallC5occ4r+XPq8vffyJ/zwfbaHT58eFq0aFFR+gy5v0DY3vCl2pa1KA3WoXRYi9KRpRbLli37XPNv9GmNTVtYnt/2ZWgrbPvyiW3VKSKeTyl9Zo2HDjGxPCLOA84D6N27N7W1tUV7rYaGhqK2r8JZi9JgHUqHtSgdWWqxzz778NFHHxWnQ2Voc34QqBO06ue6YcOGgmtbzBC1Gvhyi/tV+WPbOqcuIjoD+5CbYP4pKaXbgdshNxJVzN/E/E2vdFiL0mAdSoe1KB1ZR6L23nvv4nSoDKWUGLRXYl1DQ6t+rt26daOmpqagc4s5J+p3QP+I6BcRXYHTgblbnTMXODt/exLwSCrW9UVJkrTbiAi6VHSiolP7TaUu2khUSqkpIi4A5pFb4mBWSmlJRFwDLEopzQXuBP45Il4H/kQuaEmSJJW8os6JSik9DDy81bGrWtzeAEze+nmSJEmlzr3zJEmSMjBESZJUBtasWUN1dTXV1dUccMAB9OnTp/l+Y2PjTp9fW1vLU089tc3HZs+ezX777Ud1dTWDBw9m0qRJrF+/vlX73717dwD++Mc/MmnSpFZtOytDlCRJZaBXr14sXryYxYsXc/7553PxxRc33+/atetOn7+jEAVw2mmnsXjxYpYsWULXrl25++67W7P7zb70pS8xZ86corS9qwxRkiSVqeeff56jjjqKYcOGMX78eN555x0AbrnlFg455BCGDh3K6aefzqpVq5g5cyY33XQT1dXVPP7449tts6mpiXXr1tGzZ08AHnzwQUaNGkVNTQ3jxo3j3XffBeDRRx9tHgmrqalpXuvphhtuYMSIEQwdOpSrr776M+2vWrWKQw89FMiNgJ155plMmDCB/v37c9lllzWfN3/+fI444ggOO+wwJk+eTENDQ+t8aC10iMU2JUna3fSd+tB2H/uffzmEvx71FQD+5dm3uOL+l7d7btbNjFNKTJkyhQceeID99tuPu+++mx//+MfMmjWL6dOns3LlSiorK1m7di09evTg/PPPp3v37lx66aXbbO/uu+/miSee4J133mHAgAGceOKJAHzzm9/kmWeeISK44447+OlPf8rPfvYzZsyYwW233cbo0aNpaGigW7duzJ8/nxUrVvDcc8+RUmLixIk89thjfOtb39ru+3j55ZdZvHgxlZWVDBw4kClTprDHHntw7bXXsmDBAvbaay+uv/56brzxRq666qrttpOFIUqSpDK0ceNGXnnlFY499lgANm/ezIEHHgjA0KFDOfPMMzn55JM5+eSTC2rvtNNO4+c//zkpJX74wx9yww03MHXqVOrq6jjttNN45513aGxspF+/fgCMHj2aSy65hDPPPJNTTjmFqqoq5s+fz/z585sXu2xoaGDFihU7DFFHHXUU++yzDwCHHHIIf/jDH1i7di1Lly5l9OjRADQ2NnLEEUdk+6B2wBAlSVI7KHQE6a9HfaV5VKo1pZQYPHgwTz/99Gcee+ihh3jsscd48MEHue6663j55e2PhG0tIjjxxBO59dZbmTp1KlOmTOGSSy5h4sSJ1NbWMm3aNACmTp3KCSecwMMPP8zo0aOZN28eKSUuv/xyvve97xX8ei3nc1VUVNDU1ERKiWOPPZbf/OY3BbeThXOiJEkqQ5WVlbz//vvNIWrTpk0sWbKELVu28PbbbzN27Fiuv/566uvrachvrVLoHnVPPPEEBx98MAD19fX06dMHgLvuuqv5nDfeeIMhQ4bwox/9iBEjRrB8+XLGjx/PrFmzmucvrV69mvfee2+X39vhhx/Ok08+yeuvvw7AunXreO2113a5nZ1xJEqSpDLUqVMn5syZw4UXXkh9fT1NTU1cdNFFDBgwgLPOOov6+npSSlx44YX06NGDE088kUmTJvHAAw9w6623cuSRR36qvU/mRG3ZsoWqqipmz54NwLRp05g8eTI9e/bk6KOPZuXKlQDcfPPNLFy4kE6dOjF48GCOP/54KisrWbZsWfOlt+7du/OrX/2K/ffff5fe23777cfs2bM544wz2LhxIwDXXnstAwYM+Jyf2qdFR9uqbvjw4WnRokVFa98NPkuHtSgN1qF0WIvSkXUD4kGDBhWnQ2Xso48+atUNiLdVp4h4PqU0fOtzvZwnSZKUgSFKkiQpA0OUJElSBoYoSZLaSEebh1xudrU+hihJktpAt27dWLNmjUGqRKWUWLNmDd26dSv4OS5xIElSG6iqqqKuro7333+/vbuyW9mwYcMuBZ8d6datG1VVVQWfb4iSJKkNdOnSpXnLE7We2tra5m1i2pqX8yRJkjIwREmSJGVgiJIkScqgw237EhHvA38o4kvsC3xQxPZVOGtRGqxD6bAWpcNalI62qMVBKaX9tj7Y4UJUsUXEom3tj6O2Zy1Kg3UoHdaidFiL0tGetfByniRJUgaGKEmSpAwMUZ91e3t3QM2sRWmwDqXDWpQOa1E62q0WzomSJEnKwJEoSZKkDMo2REXEhIh4NSJej4ip23i8MiLuzj/+bET0bfte7v4KqMMlEbE0In4fEf8vIg5qj36Wg53VosV5p0ZEigj/MqlICqlFRPxV/ntjSUT8S1v3sVwU8DPqKxGxMCJezP+c+nZ79HN3FxGzIuK9iHhlO49HRNySr9PvI+KwtuhXWYaoiKgAbgOOBw4BzoiIQ7Y67bvAn1NKXwNuAq5v217u/gqsw4vA8JTSUGAO8NO27WV5KLAWRMTewH8Bnm3bHpaPQmoREf2By4HRKaXBwEVt3tEyUOD3xX8H7kkp1QCnA79o216WjdnAhB08fjzQP//vPOB/t0GfyjNEASOB11NKb6aUGoF/BU7a6pyTgLvyt+cAx0REtGEfy8FO65BSWphSWp+/+wxQ+Pba2hWFfE8A/D25Xyg2tGXnykwhtfjPwG0ppT8DpJTea+M+lotCapGAL+Rv7wP8sQ37VzZSSo8Bf9rBKScBv0w5zwA9IuLAYverXENUH+DtFvfr8se2eU5KqQmoB3q1Se/KRyF1aOm7wP8tao/K105rkR8e/3JK6aG27FgZKuT7YgAwICKejIhnImJHv6Eru0JqMQ04KyLqgIeBKW3TNW1lV/8/aRWdi/0CUmuIiLOA4cBR7d2XchQRnYAbgXPauSvK6UzussUYcqOzj0XEkJTS2nbtVXk6A5idUvpZRBwB/HNEHJpS2tLeHVPxletI1Grgyy3uV+WPbfOciOhMbph2TZv0rnwUUgciYhzwY2BiSmljG/Wt3OysFnsDhwK1EbEKOByY6+Tyoijk+6IOmJtS2pRSWgm8Ri5UqXUVUovvAvcApJSeBrqR28tNbaug/09aW7mGqN8B/SOiX0R0JTcZcO5W58wFzs7fngQ8klxUq7XttA4RUQP8A7kA5byP4tlhLVJK9SmlfVNKfVNKfcnNT5uYUlrUPt3drRXy8+n/kBuFIiL2JXd578227GSZKKQWbwHHAETEIHIh6v027aUgV5e/zf+V3uFAfUrpnWK/aFlezkspNUXEBcA8oAKYlVJaEhHXAItSSnOBO8kNy75ObjLb6e3X491TgXW4AegO3Juf1/9WSmliu3V6N1VgLdQGCqzFPOC4iFgKbAb+W0rJkfJWVmAt/ivwjxFxMblJ5uf4C3fri4jfkPvFYd/8/LOrgS4AKaWZ5OajfRt4HVgP/F2b9MtaS5Ik7bpyvZwnSZL0uRiiJEmSMjBESZIkZWCIkiRJysAQJUmSlIEhSlLJiYjNEbE4Il6JiHsjYs9WaPOa/MKt23v8/Ij428/7OpLKh0scSCo5EdGQUuqev/1r4PmU0o0tHu+c39NSktqNI1GSSt3jwNciYkxEPB4Rc4GlEVERETdExO8i4vcR8b1PnhARP4qIlyPipYiYnj82OyIm5W9Pj4il+efNyB+bFhGX5m9X5zf2/X1E3B8RPfPHayPi+oh4LiJei4gj2/rDkFQ6ynLFckkdQ37fyuOBf8sfOgw4NKW0MiLOI7e1w4iIqASejIj5wF8AJwGjUkrrI+KLW7XZC/hL4C9SSikiemzjpX8JTEkpPZpfnfpq4KL8Y51TSiMj4tv549u9RChp9+ZIlKRStEdELAYWkdub7M788efyG+4CHEdur6zFwLNAL3Kb8I4D/imltB4gpfSnrdquBzYAd0bEKeS2iGgWEfsAPVJKj+YP3QV8q8Upv81/fR7o+3nepKSOzZEoSaXo45RSdcsD+b0T17U8RG60aN5W543fUcP5/dBGkts0dhJwAXD0LvRtY/7rZvwZKpU1R6IkdVTzgO9HRBeAiBgQEXsB/w783Sd/0beNy3ndgX1SSg8DFwNfb/l4Sqke+HOL+U5/AzyKJG3F36IkdVR3kLuc9kLkhqneB05OKf1bRFQDiyKikdzu7le0eN7ewAMR0Y3caNYl22j7bGBmPoi9SRvtCC+pY3GJA0mSpAy8nCdJkpSBIUqSJCkDQ5QkSVIGhihJkqQMDFGSJEkZGKIkSZIyMERJkiRlYIiSJEnK4D8ACeHZRzHKI4YAAAAASUVORK5CYII=\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_prc(\"Train Baseline\", train_labels, train_predictions_baseline, color=colors[0])\n",
"plot_prc(\"Test Baseline\", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')\n",
"plt.legend(loc='lower right');"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gpdsFyp64DhY"
},
"source": [
"It looks like the precision is relatively high, but the recall and the area under the ROC curve (AUC) aren't as high as you might like. Classifiers often face challenges when trying to maximize both precision and recall, which is especially true when working with imbalanced datasets. It is important to consider the costs of different types of errors in the context of the problem you care about. In this example, a false negative (a fraudulent transaction is missed) may have a financial cost, while a false positive (a transaction is incorrectly flagged as fraudulent) may decrease user happiness."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cveQoiMyGQCo"
},
"source": [
"## Class weights"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ePGp6GUE1WfH"
},
"source": [
"### Calculate class weights\n",
"\n",
"The goal is to identify fraudulent transactions, but you don't have very many of those positive samples to work with, so you would want to have the classifier heavily weight the few examples that are available. You can do this by passing Keras weights for each class through a parameter. These will cause the model to \"pay more attention\" to examples from an under-represented class."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qjGWErngGny7",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "d6e2e36b-e6ee-4972-9363-d31af7c60573"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Weight for class 0: 0.50\n",
"Weight for class 1: 289.44\n"
]
}
],
"source": [
"# Scaling by total/2 helps keep the loss to a similar magnitude.\n",
"# The sum of the weights of all examples stays the same.\n",
"weight_for_0 = (1 / neg) * (total / 2.0)\n",
"weight_for_1 = (1 / pos) * (total / 2.0)\n",
"\n",
"class_weight = {0: weight_for_0, 1: weight_for_1}\n",
"\n",
"print('Weight for class 0: {:.2f}'.format(weight_for_0))\n",
"print('Weight for class 1: {:.2f}'.format(weight_for_1))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Mk1OOE2ZSHzy"
},
"source": [
"### Train a model with class weights\n",
"\n",
"Now try re-training and evaluating the model with class weights to see how that affects the predictions.\n",
"\n",
"Note: Using `class_weights` changes the range of the loss. This may affect the stability of the training depending on the optimizer. Optimizers whose step size is dependent on the magnitude of the gradient, like `tf.keras.optimizers.SGD`, may fail. The optimizer used here, `tf.keras.optimizers.Adam`, is unaffected by the scaling change. Also note that because of the weighting, the total losses are not comparable between the two models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UJ589fn8ST3x",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a1e65a44-e8dc-4867-9747-fb9ca94c37fa"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/100\n",
"90/90 [==============================] - 3s 12ms/step - loss: 1.9099 - tp: 161.0000 - fp: 633.0000 - tn: 238177.0000 - fn: 267.0000 - accuracy: 0.9962 - precision: 0.2028 - recall: 0.3762 - auc: 0.8187 - prc: 0.1817 - val_loss: 0.0118 - val_tp: 41.0000 - val_fp: 37.0000 - val_tn: 45468.0000 - val_fn: 23.0000 - val_accuracy: 0.9987 - val_precision: 0.5256 - val_recall: 0.6406 - val_auc: 0.9546 - val_prc: 0.4972\n",
"Epoch 2/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.7187 - tp: 212.0000 - fp: 1554.0000 - tn: 180392.0000 - fn: 118.0000 - accuracy: 0.9908 - precision: 0.1200 - recall: 0.6424 - auc: 0.9072 - prc: 0.3170 - val_loss: 0.0191 - val_tp: 55.0000 - val_fp: 82.0000 - val_tn: 45423.0000 - val_fn: 9.0000 - val_accuracy: 0.9980 - val_precision: 0.4015 - val_recall: 0.8594 - val_auc: 0.9555 - val_prc: 0.6340\n",
"Epoch 3/100\n",
"90/90 [==============================] - 0s 3ms/step - loss: 0.4208 - tp: 252.0000 - fp: 2475.0000 - tn: 179471.0000 - fn: 78.0000 - accuracy: 0.9860 - precision: 0.0924 - recall: 0.7636 - auc: 0.9538 - prc: 0.3165 - val_loss: 0.0284 - val_tp: 56.0000 - val_fp: 292.0000 - val_tn: 45213.0000 - val_fn: 8.0000 - val_accuracy: 0.9934 - val_precision: 0.1609 - val_recall: 0.8750 - val_auc: 0.9604 - val_prc: 0.6295\n",
"Epoch 4/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.3554 - tp: 261.0000 - fp: 3416.0000 - tn: 178530.0000 - fn: 69.0000 - accuracy: 0.9809 - precision: 0.0710 - recall: 0.7909 - auc: 0.9613 - prc: 0.2963 - val_loss: 0.0399 - val_tp: 56.0000 - val_fp: 491.0000 - val_tn: 45014.0000 - val_fn: 8.0000 - val_accuracy: 0.9890 - val_precision: 0.1024 - val_recall: 0.8750 - val_auc: 0.9644 - val_prc: 0.6223\n",
"Epoch 5/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.3585 - tp: 267.0000 - fp: 4435.0000 - tn: 177511.0000 - fn: 63.0000 - accuracy: 0.9753 - precision: 0.0568 - recall: 0.8091 - auc: 0.9551 - prc: 0.2496 - val_loss: 0.0508 - val_tp: 56.0000 - val_fp: 640.0000 - val_tn: 44865.0000 - val_fn: 8.0000 - val_accuracy: 0.9858 - val_precision: 0.0805 - val_recall: 0.8750 - val_auc: 0.9642 - val_prc: 0.5909\n",
"Epoch 6/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.3524 - tp: 276.0000 - fp: 5129.0000 - tn: 176817.0000 - fn: 54.0000 - accuracy: 0.9716 - precision: 0.0511 - recall: 0.8364 - auc: 0.9485 - prc: 0.2396 - val_loss: 0.0602 - val_tp: 56.0000 - val_fp: 769.0000 - val_tn: 44736.0000 - val_fn: 8.0000 - val_accuracy: 0.9829 - val_precision: 0.0679 - val_recall: 0.8750 - val_auc: 0.9636 - val_prc: 0.5765\n",
"Epoch 7/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.2563 - tp: 288.0000 - fp: 5541.0000 - tn: 176405.0000 - fn: 42.0000 - accuracy: 0.9694 - precision: 0.0494 - recall: 0.8727 - auc: 0.9687 - prc: 0.2447 - val_loss: 0.0654 - val_tp: 56.0000 - val_fp: 816.0000 - val_tn: 44689.0000 - val_fn: 8.0000 - val_accuracy: 0.9819 - val_precision: 0.0642 - val_recall: 0.8750 - val_auc: 0.9639 - val_prc: 0.5735\n",
"Epoch 8/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.2904 - tp: 287.0000 - fp: 5962.0000 - tn: 175984.0000 - fn: 43.0000 - accuracy: 0.9671 - precision: 0.0459 - recall: 0.8697 - auc: 0.9553 - prc: 0.2233 - val_loss: 0.0698 - val_tp: 56.0000 - val_fp: 845.0000 - val_tn: 44660.0000 - val_fn: 8.0000 - val_accuracy: 0.9813 - val_precision: 0.0622 - val_recall: 0.8750 - val_auc: 0.9668 - val_prc: 0.5538\n",
"Epoch 9/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.2646 - tp: 289.0000 - fp: 6549.0000 - tn: 175397.0000 - fn: 41.0000 - accuracy: 0.9638 - precision: 0.0423 - recall: 0.8758 - auc: 0.9626 - prc: 0.2052 - val_loss: 0.0805 - val_tp: 56.0000 - val_fp: 985.0000 - val_tn: 44520.0000 - val_fn: 8.0000 - val_accuracy: 0.9782 - val_precision: 0.0538 - val_recall: 0.8750 - val_auc: 0.9665 - val_prc: 0.5125\n",
"Epoch 10/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.2696 - tp: 287.0000 - fp: 6687.0000 - tn: 175259.0000 - fn: 43.0000 - accuracy: 0.9631 - precision: 0.0412 - recall: 0.8697 - auc: 0.9648 - prc: 0.2014 - val_loss: 0.0834 - val_tp: 58.0000 - val_fp: 1006.0000 - val_tn: 44499.0000 - val_fn: 6.0000 - val_accuracy: 0.9778 - val_precision: 0.0545 - val_recall: 0.9062 - val_auc: 0.9676 - val_prc: 0.5071\n",
"Epoch 11/100\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.2782 - tp: 286.0000 - fp: 6986.0000 - tn: 174960.0000 - fn: 44.0000 - accuracy: 0.9614 - precision: 0.0393 - recall: 0.8667 - auc: 0.9617 - prc: 0.1935 - val_loss: 0.0853 - val_tp: 58.0000 - val_fp: 1015.0000 - val_tn: 44490.0000 - val_fn: 6.0000 - val_accuracy: 0.9776 - val_precision: 0.0541 - val_recall: 0.9062 - val_auc: 0.9671 - val_prc: 0.5073\n",
"Epoch 12/100\n",
"82/90 [==========================>...] - ETA: 0s - loss: 0.2518 - tp: 275.0000 - fp: 6622.0000 - tn: 161004.0000 - fn: 35.0000 - accuracy: 0.9604 - precision: 0.0399 - recall: 0.8871 - auc: 0.9664 - prc: 0.1989Restoring model weights from the end of the best epoch: 2.\n",
"90/90 [==============================] - 0s 4ms/step - loss: 0.2601 - tp: 292.0000 - fp: 7205.0000 - tn: 174741.0000 - fn: 38.0000 - accuracy: 0.9603 - precision: 0.0389 - recall: 0.8848 - auc: 0.9631 - prc: 0.1924 - val_loss: 0.0842 - val_tp: 58.0000 - val_fp: 988.0000 - val_tn: 44517.0000 - val_fn: 6.0000 - val_accuracy: 0.9782 - val_precision: 0.0554 - val_recall: 0.9062 - val_auc: 0.9676 - val_prc: 0.5183\n",
"Epoch 12: early stopping\n"
]
}
],
"source": [
"weighted_model = make_model()\n",
"weighted_model.load_weights(initial_weights)\n",
"\n",
"weighted_history = weighted_model.fit(\n",
" train_features,\n",
" train_labels,\n",
" batch_size=BATCH_SIZE,\n",
" epochs=EPOCHS,\n",
" callbacks=[early_stopping],\n",
" validation_data=(val_features, val_labels),\n",
" # The class weights go here\n",
" class_weight=class_weight) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R0ynYRO0G3Lx"
},
"source": [
"### Check training history"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BBe9FMO5ucTC",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 610
},
"outputId": "e36725bb-f0ec-45d9-e52a-a8c8c47c69ad"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 4 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_metrics(weighted_history)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "REy6WClTZIwQ"
},
"source": [
"### Evaluate metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nifqscPGw-5w",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "63f36338-8e78-4687-ea8c-d41f3ebcbc6b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"90/90 [==============================] - 0s 954us/step\n",
"28/28 [==============================] - 0s 999us/step\n"
]
}
],
"source": [
"train_predictions_weighted = weighted_model.predict(train_features, batch_size=BATCH_SIZE)\n",
"test_predictions_weighted = weighted_model.predict(test_features, batch_size=BATCH_SIZE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "owKL2vdMBJr6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 636
},
"outputId": "c5bc6b0e-e4aa-422d-c6fe-a9e2447337f3"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"loss : 0.019699709489941597\n",
"tp : 87.0\n",
"fp : 119.0\n",
"tn : 56745.0\n",
"fn : 11.0\n",
"accuracy : 0.9977177977561951\n",
"precision : 0.4223301112651825\n",
"recall : 0.8877550959587097\n",
"auc : 0.9575821757316589\n",
"prc : 0.6527646780014038\n",
"\n",
"Legitimate Transactions Detected (True Negatives): 56745\n",
"Legitimate Transactions Incorrectly Detected (False Positives): 119\n",
"Fraudulent Transactions Missed (False Negatives): 11\n",
"Fraudulent Transactions Detected (True Positives): 87\n",
"Total Fraudulent Transactions: 98\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 360x360 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"weighted_results = weighted_model.evaluate(test_features, test_labels,\n",
" batch_size=BATCH_SIZE, verbose=0)\n",
"for name, value in zip(weighted_model.metrics_names, weighted_results):\n",
" print(name, ': ', value)\n",
"print()\n",
"\n",
"plot_cm(test_labels, test_predictions_weighted)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PTh1rtDn8r4-"
},
"source": [
"Here you can see that with class weights the accuracy and precision are lower because there are more false positives, but conversely the recall and AUC are higher because the model also found more true positives. Despite having lower accuracy, this model has higher recall (and identifies more fraudulent transactions). Of course, there is a cost to both types of error (you wouldn't want to bug users by flagging too many legitimate transactions as fraudulent, either). Carefully consider the trade-offs between these different types of errors for your application."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hXDAwyr0HYdX"
},
"source": [
"### Plot the ROC"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3hzScIVZS1Xm",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 606
},
"outputId": "4b28710a-7518-44be-b646-6a01380dd47b"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_roc(\"Train Baseline\", train_labels, train_predictions_baseline, color=colors[0])\n",
"plot_roc(\"Test Baseline\", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')\n",
"\n",
"plot_roc(\"Train Weighted\", train_labels, train_predictions_weighted, color=colors[1])\n",
"plot_roc(\"Test Weighted\", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')\n",
"\n",
"\n",
"plt.legend(loc='lower right');"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_0krS8g1OTbD"
},
"source": [
"### Plot the AUPRC"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7jHnmVebOWOC",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 606
},
"outputId": "e945381c-9e35-4b41-e952-8a936098b1f6"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_prc(\"Train Baseline\", train_labels, train_predictions_baseline, color=colors[0])\n",
"plot_prc(\"Test Baseline\", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')\n",
"\n",
"plot_prc(\"Train Weighted\", train_labels, train_predictions_weighted, color=colors[1])\n",
"plot_prc(\"Test Weighted\", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')\n",
"\n",
"\n",
"plt.legend(loc='lower right');"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5ysRtr6xHnXP"
},
"source": [
"## Oversampling"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "18VUHNc-UF5w"
},
"source": [
"### Oversample the minority class\n",
"\n",
"A related approach would be to resample the dataset by oversampling the minority class."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sHirNp6u7OWp"
},
"outputs": [],
"source": [
"pos_features = train_features[bool_train_labels]\n",
"neg_features = train_features[~bool_train_labels]\n",
"\n",
"pos_labels = train_labels[bool_train_labels]\n",
"neg_labels = train_labels[~bool_train_labels]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WgBVbX7P7QrL"
},
"source": [
"#### Using NumPy\n",
"\n",
"You can balance the dataset manually by choosing the right number of random \n",
"indices from the positive examples:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BUzGjSkwqT88",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "39b397f1-c8aa-4435-83f7-b02c38bb15fe"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(181946, 29)"
]
},
"metadata": {},
"execution_count": 43
}
],
"source": [
"ids = np.arange(len(pos_features))\n",
"choices = np.random.choice(ids, len(neg_features))\n",
"\n",
"res_pos_features = pos_features[choices]\n",
"res_pos_labels = pos_labels[choices]\n",
"\n",
"res_pos_features.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7ie_FFet6cep",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "0bbd51e5-f0b8-48eb-8b10-6d9804107996"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(363892, 29)"
]
},
"metadata": {},
"execution_count": 44
}
],
"source": [
"resampled_features = np.concatenate([res_pos_features, neg_features], axis=0)\n",
"resampled_labels = np.concatenate([res_pos_labels, neg_labels], axis=0)\n",
"\n",
"order = np.arange(len(resampled_labels))\n",
"np.random.shuffle(order)\n",
"resampled_features = resampled_features[order]\n",
"resampled_labels = resampled_labels[order]\n",
"\n",
"resampled_features.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IYfJe2Kc-FAz"
},
"source": [
"#### Using `tf.data`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "usyixaST8v5P"
},
"source": [
"If you're using `tf.data` the easiest way to produce balanced examples is to start with a `positive` and a `negative` dataset, and merge them. See [the tf.data guide](../../guide/data.ipynb) for more examples."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yF4OZ-rI6xb6"
},
"outputs": [],
"source": [
"BUFFER_SIZE = 100000\n",
"\n",
"def make_ds(features, labels):\n",
" ds = tf.data.Dataset.from_tensor_slices((features, labels))#.cache()\n",
" ds = ds.shuffle(BUFFER_SIZE).repeat()\n",
" return ds\n",
"\n",
"pos_ds = make_ds(pos_features, pos_labels)\n",
"neg_ds = make_ds(neg_features, neg_labels)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RNQUx-OA-oJc"
},
"source": [
"Each dataset provides `(feature, label)` pairs:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "llXc9rNH7Fbz",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8e8060f1-2a62-435a-b351-618bfb197048"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Features:\n",
" [-5. 5. -5. 5. -5. 1.93373045\n",
" -5. -5. -5. -5. 3.15412161 -5.\n",
" -1.5283744 0.12136576 -3.38718508 -5. -5. -5.\n",
" -1.5258247 5. -5. 5. 4.44827287 0.84823809\n",
" -3.07621607 -0.95345978 -5. -1.01224308 -1.45701771]\n",
"\n",
"Label: 1\n"
]
}
],
"source": [
"for features, label in pos_ds.take(1):\n",
" print(\"Features:\\n\", features.numpy())\n",
" print()\n",
" print(\"Label: \", label.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sLEfjZO0-vbN"
},
"source": [
"Merge the two together using `tf.data.Dataset.sample_from_datasets`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "e7w9UQPT9wzE"
},
"outputs": [],
"source": [
"resampled_ds = tf.data.Dataset.sample_from_datasets([pos_ds, neg_ds], weights=[0.5, 0.5])\n",
"resampled_ds = resampled_ds.batch(BATCH_SIZE).prefetch(2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EWXARdTdAuQK",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "57a91a39-6efb-4613-810a-ac1f0c6c9e10"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.49169921875\n"
]
}
],
"source": [
"for features, label in resampled_ds.take(1):\n",
" print(label.numpy().mean())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "irgqf3YxAyN0"
},
"source": [
"To use this dataset, you'll need the number of steps per epoch.\n",
"\n",
"The definition of \"epoch\" in this case is less clear. Say it's the number of batches required to see each negative example once:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xH-7K46AAxpq",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8842a3fb-4a7d-41a0-b13a-bc1c36ffc163"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"278.0"
]
},
"metadata": {},
"execution_count": 49
}
],
"source": [
"resampled_steps_per_epoch = np.ceil(2.0*neg/BATCH_SIZE)\n",
"resampled_steps_per_epoch"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XZ1BvEpcBVHP"
},
"source": [
"### Train on the oversampled data\n",
"\n",
"Now try training the model with the resampled data set instead of using class weights to see how these methods compare.\n",
"\n",
"Note: Because the data was balanced by replicating the positive examples, the total dataset size is larger, and each epoch runs for more training steps. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "soRQ89JYqd6b",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "6f1fefa5-7e31-4987-ce19-94a4d585e249"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/100\n",
"278/278 [==============================] - 18s 58ms/step - loss: 0.4877 - tp: 250573.0000 - fp: 100739.0000 - tn: 240932.0000 - fn: 34062.0000 - accuracy: 0.7848 - precision: 0.7132 - recall: 0.8803 - auc: 0.9083 - prc: 0.9243 - val_loss: 0.3054 - val_tp: 56.0000 - val_fp: 2631.0000 - val_tn: 42874.0000 - val_fn: 8.0000 - val_accuracy: 0.9421 - val_precision: 0.0208 - val_recall: 0.8750 - val_auc: 0.9636 - val_prc: 0.7582\n",
"Epoch 2/100\n",
"278/278 [==============================] - 16s 58ms/step - loss: 0.2275 - tp: 259894.0000 - fp: 22063.0000 - tn: 262829.0000 - fn: 24558.0000 - accuracy: 0.9181 - precision: 0.9218 - recall: 0.9137 - auc: 0.9688 - prc: 0.9761 - val_loss: 0.1678 - val_tp: 57.0000 - val_fp: 1190.0000 - val_tn: 44315.0000 - val_fn: 7.0000 - val_accuracy: 0.9737 - val_precision: 0.0457 - val_recall: 0.8906 - val_auc: 0.9760 - val_prc: 0.7727\n",
"Epoch 3/100\n",
"278/278 [==============================] - 16s 58ms/step - loss: 0.1802 - tp: 261140.0000 - fp: 12903.0000 - tn: 271842.0000 - fn: 23459.0000 - accuracy: 0.9361 - precision: 0.9529 - recall: 0.9176 - auc: 0.9796 - prc: 0.9837 - val_loss: 0.1198 - val_tp: 58.0000 - val_fp: 977.0000 - val_tn: 44528.0000 - val_fn: 6.0000 - val_accuracy: 0.9784 - val_precision: 0.0560 - val_recall: 0.9062 - val_auc: 0.9804 - val_prc: 0.7648\n",
"Epoch 4/100\n",
"278/278 [==============================] - 17s 62ms/step - loss: 0.1553 - tp: 262766.0000 - fp: 10062.0000 - tn: 274653.0000 - fn: 21863.0000 - accuracy: 0.9439 - precision: 0.9631 - recall: 0.9232 - auc: 0.9848 - prc: 0.9874 - val_loss: 0.0975 - val_tp: 58.0000 - val_fp: 946.0000 - val_tn: 44559.0000 - val_fn: 6.0000 - val_accuracy: 0.9791 - val_precision: 0.0578 - val_recall: 0.9062 - val_auc: 0.9800 - val_prc: 0.7437\n",
"Epoch 5/100\n",
"278/278 [==============================] - 16s 58ms/step - loss: 0.1396 - tp: 264369.0000 - fp: 9105.0000 - tn: 275445.0000 - fn: 20425.0000 - accuracy: 0.9481 - precision: 0.9667 - recall: 0.9283 - auc: 0.9880 - prc: 0.9896 - val_loss: 0.0824 - val_tp: 58.0000 - val_fp: 811.0000 - val_tn: 44694.0000 - val_fn: 6.0000 - val_accuracy: 0.9821 - val_precision: 0.0667 - val_recall: 0.9062 - val_auc: 0.9824 - val_prc: 0.7337\n",
"Epoch 6/100\n",
"278/278 [==============================] - 16s 58ms/step - loss: 0.1289 - tp: 265424.0000 - fp: 8267.0000 - tn: 276412.0000 - fn: 19241.0000 - accuracy: 0.9517 - precision: 0.9698 - recall: 0.9324 - auc: 0.9899 - prc: 0.9910 - val_loss: 0.0741 - val_tp: 58.0000 - val_fp: 757.0000 - val_tn: 44748.0000 - val_fn: 6.0000 - val_accuracy: 0.9833 - val_precision: 0.0712 - val_recall: 0.9062 - val_auc: 0.9830 - val_prc: 0.7015\n",
"Epoch 7/100\n",
"278/278 [==============================] - 17s 60ms/step - loss: 0.1217 - tp: 266675.0000 - fp: 7973.0000 - tn: 276277.0000 - fn: 18419.0000 - accuracy: 0.9536 - precision: 0.9710 - recall: 0.9354 - auc: 0.9911 - prc: 0.9919 - val_loss: 0.0695 - val_tp: 58.0000 - val_fp: 771.0000 - val_tn: 44734.0000 - val_fn: 6.0000 - val_accuracy: 0.9829 - val_precision: 0.0700 - val_recall: 0.9062 - val_auc: 0.9825 - val_prc: 0.6844\n",
"Epoch 8/100\n",
"278/278 [==============================] - 16s 59ms/step - loss: 0.1158 - tp: 266707.0000 - fp: 7594.0000 - tn: 277546.0000 - fn: 17497.0000 - accuracy: 0.9559 - precision: 0.9723 - recall: 0.9384 - auc: 0.9921 - prc: 0.9926 - val_loss: 0.0643 - val_tp: 58.0000 - val_fp: 719.0000 - val_tn: 44786.0000 - val_fn: 6.0000 - val_accuracy: 0.9841 - val_precision: 0.0746 - val_recall: 0.9062 - val_auc: 0.9820 - val_prc: 0.6849\n",
"Epoch 9/100\n",
"278/278 [==============================] - 16s 57ms/step - loss: 0.1109 - tp: 267739.0000 - fp: 7329.0000 - tn: 277205.0000 - fn: 17071.0000 - accuracy: 0.9571 - precision: 0.9734 - recall: 0.9401 - auc: 0.9929 - prc: 0.9933 - val_loss: 0.0610 - val_tp: 59.0000 - val_fp: 731.0000 - val_tn: 44774.0000 - val_fn: 5.0000 - val_accuracy: 0.9838 - val_precision: 0.0747 - val_recall: 0.9219 - val_auc: 0.9775 - val_prc: 0.6758\n",
"Epoch 10/100\n",
"278/278 [==============================] - 15s 56ms/step - loss: 0.1072 - tp: 267584.0000 - fp: 7277.0000 - tn: 277699.0000 - fn: 16784.0000 - accuracy: 0.9577 - precision: 0.9735 - recall: 0.9410 - auc: 0.9934 - prc: 0.9935 - val_loss: 0.0573 - val_tp: 59.0000 - val_fp: 682.0000 - val_tn: 44823.0000 - val_fn: 5.0000 - val_accuracy: 0.9849 - val_precision: 0.0796 - val_recall: 0.9219 - val_auc: 0.9778 - val_prc: 0.6765\n",
"Epoch 11/100\n",
"278/278 [==============================] - 16s 56ms/step - loss: 0.1028 - tp: 268128.0000 - fp: 7172.0000 - tn: 277865.0000 - fn: 16179.0000 - accuracy: 0.9590 - precision: 0.9739 - recall: 0.9431 - auc: 0.9941 - prc: 0.9942 - val_loss: 0.0545 - val_tp: 59.0000 - val_fp: 673.0000 - val_tn: 44832.0000 - val_fn: 5.0000 - val_accuracy: 0.9851 - val_precision: 0.0806 - val_recall: 0.9219 - val_auc: 0.9784 - val_prc: 0.6766\n",
"Epoch 12/100\n",
"278/278 [==============================] - ETA: 0s - loss: 0.0986 - tp: 269097.0000 - fp: 6990.0000 - tn: 277546.0000 - fn: 15711.0000 - accuracy: 0.9601 - precision: 0.9747 - recall: 0.9448 - auc: 0.9946 - prc: 0.9946Restoring model weights from the end of the best epoch: 2.\n",
"278/278 [==============================] - 16s 58ms/step - loss: 0.0986 - tp: 269097.0000 - fp: 6990.0000 - tn: 277546.0000 - fn: 15711.0000 - accuracy: 0.9601 - precision: 0.9747 - recall: 0.9448 - auc: 0.9946 - prc: 0.9946 - val_loss: 0.0504 - val_tp: 58.0000 - val_fp: 589.0000 - val_tn: 44916.0000 - val_fn: 6.0000 - val_accuracy: 0.9869 - val_precision: 0.0896 - val_recall: 0.9062 - val_auc: 0.9789 - val_prc: 0.6771\n",
"Epoch 12: early stopping\n"
]
}
],
"source": [
"resampled_model = make_model()\n",
"resampled_model.load_weights(initial_weights)\n",
"\n",
"# Reset the bias to zero, since this dataset is balanced.\n",
"output_layer = resampled_model.layers[-1] \n",
"output_layer.bias.assign([0])\n",
"\n",
"val_ds = tf.data.Dataset.from_tensor_slices((val_features, val_labels)).cache()\n",
"val_ds = val_ds.batch(BATCH_SIZE).prefetch(2) \n",
"\n",
"resampled_history = resampled_model.fit(\n",
" resampled_ds,\n",
" epochs=EPOCHS,\n",
" steps_per_epoch=resampled_steps_per_epoch,\n",
" callbacks=[early_stopping],\n",
" validation_data=val_ds)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "avALvzUp3T_c"
},
"source": [
"If the training process were considering the whole dataset on each gradient update, this oversampling would be basically identical to the class weighting.\n",
"\n",
"But when training the model batch-wise, as you did here, the oversampled data provides a smoother gradient signal: Instead of each positive example being shown in one batch with a large weight, they're shown in many different batches each time with a small weight. \n",
"\n",
"This smoother gradient signal makes it easier to train the model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "klHZ0HV76VC5"
},
"source": [
"### Check training history\n",
"\n",
"Note that the distributions of metrics will be different here, because the training data has a totally different distribution from the validation and test data. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YoUGfr1vuivl",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 610
},
"outputId": "2c4f2b84-a1f8-49dc-b3f6-a4d1e955dfd6"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 4 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_metrics(resampled_history)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1PuH3A2vnwrh"
},
"source": [
"### Re-train\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KFLxRL8eoDE5"
},
"source": [
"Because training is easier on the balanced data, the above training procedure may overfit quickly. \n",
"\n",
"So break up the epochs to give the `tf.keras.callbacks.EarlyStopping` finer control over when to stop training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "e_yn9I26qAHU",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "2ad10c68-4d73-4209-8647-91dc7e43f022"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/1000\n",
"20/20 [==============================] - 3s 95ms/step - loss: 1.2206 - tp: 12496.0000 - fp: 14495.0000 - tn: 51590.0000 - fn: 7948.0000 - accuracy: 0.7406 - precision: 0.4630 - recall: 0.6112 - auc: 0.8062 - prc: 0.6422 - val_loss: 1.0122 - val_tp: 61.0000 - val_fp: 34841.0000 - val_tn: 10664.0000 - val_fn: 3.0000 - val_accuracy: 0.2354 - val_precision: 0.0017 - val_recall: 0.9531 - val_auc: 0.8489 - val_prc: 0.0334\n",
"Epoch 2/1000\n",
"20/20 [==============================] - 1s 63ms/step - loss: 0.8277 - tp: 15989.0000 - fp: 13466.0000 - tn: 7202.0000 - fn: 4303.0000 - accuracy: 0.5662 - precision: 0.5428 - recall: 0.7879 - auc: 0.7348 - prc: 0.8128 - val_loss: 0.9382 - val_tp: 62.0000 - val_fp: 32218.0000 - val_tn: 13287.0000 - val_fn: 2.0000 - val_accuracy: 0.2929 - val_precision: 0.0019 - val_recall: 0.9688 - val_auc: 0.9321 - val_prc: 0.2633\n",
"Epoch 3/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.6473 - tp: 17710.0000 - fp: 12206.0000 - tn: 8309.0000 - fn: 2735.0000 - accuracy: 0.6352 - precision: 0.5920 - recall: 0.8662 - auc: 0.8327 - prc: 0.8829 - val_loss: 0.8464 - val_tp: 62.0000 - val_fp: 27746.0000 - val_tn: 17759.0000 - val_fn: 2.0000 - val_accuracy: 0.3911 - val_precision: 0.0022 - val_recall: 0.9688 - val_auc: 0.9385 - val_prc: 0.4402\n",
"Epoch 4/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.5544 - tp: 18412.0000 - fp: 10958.0000 - tn: 9563.0000 - fn: 2027.0000 - accuracy: 0.6830 - precision: 0.6269 - recall: 0.9008 - auc: 0.8811 - prc: 0.9170 - val_loss: 0.7529 - val_tp: 61.0000 - val_fp: 21846.0000 - val_tn: 23659.0000 - val_fn: 3.0000 - val_accuracy: 0.5205 - val_precision: 0.0028 - val_recall: 0.9531 - val_auc: 0.9415 - val_prc: 0.5780\n",
"Epoch 5/1000\n",
"20/20 [==============================] - 1s 66ms/step - loss: 0.4857 - tp: 18868.0000 - fp: 9194.0000 - tn: 11226.0000 - fn: 1672.0000 - accuracy: 0.7347 - precision: 0.6724 - recall: 0.9186 - auc: 0.9101 - prc: 0.9363 - val_loss: 0.6678 - val_tp: 60.0000 - val_fp: 15709.0000 - val_tn: 29796.0000 - val_fn: 4.0000 - val_accuracy: 0.6552 - val_precision: 0.0038 - val_recall: 0.9375 - val_auc: 0.9446 - val_prc: 0.6466\n",
"Epoch 6/1000\n",
"20/20 [==============================] - 1s 63ms/step - loss: 0.4444 - tp: 18864.0000 - fp: 7907.0000 - tn: 12480.0000 - fn: 1709.0000 - accuracy: 0.7652 - precision: 0.7046 - recall: 0.9169 - auc: 0.9193 - prc: 0.9428 - val_loss: 0.5958 - val_tp: 58.0000 - val_fp: 11120.0000 - val_tn: 34385.0000 - val_fn: 6.0000 - val_accuracy: 0.7558 - val_precision: 0.0052 - val_recall: 0.9062 - val_auc: 0.9477 - val_prc: 0.6954\n",
"Epoch 7/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.4031 - tp: 18870.0000 - fp: 6564.0000 - tn: 13901.0000 - fn: 1625.0000 - accuracy: 0.8001 - precision: 0.7419 - recall: 0.9207 - auc: 0.9313 - prc: 0.9509 - val_loss: 0.5337 - val_tp: 58.0000 - val_fp: 8170.0000 - val_tn: 37335.0000 - val_fn: 6.0000 - val_accuracy: 0.8206 - val_precision: 0.0070 - val_recall: 0.9062 - val_auc: 0.9501 - val_prc: 0.7155\n",
"Epoch 8/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.3746 - tp: 18931.0000 - fp: 5558.0000 - tn: 14768.0000 - fn: 1703.0000 - accuracy: 0.8227 - precision: 0.7730 - recall: 0.9175 - auc: 0.9354 - prc: 0.9546 - val_loss: 0.4823 - val_tp: 57.0000 - val_fp: 6281.0000 - val_tn: 39224.0000 - val_fn: 7.0000 - val_accuracy: 0.8620 - val_precision: 0.0090 - val_recall: 0.8906 - val_auc: 0.9526 - val_prc: 0.7334\n",
"Epoch 9/1000\n",
"20/20 [==============================] - 1s 66ms/step - loss: 0.3537 - tp: 18794.0000 - fp: 4814.0000 - tn: 15606.0000 - fn: 1746.0000 - accuracy: 0.8398 - precision: 0.7961 - recall: 0.9150 - auc: 0.9395 - prc: 0.9575 - val_loss: 0.4398 - val_tp: 56.0000 - val_fp: 5080.0000 - val_tn: 40425.0000 - val_fn: 8.0000 - val_accuracy: 0.8883 - val_precision: 0.0109 - val_recall: 0.8750 - val_auc: 0.9556 - val_prc: 0.7405\n",
"Epoch 10/1000\n",
"20/20 [==============================] - 1s 56ms/step - loss: 0.3313 - tp: 18621.0000 - fp: 4201.0000 - tn: 16408.0000 - fn: 1730.0000 - accuracy: 0.8552 - precision: 0.8159 - recall: 0.9150 - auc: 0.9458 - prc: 0.9606 - val_loss: 0.4038 - val_tp: 56.0000 - val_fp: 4309.0000 - val_tn: 41196.0000 - val_fn: 8.0000 - val_accuracy: 0.9053 - val_precision: 0.0128 - val_recall: 0.8750 - val_auc: 0.9575 - val_prc: 0.7422\n",
"Epoch 11/1000\n",
"20/20 [==============================] - 1s 69ms/step - loss: 0.3138 - tp: 18697.0000 - fp: 3637.0000 - tn: 16917.0000 - fn: 1709.0000 - accuracy: 0.8695 - precision: 0.8372 - recall: 0.9163 - auc: 0.9503 - prc: 0.9637 - val_loss: 0.3724 - val_tp: 56.0000 - val_fp: 3653.0000 - val_tn: 41852.0000 - val_fn: 8.0000 - val_accuracy: 0.9197 - val_precision: 0.0151 - val_recall: 0.8750 - val_auc: 0.9590 - val_prc: 0.7549\n",
"Epoch 12/1000\n",
"20/20 [==============================] - 1s 63ms/step - loss: 0.3006 - tp: 18641.0000 - fp: 3245.0000 - tn: 17335.0000 - fn: 1739.0000 - accuracy: 0.8783 - precision: 0.8517 - recall: 0.9147 - auc: 0.9523 - prc: 0.9650 - val_loss: 0.3453 - val_tp: 56.0000 - val_fp: 3160.0000 - val_tn: 42345.0000 - val_fn: 8.0000 - val_accuracy: 0.9305 - val_precision: 0.0174 - val_recall: 0.8750 - val_auc: 0.9606 - val_prc: 0.7683\n",
"Epoch 13/1000\n",
"20/20 [==============================] - 1s 65ms/step - loss: 0.2857 - tp: 18754.0000 - fp: 2880.0000 - tn: 17461.0000 - fn: 1865.0000 - accuracy: 0.8842 - precision: 0.8669 - recall: 0.9095 - auc: 0.9538 - prc: 0.9668 - val_loss: 0.3227 - val_tp: 56.0000 - val_fp: 2838.0000 - val_tn: 42667.0000 - val_fn: 8.0000 - val_accuracy: 0.9375 - val_precision: 0.0194 - val_recall: 0.8750 - val_auc: 0.9620 - val_prc: 0.7580\n",
"Epoch 14/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.2797 - tp: 18542.0000 - fp: 2622.0000 - tn: 17914.0000 - fn: 1882.0000 - accuracy: 0.8900 - precision: 0.8761 - recall: 0.9079 - auc: 0.9553 - prc: 0.9669 - val_loss: 0.3035 - val_tp: 56.0000 - val_fp: 2587.0000 - val_tn: 42918.0000 - val_fn: 8.0000 - val_accuracy: 0.9431 - val_precision: 0.0212 - val_recall: 0.8750 - val_auc: 0.9638 - val_prc: 0.7585\n",
"Epoch 15/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.2647 - tp: 18770.0000 - fp: 2401.0000 - tn: 18028.0000 - fn: 1761.0000 - accuracy: 0.8984 - precision: 0.8866 - recall: 0.9142 - auc: 0.9605 - prc: 0.9707 - val_loss: 0.2854 - val_tp: 56.0000 - val_fp: 2329.0000 - val_tn: 43176.0000 - val_fn: 8.0000 - val_accuracy: 0.9487 - val_precision: 0.0235 - val_recall: 0.8750 - val_auc: 0.9652 - val_prc: 0.7585\n",
"Epoch 16/1000\n",
"20/20 [==============================] - 1s 60ms/step - loss: 0.2584 - tp: 18723.0000 - fp: 2221.0000 - tn: 18192.0000 - fn: 1824.0000 - accuracy: 0.9012 - precision: 0.8940 - recall: 0.9112 - auc: 0.9612 - prc: 0.9712 - val_loss: 0.2702 - val_tp: 56.0000 - val_fp: 2111.0000 - val_tn: 43394.0000 - val_fn: 8.0000 - val_accuracy: 0.9535 - val_precision: 0.0258 - val_recall: 0.8750 - val_auc: 0.9669 - val_prc: 0.7711\n",
"Epoch 17/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.2492 - tp: 18733.0000 - fp: 1949.0000 - tn: 18428.0000 - fn: 1850.0000 - accuracy: 0.9073 - precision: 0.9058 - recall: 0.9101 - auc: 0.9630 - prc: 0.9723 - val_loss: 0.2575 - val_tp: 56.0000 - val_fp: 1990.0000 - val_tn: 43515.0000 - val_fn: 8.0000 - val_accuracy: 0.9562 - val_precision: 0.0274 - val_recall: 0.8750 - val_auc: 0.9682 - val_prc: 0.7728\n",
"Epoch 18/1000\n",
"20/20 [==============================] - 1s 60ms/step - loss: 0.2431 - tp: 18968.0000 - fp: 1825.0000 - tn: 18409.0000 - fn: 1758.0000 - accuracy: 0.9125 - precision: 0.9122 - recall: 0.9152 - auc: 0.9648 - prc: 0.9740 - val_loss: 0.2456 - val_tp: 56.0000 - val_fp: 1874.0000 - val_tn: 43631.0000 - val_fn: 8.0000 - val_accuracy: 0.9587 - val_precision: 0.0290 - val_recall: 0.8750 - val_auc: 0.9690 - val_prc: 0.7719\n",
"Epoch 19/1000\n",
"20/20 [==============================] - 1s 64ms/step - loss: 0.2368 - tp: 18651.0000 - fp: 1792.0000 - tn: 18716.0000 - fn: 1801.0000 - accuracy: 0.9123 - precision: 0.9123 - recall: 0.9119 - auc: 0.9664 - prc: 0.9746 - val_loss: 0.2339 - val_tp: 56.0000 - val_fp: 1743.0000 - val_tn: 43762.0000 - val_fn: 8.0000 - val_accuracy: 0.9616 - val_precision: 0.0311 - val_recall: 0.8750 - val_auc: 0.9700 - val_prc: 0.7718\n",
"Epoch 20/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.2317 - tp: 18580.0000 - fp: 1689.0000 - tn: 18941.0000 - fn: 1750.0000 - accuracy: 0.9160 - precision: 0.9167 - recall: 0.9139 - auc: 0.9672 - prc: 0.9751 - val_loss: 0.2230 - val_tp: 56.0000 - val_fp: 1624.0000 - val_tn: 43881.0000 - val_fn: 8.0000 - val_accuracy: 0.9642 - val_precision: 0.0333 - val_recall: 0.8750 - val_auc: 0.9708 - val_prc: 0.7726\n",
"Epoch 21/1000\n",
"20/20 [==============================] - 1s 67ms/step - loss: 0.2215 - tp: 18809.0000 - fp: 1441.0000 - tn: 19000.0000 - fn: 1710.0000 - accuracy: 0.9231 - precision: 0.9288 - recall: 0.9167 - auc: 0.9706 - prc: 0.9774 - val_loss: 0.2137 - val_tp: 56.0000 - val_fp: 1533.0000 - val_tn: 43972.0000 - val_fn: 8.0000 - val_accuracy: 0.9662 - val_precision: 0.0352 - val_recall: 0.8750 - val_auc: 0.9718 - val_prc: 0.7728\n",
"Epoch 22/1000\n",
"20/20 [==============================] - 1s 62ms/step - loss: 0.2174 - tp: 18765.0000 - fp: 1434.0000 - tn: 19070.0000 - fn: 1691.0000 - accuracy: 0.9237 - precision: 0.9290 - recall: 0.9173 - auc: 0.9721 - prc: 0.9782 - val_loss: 0.2047 - val_tp: 56.0000 - val_fp: 1453.0000 - val_tn: 44052.0000 - val_fn: 8.0000 - val_accuracy: 0.9679 - val_precision: 0.0371 - val_recall: 0.8750 - val_auc: 0.9729 - val_prc: 0.7720\n",
"Epoch 23/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.2177 - tp: 18658.0000 - fp: 1361.0000 - tn: 19162.0000 - fn: 1779.0000 - accuracy: 0.9233 - precision: 0.9320 - recall: 0.9130 - auc: 0.9708 - prc: 0.9774 - val_loss: 0.1963 - val_tp: 56.0000 - val_fp: 1385.0000 - val_tn: 44120.0000 - val_fn: 8.0000 - val_accuracy: 0.9694 - val_precision: 0.0389 - val_recall: 0.8750 - val_auc: 0.9735 - val_prc: 0.7718\n",
"Epoch 24/1000\n",
"20/20 [==============================] - 1s 71ms/step - loss: 0.2111 - tp: 18755.0000 - fp: 1289.0000 - tn: 19190.0000 - fn: 1726.0000 - accuracy: 0.9264 - precision: 0.9357 - recall: 0.9157 - auc: 0.9725 - prc: 0.9787 - val_loss: 0.1894 - val_tp: 56.0000 - val_fp: 1338.0000 - val_tn: 44167.0000 - val_fn: 8.0000 - val_accuracy: 0.9705 - val_precision: 0.0402 - val_recall: 0.8750 - val_auc: 0.9742 - val_prc: 0.7719\n",
"Epoch 25/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.2078 - tp: 18632.0000 - fp: 1243.0000 - tn: 19357.0000 - fn: 1728.0000 - accuracy: 0.9275 - precision: 0.9375 - recall: 0.9151 - auc: 0.9734 - prc: 0.9790 - val_loss: 0.1828 - val_tp: 56.0000 - val_fp: 1288.0000 - val_tn: 44217.0000 - val_fn: 8.0000 - val_accuracy: 0.9716 - val_precision: 0.0417 - val_recall: 0.8750 - val_auc: 0.9749 - val_prc: 0.7719\n",
"Epoch 26/1000\n",
"20/20 [==============================] - 1s 62ms/step - loss: 0.2066 - tp: 18646.0000 - fp: 1218.0000 - tn: 19331.0000 - fn: 1765.0000 - accuracy: 0.9272 - precision: 0.9387 - recall: 0.9135 - auc: 0.9734 - prc: 0.9792 - val_loss: 0.1763 - val_tp: 57.0000 - val_fp: 1236.0000 - val_tn: 44269.0000 - val_fn: 7.0000 - val_accuracy: 0.9727 - val_precision: 0.0441 - val_recall: 0.8906 - val_auc: 0.9756 - val_prc: 0.7728\n",
"Epoch 27/1000\n",
"20/20 [==============================] - 1s 65ms/step - loss: 0.2019 - tp: 18634.0000 - fp: 1206.0000 - tn: 19405.0000 - fn: 1715.0000 - accuracy: 0.9287 - precision: 0.9392 - recall: 0.9157 - auc: 0.9748 - prc: 0.9802 - val_loss: 0.1705 - val_tp: 57.0000 - val_fp: 1196.0000 - val_tn: 44309.0000 - val_fn: 7.0000 - val_accuracy: 0.9736 - val_precision: 0.0455 - val_recall: 0.8906 - val_auc: 0.9761 - val_prc: 0.7727\n",
"Epoch 28/1000\n",
"20/20 [==============================] - 1s 64ms/step - loss: 0.1991 - tp: 18543.0000 - fp: 1122.0000 - tn: 19530.0000 - fn: 1765.0000 - accuracy: 0.9295 - precision: 0.9429 - recall: 0.9131 - auc: 0.9751 - prc: 0.9805 - val_loss: 0.1655 - val_tp: 57.0000 - val_fp: 1176.0000 - val_tn: 44329.0000 - val_fn: 7.0000 - val_accuracy: 0.9740 - val_precision: 0.0462 - val_recall: 0.8906 - val_auc: 0.9766 - val_prc: 0.7729\n",
"Epoch 29/1000\n",
"20/20 [==============================] - 1s 60ms/step - loss: 0.1923 - tp: 18703.0000 - fp: 996.0000 - tn: 19518.0000 - fn: 1743.0000 - accuracy: 0.9331 - precision: 0.9494 - recall: 0.9148 - auc: 0.9770 - prc: 0.9818 - val_loss: 0.1607 - val_tp: 57.0000 - val_fp: 1152.0000 - val_tn: 44353.0000 - val_fn: 7.0000 - val_accuracy: 0.9746 - val_precision: 0.0471 - val_recall: 0.8906 - val_auc: 0.9771 - val_prc: 0.7728\n",
"Epoch 30/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.1929 - tp: 18448.0000 - fp: 1077.0000 - tn: 19699.0000 - fn: 1736.0000 - accuracy: 0.9313 - precision: 0.9448 - recall: 0.9140 - auc: 0.9765 - prc: 0.9811 - val_loss: 0.1558 - val_tp: 57.0000 - val_fp: 1119.0000 - val_tn: 44386.0000 - val_fn: 7.0000 - val_accuracy: 0.9753 - val_precision: 0.0485 - val_recall: 0.8906 - val_auc: 0.9775 - val_prc: 0.7728\n",
"Epoch 31/1000\n",
"20/20 [==============================] - 1s 56ms/step - loss: 0.1877 - tp: 18924.0000 - fp: 1000.0000 - tn: 19264.0000 - fn: 1772.0000 - accuracy: 0.9323 - precision: 0.9498 - recall: 0.9144 - auc: 0.9777 - prc: 0.9826 - val_loss: 0.1525 - val_tp: 58.0000 - val_fp: 1119.0000 - val_tn: 44386.0000 - val_fn: 6.0000 - val_accuracy: 0.9753 - val_precision: 0.0493 - val_recall: 0.9062 - val_auc: 0.9782 - val_prc: 0.7743\n",
"Epoch 32/1000\n",
"20/20 [==============================] - 1s 68ms/step - loss: 0.1858 - tp: 18856.0000 - fp: 1030.0000 - tn: 19365.0000 - fn: 1709.0000 - accuracy: 0.9331 - precision: 0.9482 - recall: 0.9169 - auc: 0.9783 - prc: 0.9827 - val_loss: 0.1486 - val_tp: 58.0000 - val_fp: 1096.0000 - val_tn: 44409.0000 - val_fn: 6.0000 - val_accuracy: 0.9758 - val_precision: 0.0503 - val_recall: 0.9062 - val_auc: 0.9785 - val_prc: 0.7760\n",
"Epoch 33/1000\n",
"20/20 [==============================] - 1s 65ms/step - loss: 0.1829 - tp: 18624.0000 - fp: 941.0000 - tn: 19714.0000 - fn: 1681.0000 - accuracy: 0.9360 - precision: 0.9519 - recall: 0.9172 - auc: 0.9793 - prc: 0.9832 - val_loss: 0.1442 - val_tp: 58.0000 - val_fp: 1073.0000 - val_tn: 44432.0000 - val_fn: 6.0000 - val_accuracy: 0.9763 - val_precision: 0.0513 - val_recall: 0.9062 - val_auc: 0.9786 - val_prc: 0.7770\n",
"Epoch 34/1000\n",
"20/20 [==============================] - 1s 72ms/step - loss: 0.1827 - tp: 18705.0000 - fp: 931.0000 - tn: 19628.0000 - fn: 1696.0000 - accuracy: 0.9359 - precision: 0.9526 - recall: 0.9169 - auc: 0.9789 - prc: 0.9831 - val_loss: 0.1410 - val_tp: 58.0000 - val_fp: 1068.0000 - val_tn: 44437.0000 - val_fn: 6.0000 - val_accuracy: 0.9764 - val_precision: 0.0515 - val_recall: 0.9062 - val_auc: 0.9792 - val_prc: 0.7768\n",
"Epoch 35/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.1802 - tp: 18719.0000 - fp: 962.0000 - tn: 19621.0000 - fn: 1658.0000 - accuracy: 0.9360 - precision: 0.9511 - recall: 0.9186 - auc: 0.9795 - prc: 0.9835 - val_loss: 0.1372 - val_tp: 58.0000 - val_fp: 1043.0000 - val_tn: 44462.0000 - val_fn: 6.0000 - val_accuracy: 0.9770 - val_precision: 0.0527 - val_recall: 0.9062 - val_auc: 0.9793 - val_prc: 0.7771\n",
"Epoch 36/1000\n",
"20/20 [==============================] - 1s 59ms/step - loss: 0.1802 - tp: 18922.0000 - fp: 890.0000 - tn: 19473.0000 - fn: 1675.0000 - accuracy: 0.9374 - precision: 0.9551 - recall: 0.9187 - auc: 0.9796 - prc: 0.9837 - val_loss: 0.1341 - val_tp: 58.0000 - val_fp: 1026.0000 - val_tn: 44479.0000 - val_fn: 6.0000 - val_accuracy: 0.9774 - val_precision: 0.0535 - val_recall: 0.9062 - val_auc: 0.9798 - val_prc: 0.7773\n",
"Epoch 37/1000\n",
"20/20 [==============================] - 1s 65ms/step - loss: 0.1738 - tp: 18896.0000 - fp: 853.0000 - tn: 19536.0000 - fn: 1675.0000 - accuracy: 0.9383 - precision: 0.9568 - recall: 0.9186 - auc: 0.9807 - prc: 0.9847 - val_loss: 0.1313 - val_tp: 58.0000 - val_fp: 1021.0000 - val_tn: 44484.0000 - val_fn: 6.0000 - val_accuracy: 0.9775 - val_precision: 0.0538 - val_recall: 0.9062 - val_auc: 0.9795 - val_prc: 0.7771\n",
"Epoch 38/1000\n",
"20/20 [==============================] - 1s 64ms/step - loss: 0.1726 - tp: 18924.0000 - fp: 833.0000 - tn: 19530.0000 - fn: 1673.0000 - accuracy: 0.9388 - precision: 0.9578 - recall: 0.9188 - auc: 0.9814 - prc: 0.9850 - val_loss: 0.1295 - val_tp: 58.0000 - val_fp: 1040.0000 - val_tn: 44465.0000 - val_fn: 6.0000 - val_accuracy: 0.9770 - val_precision: 0.0528 - val_recall: 0.9062 - val_auc: 0.9798 - val_prc: 0.7640\n",
"Epoch 39/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.1722 - tp: 18804.0000 - fp: 850.0000 - tn: 19648.0000 - fn: 1658.0000 - accuracy: 0.9388 - precision: 0.9568 - recall: 0.9190 - auc: 0.9816 - prc: 0.9852 - val_loss: 0.1266 - val_tp: 58.0000 - val_fp: 1024.0000 - val_tn: 44481.0000 - val_fn: 6.0000 - val_accuracy: 0.9774 - val_precision: 0.0536 - val_recall: 0.9062 - val_auc: 0.9803 - val_prc: 0.7642\n",
"Epoch 40/1000\n",
"20/20 [==============================] - 1s 61ms/step - loss: 0.1697 - tp: 18798.0000 - fp: 833.0000 - tn: 19691.0000 - fn: 1638.0000 - accuracy: 0.9397 - precision: 0.9576 - recall: 0.9198 - auc: 0.9817 - prc: 0.9852 - val_loss: 0.1242 - val_tp: 58.0000 - val_fp: 1021.0000 - val_tn: 44484.0000 - val_fn: 6.0000 - val_accuracy: 0.9775 - val_precision: 0.0538 - val_recall: 0.9062 - val_auc: 0.9803 - val_prc: 0.7644\n",
"Epoch 41/1000\n",
"20/20 [==============================] - 1s 58ms/step - loss: 0.1679 - tp: 18937.0000 - fp: 815.0000 - tn: 19597.0000 - fn: 1611.0000 - accuracy: 0.9408 - precision: 0.9587 - recall: 0.9216 - auc: 0.9824 - prc: 0.9857 - val_loss: 0.1213 - val_tp: 58.0000 - val_fp: 1004.0000 - val_tn: 44501.0000 - val_fn: 6.0000 - val_accuracy: 0.9778 - val_precision: 0.0546 - val_recall: 0.9062 - val_auc: 0.9808 - val_prc: 0.7645\n",
"Epoch 42/1000\n",
"20/20 [==============================] - 1s 60ms/step - loss: 0.1657 - tp: 18922.0000 - fp: 796.0000 - tn: 19619.0000 - fn: 1623.0000 - accuracy: 0.9409 - precision: 0.9596 - recall: 0.9210 - auc: 0.9827 - prc: 0.9862 - val_loss: 0.1192 - val_tp: 58.0000 - val_fp: 1003.0000 - val_tn: 44502.0000 - val_fn: 6.0000 - val_accuracy: 0.9779 - val_precision: 0.0547 - val_recall: 0.9062 - val_auc: 0.9807 - val_prc: 0.7521\n",
"Epoch 43/1000\n",
"20/20 [==============================] - 1s 66ms/step - loss: 0.1666 - tp: 18906.0000 - fp: 781.0000 - tn: 19658.0000 - fn: 1615.0000 - accuracy: 0.9415 - precision: 0.9603 - recall: 0.9213 - auc: 0.9824 - prc: 0.9856 - val_loss: 0.1171 - val_tp: 58.0000 - val_fp: 996.0000 - val_tn: 44509.0000 - val_fn: 6.0000 - val_accuracy: 0.9780 - val_precision: 0.0550 - val_recall: 0.9062 - val_auc: 0.9810 - val_prc: 0.7643\n",
"Epoch 44/1000\n",
"20/20 [==============================] - 1s 73ms/step - loss: 0.1629 - tp: 18916.0000 - fp: 790.0000 - tn: 19645.0000 - fn: 1609.0000 - accuracy: 0.9414 - precision: 0.9599 - recall: 0.9216 - auc: 0.9835 - prc: 0.9864 - val_loss: 0.1148 - val_tp: 58.0000 - val_fp: 987.0000 - val_tn: 44518.0000 - val_fn: 6.0000 - val_accuracy: 0.9782 - val_precision: 0.0555 - val_recall: 0.9062 - val_auc: 0.9809 - val_prc: 0.7644\n",
"Epoch 45/1000\n",
"20/20 [==============================] - 1s 60ms/step - loss: 0.1612 - tp: 18975.0000 - fp: 757.0000 - tn: 19646.0000 - fn: 1582.0000 - accuracy: 0.9429 - precision: 0.9616 - recall: 0.9230 - auc: 0.9833 - prc: 0.9866 - val_loss: 0.1130 - val_tp: 58.0000 - val_fp: 975.0000 - val_tn: 44530.0000 - val_fn: 6.0000 - val_accuracy: 0.9785 - val_precision: 0.0561 - val_recall: 0.9062 - val_auc: 0.9811 - val_prc: 0.7645\n",
"Epoch 46/1000\n",
"20/20 [==============================] - ETA: 0s - loss: 0.1580 - tp: 18963.0000 - fp: 767.0000 - tn: 19662.0000 - fn: 1568.0000 - accuracy: 0.9430 - precision: 0.9611 - recall: 0.9236 - auc: 0.9845 - prc: 0.9873Restoring model weights from the end of the best epoch: 36.\n",
"20/20 [==============================] - 1s 67ms/step - loss: 0.1580 - tp: 18963.0000 - fp: 767.0000 - tn: 19662.0000 - fn: 1568.0000 - accuracy: 0.9430 - precision: 0.9611 - recall: 0.9236 - auc: 0.9845 - prc: 0.9873 - val_loss: 0.1113 - val_tp: 58.0000 - val_fp: 975.0000 - val_tn: 44530.0000 - val_fn: 6.0000 - val_accuracy: 0.9785 - val_precision: 0.0561 - val_recall: 0.9062 - val_auc: 0.9814 - val_prc: 0.7645\n",
"Epoch 46: early stopping\n"
]
}
],
"source": [
"resampled_model = make_model()\n",
"resampled_model.load_weights(initial_weights)\n",
"\n",
"# Reset the bias to zero, since this dataset is balanced.\n",
"output_layer = resampled_model.layers[-1] \n",
"output_layer.bias.assign([0])\n",
"\n",
"resampled_history = resampled_model.fit(\n",
" resampled_ds,\n",
" # These are not real epochs\n",
" steps_per_epoch=20,\n",
" epochs=10*EPOCHS,\n",
" callbacks=[early_stopping],\n",
" validation_data=(val_ds))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UuJYKv0gpBK1"
},
"source": [
"### Re-check training history"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FMycrpJwn39w",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 610
},
"outputId": "6500e9ac-f815-439c-f9e0-d688e37dfa23"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 4 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_metrics(resampled_history)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bUuE5HOWZiwP"
},
"source": [
"### Evaluate metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C0fmHSgXxFdW",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ad47f88f-d530-4899-e66e-0de876dd5154"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"90/90 [==============================] - 0s 901us/step\n",
"28/28 [==============================] - 0s 1ms/step\n"
]
}
],
"source": [
"train_predictions_resampled = resampled_model.predict(train_features, batch_size=BATCH_SIZE)\n",
"test_predictions_resampled = resampled_model.predict(test_features, batch_size=BATCH_SIZE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FO0mMOYUDWFk",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 636
},
"outputId": "ccfd144f-9601-4f5c-bc3a-0578024b88c6"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"loss : 0.13555769622325897\n",
"tp : 89.0\n",
"fp : 1355.0\n",
"tn : 55509.0\n",
"fn : 9.0\n",
"accuracy : 0.9760541915893555\n",
"precision : 0.06163435056805611\n",
"recall : 0.9081632494926453\n",
"auc : 0.9771085381507874\n",
"prc : 0.732808530330658\n",
"\n",
"Legitimate Transactions Detected (True Negatives): 55509\n",
"Legitimate Transactions Incorrectly Detected (False Positives): 1355\n",
"Fraudulent Transactions Missed (False Negatives): 9\n",
"Fraudulent Transactions Detected (True Positives): 89\n",
"Total Fraudulent Transactions: 98\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 360x360 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"resampled_results = resampled_model.evaluate(test_features, test_labels,\n",
" batch_size=BATCH_SIZE, verbose=0)\n",
"for name, value in zip(resampled_model.metrics_names, resampled_results):\n",
" print(name, ': ', value)\n",
"print()\n",
"\n",
"plot_cm(test_labels, test_predictions_resampled)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_xYozM1IIITq"
},
"source": [
"### Plot the ROC"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fye_CiuYrZ1U",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 606
},
"outputId": "2ac4a19f-b25b-4909-b51b-942358c50238"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_roc(\"Train Baseline\", train_labels, train_predictions_baseline, color=colors[0])\n",
"plot_roc(\"Test Baseline\", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')\n",
"\n",
"plot_roc(\"Train Weighted\", train_labels, train_predictions_weighted, color=colors[1])\n",
"plot_roc(\"Test Weighted\", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')\n",
"\n",
"plot_roc(\"Train Resampled\", train_labels, train_predictions_resampled, color=colors[2])\n",
"plot_roc(\"Test Resampled\", test_labels, test_predictions_resampled, color=colors[2], linestyle='--')\n",
"plt.legend(loc='lower right');"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vayGnv0VOe_v"
},
"source": [
"### Plot the AUPRC\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wgWXQ8aeOhCZ",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 606
},
"outputId": "5a5cc0bd-08b0-4b30-f929-3db7c436a3ec"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x720 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"plot_prc(\"Train Baseline\", train_labels, train_predictions_baseline, color=colors[0])\n",
"plot_prc(\"Test Baseline\", test_labels, test_predictions_baseline, color=colors[0], linestyle='--')\n",
"\n",
"plot_prc(\"Train Weighted\", train_labels, train_predictions_weighted, color=colors[1])\n",
"plot_prc(\"Test Weighted\", test_labels, test_predictions_weighted, color=colors[1], linestyle='--')\n",
"\n",
"plot_prc(\"Train Resampled\", train_labels, train_predictions_resampled, color=colors[2])\n",
"plot_prc(\"Test Resampled\", test_labels, test_predictions_resampled, color=colors[2], linestyle='--')\n",
"plt.legend(loc='lower right');"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3o3f0ywl8uqW"
},
"source": [
"## Applying this tutorial to your problem\n",
"\n",
"Imbalanced data classification is an inherently difficult task since there are so few samples to learn from. You should always start with the data first and do your best to collect as many samples as possible and give substantial thought to what features may be relevant so the model can get the most out of your minority class. At some point your model may struggle to improve and yield the results you want, so it is important to keep in mind the context of your problem and the trade offs between different types of errors."
]
}
],
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment