Skip to content

Instantly share code, notes, and snippets.

@lewiuberg
Created September 14, 2020 13:13
Show Gist options
  • Save lewiuberg/fde6db3c5d16160b52c1c332022caa39 to your computer and use it in GitHub Desktop.
Save lewiuberg/fde6db3c5d16160b52c1c332022caa39 to your computer and use it in GitHub Desktop.
nuc_machine_learning/Session 6/tutorial 6.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Activity sheet 6\n# Classification - Logistic Regression\n**Objective**\\\nThe aim of this activity sheet is to load a customer dataset belonging to a telecommunication company and develop a classifier using the dataset. The overarching objective of the model is to predict the churn rate. This kind of insight would enable the company to take some pro-active steps and retain its existing customers. In a broader sense this activity sheet is an application of machine learning in business analytics. The classification technique to be employed in this sheet is logistic regression.\n\n**Dataset description**\\\nAs explained in the previous section, you will be using a telecom company data for predicting customer churn. In this dataset, each row represents one customer. The general hypothesis in this business is that, it’s cheaper to retain customers rather than acquiring new ones. Therefore, the focus would be to predict the customers who are more likely to stay with the company.\n\nThe dataset includes information about: /\n* Customers who left within the last month – the column is called Churn.\n* Services that each customer has signed up for.\n* Customer account information.\n* Demographic info about customers."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Data pre-processing\nLoad telecommunication churn data from the csv file. Select the feature set for the modelling. You do not necessarily need all the given variables. However, the output of interest (dependent variable) must be churn rate. Use scikit-learn for train-test splitting of the dataset."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport numpy as np\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\nfrom scipy import stats\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import jaccard_score, confusion_matrix, plot_confusion_matrix, accuracy_score, classification_report, log_loss\n\nfrom eval_tools import cm_plot\n\n%matplotlib inline",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df = pd.read_csv(\"dataset6.csv\")",
"execution_count": 2,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.head()",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": " tenure age address income ed employ equip callcard wireless \\\n0 11.0 33.0 7.0 136.0 5.0 5.0 0.0 1.0 1.0 \n1 33.0 33.0 12.0 33.0 2.0 0.0 0.0 0.0 0.0 \n2 23.0 30.0 9.0 30.0 1.0 2.0 0.0 0.0 0.0 \n3 38.0 35.0 5.0 76.0 2.0 10.0 1.0 1.0 1.0 \n4 7.0 35.0 14.0 80.0 2.0 15.0 0.0 1.0 0.0 \n\n longmon ... pager internet callwait confer ebill loglong logtoll \\\n0 4.40 ... 1.0 0.0 1.0 1.0 0.0 1.482 3.033 \n1 9.45 ... 0.0 0.0 0.0 0.0 0.0 2.246 3.240 \n2 6.30 ... 0.0 0.0 0.0 1.0 0.0 1.841 3.240 \n3 6.05 ... 1.0 1.0 1.0 1.0 1.0 1.800 3.807 \n4 7.10 ... 0.0 0.0 1.0 1.0 0.0 1.960 3.091 \n\n lninc custcat churn \n0 4.913 4.0 1.0 \n1 3.497 1.0 1.0 \n2 3.401 3.0 0.0 \n3 4.331 4.0 0.0 \n4 4.382 3.0 0.0 \n\n[5 rows x 28 columns]",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>tenure</th>\n <th>age</th>\n <th>address</th>\n <th>income</th>\n <th>ed</th>\n <th>employ</th>\n <th>equip</th>\n <th>callcard</th>\n <th>wireless</th>\n <th>longmon</th>\n <th>...</th>\n <th>pager</th>\n <th>internet</th>\n <th>callwait</th>\n <th>confer</th>\n <th>ebill</th>\n <th>loglong</th>\n <th>logtoll</th>\n <th>lninc</th>\n <th>custcat</th>\n <th>churn</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>11.0</td>\n <td>33.0</td>\n <td>7.0</td>\n <td>136.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>4.40</td>\n <td>...</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>1.482</td>\n <td>3.033</td>\n <td>4.913</td>\n <td>4.0</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>33.0</td>\n <td>33.0</td>\n <td>12.0</td>\n <td>33.0</td>\n <td>2.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>9.45</td>\n <td>...</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>2.246</td>\n <td>3.240</td>\n <td>3.497</td>\n <td>1.0</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>23.0</td>\n <td>30.0</td>\n <td>9.0</td>\n <td>30.0</td>\n <td>1.0</td>\n <td>2.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>6.30</td>\n <td>...</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>1.841</td>\n <td>3.240</td>\n <td>3.401</td>\n <td>3.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>38.0</td>\n <td>35.0</td>\n <td>5.0</td>\n <td>76.0</td>\n <td>2.0</td>\n <td>10.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>6.05</td>\n <td>...</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>1.800</td>\n <td>3.807</td>\n <td>4.331</td>\n <td>4.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>7.0</td>\n <td>35.0</td>\n <td>14.0</td>\n <td>80.0</td>\n <td>2.0</td>\n <td>15.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>7.10</td>\n <td>...</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>1.0</td>\n <td>0.0</td>\n <td>1.960</td>\n <td>3.091</td>\n <td>4.382</td>\n <td>3.0</td>\n <td>0.0</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows × 28 columns</p>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**Logistic Regression Assumptions**\n* Binary logistic regression requires the dependent variable to be binary.\n* For a binary regression, the factor level 1 of the dependent variable should represent the desired outcome.\n* Only the meaningful variables should be included.\n* The independent variables should be independent of each other. That is, the model should have little or no multicollinearity.\n* The independent variables are linearly related to the log odds.\n* Logistic regression requires quite large sample sizes.\n\n[Source](https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8)"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Looking for missing values."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 200 entries, 0 to 199\nData columns (total 28 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 tenure 200 non-null float64\n 1 age 200 non-null float64\n 2 address 200 non-null float64\n 3 income 200 non-null float64\n 4 ed 200 non-null float64\n 5 employ 200 non-null float64\n 6 equip 200 non-null float64\n 7 callcard 200 non-null float64\n 8 wireless 200 non-null float64\n 9 longmon 200 non-null float64\n 10 tollmon 200 non-null float64\n 11 equipmon 200 non-null float64\n 12 cardmon 200 non-null float64\n 13 wiremon 200 non-null float64\n 14 longten 200 non-null float64\n 15 tollten 200 non-null float64\n 16 cardten 200 non-null float64\n 17 voice 200 non-null float64\n 18 pager 200 non-null float64\n 19 internet 200 non-null float64\n 20 callwait 200 non-null float64\n 21 confer 200 non-null float64\n 22 ebill 200 non-null float64\n 23 loglong 200 non-null float64\n 24 logtoll 200 non-null float64\n 25 lninc 200 non-null float64\n 26 custcat 200 non-null float64\n 27 churn 200 non-null float64\ndtypes: float64(28)\nmemory usage: 43.9 KB\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Looking for variables that had a strong correlation relationship with \"churn\" to filter out unnecessary variables. The values to be used should be above 0.5 for a strong positive relationship, and below -0.5 for a strong negative relationship. However, the values calculated were not within these limits, so other values were selected to have something to work with."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Setting the upper and lower limits.\nge = 0.3 #0.25\nle = -0.3 #-0.3\n# Calculating the correlation between all features (Correlation Matrix).\ncorr = df.corr()\n# Selecting all correlated values of X greater than n and lower than m.\nchurn_corr_mat = corr[corr[\"churn\"].ge(ge) | corr[\"churn\"].le(le)]\n# Getting the sorted list of features correlated to X.\nchurn_corr_lst = churn_corr_mat[\"churn\"].sort_values(\n ascending=False).index.tolist()\n# Defining a sliced dataframe from the sortet list of features\nchurn_corr_df = df[churn_corr_lst]\n# Extracting the X feature column from the dataframe\n# based on the sorted list greater than n.\nchurn_corr_lst_df = churn_corr_df.corr()[[\"churn\"]].sort_values(by=[\n \"churn\"], ascending=False)",
"execution_count": 5,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "plt.figure(figsize=(3, 5))\n#sns.set(font_scale=1.2)\n\nax = sns.heatmap(\n churn_corr_lst_df,\n vmin=-1,\n cmap=\"coolwarm_r\",\n annot=True,\n annot_kws={\"size\": 12})\nax.set_yticklabels(\n ax.get_yticklabels(),\n fontsize=12,\n horizontalalignment=\"right\");",
"execution_count": 6,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 216x360 with 2 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Looking for multicollinearity."
},
{
"metadata": {
"trusted": true,
"scrolled": false
},
"cell_type": "code",
"source": "sns.pairplot(churn_corr_df, hue=\"churn\", diag_kind=\"hist\", diag_kws = {\"alpha\":1, \"bins\":15}, markers=[\"s\", \"D\"], corner=True);",
"execution_count": 7,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 771.875x720 with 14 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "plt.figure(figsize=(8, 8))\n\nax = sns.heatmap(\n churn_corr_df.corr(),\n vmin=-1, vmax=1, center=0,\n cmap=sns.diverging_palette(20, 220, n=200),\n square=True,\n annot=True,\n linewidths=0.5,\n cbar_kws={\"shrink\": .82},\n fmt=\".2f\",\n annot_kws={\"size\": 10}\n)\n\nax.set_xticklabels(\n ax.get_xticklabels(),\n rotation=45,\n horizontalalignment='right'\n);",
"execution_count": 8,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 576x576 with 2 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Multicollinearity diagnostic is considered, pairwise correlation coefficients between predictors. Correlation cutoff = 0.8 \\\n[Source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888898/)"
},
{
"metadata": {
"trusted": true,
"editable": false,
"deletable": false,
"run_control": {
"frozen": true
}
},
"cell_type": "code",
"source": "# THE THERE ARE TOO FEW OBSERVATIONS IN THE DATA SET \nchurn_corr_df = churn_corr_df.drop([\"equipmon\", \"loglong\"], axis=1)",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"scrolled": true,
"trusted": true
},
"cell_type": "code",
"source": "churn_corr_df.info()",
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 200 entries, 0 to 199\nData columns (total 5 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 churn 200 non-null float64\n 1 callcard 200 non-null float64\n 2 loglong 200 non-null float64\n 3 employ 200 non-null float64\n 4 tenure 200 non-null float64\ndtypes: float64(5)\nmemory usage: 7.9 KB\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "churn_corr_df.head()",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/plain": " churn callcard loglong employ tenure\n0 1.0 1.0 1.482 5.0 11.0\n1 1.0 0.0 2.246 0.0 33.0\n2 0.0 0.0 1.841 2.0 23.0\n3 0.0 1.0 1.800 10.0 38.0\n4 0.0 1.0 1.960 15.0 7.0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>churn</th>\n <th>callcard</th>\n <th>loglong</th>\n <th>employ</th>\n <th>tenure</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1.0</td>\n <td>1.0</td>\n <td>1.482</td>\n <td>5.0</td>\n <td>11.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1.0</td>\n <td>0.0</td>\n <td>2.246</td>\n <td>0.0</td>\n <td>33.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>1.841</td>\n <td>2.0</td>\n <td>23.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>0.0</td>\n <td>1.0</td>\n <td>1.800</td>\n <td>10.0</td>\n <td>38.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0.0</td>\n <td>1.0</td>\n <td>1.960</td>\n <td>15.0</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Looking for outliers."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "churn_corr_df[[\"employ\", \"tenure\"]].hist(bins=15, figsize=(8,6));",
"execution_count": 11,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 576x432 with 2 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAF1CAYAAAAwfzllAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAedElEQVR4nO3dfZBsdX3n8fdHHiJBIiA64UkvWVksNzeiToiWbu1ElCCouFvEwBLDTUhukk0qWnWzCklWE2OtpLaISTQbcqMEogQ0GIUIqxJkSk2MBBC8PKgQvC73CtwYFB00D6Pf/aPPsM3cmTs9j7+enveramrO+Z1fn/6e7j79mXNOz69TVUiSpLX1hNYFSJK0ERnAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABvcEk2Jakk+7euRZI2EgNYkoZMkp1JXtq6Dq0uA1iSBIBnwtaWATzEkhyV5ANJ/jHJl5L8Stf+m0n+Isl7k3wzyY4k/z7JBUn2JLk/ySl965lM8rYkNyX5RpKrkxy+j/u8JsnDSe5N8nNd+/cn+VaSp/T1fV5X2wGr/VhIG0WS9wBPB/4qyVSSNyR5QZK/TfL1JLcnmejrP5nkt5P8Tfd+8LEkR3TLJpLsmrX+x46uu/eSq7r3km8AW5I8Ocm7kzyQZHeStybZb80egA3EAB5SSZ4A/BVwO3A0cDLw+iQ/1nV5JfAe4DDgs8BH6T2fRwNvAf541ip/CvgZ4EhgGviDee76SmAXcBRwJvA/k7ykqh4EJoHX9PV9LXBlVf3bkjdU0uNU1WuB/wu8sqqeBFwOXAu8FTgc+FXgA0me2nez/wr8NPA04MCuz6DOAK4CDu3u61J67xHPBJ4LnAL87JI3SPMygIfXDwNPraq3VNW/VtV9wJ8AZ3XLP1lVH62qaeAvgKcCF3ZheCWwKcmhfet7T1XdUVWPAv8DeM3sv2qTHAu8CHhjVf1zVd0GvIteeANcBvxk13c/4Gx6fwRIWj0/CVxXVddV1Xer6nrgZuC0vj5/WlVfrKpvA+8HTlzE+j9dVR+qqu8C39et9/VV9WhV7QHezv9/39EK8nz/8HoGcFSSr/e17Qd8Evgy8FBf+7eBr1bVd/rmAZ4EzNz+/r7+XwYOAI6YdZ9HAQ9X1Tdn9R3vpq8GLk5yHHAC8EhV3bS4zZK0SM8AfjzJK/vaDgBu7Jt/sG/6W/T2/UH1vzc8o1v3A0lm2p4wq49WiAE8vO4HvlRVx89ekOQ3l7C+Y/umnw78G/DVWe1fAQ5PckhfCD8d2A1QVf+c5P30/iJ/Fh79Squl/2vq7qd3BuvnlrCeR4HvnZnpzlw9dVaf2ff1L8AR3dk1rSJPQQ+vm4BvJnljkoOS7JfkB5P88BLX95NJnp3ke+ldI76q74gZgKq6H/hb4G1Jnpjkh4DzgPf2dfszYAvwKgxgabU8BPxAN/1e4JVJfqx7H3hi9+GqYwZYzxeBJyY5vfuw5G8A3zNf56p6APgYcFGS70vyhCT/Lsl/Wub2aA4G8JDqwvEV9K7lfIne0eq7gCcvcZXvoffhigeBJwK/Mk+/s4FN9I6GPwi8uar+uq+uvwG+C9xaVV9eYi2S9u1twG90l6B+gt4HpX4N+Ed6R6n/nQHev6vqEeC/0Xvv2E3viHjXPm/U+8zHgcBdwNfofUDryKVshPYtVbVwL61rSSaB91bVu1ZofR8H/nyl1idJG5HXgLUo3Snw59H7i1yStESegtbAklwG/DW9f1H45kL9JUnz8xS0JEkNeAQsSVIDBrAkSQ2s6YewjjjiiNq0adM++zz66KMcfPDBa1PQEljf8lgf3HLLLV+tqtmDIYyc+fb3YX8NDMrtGC7Duh373N+ras1+nv/859dCbrzxxgX7tGR9y2N9VcDNtYb7Xauf+fb3YX8NDMrtGC7Duh372t89BS1JUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgNr+nWEg9ix+xG2nH/tQH13Xnj6KlcjaZhsGvC9AXx/0PDzCFiSpAYMYEmSGjCAJUlqwACWJKkBA1iSpAYMYEmSGjCAJUlqwACWJKkBA1iSpAYMYEmSGjCAJUlqwACWJKkBA1iSpAYMYEmSGjCAJUlqwACWJKmB/VsXIEnSsNh0/rUD99154enLuq+Bj4CT7Jfks0k+3M0fl+QzSe5N8r4kBy6rEkmSNpDFnIJ+HXB33/zvAG+vqmcCXwPOW8nCJEkaZQMFcJJjgNOBd3XzAV4CXNV1uQx49SrUJ0nSSBr0GvDvAW8ADunmnwJ8vaqmu/ldwNErW5qklZTkEuAVwJ6q+sGu7X3ACV2XQ+nt1yfOcdudwDeB7wDTVTW+BiVLI23BAE4ys8PekmRisXeQZCuwFWBsbIzJycl99h87CLZtnt5nnxkLrWs1TE1NNbnfQVnf8gx7fct0KfBO4M9mGqrqJ2amk1wEPLKP2/9oVX111aqTNphBjoBfBLwqyWnAE4HvA34fODTJ/t1R8DHA7rluXFXbge0A4+PjNTExsc87e8flV3PRjsEOzHees+91rYbJyUkW2oaWrG95hr2+5aiqTyTZNNey7rLSa+hdWpK0Bha8BlxVF1TVMVW1CTgL+HhVnQPcCJzZdTsXuHrVqpS02v4j8FBV3TPP8gI+luSW7qyWpGVazv8BvxG4Mslbgc8C716ZkiQ1cDZwxT6Wv7iqdid5GnB9ks9X1SdmdxrkktNyTvMPenkKVv8S1ahcrnA7Hm8tX2OLCuCqmgQmu+n7gJOWde+SmkuyP/BfgOfP16eqdne/9yT5IL19f68AHuSS03JO829ZzCAJq3yJalQuV7gdj7eWrzGHopT0UuDzVbVrroVJDk5yyMw0cApwxxrWJ40kA1jaIJJcAXwaOCHJriQzg+ecxazTz0mOSnJdNzsGfCrJ7cBNwLVV9ZG1qlsaVY4FLW0QVXX2PO1b5mj7CnBaN30f8JxVLU7agDwCliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhrYv3UBkjQMNp1/7aL6X3rqwau6/p0Xnr6o/hvFfI/jts3TbJlj2TA/jh4BS5LUwLo+AvYvSknSeuURsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLG0SSS5LsSXJHX9tvJtmd5Lbu57R5bntqki8kuTfJ+WtXtTS6DGBp47gUOHWO9rdX1Yndz3WzFybZD/hD4OXAs4Gzkzx7VSuVNgADWNogquoTwMNLuOlJwL1VdV9V/StwJXDGihYnbUAGsKRfTvK57hT1YXMsPxq4v29+V9cmaRnW9dcRSlq2PwJ+G6ju90XAzyx1ZUm2AlsBxsbGmJyc3KvP1NTUnO2D2LZ5euC+i72PxawbFr8di13/Uh+jxVrO89HCfI/j2EFzL1vN18FyHzcDWNrAquqhmekkfwJ8eI5uu4Fj++aP6drmWt92YDvA+Ph4TUxM7NVncnKSudoHsWUR3wG+85zF3cdi1g1w6akHL2o7Frv+xda/VMt5PlqY73Hctnmai3bsHWmr+TpY7nPkKWhpA0tyZN/sfwbumKPb3wPHJzkuyYHAWcA1a1GfNMo8ApY2iCRXABPAEUl2AW8GJpKcSO8U9E7g57u+RwHvqqrTqmo6yS8DHwX2Ay6pqjvXfguk0WIASxtEVZ09R/O75+n7FeC0vvnrgL3+RUnS0nkKWpKkBgxgSZIaMIAlSWrAAJYkqQEDWJKkBgxgSZIaMIAlSWrAAJYkqQEDWJKkBgxgSZIaMIAlSWrAsaAljaRNi/z6P62cxTz2Oy88fRUrGW4eAUuS1IABLElSAwawJEkNGMCSJDWwoT6EtdgPZWzkDwdIklaXR8CSJDVgAEuS1IABLElSAwsGcJInJrkpye1J7kzyW137cUk+k+TeJO9LcuDqlytJ0mgY5Aj4X4CXVNVzgBOBU5O8APgd4O1V9Uzga8B5q1alJEkjZsEArp6pbvaA7qeAlwBXde2XAa9ejQIlSRpFA/0bUpL9gFuAZwJ/CPwD8PWqmu667AKOnue2W4GtAGNjY0xOTu7zvsYOgm2bp/fZZ63MVevU1NSC29CS9S3PsNcnaXQMFMBV9R3gxCSHAh8EnjXoHVTVdmA7wPj4eE1MTOyz/zsuv5qLdgzHvyfvPGdir7bJyUkW2oaWrG95hr0+SaNjUZ+CrqqvAzcCLwQOTTKTlMcAu1e2NEmSRtcgn4J+anfkS5KDgJcBd9ML4jO7bucCV69SjZIkjZxBzvUeCVzWXQd+AvD+qvpwkruAK5O8Ffgs8O5VrFPSCFrP39m7Y/cjbFnH9au9BQO4qj4HPHeO9vuAk1ajKEmSRp0jYUmS1IABLElSAwawJEkNGMDSBpHkkiR7ktzR1/a/knw+yeeSfHDmPx7muO3OJDuS3Jbk5jUrWhphBrC0cVwKnDqr7XrgB6vqh4AvAhfs4/Y/WlUnVtX4KtUnbSgGsLRBVNUngIdntX2sb0jZv6M3qI6kNTAcYz5KGgY/A7xvnmUFfCxJAX/cDTG7l0HGfu8fb3tYxn1fitUet36txiRfjfHPF/O4LPa+51v3fM/HSq1/Lst93AxgSST5dWAauHyeLi+uqt1JngZcn+Tz3RH14wwy9nv/eNvreSCLbZunV3Xc+rnGol8NqzH++WKe18Vu53zrnu/5WKn1z2W5z5GnoKUNLskW4BXAOVVVc/Wpqt3d7z30vpDFQXikZTKApQ0syanAG4BXVdW35ulzcJJDZqaBU4A75uoraXAGsLRBJLkC+DRwQpJdSc4D3gkcQu+08m1JLu76HpXkuu6mY8CnktwO3ARcW1UfabAJ0kjxGrC0QVTV2XM0z/klKlX1FeC0bvo+4DmrWJq0IXkELElSAwawJEkNGMCSJDXgNWBJGkKbFvP/qBeevoqVaLV4BCxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxtEEkuSbInyR19bYcnuT7JPd3vw+a57bldn3uSnLt2VUujywCWNo5LgVNntZ0P3FBVxwM3dPOPk+Rw4M3AjwAnAW+eL6glDc4AljaIqvoE8PCs5jOAy7rpy4BXz3HTHwOur6qHq+prwPXsHeSSFmn/1gVIamqsqh7oph8ExuboczRwf9/8rq5tL0m2AlsBxsbGmJyc3KvP1NTUY+3bNk8vsez2xg4anvrfcfnVS77t2EEL337z0U9e1DoX87gstvZtm+dun+/5WKn1z2Wu1/diGMCSAKiqSlLLXMd2YDvA+Ph4TUxM7NVncnKSmfYt51+7nLtratvmaS7asf7fQgfZjp3nTCxqnS2e1xbPx2Ifl9k8BS1tbA8lORKg+71njj67gWP75o/p2iQtgwEsbWzXADOfaj4XmOt83UeBU5Ic1n346pSuTdIyGMDSBpHkCuDTwAlJdiU5D7gQeFmSe4CXdvMkGU/yLoCqehj4beDvu5+3dG2SlmH9X8BYRZvmuI6xbfP0vNc3dl54+mqXJC1ZVZ09z6KT5+h7M/CzffOXAJesUmnShuQRsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1sGAAJzk2yY1J7kpyZ5LXde2HJ7k+yT3d78NWv1xJkkbDIGNBTwPbqurWJIcAtyS5HtgC3FBVFyY5HzgfeOPqlSpJamGucfG1fAseAVfVA1V1azf9TeBu4GjgDOCyrttlwKtXqUZJkkbOor4NKckm4LnAZ4CxqnqgW/QgMDbPbbYCWwHGxsaYnJzc532MHdT7xqFhta/6Ftq2tTA1NTUUdczH+iSpZ+AATvIk4APA66vqG0keW1ZVlaTmul1VbQe2A4yPj9fExMQ+7+cdl1/NRTuG91sSt22enre+nedMrG0xc5icnGShx7gl65OknoE+BZ3kAHrhe3lV/WXX/FCSI7vlRwJ7VqdESZJGzyCfgg7wbuDuqvrdvkXXAOd20+cCV698eZIkjaZBzvW+CHgtsCPJbV3brwEXAu9Pch7wZeA1q1KhJEkjaMEArqpPAZln8ckrW44kSRuDI2FJktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxJUgMGsCRJDRjAkiQ1YABLktSAASxtcElOSHJb3883krx+Vp+JJI/09XlTo3KlkTHI9wFLGmFV9QXgRIAk+wG7gQ/O0fWTVfWKNSxNGmkeAUvqdzLwD1X15daFSKPOI2BJ/c4Crphn2QuT3A58BfjVqrpzdockW4GtAGNjY0xOTu61kqmpqcfat22eXpGiWxg7aH3XP8PtWLq5Xt+LYQBLAiDJgcCrgAvmWHwr8IyqmkpyGvAh4PjZnapqO7AdYHx8vCYmJvZa0eTkJDPtW86/dmWKb2Db5mku2rH+30LdjqXbec7Esm7vKWhJM14O3FpVD81eUFXfqKqpbvo64IAkR6x1gdIoMYAlzTibeU4/J/n+JOmmT6L33vFPa1ibNHLW/3kHScuW5GDgZcDP97X9AkBVXQycCfxikmng28BZVVUtapVGhQEsiap6FHjKrLaL+6bfCbxzreuSRpmnoCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhrYv3UBWh2bzr92Uf13Xnj6KlUiSZqLR8CSJDVgAEuS1IABLElSAwawJEkNGMCSJDVgAEuS1IABLElSAwawJEkNGMCSJDVgAEuS1IABLElSAwawJEkNGMCSJDXgtyE14rcVSdLG5hGwJEkNGMCSSLIzyY4ktyW5eY7lSfIHSe5N8rkkz2tRpzRKPAUtacaPVtVX51n2cuD47udHgD/qfktaIo+AJQ3iDODPqufvgEOTHNm6KGk98whYEkABH0tSwB9X1fZZy48G7u+b39W1PdDfKclWYCvA2NgYk5OTe93R1NTUY+3bNk+vTPUNjB20vuuf4XYs3Vyv78UwgCUBvLiqdid5GnB9ks9X1ScWu5IuuLcDjI+P18TExF59JicnmWnfssj/Bhgm2zZPc9GO9f8W6nYs3c5zJpZ1+/X/qG8Qg/7b0rbN0+v6TU1tVNXu7veeJB8ETgL6A3g3cGzf/DFdm6Ql8hqwtMElOTjJITPTwCnAHbO6XQP8VPdp6BcAj1TVA0haMo+AJY0BH0wCvfeEP6+qjyT5BYCquhi4DjgNuBf4FvDTjWqVRoYBLG1wVXUf8Jw52i/umy7gl9ayLmnULXgKOsklSfYkuaOv7fAk1ye5p/t92OqWKUnSaBnkGvClwKmz2s4Hbqiq44EbunlJkjSgBQO4+1eEh2c1nwFc1k1fBrx6ZcuSJGm0LfVT0GN9n4B8kN6HOCRJ0oCW/SGsqqpu9Jw5DTIyTr9hH5VlX/W94/KrB17Pts0rVdHjLfXxW0ztAJuPfvKi7wMePwrSMBr2+iSNjqUG8ENJjqyqB7rxYPfM13GQkXH6vePyq4d6VJZhHzVmrepb6ggw/aMgDaNhr0/S6FjqKehrgHO76XOBxR0+SZK0wQ3yb0hXAJ8GTkiyK8l5wIXAy5LcA7y0m5ckSQNa8FxlVZ09z6KTV7gWSZI2DMeCliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBpg0tybJIbk9yV5M4kr5ujz0SSR5Lc1v28qUWt0ijZv3UBkpqbBrZV1a1JDgFuSXJ9Vd01q98nq+oVDeqTRpJHwNIGV1UPVNWt3fQ3gbuBo9tWJY0+A1jSY5JsAp4LfGaOxS9McnuS/5PkP6xtZdLo8RS0JACSPAn4APD6qvrGrMW3As+oqqkkpwEfAo6fYx1bga0AY2NjTE5O7nU/U1NTj7Vv2zy9chuwxsYOWt/1z3A7lm6u1/diGMCSSHIAvfC9vKr+cvby/kCuquuS/O8kR1TVV2f12w5sBxgfH6+JiYm97mtycpKZ9i3nX7uCW7G2tm2e5qId6/8t1O1Yup3nTCzr9p6Clja4JAHeDdxdVb87T5/v7/qR5CR67x3/tHZVSqNn/f/ZI2m5XgS8FtiR5Lau7deApwNU1cXAmcAvJpkGvg2cVVXVoFZpZBjA0gZXVZ8CskCfdwLvXJuKpI3BANaSbFrEtbudF56+ipVI0vrkNWBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhowgCVJasAAliSpAQNYkqQGDGBJkhrw25C06vq/OWnb5mm2LPBNSn57kqSNwCNgSZIaMIAlSWrAAJYkqQEDWJKkBgxgSZIaMIAlSWrAAJYkqQEDWJKkBgxgSZIaMIAlSWrAAJYkqQEDWJKkBgxgSZIaMIAlSWrAryPU0Nm0wNcVLtdivu5wsbX4VYqSBuURsCRJDRjAkiQ1sKwATnJqki8kuTfJ+StVlKS1tdC+nOR7kryvW/6ZJJsalCmNlCUHcJL9gD8EXg48Gzg7ybNXqjBJa2PAffk84GtV9Uzg7cDvrG2V0uhZzhHwScC9VXVfVf0rcCVwxsqUJWkNDbIvnwFc1k1fBZycJGtYozRylhPARwP3983v6tokrS+D7MuP9amqaeAR4ClrUp00olb935CSbAW2drNTSb6wwE2OAL66ulUt3a9Y37IMQ33Z98nTZdW3wLpnPGOp6x92A+7vzV8DK2EYXssrwe1YuuXu78sJ4N3AsX3zx3Rtj1NV24Htg640yc1VNb6MulaV9S2P9Q2lQfblmT67kuwPPBn4p9krGmR/H5XH2O0YLutxO5ZzCvrvgeOTHJfkQOAs4JqVKUvSGhpkX74GOLebPhP4eFXVGtYojZwlHwFX1XSSXwY+CuwHXFJVd65YZZLWxHz7cpK3ADdX1TXAu4H3JLkXeJheSEtahmVdA66q64DrVqiWGQOfrm7E+pbH+obQXPtyVb2pb/qfgR9fobsblcfY7Rgu62474lkkSZLWnkNRSpLUwNAE8DAOa5nkkiR7ktzR13Z4kuuT3NP9PqxRbccmuTHJXUnuTPK6IavviUluSnJ7V99vde3HdUMZ3tsNbXhgi/r66twvyWeTfHgY6xs1w7ifD2LY97fFGIXXfJJDk1yV5PNJ7k7ywvX4XAxFAA/xsJaXAqfOajsfuKGqjgdu6OZbmAa2VdWzgRcAv9Q9ZsNS378AL6mq5wAnAqcmeQG9IQzf3g1p+DV6Qxy29Drg7r75YatvZAzxfj6IYd/fFmMUXvO/D3ykqp4FPIfe9qy/56Kqmv8ALwQ+2jd/AXBB67q6WjYBd/TNfwE4sps+EvhC6xq7Wq4GXjaM9QHfC9wK/Ai9f5Tff67nvUFdx9DbUV8CfBjIMNU3aj/DvJ8vYVuGdn9boO51/5qn9z/oX6L7DFNf+7p6LqpqOI6AWV/DWo5V1QPd9IPAWMtiALpvpnku8BmGqL7uVNdtwB7geuAfgK9XbyhDaP88/x7wBuC73fxTGK76Rs162s/nNaz724B+j/X/mj8O+EfgT7tT6e9KcjDr77kYmgBel6r3p1bTj5EneRLwAeD1VfWN/mWt66uq71TVifT+6j4JeFarWmZL8gpgT1Xd0roWrR/DvL8tZIRe8/sDzwP+qKqeCzzKrNPNw/5czBiWAB5oWMsh8VCSIwG633taFZLkAHpvBpdX1V8OW30zqurrwI30Tm8d2g1lCG2f5xcBr0qyk963/7yE3nWlYalvFK2n/Xwv62V/24dRec3vAnZV1We6+avoBfJ6ei6A4Qng9TSsZf+QfOfSuxa05pKE3uhEd1fV7/YtGpb6nprk0G76IHrXy+6mF8Rntq6vqi6oqmOqahO919vHq+qcYalvRK2n/fxxhn1/G8SovOar6kHg/iQndE0nA3exjp6Lx7S+CN13Af004Iv0rhP+eut6upquAB4A/o3eX13n0btmcgNwD/DXwOGNansxvVMsnwNu635OG6L6fgj4bFffHcCbuvYfAG4C7gX+AvieIXieJ4APD2t9o/QzjPv5gHUP9f62hO1Z1695ev9ZcXP3fHwIOGw9PheOhCVJUgPDcgpakqQNxQCWJKkBA1iSpAYMYEmSGjCAJUlqwACWJKkBA1iSpAYMYEmSGvh/llY3q97AT7sAAAAASUVORK5CYII=\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "plt.figure(figsize=(14, 6))\nax = sns.boxplot(\n data=churn_corr_df[[\"employ\", \"tenure\"]], orient=\"h\", whis=1.5, palette=\"Set2\")",
"execution_count": 12,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 1008x432 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0cAAAFlCAYAAAAgQZOXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAATtklEQVR4nO3df7Cld13Y8feXbBIiYUhCwLVB7mLLYNGBACkVwwQVYYA6QIBqrcyKdIZOZazMlDrUGkRWpz902jqMdcoIhZ0qhUICTDtVKD9tphPcCAISUkG5mpQlgSSYILPZJd/+cU9gsxPkbrJ7n7t7X6+ZO/ec5/z63P3ee/a+93nO2THnDAAAYKd7wNIDAAAAbAfiCAAAIHEEAABQiSMAAIBKHAEAAFTiCAAAoKpdSw9wIl144YVzz549S48BAABsU9dee+0X55wPu7fLTqs42rNnTwcOHFh6DAAAYJsaY6x/s8scVgcAAJA4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQ1a6lB+DE2b9/f+vr60uPsWkHDx6savfu3QtPsvOsra21d+/epccAANhWxNFpZH19vU9/5k8684IHLz3Kphy+4/aqvnLLwoPsMIdvuX3pEQAAtiVxdJo584IH99Bn/t2lx9iUL73nmqpTZt7Txd1/7gAA3JPXHAEAACSOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHJ8X+/fvbv3//0mMAsAmeswG4266lBzgdra+vLz0CAJvkORuAu9lzBAAAkDgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgCAY9x666299rWv7bbbblt6FIAttVgcjTHuWOqxAYBv7qqrrur666/vyiuvXHoUgC1lzxEA8HW33nprH/rQh5pz9uEPf9jeI2BH2bWZK40xXlz90+qs6prqp6svV79ZPaf6fPXz1b+tHlm9Ys757jHGS6rLq4dUF1X/Zc75S8fc91jd7tnVrH55zvnWMcb+6so55ztX1/vt6m1zznfdny94Kxw8eLBDhw61b9++LX3c9fX1jvS1LX1MTj1Hbv+r1m9f3/LvT9iu1tfXO/vss5ceY9u46qqrmnNWddddd3XllVf20pe+dOGpALbGt9xzNMb429WPVZfOOS+uvlb9RPWg6v1zzu+pbq9+uXpGGzH02qPu4snVC6vHVX9/jHHJMQ/xguri6vHVD1e/Osb4juoN1UtWMzyk+v7qf9zLfC8bYxwYYxy4+eabN/VFAwD37uqrr+7IkSNVHTlypKuvvnrhiQC2zmb2HD29elL1Bxs7eTqnuqm6s/rd1XU+UR2acx4eY3yi2nPU7d875/xS1Rjjyuqp1YGjLn9q9ZY559eqL4wxPlT9ndWep/84xnhYG3H1jjnnkWOHm3O+vnp91SWXXDI392WfXLt3767qiiuu2NLH3bdvX5+95eCWPiannl0P/rbWLti95d+fsF3Zi3pPl156aR/84Ac7cuRIu3bt6tJLL116JIAts5nXHI3qzXPOi1cfj5lzvqY6PO/e7153VYeq5px3dc/oOjZYjidg9lcvrn6qeuNx3A4AuA8uv/zyVv8Y2gMe8IBe8IIXLDwRwNbZTBy9r3rRGOPhVWOMC8YYa8fxGM9Y3eac6vnVsfvnf7/6sTHGGau9RJdVH1ld9qbqFVVzzk8dx2MCAPfB+eef39Oe9rTGGF122WWdd955S48EsGW+5WF1c85PjTF+oXrPGOMB1eHq5cfxGB+p3lE9oo03ZDhwzOVXVU+p/qiNvUo/N+c8uHrsL4wxrqveeRyPBwDcD5dffnk33HCDvUbAjrOpd6ubc761eusxm8896vLXHHP9c486e8Oc8/n3cp/nrj7P6p+vPu5hjPFt1aOrt2xmTgDg/jv//PN79atfvfQYAFtu2/4/R2OMH66uq1435/zy0vMAAACnt03tObqv5pxvauN1Q/fltv+rOp7XNgEAANxn23bPEQAAwFYSRwAAAIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAAKratfQAp6O1tbWlRwBgkzxnA3A3cXQS7N27d+kRANgkz9kA3M1hdQAAAIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAFXtWnoATqzDt9zel95zzdJjbMrhW26vOmXmPV0cvuX2umD30mMAAGw74ug0sra2tvQIx+XgnRufd/tFfWtdsPuU+14BANgK4ug0snfv3qVHAACAU5bXHAEAACSOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoKpdSw8AbI39+/e3vr6+9BjwdQcPHqxq9+7dC08Cp5e1tbX27t279BhwShJHsEOsr6/3Z//301107plLjwJVffWOw1XdeddXFp4ETh83rn6ugPtGHMEOctG5Z/Yzj3v40mNAVa/7+E1VvifhBLr75wq4b7zmCAAAIHEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTha1P79+9u/f//SYwAAwElzKv3Ou2vpAXay9fX1pUcAAICT6lT6ndeeIwAAgMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFAdZxyNMc4bY/z0yRoGAABgKce75+i86qTG0Rhj18m8fwAAgHtzvCHyr6u/Ocb4WPXe6qbqR6uzq6vmnL84xthT/c/qf1ffX91YPW/O+dUxxgerV845D4wxLqwOzDn3jDFeUr2gOrc6Y4zxnOp11fdWZ1avmXO+6359pdvQwYMHO3ToUPv27Vt6FHaA9fX1zjxyZOkxADiJvvjVIx1eX/e7BdvK+vp6Z5999tJjbMrx7jl6VfXZOefFbcTRo6snVxdXTxpjXLa63qOr35hzfk91W/XCTdz3E6sXzTmfVv3L6v1zzidXP1j96hjjQfd2ozHGy8YYB8YYB26++ebj/HIAAAA23J9D2J65+vjo6vy5bUTRn1d/Nuf82Gr7tdWeTdzfe+ectxx1388dY7xydf6B1SOr64690Zzz9dXrqy655JJ53F/Fgnbv3l3VFVdcsfAk7AT79u3rzv/32aXHAOAkuvCcXZ31N9b8bsG2cirtybw/cTSqfzXn/E/32LhxWN2hozZ9rTpndfpI39hb9cBj7u8rx9z3C+ec19+P+QAAADbteA+ru7168Or071UvHWOcWzXGuGiM8fBvcfvPVU9anX7RX3O936t+ZowxVvf9hOOcEwAA4LgcVxzNOb9UXT3G+GT1jOp3qv8zxvhE9fa+EU7fzK9V/2SM8dHqwr/mevvaeCOGj48x/nh1HgAA4KQ57sPq5pz/8JhNv34vV/veo67/a0ed/nT1uKOu9wur7W+q3nTU9b5a/ePjnQ0AAOC+Ot7D6gAAAE5L4ggAACBxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUNWupQfYydbW1pYeAQAATqpT6XdecbSgvXv3Lj0CAACcVKfS77wOqwMAAEgcAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKhq19IDAFvnxjsO97qP37T0GFBtfD9WvifhBLrxjsM9aukh4BQmjmCHWFtbW3oEuIdzDh6s6qzduxeeBE4fj8rzPdwf4gh2iL179y49AgDAtuY1RwAAAIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQFVjzrn0DCfMGOPman3pOVYurL649BBU1mI7sRbbg3XYPqzF9mEttgfrsH2czmuxNud82L1dcFrF0XYyxjgw57xk6TmwFtuJtdgerMP2YS22D2uxPViH7WOnroXD6gAAABJHAAAAlTg6mV6/9AB8nbXYPqzF9mAdtg9rsX1Yi+3BOmwfO3ItvOYIAAAge44AAAAqcXRSjDGeNca4fozxmTHGq5aeZycZY7xxjHHTGOOTR227YIzx3jHGn6w+n7/kjDvBGOM7xxgfGGN8aozxx2OMn11ttxZbbIzxwDHGR8YYf7Rai19abX/UGOOa1fPUW8cYZy09604wxjhjjPHRMcZ/X523DgsYY3xujPGJMcbHxhgHVts8Py1gjHHeGOPtY4xPjzGuG2M8xVpsrTHGY1Y/C3d//OUY4xU7dR3E0Qk2xjij+o3q2dVjqx8fYzx22al2lDdVzzpm26uq9805H129b3Wek+tI9c/mnI+tvq96+ernwFpsvUPVD805H19dXD1rjPF91b+p/v2c829Vt1b/aLkRd5Sfra476rx1WM4PzjkvPuqtij0/LePXq9+dc3539fg2fj6sxRaac16/+lm4uHpS9VfVVe3QdRBHJ96Tq8/MOf90znln9V+r5y08044x5/xwdcsxm59XvXl1+s3V87dypp1ozvn5Oecfrk7f3sZfdhdlLbbc3HDH6uyZq49Z/VD19tV2a7EFxhiPqP5e9Vur8yPrsJ14ftpiY4yHVJdVb6iac94557wta7Gkp1efnXOut0PXQRydeBdVf3HU+RtW21jOt885P786fbD69iWH2WnGGHuqJ1TXZC0WsTqU62PVTdV7q89Wt805j6yu4nlqa/yH6uequ1bnH5p1WMqs3jPGuHaM8bLVNs9PW+9R1c3Vf14dbvpbY4wHZS2W9A+qt6xO78h1EEfsKHPj7Rm9ReMWGWOcW72jesWc8y+PvsxabJ0559dWh0s8oo2929+97EQ7zxjjR6qb5pzXLj0LVT11zvnENg6Bf/kY47KjL/T8tGV2VU+sfnPO+YTqKx1z6Ja12Dqr1zw+t/pvx162k9ZBHJ14N1bfedT5R6y2sZwvjDG+o2r1+aaF59kRxhhnthFGvz3nvHK12VosaHW4ygeqp1TnjTF2rS7yPHXyXVo9d4zxuTYOt/6hNl5rYR0WMOe8cfX5pjZeW/HkPD8t4YbqhjnnNavzb28jlqzFMp5d/eGc8wur8ztyHcTRifcH1aNX70B0Vhu7J9+98Ew73burn1yd/snqXQvOsiOsXkvxhuq6Oee/O+oia7HFxhgPG2Octzp9TvWMNl4D9oHqRaurWYuTbM75L+acj5hz7mnj74X3zzl/Iuuw5cYYDxpjPPju09Uzq0/m+WnLzTkPVn8xxnjMatPTq09lLZby433jkLraoevgP4E9CcYYz2nj2PIzqjfOOX9l2Yl2jjHGW6ofqC6svlD9YvXO6m3VI6v16kfnnMe+aQMn0BjjqdXvV5/oG6+v+Pk2XndkLbbQGONxbbyQ9ow2/kHsbXPO144xvquNPRgXVB+tXjznPLTcpDvHGOMHqlfOOX/EOmy91Z/5Vauzu6rfmXP+yhjjoXl+2nJjjIvbeJOSs6o/rX6q1XNV1mLLrP6h4M+r75pzfnm1bUf+TIgjAACAHFYHAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAAKr6/5Z5VoyNQsWTAAAAAElFTkSuQmCC\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {
"trusted": true,
"editable": false,
"deletable": false,
"run_control": {
"frozen": true
}
},
"cell_type": "code",
"source": "# THE THERE ARE TOO FEW OBSERVATIONS IN THE DATA SET\n# Dropping one row had a massive effect.\n\nthreshold = 3\nz_scores = stats.zscore(churn_corr_df)\nabs_z_scores = np.abs(z_scores)\nfiltered_entries = (abs_z_scores < threshold).all(axis=1)\nchurn_corr_df = churn_corr_df[filtered_entries]",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "churn_corr_df[[\"employ\", \"tenure\"]].hist(bins=15, figsize=(8,6));",
"execution_count": 13,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 576x432 with 2 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "plt.figure(figsize=(14, 6))\nax = sns.boxplot(\n data=churn_corr_df[[\"employ\", \"tenure\"]], orient=\"h\", whis=1.5, palette=\"Set2\")",
"execution_count": 14,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 1008x432 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0cAAAFlCAYAAAAgQZOXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAATtklEQVR4nO3df7Cld13Y8feXbBIiYUhCwLVB7mLLYNGBACkVwwQVYYA6QIBqrcyKdIZOZazMlDrUGkRWpz902jqMdcoIhZ0qhUICTDtVKD9tphPcCAISUkG5mpQlgSSYILPZJd/+cU9gsxPkbrJ7n7t7X6+ZO/ec5/z63P3ee/a+93nO2THnDAAAYKd7wNIDAAAAbAfiCAAAIHEEAABQiSMAAIBKHAEAAFTiCAAAoKpdSw9wIl144YVzz549S48BAABsU9dee+0X55wPu7fLTqs42rNnTwcOHFh6DAAAYJsaY6x/s8scVgcAAJA4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQ1a6lB+DE2b9/f+vr60uPsWkHDx6savfu3QtPsvOsra21d+/epccAANhWxNFpZH19vU9/5k8684IHLz3Kphy+4/aqvnLLwoPsMIdvuX3pEQAAtiVxdJo584IH99Bn/t2lx9iUL73nmqpTZt7Txd1/7gAA3JPXHAEAACSOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHJ8X+/fvbv3//0mMAsAmeswG4266lBzgdra+vLz0CAJvkORuAu9lzBAAAkDgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgCAY9x666299rWv7bbbblt6FIAttVgcjTHuWOqxAYBv7qqrrur666/vyiuvXHoUgC1lzxEA8HW33nprH/rQh5pz9uEPf9jeI2BH2bWZK40xXlz90+qs6prqp6svV79ZPaf6fPXz1b+tHlm9Ys757jHGS6rLq4dUF1X/Zc75S8fc91jd7tnVrH55zvnWMcb+6so55ztX1/vt6m1zznfdny94Kxw8eLBDhw61b9++LX3c9fX1jvS1LX1MTj1Hbv+r1m9f3/LvT9iu1tfXO/vss5ceY9u46qqrmnNWddddd3XllVf20pe+dOGpALbGt9xzNMb429WPVZfOOS+uvlb9RPWg6v1zzu+pbq9+uXpGGzH02qPu4snVC6vHVX9/jHHJMQ/xguri6vHVD1e/Osb4juoN1UtWMzyk+v7qf9zLfC8bYxwYYxy4+eabN/VFAwD37uqrr+7IkSNVHTlypKuvvnrhiQC2zmb2HD29elL1Bxs7eTqnuqm6s/rd1XU+UR2acx4eY3yi2nPU7d875/xS1Rjjyuqp1YGjLn9q9ZY559eqL4wxPlT9ndWep/84xnhYG3H1jjnnkWOHm3O+vnp91SWXXDI392WfXLt3767qiiuu2NLH3bdvX5+95eCWPiannl0P/rbWLti95d+fsF3Zi3pPl156aR/84Ac7cuRIu3bt6tJLL116JIAts5nXHI3qzXPOi1cfj5lzvqY6PO/e7153VYeq5px3dc/oOjZYjidg9lcvrn6qeuNx3A4AuA8uv/zyVv8Y2gMe8IBe8IIXLDwRwNbZTBy9r3rRGOPhVWOMC8YYa8fxGM9Y3eac6vnVsfvnf7/6sTHGGau9RJdVH1ld9qbqFVVzzk8dx2MCAPfB+eef39Oe9rTGGF122WWdd955S48EsGW+5WF1c85PjTF+oXrPGOMB1eHq5cfxGB+p3lE9oo03ZDhwzOVXVU+p/qiNvUo/N+c8uHrsL4wxrqveeRyPBwDcD5dffnk33HCDvUbAjrOpd6ubc761eusxm8896vLXHHP9c486e8Oc8/n3cp/nrj7P6p+vPu5hjPFt1aOrt2xmTgDg/jv//PN79atfvfQYAFtu2/4/R2OMH66uq1435/zy0vMAAACnt03tObqv5pxvauN1Q/fltv+rOp7XNgEAANxn23bPEQAAwFYSRwAAAIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAAKratfQAp6O1tbWlRwBgkzxnA3A3cXQS7N27d+kRANgkz9kA3M1hdQAAAIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAFXtWnoATqzDt9zel95zzdJjbMrhW26vOmXmPV0cvuX2umD30mMAAGw74ug0sra2tvQIx+XgnRufd/tFfWtdsPuU+14BANgK4ug0snfv3qVHAACAU5bXHAEAACSOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoKpdSw8AbI39+/e3vr6+9BjwdQcPHqxq9+7dC08Cp5e1tbX27t279BhwShJHsEOsr6/3Z//301107plLjwJVffWOw1XdeddXFp4ETh83rn6ugPtGHMEOctG5Z/Yzj3v40mNAVa/7+E1VvifhBLr75wq4b7zmCAAAIHEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTha1P79+9u/f//SYwAAwElzKv3Ou2vpAXay9fX1pUcAAICT6lT6ndeeIwAAgMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFAdZxyNMc4bY/z0yRoGAABgKce75+i86qTG0Rhj18m8fwAAgHtzvCHyr6u/Ocb4WPXe6qbqR6uzq6vmnL84xthT/c/qf1ffX91YPW/O+dUxxgerV845D4wxLqwOzDn3jDFeUr2gOrc6Y4zxnOp11fdWZ1avmXO+6359pdvQwYMHO3ToUPv27Vt6FHaA9fX1zjxyZOkxADiJvvjVIx1eX/e7BdvK+vp6Z5999tJjbMrx7jl6VfXZOefFbcTRo6snVxdXTxpjXLa63qOr35hzfk91W/XCTdz3E6sXzTmfVv3L6v1zzidXP1j96hjjQfd2ozHGy8YYB8YYB26++ebj/HIAAAA23J9D2J65+vjo6vy5bUTRn1d/Nuf82Gr7tdWeTdzfe+ectxx1388dY7xydf6B1SOr64690Zzz9dXrqy655JJ53F/Fgnbv3l3VFVdcsfAk7AT79u3rzv/32aXHAOAkuvCcXZ31N9b8bsG2cirtybw/cTSqfzXn/E/32LhxWN2hozZ9rTpndfpI39hb9cBj7u8rx9z3C+ec19+P+QAAADbteA+ru7168Or071UvHWOcWzXGuGiM8fBvcfvPVU9anX7RX3O936t+ZowxVvf9hOOcEwAA4LgcVxzNOb9UXT3G+GT1jOp3qv8zxvhE9fa+EU7fzK9V/2SM8dHqwr/mevvaeCOGj48x/nh1HgAA4KQ57sPq5pz/8JhNv34vV/veo67/a0ed/nT1uKOu9wur7W+q3nTU9b5a/ePjnQ0AAOC+Ot7D6gAAAE5L4ggAACBxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUNWupQfYydbW1pYeAQAATqpT6XdecbSgvXv3Lj0CAACcVKfS77wOqwMAAEgcAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQCWOAAAAKnEEAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAACpxBAAAUIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKhq19IDAFvnxjsO97qP37T0GFBtfD9WvifhBLrxjsM9aukh4BQmjmCHWFtbW3oEuIdzDh6s6qzduxeeBE4fj8rzPdwf4gh2iL179y49AgDAtuY1RwAAAIkjAACAShwBAABU4ggAAKASRwAAAJU4AgAAqMQRAABAJY4AAAAqcQQAAFCJIwAAgEocAQAAVOIIAACgEkcAAACVOAIAAKjEEQAAQFVjzrn0DCfMGOPman3pOVYurL649BBU1mI7sRbbg3XYPqzF9mEttgfrsH2czmuxNud82L1dcFrF0XYyxjgw57xk6TmwFtuJtdgerMP2YS22D2uxPViH7WOnroXD6gAAABJHAAAAlTg6mV6/9AB8nbXYPqzF9mAdtg9rsX1Yi+3BOmwfO3ItvOYIAAAge44AAAAqcXRSjDGeNca4fozxmTHGq5aeZycZY7xxjHHTGOOTR227YIzx3jHGn6w+n7/kjDvBGOM7xxgfGGN8aozxx2OMn11ttxZbbIzxwDHGR8YYf7Rai19abX/UGOOa1fPUW8cYZy09604wxjhjjPHRMcZ/X523DgsYY3xujPGJMcbHxhgHVts8Py1gjHHeGOPtY4xPjzGuG2M8xVpsrTHGY1Y/C3d//OUY4xU7dR3E0Qk2xjij+o3q2dVjqx8fYzx22al2lDdVzzpm26uq9805H129b3Wek+tI9c/mnI+tvq96+ernwFpsvUPVD805H19dXD1rjPF91b+p/v2c829Vt1b/aLkRd5Sfra476rx1WM4PzjkvPuqtij0/LePXq9+dc3539fg2fj6sxRaac16/+lm4uHpS9VfVVe3QdRBHJ96Tq8/MOf90znln9V+r5y08044x5/xwdcsxm59XvXl1+s3V87dypp1ozvn5Oecfrk7f3sZfdhdlLbbc3HDH6uyZq49Z/VD19tV2a7EFxhiPqP5e9Vur8yPrsJ14ftpiY4yHVJdVb6iac94557wta7Gkp1efnXOut0PXQRydeBdVf3HU+RtW21jOt885P786fbD69iWH2WnGGHuqJ1TXZC0WsTqU62PVTdV7q89Wt805j6yu4nlqa/yH6uequ1bnH5p1WMqs3jPGuHaM8bLVNs9PW+9R1c3Vf14dbvpbY4wHZS2W9A+qt6xO78h1EEfsKHPj7Rm9ReMWGWOcW72jesWc8y+PvsxabJ0559dWh0s8oo2929+97EQ7zxjjR6qb5pzXLj0LVT11zvnENg6Bf/kY47KjL/T8tGV2VU+sfnPO+YTqKx1z6Ja12Dqr1zw+t/pvx162k9ZBHJ14N1bfedT5R6y2sZwvjDG+o2r1+aaF59kRxhhnthFGvz3nvHK12VosaHW4ygeqp1TnjTF2rS7yPHXyXVo9d4zxuTYOt/6hNl5rYR0WMOe8cfX5pjZeW/HkPD8t4YbqhjnnNavzb28jlqzFMp5d/eGc8wur8ztyHcTRifcH1aNX70B0Vhu7J9+98Ew73burn1yd/snqXQvOsiOsXkvxhuq6Oee/O+oia7HFxhgPG2Octzp9TvWMNl4D9oHqRaurWYuTbM75L+acj5hz7mnj74X3zzl/Iuuw5cYYDxpjPPju09Uzq0/m+WnLzTkPVn8xxnjMatPTq09lLZby433jkLraoevgP4E9CcYYz2nj2PIzqjfOOX9l2Yl2jjHGW6ofqC6svlD9YvXO6m3VI6v16kfnnMe+aQMn0BjjqdXvV5/oG6+v+Pk2XndkLbbQGONxbbyQ9ow2/kHsbXPO144xvquNPRgXVB+tXjznPLTcpDvHGOMHqlfOOX/EOmy91Z/5Vauzu6rfmXP+yhjjoXl+2nJjjIvbeJOSs6o/rX6q1XNV1mLLrP6h4M+r75pzfnm1bUf+TIgjAACAHFYHAABQiSMAAIBKHAEAAFTiCAAAoBJHAAAAlTgCAACoxBEAAEAljgAAAKr6/5Z5VoyNQsWTAAAAAElFTkSuQmCC\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**Selecting features (dependent) and target (independent).**"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "X = churn_corr_df.drop(\"churn\", axis=1)\ny = np.asarray(churn_corr_df.churn)",
"execution_count": 15,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "X.head()",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/plain": " callcard loglong employ tenure\n0 1.0 1.482 5.0 11.0\n1 0.0 2.246 0.0 33.0\n2 0.0 1.841 2.0 23.0\n3 1.0 1.800 10.0 38.0\n4 1.0 1.960 15.0 7.0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>callcard</th>\n <th>loglong</th>\n <th>employ</th>\n <th>tenure</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1.0</td>\n <td>1.482</td>\n <td>5.0</td>\n <td>11.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>0.0</td>\n <td>2.246</td>\n <td>0.0</td>\n <td>33.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0.0</td>\n <td>1.841</td>\n <td>2.0</td>\n <td>23.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1.0</td>\n <td>1.800</td>\n <td>10.0</td>\n <td>38.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1.0</td>\n <td>1.960</td>\n <td>15.0</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "scaler = StandardScaler()",
"execution_count": 17,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "X = scaler.fit_transform(X)",
"execution_count": 18,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "X[0:5]",
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 19,
"data": {
"text/plain": "array([[ 0.64686916, -0.97509593, -0.58477841, -1.13518441],\n [-1.54590766, 0.07226665, -1.14437497, -0.11604313],\n [-1.54590766, -0.48294519, -0.92053635, -0.57928917],\n [ 0.64686916, -0.53915182, -0.02518185, 0.11557989],\n [ 0.64686916, -0.31980887, 0.53441472, -1.32048283]])"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "y[0:5]",
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 20,
"data": {
"text/plain": "array([1., 1., 0., 0., 0.])"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)",
"execution_count": 21,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print(\"Train set:\", X_train.shape, y_train.shape)\nprint(\"Test set:\", X_test.shape, y_test.shape)",
"execution_count": 22,
"outputs": [
{
"output_type": "stream",
"text": "Train set: (160, 4) (160,)\nTest set: (40, 4) (40,)\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Model development\nDevelop your model using Logistic Regression from Scikit-learn package in Python. This function implements logistic regression and can use different numerical optimisers to find parameters, including ‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’ solvers. Feel free to Google the exhaustive information about the pros and cons of these optimisers. The Logistic Regression feature in Scikit-learn, supports regularisation. The regularisation is a technique used to solve the overfitting problem in machine learning models. The ‘C’ parameter indicates inverse of regularization strength which must be a positive float. Smaller values of ‘C’ specify stronger regularization. For this activity sheet accept ‘C’ as just a parameter, the mathematics behind regularisation is not so important under the realm of this course. Fit your model with train set. Predict outputs from the model using your testing dataset. Use predict_proba function on your testing dataset to return estimates for all classes, ordered by the label of classes. In this dataset you have two classes (0 & 1), so you should expect the outcome of predict_proba ( ) to display the probability for both these classes."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "clf = LogisticRegression(C=0.01, solver='liblinear')\nclf.fit(X_train,y_train)\nclf",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 23,
"data": {
"text/plain": "LogisticRegression(C=0.01, solver='liblinear')"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "y_pred = clf.predict(X_test)",
"execution_count": 24,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "y_pred_prob = clf.predict_proba(X_test)",
"execution_count": 25,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Model evaluation\nUse Jaccard Index to evaluate the accuracy of your model. Scikit-learn can be used very easily to perform this task. Now use another evaluation matrix, log loss for evaluating the accuracy of the obtained model."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "The higher the Jaccard score higher the accuracy of the classifier.\n\nHere weighted since the test set has a low amount of observations."
},
{
"metadata": {
"trusted": true,
"scrolled": false
},
"cell_type": "code",
"source": "jaccard_score(y_test, y_pred, average=\"weighted\")",
"execution_count": 26,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 26,
"data": {
"text/plain": "0.6921212121212121"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Log loss measures how far the prediction is from the actual label when the output is a probability value, and should be low."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "log_loss(y_test, y_pred_prob)",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 27,
"data": {
"text/plain": "0.5672894611120263"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "clf.score(X_test, y_test)\n# accuracy_score(y_test, y_pred)",
"execution_count": 28,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 28,
"data": {
"text/plain": "0.8"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print(classification_report(y_test, y_pred))",
"execution_count": 29,
"outputs": [
{
"output_type": "stream",
"text": " precision recall f1-score support\n\n 0.0 0.93 0.81 0.86 31\n 1.0 0.54 0.78 0.64 9\n\n accuracy 0.80 40\n macro avg 0.73 0.79 0.75 40\nweighted avg 0.84 0.80 0.81 40\n\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "cm = confusion_matrix(y_test, y_pred)",
"execution_count": 30,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "cm_plot(cm, legend=True)",
"execution_count": 31,
"outputs": [
{
"output_type": "stream",
"text": "\n # True Prositive\n TP = np.diag(cf_matrix) : 7.00\n\n # False Negative, Type II Error\n FN = cf_matrix.sum(axis=1) - np.diag(cf_matrix) : 2.00\n\n # False Positive, Type I Error\n FP = cf_matrix.sum(axis=0) - np.diag(cf_matrix) : 6.00\n\n # True Negative\n TN = cf_matrix.sum() - (FP + FN + TP) : 25.00\n\n # Sensitivity, hit rate, recall, or true positive rate\n TPR = TP/(TP+FN) : 0.78\n\n # Specificity, true negative rate or negative recall\n TNR = TN/(TN+FP) : 0.81\n\n # Precision or positive predictive value\n PPV = TP/(TP+FP) : 0.54\n\n # Negative predictive value\n NPV = TN/(TN+FN) : 0.93\n\n # Fall out or false positive rate\n FPR = FP/(FP+TN) : 0.19\n\n # False negative rate\n FNR = FN/(TP+FN) : 0.22\n\n # False discovery rate\n FDR = FP/(TP+FP) : 0.46\n\n # F1 score is the harmonic mean of positive predictive value\n # and sensitivity\n F1 = 2 * (PPV * TPR)/(PPV + TPR) : 0.64\n\n # Overall accuracy\n ACC = (TP+TN)/(TP+FP+FN+TN) : 0.80\n \n",
"name": "stdout"
},
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 2 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {
"trusted": true,
"scrolled": false
},
"cell_type": "code",
"source": "# False positive rate\nfrom sklearn.metrics import roc_auc_score\nfrom sklearn.metrics import roc_curve\nlogit_roc_auc = roc_auc_score(y_test, clf.predict(X_test))\nfpr, tpr, thresholds = roc_curve(y_test, clf.predict_proba(X_test)[:,1])\nplt.figure(figsize=(10, 6))\n# Area Under the Curve (AUC)\nplt.plot(fpr, tpr, label=\"Logistic Regression (AUC = %0.2f)\" % logit_roc_auc)\nplt.plot([0, 1], [0, 1],\"r--\")\nplt.xlim([0.0, 1.0])\nplt.ylim([0.0, 1.05])\nplt.xlabel(\"False Positive Rate\")\nplt.ylabel(\"True Positive Rate\")\nplt.title(\"Receiver operating characteristic\")\nplt.legend(loc=\"lower right\")\nplt.savefig(\"Log_ROC\")\nplt.show()",
"execution_count": 32,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 720x432 with 1 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Tuning\nDevelop logistic regression model again, using the same dataset and play with different values for ‘solver’ and ‘regularisation’. Compare the model performance using Jaccard Index/logLoss every time. In this experiment you will be able to get a sense about the best tuning parameters corresponding to this dataset."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Explained by Isah, and saved for referencing purposes:\n\n> Model parameter = learnable variables of the model that are estimated from the data only. Eg. the weights of link in neural networks, the coefficient of simple linear regression\n\n> Model hyperparameter or tuning parameter = variables of the model that the user has to provide during the modeling process manually. e.g., the k in KNN, etc. Usually, the user has to tune the model (model selection) to select the appropriate values of the parameter"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "def log_reg(X_train, y_train, solver, C=0.01):\n clf = LogisticRegression(C=C, solver=solver)\n clf.fit(X_train,y_train)\n y_pred = clf.predict(X_test)\n y_pred_prob = clf.predict_proba(X_test)\n \n print(\"Accuracy score:\", accuracy_score(y_test, y_pred))\n print(\"Jaccard score:\", jaccard_score(y_test, y_pred, average=\"weighted\"))\n print(\"Log loss:\", log_loss(y_test, y_pred_prob))\n \n return y_test, y_pred, y_pred_prob",
"execution_count": 33,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "solvers = [\"newton-cg\", \"lbfgs\", \"liblinear\", \"sag\", \"saga\"]",
"execution_count": 34,
"outputs": []
},
{
"metadata": {
"trusted": true,
"scrolled": false
},
"cell_type": "code",
"source": "for solver in solvers:\n print()\n print(solver)\n log_reg(X_train, y_train, solver)\n #cm = confusion_matrix(y_test, y_pred)\n #cm_plot(cm)",
"execution_count": 35,
"outputs": [
{
"output_type": "stream",
"text": "\nnewton-cg\nAccuracy score: 0.775\nJaccard score: 0.6006250000000001\nLog loss: 0.47094414419392033\n\nlbfgs\nAccuracy score: 0.775\nJaccard score: 0.6006250000000001\nLog loss: 0.47094454470363123\n\nliblinear\nAccuracy score: 0.8\nJaccard score: 0.6921212121212121\nLog loss: 0.5672894611120263\n\nsag\nAccuracy score: 0.775\nJaccard score: 0.6006250000000001\nLog loss: 0.4709469474796763\n\nsaga\nAccuracy score: 0.775\nJaccard score: 0.6006250000000001\nLog loss: 0.4709455161533668\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**Example of weighted average**"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "Predicted = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]\nActual = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]",
"execution_count": 36,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "jaccard_score(Actual, Predicted)",
"execution_count": 37,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 37,
"data": {
"text/plain": "0.5"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "jaccard_score(Actual, Predicted, average=\"weighted\")",
"execution_count": 38,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 38,
"data": {
"text/plain": "0.8636363636363636"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# How many 0 was correctly predicted.\njaccard_score(Actual, Predicted, pos_label=0, average=\"binary\")",
"execution_count": 39,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 39,
"data": {
"text/plain": "0.9"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# How many 1 was correctly predicted.\njaccard_score(Actual, Predicted, pos_label=1, average=\"binary\")",
"execution_count": 40,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 40,
"data": {
"text/plain": "0.5"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**Testing tuple unpacking**"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "display(url := (\"https://www.amazon.com/War-Art-Trough-Creative-battles/dp/561651651651/?keyword=war+of+art\"))\ndisplay(url.split(\"/\")[2:-1])\ndomain, *rest, asin = url.split(\"/\")[2:-1]\ndisplay(domain)\ndisplay(asin)\ndisplay(rest)",
"execution_count": 41,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "'https://www.amazon.com/War-Art-Trough-Creative-battles/dp/561651651651/?keyword=war+of+art'"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": "['www.amazon.com', 'War-Art-Trough-Creative-battles', 'dp', '561651651651']"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": "'www.amazon.com'"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": "'561651651651'"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": "['War-Art-Trough-Creative-battles', 'dp']"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"name": "venv",
"display_name": "venv",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.8.3",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"latex_envs": {
"eqNumInitial": 1,
"eqLabelWithNumbers": true,
"current_citInitial": 1,
"cite_by": "apalike",
"bibliofile": "biblio.bib",
"LaTeX_envs_menu_present": true,
"labels_anchors": false,
"latex_user_defs": false,
"user_envs_cfg": false,
"report_style_numbering": false,
"autoclose": false,
"autocomplete": true,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
}
},
"toc": {
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"base_numbering": 1,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"window_display": false,
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"library": "var_list.py",
"delete_cmd_prefix": "del ",
"delete_cmd_postfix": "",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"library": "var_list.r",
"delete_cmd_prefix": "rm(",
"delete_cmd_postfix": ") ",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
]
},
"gist": {
"id": "",
"data": {
"description": "nuc_machine_learning/Session 6/tutorial 6.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment