Skip to content

Instantly share code, notes, and snippets.

@Ayushverma8
Created January 18, 2022 03:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Ayushverma8/0f09b2c9741b26429e58cfe3f223304c to your computer and use it in GitHub Desktop.
Save Ayushverma8/0f09b2c9741b26429e58cfe3f223304c to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predicting Customer Attrition with Survival Analysis\n",
"------------\n",
"#### <b>1. Objective <b>\n",
"* Customer Attrition or Churn refers to a decision made by the customer about ending the business relationship. \n",
"* Customer loyalty and customer churn always add up to 100%. If a firm has a 60% of loyalty rate, then their loss or churn rate of customers is 40%.\n",
"* Churn is undesirable and it is the firm's responsibility to understand why customers are churning and prevent that \n",
"--------\n",
"#### <b>2. Approach <b>\n",
"* Usually when it comes to Predicting customer churn, we look at classification techniques such as Logistic regression but the problem with that approach is that it doesn't take time into consideration\n",
"* So we will be using a tool from an unlikely place - Survival analysis. It was first developed by actuaries & medical professionals to predict survival rates\n",
"* Here we will be defining -\n",
" - Birth event: For eg, a customer subsribing to your product or service\n",
" - Death event: For eg, a customer ending the relationship with the company\n",
" \n",
"* Component that makes SA superior to other models is its ability to deal with \"censorship\" in data\n",
"* Censorship basically refers to losing track of a customer or the customer \"not dying\" before the end of the observation period\n",
"* This data is considered \"censored\" since everyone dies eventually, we are just missing the data\n",
"* Similarly, we would expect to lose all customers eventually. Just because we haven't observed them cancelling their subscription doesn't mean they never will.\n",
"\n",
"We have all come across \"Teleco Customer Churn\" dataset which we usually use to Predict customer churn by binary classification method. <br>\n",
"Now, we will use the same dataset and apply our newly learnt Survival analysis skills.\n",
"\n",
"---------------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### <b> 3. Data Preparation <b>\n",
"\n",
"<b> 3.1 - Import necessary libs and read data "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>customerID</th>\n",
" <th>gender</th>\n",
" <th>SeniorCitizen</th>\n",
" <th>Partner</th>\n",
" <th>Dependents</th>\n",
" <th>tenure</th>\n",
" <th>PhoneService</th>\n",
" <th>MultipleLines</th>\n",
" <th>InternetService</th>\n",
" <th>OnlineSecurity</th>\n",
" <th>...</th>\n",
" <th>DeviceProtection</th>\n",
" <th>TechSupport</th>\n",
" <th>StreamingTV</th>\n",
" <th>StreamingMovies</th>\n",
" <th>Contract</th>\n",
" <th>PaperlessBilling</th>\n",
" <th>PaymentMethod</th>\n",
" <th>MonthlyCharges</th>\n",
" <th>TotalCharges</th>\n",
" <th>Churn</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>7590-VHVEG</td>\n",
" <td>Female</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>1</td>\n",
" <td>No</td>\n",
" <td>No phone service</td>\n",
" <td>DSL</td>\n",
" <td>No</td>\n",
" <td>...</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Month-to-month</td>\n",
" <td>Yes</td>\n",
" <td>Electronic check</td>\n",
" <td>29.85</td>\n",
" <td>29.85</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5575-GNVDE</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>34</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>DSL</td>\n",
" <td>Yes</td>\n",
" <td>...</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>One year</td>\n",
" <td>No</td>\n",
" <td>Mailed check</td>\n",
" <td>56.95</td>\n",
" <td>1889.5</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3668-QPYBK</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>2</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>DSL</td>\n",
" <td>Yes</td>\n",
" <td>...</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Month-to-month</td>\n",
" <td>Yes</td>\n",
" <td>Mailed check</td>\n",
" <td>53.85</td>\n",
" <td>108.15</td>\n",
" <td>Yes</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n",
"0 7590-VHVEG Female 0 Yes No 1 No \n",
"1 5575-GNVDE Male 0 No No 34 Yes \n",
"2 3668-QPYBK Male 0 No No 2 Yes \n",
"\n",
" MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n",
"0 No phone service DSL No ... No \n",
"1 No DSL Yes ... Yes \n",
"2 No DSL Yes ... No \n",
"\n",
" TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n",
"0 No No No Month-to-month Yes \n",
"1 No No No One year No \n",
"2 No No No Month-to-month Yes \n",
"\n",
" PaymentMethod MonthlyCharges TotalCharges Churn \n",
"0 Electronic check 29.85 29.85 No \n",
"1 Mailed check 56.95 1889.5 No \n",
"2 Mailed check 53.85 108.15 Yes \n",
"\n",
"[3 rows x 21 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import lifelines\n",
"\n",
"churn_data = pd.read_csv('../data/Telco-Customer-Attrition-Data.csv')\n",
"churn_data.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>3.2 - Cleaning the dataset <b>\n",
" \n",
"For each customer, we will need to calculate the following\n",
"* Tenure: How long has the customer been with the firm?\n",
"* Churn: Has the customer left the firm?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>customerID</th>\n",
" <th>gender</th>\n",
" <th>SeniorCitizen</th>\n",
" <th>Partner</th>\n",
" <th>Dependents</th>\n",
" <th>tenure</th>\n",
" <th>PhoneService</th>\n",
" <th>MultipleLines</th>\n",
" <th>InternetService</th>\n",
" <th>OnlineSecurity</th>\n",
" <th>...</th>\n",
" <th>DeviceProtection</th>\n",
" <th>TechSupport</th>\n",
" <th>StreamingTV</th>\n",
" <th>StreamingMovies</th>\n",
" <th>Contract</th>\n",
" <th>PaperlessBilling</th>\n",
" <th>PaymentMethod</th>\n",
" <th>MonthlyCharges</th>\n",
" <th>TotalCharges</th>\n",
" <th>Churn</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>7590-VHVEG</td>\n",
" <td>Female</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>1.0</td>\n",
" <td>No</td>\n",
" <td>No phone service</td>\n",
" <td>DSL</td>\n",
" <td>No</td>\n",
" <td>...</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Month-to-month</td>\n",
" <td>Yes</td>\n",
" <td>Electronic check</td>\n",
" <td>29.85</td>\n",
" <td>29.85</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5575-GNVDE</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>34.0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>DSL</td>\n",
" <td>Yes</td>\n",
" <td>...</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>One year</td>\n",
" <td>No</td>\n",
" <td>Mailed check</td>\n",
" <td>56.95</td>\n",
" <td>1889.5</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3668-QPYBK</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>2.0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>DSL</td>\n",
" <td>Yes</td>\n",
" <td>...</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Month-to-month</td>\n",
" <td>Yes</td>\n",
" <td>Mailed check</td>\n",
" <td>53.85</td>\n",
" <td>108.15</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>7795-CFOCW</td>\n",
" <td>Male</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>45.0</td>\n",
" <td>No</td>\n",
" <td>No phone service</td>\n",
" <td>DSL</td>\n",
" <td>Yes</td>\n",
" <td>...</td>\n",
" <td>Yes</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>One year</td>\n",
" <td>No</td>\n",
" <td>Bank transfer (automatic)</td>\n",
" <td>42.30</td>\n",
" <td>1840.75</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>9237-HQITU</td>\n",
" <td>Female</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>2.0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Fiber optic</td>\n",
" <td>No</td>\n",
" <td>...</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Month-to-month</td>\n",
" <td>Yes</td>\n",
" <td>Electronic check</td>\n",
" <td>70.70</td>\n",
" <td>151.65</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n",
"0 7590-VHVEG Female 0 Yes No 1.0 No \n",
"1 5575-GNVDE Male 0 No No 34.0 Yes \n",
"2 3668-QPYBK Male 0 No No 2.0 Yes \n",
"3 7795-CFOCW Male 0 No No 45.0 No \n",
"4 9237-HQITU Female 0 No No 2.0 Yes \n",
"\n",
" MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n",
"0 No phone service DSL No ... No \n",
"1 No DSL Yes ... Yes \n",
"2 No DSL Yes ... No \n",
"3 No phone service DSL Yes ... Yes \n",
"4 No Fiber optic No ... No \n",
"\n",
" TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n",
"0 No No No Month-to-month Yes \n",
"1 No No No One year No \n",
"2 No No No Month-to-month Yes \n",
"3 Yes No No One year No \n",
"4 No No No Month-to-month Yes \n",
"\n",
" PaymentMethod MonthlyCharges TotalCharges Churn \n",
"0 Electronic check 29.85 29.85 False \n",
"1 Mailed check 56.95 1889.5 False \n",
"2 Mailed check 53.85 108.15 True \n",
"3 Bank transfer (automatic) 42.30 1840.75 False \n",
"4 Electronic check 70.70 151.65 True \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Transform tenure and churn features\n",
"churn_data['tenure'] = churn_data['tenure'].astype(float)\n",
"churn_data['Churn'] = churn_data['Churn'] == 'Yes'\n",
"churn_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: Many customers in our data have not left yet. Here we deal with censorship as discussed earlier."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------\n",
"\n",
"#### <b> 4. Exploratory Data Analysis\n",
"\n",
"Let's look at survival rate for the average customer using a Kaplan-Meier survival curve. We will fit a KM survival curve to our dataset & plot our survival curve with confidence interval.\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# fitting kmf to churn data\n",
"t = churn_data['tenure'].values\n",
"churn = churn_data['Churn'].values\n",
"kmf = lifelines.KaplanMeierFitter()\n",
"kmf.fit(t, event_observed=churn, label='Estimate for Average Customer')\n",
"\n",
"# plotting kmf curve\n",
"fig, ax = plt.subplots(figsize=(10,7))\n",
"kmf.plot(ax=ax)\n",
"ax.set_title('Kaplan-Meier Survival Curve - All Customers')\n",
"ax.set_xlabel('Customer Tenure (Months)')\n",
"ax.set_ylabel('Customer Survival Chance (%)')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: Survival curve is cumulative. After 20 months, the chance of a customer not canceling service is just above 80%\n",
"\n",
"Insights:\n",
"* We can see that even after 70 months, the company is able to retain 60% of its customers\n",
"* Churn is relatively low when it comes to Telcos\n",
"\n",
"Now we will examine the affect of different features on the survival rate. <br>\n",
"For this we will use \"Cox Proportional Hazards Model\", we can also think of this as a Survival Regression Model\n",
"\n",
"* Hazards can be something that would increase or decrease the chances of survival. In our case, a hazard can be type of contract. \n",
"* Customer with multi-year contracts would probably cancel less frequently than those with monthly contracts\n",
"* Restriction - Here the model assumes a constant ratio of hazards over time across groups.\n",
"* Lifelines offers a check_assumptions method for CoxPHFitter Object\n",
"\n",
"<b> 4.1 - Data cleaning, encoding categorical variables (k-1 dummies)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"features_to_drop = ['customerID', 'gender', 'PhoneService', 'InternetService']\n",
"\n",
"# engineering numeric columns for Cox Proportional Hazard estimation\n",
"churn_hazard = churn_data.drop(features_to_drop, axis=1).copy()\n",
"\n",
"# convert some stuff to integers\n",
"churn_hazard['TotalCharges'] = pd.to_numeric(churn_hazard['TotalCharges'], errors='coerce')\n",
"churn_hazard['TotalCharges'].fillna(0, inplace=True)\n",
"\n",
"# a lot of variables are encoded as 'Yes' or 'No', lets get these all done at once\n",
"binary_features = ['Partner', 'Dependents', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', \n",
" 'StreamingTV','StreamingMovies', 'PaperlessBilling']\n",
"for feat in binary_features:\n",
" churn_hazard[feat] = churn_hazard[feat] == 'Yes'\n",
" \n",
"# let's one hot encode the remaining categorical features\n",
"ohe_features = ['MultipleLines', 'Contract', 'PaymentMethod']\n",
"churn_hazard = pd.get_dummies(churn_hazard, \n",
" drop_first=True,\n",
" columns=ohe_features)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SeniorCitizen</th>\n",
" <th>Partner</th>\n",
" <th>Dependents</th>\n",
" <th>tenure</th>\n",
" <th>OnlineSecurity</th>\n",
" <th>OnlineBackup</th>\n",
" <th>DeviceProtection</th>\n",
" <th>TechSupport</th>\n",
" <th>StreamingTV</th>\n",
" <th>StreamingMovies</th>\n",
" <th>...</th>\n",
" <th>MonthlyCharges</th>\n",
" <th>TotalCharges</th>\n",
" <th>Churn</th>\n",
" <th>MultipleLines_No phone service</th>\n",
" <th>MultipleLines_Yes</th>\n",
" <th>Contract_One year</th>\n",
" <th>Contract_Two year</th>\n",
" <th>PaymentMethod_Credit card (automatic)</th>\n",
" <th>PaymentMethod_Electronic check</th>\n",
" <th>PaymentMethod_Mailed check</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>1.0</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>29.85</td>\n",
" <td>29.85</td>\n",
" <td>False</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>34.0</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>56.95</td>\n",
" <td>1889.50</td>\n",
" <td>False</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2.0</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>53.85</td>\n",
" <td>108.15</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>45.0</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>42.30</td>\n",
" <td>1840.75</td>\n",
" <td>False</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2.0</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>70.70</td>\n",
" <td>151.65</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" SeniorCitizen Partner Dependents tenure OnlineSecurity OnlineBackup \\\n",
"0 0 True False 1.0 False True \n",
"1 0 False False 34.0 True False \n",
"2 0 False False 2.0 True True \n",
"3 0 False False 45.0 True False \n",
"4 0 False False 2.0 False False \n",
"\n",
" DeviceProtection TechSupport StreamingTV StreamingMovies \\\n",
"0 False False False False \n",
"1 True False False False \n",
"2 False False False False \n",
"3 True True False False \n",
"4 False False False False \n",
"\n",
" ... MonthlyCharges TotalCharges Churn \\\n",
"0 ... 29.85 29.85 False \n",
"1 ... 56.95 1889.50 False \n",
"2 ... 53.85 108.15 True \n",
"3 ... 42.30 1840.75 False \n",
"4 ... 70.70 151.65 True \n",
"\n",
" MultipleLines_No phone service MultipleLines_Yes Contract_One year \\\n",
"0 1 0 0 \n",
"1 0 0 1 \n",
"2 0 0 0 \n",
"3 1 0 1 \n",
"4 0 0 0 \n",
"\n",
" Contract_Two year PaymentMethod_Credit card (automatic) \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" PaymentMethod_Electronic check PaymentMethod_Mailed check \n",
"0 1 0 \n",
"1 0 1 \n",
"2 0 1 \n",
"3 0 0 \n",
"4 1 0 \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"churn_hazard.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------\n",
"\n",
"#### <b> 5. Modeling"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>model</th>\n",
" <td>lifelines.CoxPHFitter</td>\n",
" </tr>\n",
" <tr>\n",
" <th>duration col</th>\n",
" <td>'tenure'</td>\n",
" </tr>\n",
" <tr>\n",
" <th>event col</th>\n",
" <td>'Churn'</td>\n",
" </tr>\n",
" <tr>\n",
" <th>baseline estimation</th>\n",
" <td>breslow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>number of observations</th>\n",
" <td>7043</td>\n",
" </tr>\n",
" <tr>\n",
" <th>number of events observed</th>\n",
" <td>1869</td>\n",
" </tr>\n",
" <tr>\n",
" <th>partial log-likelihood</th>\n",
" <td>-12688.70</td>\n",
" </tr>\n",
" <tr>\n",
" <th>time fit was run</th>\n",
" <td>2021-06-15 09:14:33 UTC</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div><table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th style=\"min-width: 12;\"></th>\n",
" <th style=\"min-width: 12;\">coef</th>\n",
" <th style=\"min-width: 12;\">exp(coef)</th>\n",
" <th style=\"min-width: 12;\">se(coef)</th>\n",
" <th style=\"min-width: 12;\">coef lower 95%</th>\n",
" <th style=\"min-width: 12;\">coef upper 95%</th>\n",
" <th style=\"min-width: 12;\">exp(coef) lower 95%</th>\n",
" <th style=\"min-width: 12;\">exp(coef) upper 95%</th>\n",
" <th style=\"min-width: 12;\">z</th>\n",
" <th style=\"min-width: 12;\">p</th>\n",
" <th style=\"min-width: 12;\">-log2(p)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">SeniorCitizen</th>\n",
" <td>0.03</td>\n",
" <td>1.03</td>\n",
" <td>0.06</td>\n",
" <td>-0.08</td>\n",
" <td>0.14</td>\n",
" <td>0.93</td>\n",
" <td>1.16</td>\n",
" <td>0.60</td>\n",
" <td>0.55</td>\n",
" <td>0.87</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">Partner</th>\n",
" <td>-0.19</td>\n",
" <td>0.82</td>\n",
" <td>0.06</td>\n",
" <td>-0.30</td>\n",
" <td>-0.09</td>\n",
" <td>0.74</td>\n",
" <td>0.92</td>\n",
" <td>-3.52</td>\n",
" <td>&lt;0.005</td>\n",
" <td>11.20</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">Dependents</th>\n",
" <td>-0.10</td>\n",
" <td>0.91</td>\n",
" <td>0.07</td>\n",
" <td>-0.23</td>\n",
" <td>0.04</td>\n",
" <td>0.79</td>\n",
" <td>1.04</td>\n",
" <td>-1.39</td>\n",
" <td>0.17</td>\n",
" <td>2.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">OnlineSecurity</th>\n",
" <td>-0.38</td>\n",
" <td>0.68</td>\n",
" <td>0.07</td>\n",
" <td>-0.51</td>\n",
" <td>-0.25</td>\n",
" <td>0.60</td>\n",
" <td>0.78</td>\n",
" <td>-5.65</td>\n",
" <td>&lt;0.005</td>\n",
" <td>25.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">OnlineBackup</th>\n",
" <td>-0.29</td>\n",
" <td>0.75</td>\n",
" <td>0.06</td>\n",
" <td>-0.40</td>\n",
" <td>-0.18</td>\n",
" <td>0.67</td>\n",
" <td>0.83</td>\n",
" <td>-5.22</td>\n",
" <td>&lt;0.005</td>\n",
" <td>22.41</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">DeviceProtection</th>\n",
" <td>-0.16</td>\n",
" <td>0.85</td>\n",
" <td>0.06</td>\n",
" <td>-0.27</td>\n",
" <td>-0.05</td>\n",
" <td>0.76</td>\n",
" <td>0.95</td>\n",
" <td>-2.85</td>\n",
" <td>&lt;0.005</td>\n",
" <td>7.85</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">TechSupport</th>\n",
" <td>-0.28</td>\n",
" <td>0.76</td>\n",
" <td>0.07</td>\n",
" <td>-0.41</td>\n",
" <td>-0.15</td>\n",
" <td>0.67</td>\n",
" <td>0.86</td>\n",
" <td>-4.19</td>\n",
" <td>&lt;0.005</td>\n",
" <td>15.15</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">StreamingTV</th>\n",
" <td>-0.27</td>\n",
" <td>0.77</td>\n",
" <td>0.06</td>\n",
" <td>-0.38</td>\n",
" <td>-0.15</td>\n",
" <td>0.68</td>\n",
" <td>0.86</td>\n",
" <td>-4.46</td>\n",
" <td>&lt;0.005</td>\n",
" <td>16.86</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">StreamingMovies</th>\n",
" <td>-0.26</td>\n",
" <td>0.77</td>\n",
" <td>0.06</td>\n",
" <td>-0.38</td>\n",
" <td>-0.14</td>\n",
" <td>0.69</td>\n",
" <td>0.87</td>\n",
" <td>-4.36</td>\n",
" <td>&lt;0.005</td>\n",
" <td>16.26</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">PaperlessBilling</th>\n",
" <td>0.16</td>\n",
" <td>1.17</td>\n",
" <td>0.06</td>\n",
" <td>0.05</td>\n",
" <td>0.27</td>\n",
" <td>1.05</td>\n",
" <td>1.31</td>\n",
" <td>2.79</td>\n",
" <td>0.01</td>\n",
" <td>7.57</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">MonthlyCharges</th>\n",
" <td>0.07</td>\n",
" <td>1.07</td>\n",
" <td>0.00</td>\n",
" <td>0.06</td>\n",
" <td>0.07</td>\n",
" <td>1.06</td>\n",
" <td>1.07</td>\n",
" <td>26.59</td>\n",
" <td>&lt;0.005</td>\n",
" <td>515.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">TotalCharges</th>\n",
" <td>-0.00</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>-0.00</td>\n",
" <td>-0.00</td>\n",
" <td>1.00</td>\n",
" <td>1.00</td>\n",
" <td>-40.10</td>\n",
" <td>&lt;0.005</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">MultipleLines_No phone service</th>\n",
" <td>0.64</td>\n",
" <td>1.89</td>\n",
" <td>0.12</td>\n",
" <td>0.41</td>\n",
" <td>0.87</td>\n",
" <td>1.51</td>\n",
" <td>2.38</td>\n",
" <td>5.50</td>\n",
" <td>&lt;0.005</td>\n",
" <td>24.62</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">MultipleLines_Yes</th>\n",
" <td>-0.20</td>\n",
" <td>0.82</td>\n",
" <td>0.05</td>\n",
" <td>-0.30</td>\n",
" <td>-0.09</td>\n",
" <td>0.74</td>\n",
" <td>0.91</td>\n",
" <td>-3.68</td>\n",
" <td>&lt;0.005</td>\n",
" <td>12.09</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">Contract_One year</th>\n",
" <td>-1.40</td>\n",
" <td>0.25</td>\n",
" <td>0.10</td>\n",
" <td>-1.60</td>\n",
" <td>-1.20</td>\n",
" <td>0.20</td>\n",
" <td>0.30</td>\n",
" <td>-13.78</td>\n",
" <td>&lt;0.005</td>\n",
" <td>141.13</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">Contract_Two year</th>\n",
" <td>-4.05</td>\n",
" <td>0.02</td>\n",
" <td>0.20</td>\n",
" <td>-4.43</td>\n",
" <td>-3.66</td>\n",
" <td>0.01</td>\n",
" <td>0.03</td>\n",
" <td>-20.74</td>\n",
" <td>&lt;0.005</td>\n",
" <td>314.88</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">PaymentMethod_Credit card (automatic)</th>\n",
" <td>-0.01</td>\n",
" <td>0.99</td>\n",
" <td>0.09</td>\n",
" <td>-0.18</td>\n",
" <td>0.17</td>\n",
" <td>0.83</td>\n",
" <td>1.19</td>\n",
" <td>-0.06</td>\n",
" <td>0.95</td>\n",
" <td>0.07</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">PaymentMethod_Electronic check</th>\n",
" <td>0.38</td>\n",
" <td>1.46</td>\n",
" <td>0.07</td>\n",
" <td>0.24</td>\n",
" <td>0.52</td>\n",
" <td>1.27</td>\n",
" <td>1.69</td>\n",
" <td>5.20</td>\n",
" <td>&lt;0.005</td>\n",
" <td>22.29</td>\n",
" </tr>\n",
" <tr>\n",
" <th style=\"min-width: 12;\">PaymentMethod_Mailed check</th>\n",
" <td>0.52</td>\n",
" <td>1.68</td>\n",
" <td>0.09</td>\n",
" <td>0.35</td>\n",
" <td>0.69</td>\n",
" <td>1.42</td>\n",
" <td>1.99</td>\n",
" <td>5.96</td>\n",
" <td>&lt;0.005</td>\n",
" <td>28.57</td>\n",
" </tr>\n",
" </tbody>\n",
"</table><br><div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <th>Concordance</th>\n",
" <td>0.93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Partial AIC</th>\n",
" <td>25415.41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>log-likelihood ratio test</th>\n",
" <td>5928.67 on 19 df</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-log2(p) of ll-ratio test</th>\n",
" <td>inf</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/latex": [
"\\begin{tabular}{lrrrrrrrrrr}\n",
"\\toprule\n",
"{} & coef & exp(coef) & se(coef) & coef lower 95\\% & coef upper 95\\% & exp(coef) lower 95\\% & exp(coef) upper 95\\% & z & p & -log2(p) \\\\\n",
"covariate & & & & & & & & & & \\\\\n",
"\\midrule\n",
"SeniorCitizen & 0.03 & 1.03 & 0.06 & -0.08 & 0.14 & 0.93 & 1.16 & 0.60 & 5.487665e-01 & 0.87 \\\\\n",
"Partner & -0.19 & 0.82 & 0.06 & -0.30 & -0.09 & 0.74 & 0.92 & -3.52 & 4.241057e-04 & 11.20 \\\\\n",
"Dependents & -0.10 & 0.91 & 0.07 & -0.23 & 0.04 & 0.79 & 1.04 & -1.39 & 1.652144e-01 & 2.60 \\\\\n",
"OnlineSecurity & -0.38 & 0.68 & 0.07 & -0.51 & -0.25 & 0.60 & 0.78 & -5.65 & 1.609159e-08 & 25.89 \\\\\n",
"OnlineBackup & -0.29 & 0.75 & 0.06 & -0.40 & -0.18 & 0.67 & 0.83 & -5.22 & 1.789263e-07 & 22.41 \\\\\n",
"DeviceProtection & -0.16 & 0.85 & 0.06 & -0.27 & -0.05 & 0.76 & 0.95 & -2.85 & 4.346366e-03 & 7.85 \\\\\n",
"TechSupport & -0.28 & 0.76 & 0.07 & -0.41 & -0.15 & 0.67 & 0.86 & -4.19 & 2.758595e-05 & 15.15 \\\\\n",
"StreamingTV & -0.27 & 0.77 & 0.06 & -0.38 & -0.15 & 0.68 & 0.86 & -4.46 & 8.380722e-06 & 16.86 \\\\\n",
"StreamingMovies & -0.26 & 0.77 & 0.06 & -0.38 & -0.14 & 0.69 & 0.87 & -4.36 & 1.272423e-05 & 16.26 \\\\\n",
"PaperlessBilling & 0.16 & 1.17 & 0.06 & 0.05 & 0.27 & 1.05 & 1.31 & 2.79 & 5.245333e-03 & 7.57 \\\\\n",
"MonthlyCharges & 0.07 & 1.07 & 0.00 & 0.06 & 0.07 & 1.06 & 1.07 & 26.59 & 9.259870e-156 & 515.01 \\\\\n",
"TotalCharges & -0.00 & 1.00 & 0.00 & -0.00 & -0.00 & 1.00 & 1.00 & -40.10 & 0.000000e+00 & inf \\\\\n",
"MultipleLines\\_No phone service & 0.64 & 1.89 & 0.12 & 0.41 & 0.87 & 1.51 & 2.38 & 5.50 & 3.877513e-08 & 24.62 \\\\\n",
"MultipleLines\\_Yes & -0.20 & 0.82 & 0.05 & -0.30 & -0.09 & 0.74 & 0.91 & -3.68 & 2.293482e-04 & 12.09 \\\\\n",
"Contract\\_One year & -1.40 & 0.25 & 0.10 & -1.60 & -1.20 & 0.20 & 0.30 & -13.78 & 3.267030e-43 & 141.13 \\\\\n",
"Contract\\_Two year & -4.05 & 0.02 & 0.20 & -4.43 & -3.66 & 0.01 & 0.03 & -20.74 & 1.630391e-95 & 314.88 \\\\\n",
"PaymentMethod\\_Credit card (automatic) & -0.01 & 0.99 & 0.09 & -0.18 & 0.17 & 0.83 & 1.19 & -0.06 & 9.541490e-01 & 0.07 \\\\\n",
"PaymentMethod\\_Electronic check & 0.38 & 1.46 & 0.07 & 0.24 & 0.52 & 1.27 & 1.69 & 5.20 & 1.951765e-07 & 22.29 \\\\\n",
"PaymentMethod\\_Mailed check & 0.52 & 1.68 & 0.09 & 0.35 & 0.69 & 1.42 & 1.99 & 5.96 & 2.502489e-09 & 28.57 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n"
],
"text/plain": [
"<lifelines.CoxPHFitter: fitted with 7043 total observations, 5174 right-censored observations>\n",
" duration col = 'tenure'\n",
" event col = 'Churn'\n",
" baseline estimation = breslow\n",
" number of observations = 7043\n",
"number of events observed = 1869\n",
" partial log-likelihood = -12688.70\n",
" time fit was run = 2021-06-15 09:14:33 UTC\n",
"\n",
"---\n",
" coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%\n",
"covariate \n",
"SeniorCitizen 0.03 1.03 0.06 -0.08 0.14 0.93 1.16\n",
"Partner -0.19 0.82 0.06 -0.30 -0.09 0.74 0.92\n",
"Dependents -0.10 0.91 0.07 -0.23 0.04 0.79 1.04\n",
"OnlineSecurity -0.38 0.68 0.07 -0.51 -0.25 0.60 0.78\n",
"OnlineBackup -0.29 0.75 0.06 -0.40 -0.18 0.67 0.83\n",
"DeviceProtection -0.16 0.85 0.06 -0.27 -0.05 0.76 0.95\n",
"TechSupport -0.28 0.76 0.07 -0.41 -0.15 0.67 0.86\n",
"StreamingTV -0.27 0.77 0.06 -0.38 -0.15 0.68 0.86\n",
"StreamingMovies -0.26 0.77 0.06 -0.38 -0.14 0.69 0.87\n",
"PaperlessBilling 0.16 1.17 0.06 0.05 0.27 1.05 1.31\n",
"MonthlyCharges 0.07 1.07 0.00 0.06 0.07 1.06 1.07\n",
"TotalCharges -0.00 1.00 0.00 -0.00 -0.00 1.00 1.00\n",
"MultipleLines_No phone service 0.64 1.89 0.12 0.41 0.87 1.51 2.38\n",
"MultipleLines_Yes -0.20 0.82 0.05 -0.30 -0.09 0.74 0.91\n",
"Contract_One year -1.40 0.25 0.10 -1.60 -1.20 0.20 0.30\n",
"Contract_Two year -4.05 0.02 0.20 -4.43 -3.66 0.01 0.03\n",
"PaymentMethod_Credit card (automatic) -0.01 0.99 0.09 -0.18 0.17 0.83 1.19\n",
"PaymentMethod_Electronic check 0.38 1.46 0.07 0.24 0.52 1.27 1.69\n",
"PaymentMethod_Mailed check 0.52 1.68 0.09 0.35 0.69 1.42 1.99\n",
"\n",
" z p -log2(p)\n",
"covariate \n",
"SeniorCitizen 0.60 0.55 0.87\n",
"Partner -3.52 <0.005 11.20\n",
"Dependents -1.39 0.17 2.60\n",
"OnlineSecurity -5.65 <0.005 25.89\n",
"OnlineBackup -5.22 <0.005 22.41\n",
"DeviceProtection -2.85 <0.005 7.85\n",
"TechSupport -4.19 <0.005 15.15\n",
"StreamingTV -4.46 <0.005 16.86\n",
"StreamingMovies -4.36 <0.005 16.26\n",
"PaperlessBilling 2.79 0.01 7.57\n",
"MonthlyCharges 26.59 <0.005 515.01\n",
"TotalCharges -40.10 <0.005 inf\n",
"MultipleLines_No phone service 5.50 <0.005 24.62\n",
"MultipleLines_Yes -3.68 <0.005 12.09\n",
"Contract_One year -13.78 <0.005 141.13\n",
"Contract_Two year -20.74 <0.005 314.88\n",
"PaymentMethod_Credit card (automatic) -0.06 0.95 0.07\n",
"PaymentMethod_Electronic check 5.20 <0.005 22.29\n",
"PaymentMethod_Mailed check 5.96 <0.005 28.57\n",
"---\n",
"Concordance = 0.93\n",
"Partial AIC = 25415.41\n",
"log-likelihood ratio test = 5928.67 on 19 df\n",
"-log2(p) of ll-ratio test = inf"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Fitting survival regression model \n",
"cph = lifelines.CoxPHFitter()\n",
"cph.fit(churn_hazard, duration_col='tenure', event_col='Churn', show_progress=False)\n",
"cph.print_summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* In the above regression, the key output is exp(coef). This is interpreted as the scaling of hazard risk for each additional unit of the variable, 1.00 being neutral.\n",
"* For example, the last exp(coefficient), corresponding to PaymentMethod_Mailed check, means a customer that pays by mailing a check is 1.68 times as likely to cancel their service.\n",
"* For the company, exp(coef) below 1.0 is good, meaning a customer less likely to cancel.\n",
"\n",
"To better visualize the above, we can plot the coefficient outputs and their confidence intervals."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# plotting coefficients\n",
"fig_coef, ax_coef = plt.subplots(figsize=(12,7))\n",
"ax_coef.set_title('Survival Regression: Coefficients and Confident Intervals')\n",
"cph.plot(ax=ax_coef);"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# function for creating Kaplan-Meier curve plots segmented\n",
"# by categorical variables\n",
"def plot_categorical_survival(feature, t='tenure', event='Churn', df=churn_data, ax=None):\n",
" for cat in df[feature].unique():\n",
" idx = df[feature] == cat\n",
" kmf = lifelines.KaplanMeierFitter()\n",
" kmf.fit(df[idx][t], event_observed=df[idx][event], label=cat)\n",
" kmf.plot(ax=ax, label=cat)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig_pmt, ax_pmt = plt.subplots(figsize=(12,7))\n",
"plot_categorical_survival(feature='PaymentMethod', ax=ax_pmt)\n",
"ax_pmt.set_title('Customer Churn by Payment Method')\n",
"ax_pmt.set_xlabel('Customer Tenure (Months)')\n",
"ax_pmt.set_ylabel('Customer Survival Chance (%)')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig_contract, ax_contract = plt.subplots(figsize=(12,7))\n",
"plot_categorical_survival(feature='Contract', ax=ax_contract)\n",
"ax_contract.set_title('Customer Churn by Contract Type')\n",
"ax_contract.set_xlabel('Customer Tenure (Months)')\n",
"ax_contract.set_ylabel('Customer Survival Chance (%)')\n",
"plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"\n",
"fig_dep, ax_dep = plt.subplots(figsize=(12,7))\n",
"plot_categorical_survival(feature='Dependents', ax=ax_dep)\n",
"ax_dep.set_title('Customer Churn Dependents vs. No Dependents')\n",
"ax_dep.set_xlabel('Customer Tenure (Months)')\n",
"ax_dep.set_ylabel('Customer Survival Chance (%)')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------\n",
"\n",
"#### <b> 6. Recommendations\n",
" \n",
"So coming back to our objective, how can we reduce attrition?\n",
"\n",
"Customer Specification: \n",
"* The most important feature, by far, is the presence of a 1 or 2 year contract. Customers are .25 and .02, respectively, times as likely to cancel their service if they are under contract. Cancellation fees are a possible underlying cause. As long as these fees do not prohibit new sales, we would recommend continuing to put them into as many contracts as possible.\n",
"\n",
"Customer Selection:\n",
"* Customers with a partner or dependents are .82 and .91 times as likely to cancel as normal customers. Families and other large households seem to be less likely to change providers. This could be due to higher incomes, less time to consider options, or another combination of factors.\n",
" \n",
"Payment Systems:\n",
"* There is a reason companies now default to opting employees into 401k plans. It takes effort for people to make a change, even if it is beneficial.\n",
"* Make sure your customer's default is an automatic payment made monthly. This requires little effort from the customer to remain subscribed.\n",
"* Conversely, sending a check, in the mail or electronically, is a pain. It requires effort to remain subscribed.\n",
" \n",
"--------------------\n",
"\n",
"<b> Links: <b>\n",
"\n",
"* Portfolio : https://gofornaman.github.io\n",
"* LinkedIn : https://www.linkedin.com/in/naman-doshi/\n",
"\n",
"------------"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment