Skip to content

Instantly share code, notes, and snippets.

@imjakedaniels
Last active April 15, 2019 16:22
Show Gist options
  • Save imjakedaniels/24fbb73db9efcc11fb3b97d3f858d063 to your computer and use it in GitHub Desktop.
Save imjakedaniels/24fbb73db9efcc11fb3b97d3f858d063 to your computer and use it in GitHub Desktop.
Churn Analysis Project with Clustering & Decision Trees
```{r}
install.packages("tidyverse")
library(tidyverse)
library(stringr)
churn <- read_csv(file.choose())
```
```{r}
#take numerics and remove high-correlation
ch <- data.frame(churn[,c(2,5:21)])
```
```{r}
###categorical data to logical
ch$Int.l.Plan<- str_replace_all(ch$Int.l.Plan, "no", "F")
ch$Int.l.Plan<- str_replace_all(ch$Int.l.Plan, "yes", "T")
ch$VMail.Plan <- str_replace_all(ch$VMail.Plan, "no", "F")
ch$VMail.Plan <- str_replace_all(ch$VMail.Plan, "yes", "T")
ch$Churn. <- str_replace_all(ch$Churn., "False.", "F")
ch$Churn. <- str_replace_all(ch$Churn., "True.", "T")
#logicals
ch$Intl.Plan <- as.logical(ch$Int.l.Plan)
ch$VMail.Plan <- as.logical(ch$VMail.Plan)
ch$Churn <- as.logical(ch$Churn.)
#combine mins
ch$Local.Mins = NULL
ch$Local.Mins <- c(ch$Day.Mins + ch$Eve.Mins + ch$Night.Mins)
ch$Local.Charge = NULL
ch$Local.Charge <- c(ch$Day.Charge + ch$Eve.Charge + ch$Night.Charge)
#remove old mins
ch$Day.Mins = NULL
ch$Eve.Mins = NULL
ch$Night.Mins = NULL
#export for weka
install.packages("RWeka")
library(RWeka)
write.arff(ch, file = "clusteringresults.arff")
```
Upon Identifying our the three archetypes with the highest risk to Churn, I generated an email list to send call centers to offer incentives.
```{r}
install.packages("tidyverse")
library(tidyverse)
library(stringr)
#combine area code and phone numbers, then remove
ch$PhoneNumbers <- paste(ch$Area.Code, ch$Phone)
ch$Area.Code = NULL
ch$Phone = NULL
#Customer1 - Heavy Mins
heavy_users <- which(ch$Local.Charge > 71.54 & ch$VMail.Plan == F)
Customer1 <- ch[heavy_users,]
Customer1 <- Customer1$PhoneNumbers
#Customer2 - Moderate, Low-Contact with Intl Plans
moderate_international_users <- which(ch$Local.Charge <= 71.54 & ch$CustServ.Calls <= 3 & ch$Int.l.Plan == T & (ch$Intl.Calls <= 2 | ch$Intl.Mins > 13.1))
Customer2 <- ch[moderate_international_users,]
Customer2 <- Customer2$PhoneNumbers
#Customer3 - Light, Frequent-Contact
which(ch$Local.Charge <= 71.54)
light_recurring <- which(ch$Local.Charge <= 54.12 & ch$CustServ.Calls > 3)
Customer3 <- ch[light_recurring,]
Customer3 <- Customer3$PhoneNumbers
```
@imjakedaniels
Copy link
Author

imjakedaniels commented Jan 27, 2018

Clustering Details
clusterss

@imjakedaniels
Copy link
Author

imjakedaniels commented Jan 27, 2018

Pruned Decision Tree indicating strongest factors of churn and displaying the clusters of customers listed:

decision tree

Customer 1 (Red): Heavy Users are likely to Churn when they don’t have voicemail plans. Is this a service our customers do want but can’t afford to upgrade their plan? Is there a lack of cross-selling when registering new customers? Should we build the voicemail plan into the plans as an included feature to discourage this Churn?

Customer 2 & 2a (Orange): Regular Users with Recurring Contact who lean towards the lighter side of overall usage are likely to Churn. We can deduct this is because their minimal usage of their phone has been unsatisfactory for them and this frustration could lead to cancellation for another competitor.

Customer 3 (Green): Regular Users with Low Contact have only have a 3% chance of Churning. The minor exception amongst these users, classified as 2B, with the International Plan. Those with the International plan who don’t make more than 2 calls a month, as well as those who have the plan and use it heavily, present the leading driver for churn in our most secure clientele. Are the heavy international users dropping calls? Are the users with 0 International Calls have the plan because it was bundled in a service, leading to a sense of redundancy when they are paying for a service they do not use.

@imjakedaniels
Copy link
Author

imjakedaniels commented Jan 27, 2018

More examples of the archetypes in cluster analysis, my responsibility in the project.
clusteringarchetypes

We performed K-Means Clustering with Euclidean Distance. With this data, we discovered which attributes we should investigate and created customer archetypes. On screen, we see two examples of the customers our decision tree revealed to us.

The Customer 1 Archetype, who are heavy users with no voicemail plan, and the Customer 3 Archetype, who are light users with many complaints.

When these clusters of customers with a high propensity to churn are exposed, we can improve our data collection surrounding them to reveal more attributes as to why that is in the future and adapt our current strategies to better handle sensitive customers like those with >3 customer service calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment