Skip to content

Instantly share code, notes, and snippets.

@psamim
Last active May 28, 2016 19:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save psamim/0e4e182ca534b681a39ae9c7e16d4fc8 to your computer and use it in GitHub Desktop.
Save psamim/0e4e182ca534b681a39ae9c7e16d4fc8 to your computer and use it in GitHub Desktop.
library("openxlsx")
library("rpart")
library("rpart.plot")
## setwd("/home/samim/workspace/danial")
# Load the data
wb <- read.xlsx("data.xlsx")
# get average over range Monthly.Income
## temp <- wb$Monthly.Income
## temp[temp == "1200-180000-800"] <- NA
## temp[temp == "9"] <- 15000
## wb$Monthly.Income <- sapply(strsplit(temp , "-") , function(i) mean(as.numeric(i)))
# get average over range Age
## temp <- wb$Age
## temp <- gsub( "over" , "" , temp)
## wb$Age <- sapply(strsplit(temp , "to") , function(i) mean(as.numeric(i)))
# Model it using rpart
formula <- Influence.in.Buying.Behaviour.Newsletter ~ Age + Monthly.Income
model <- rpart(formula, method = "anova", data = wb, cp = 10^(-6))
# Prune the tree
cp9 = which(model$cptable[, 2] == 9)
tree9 = prune(model, model$cptable[cp9, 1])
# create attractive pdf plot of tree
pdf("spactree9.pdf")
prp(tree9, extra = 100)
dev.off()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment