Skip to content

Instantly share code, notes, and snippets.

@hopped
Created May 19, 2014 09:40
Show Gist options
  • Save hopped/d24241418baff3dc1e52 to your computer and use it in GitHub Desktop.
Save hopped/d24241418baff3dc1e52 to your computer and use it in GitHub Desktop.
Identifying risky bank loans using C5.0 with boosting and cost matrix
# Download data set via:
# http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
#
# Note, the example below uses the pre-processed data that is used in the book 'Machine Learning with R' by Brett Lantz
library(C50)
df <- read.csv("credit.csv", stringsAsFactors=TRUE)
set.seed(12345)
df_rand <- df[order(runif(1000)),]
df_train <- df_rand[1:900,]
df_test <- df_rand[901:1000,]
names <- list(c("no", "yes"), c("no", "yes"))
error_cost <- matrix(c(0,1,4,0), nrow=2, dimnames=names)
df_model <- C5.0(df_train[-17], df_train$default, trials = 10, costs = error_cost)
df_pred <- predict(df_model, df_test)
confusionMatrix(df_pred, df_test$default)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment