Skip to content

Instantly share code, notes, and snippets.

@akshayjh
Forked from hopped/ml-with-c50-credits.R
Created February 11, 2017 11:53
Show Gist options
  • Save akshayjh/19b1ac27230b530884b2021f961ea87e to your computer and use it in GitHub Desktop.
Save akshayjh/19b1ac27230b530884b2021f961ea87e to your computer and use it in GitHub Desktop.
Identifying risky bank loans using C5.0 with boosting and cost matrix
# Download data set via:
# http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
#
# Note, the example below uses the pre-processed data that is used in the book 'Machine Learning with R' by Brett Lantz
library(C50)
df <- read.csv("credit.csv", stringsAsFactors=TRUE)
set.seed(12345)
df_rand <- df[order(runif(1000)),]
df_train <- df_rand[1:900,]
df_test <- df_rand[901:1000,]
names <- list(c("no", "yes"), c("no", "yes"))
error_cost <- matrix(c(0,1,4,0), nrow=2, dimnames=names)
df_model <- C5.0(df_train[-17], df_train$default, trials = 10, costs = error_cost)
df_pred <- predict(df_model, df_test)
confusionMatrix(df_pred, df_test$default)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment