Skip to content

Instantly share code, notes, and snippets.

@statcompute
Created November 24, 2018 21:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save statcompute/66aef169d86da46ee1cd2e187e5ab0fd to your computer and use it in GitHub Desktop.
Save statcompute/66aef169d86da46ee1cd2e187e5ab0fd to your computer and use it in GitHub Desktop.
Monotonic Binning Based on Isotonic Regression
isoreg_bin <- function(data, y, x) {
n1 <- 50
n2 <- 10
yname <- deparse(substitute(y))
xname <- deparse(substitute(x))
df1 <- data[, c(yname, xname)]
df2 <- df1[!is.na(df1[, xname]), c(xname, yname)]
cor <- cor(df2[, 2], df2[, 1], method = "spearman", use = "complete.obs")
reg <- isoreg(df2[, 1], cor / abs(cor) * df2[, 2])
cut <- knots(as.stepfun(reg))
df2$cut <- cut(df2[[xname]], breaks = unique(cut), include.lowest = T)
df3 <- Reduce(rbind,
lapply(split(df2, df2$cut),
function(x) data.frame(n = nrow(x),
b = sum(x[[yname]]),
g = sum(1 - x[[yname]]),
maxx = max(x[[xname]]),
minx = min(x[[xname]]))))
df4 <- df3[which(df3[["n"]] > n1 & df3[["b"]] > n2 & df3[["g"]] > n2), ]
df1$good <- 1 - df1[[yname]]
return(smbinning::smbinning.custom(df1, "good", xname, cuts = df4$maxx[-nrow(df4)])$ivtable)
}
df <- sas7bdat::read.sas7bdat("Downloads/accepts.sas7bdat")
isoreg_bin(df, bad, bureau_score)
@statcompute
Copy link
Author

Monotonic binning is an important algorithm used in the scorecard development of consumer credit risk. The R function isoreg_bin() implements the monotonic binning based on the isotonic regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment