Skip to content

Instantly share code, notes, and snippets.

@robertzk
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robertzk/abe160499cb9456e2bce to your computer and use it in GitHub Desktop.
Save robertzk/abe160499cb9456e2bce to your computer and use it in GitHub Desktop.
variable by buckets
function(model, data, factor_variable, top = 4, stat = function(y) median(y, na.rm = T)) {
library(productivus)
if (is(model, 'tundraContainer')) {
data <- model$munge(data)
model <- model$output$model
}
stopifnot(is(model, 'gbm'))
tbl <- names(table(data[[factor_variable]]))
tops <- tbl[seq_len(min(top, length(tbl)))]
~{as.character(data[[factor_variable]])}
data[[factor_variable]][!data[[factor_variable]] %in% tops] <- 'other'
~{as.factor(data[[factor_variable]])}
library(gbm)
vars <- head(summary(model)[, 1], 20, plotit = FALSE)
vars <- intersect(vars, Filter(function(x) is.numeric(data[[x]]), colnames(data)))
df <- data.frame(do.call(cbind, lapply(levels(data[[factor_variable]]), function(x) {
sapply(vars, function(v) stat(data[data[[factor_variable]] == x, v]))
})))
colnames(df) <- levels(data[[factor_variable]])
df$rownames <- row.names(df)
df
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment