Skip to content

Instantly share code, notes, and snippets.

@alexpghayes
Created August 20, 2020 02:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexpghayes/350b758cb2ec01ec276ed5b12a06c63e to your computer and use it in GitHub Desktop.
Save alexpghayes/350b758cb2ec01ec276ed5b12a06c63e to your computer and use it in GitHub Desktop.
library(glmnet)
library(tidyverse)
# want to fit a model with formula: as.factor(am) ~ mpg + wt * as.factor(gear)
# data prep
nice_data <- mtcars %>%
mutate_at(vars(am, gear), as.factor)
# be careful about intercepts, see the intercept argument to glmnet
formula <- am ~ mpg + wt * gear + 0
# if you need more than a single call to model.matrix() something
# is probably wrong. note that X does not have intercept column
# because of +0 in the formula
X <- model.matrix(formula, nice_data)
mf <- model.frame(formula, nice_data)
y <- model.response(mf)
# repeated cross-validation
repeats <- 10
cv_fits <- vector(mode = "list", length = repeats)
for (r in 1:repeats) {
# cv.glmnet will generate the folds on its own, let it work for you
cv_fits[[r]] <- cv.glmnet(X, y, family = "binomial")
}
str(cv_fits[1])
# average (across repeats) median risk (across folds)
list_of_median_risk_by_lambda <- lapply(cv_fits, function(x) x$cvm)
cvm_matrix <- do.call(rbind, list_of_median_risk_by_lambda)
avg_median_risks <- colMeans(cvm_matrix)
air_quotes_best_lambda_index <- which.min(avg_median_risks)
# final fit
glmnet(X, y, lambda = air_quotes_best_lambda_index, family = "binomial")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment