Skip to content

Instantly share code, notes, and snippets.

View ledell's full-sized avatar
💭
Check out H2O AutoML: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

Erin LeDell ledell

💭
Check out H2O AutoML: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
View GitHub Profile
@ledell
ledell / covid_meetups_rladies_wimlds.R
Created December 17, 2020 05:24
Count the number of R-Ladies & WiMLDS meetups since COVID quarantine started
# Count the number of covid meetups for R-Ladies and WiMLDS
library(meetupr)
library(tidyverse)
# Look up all R-Ladies & WiMLDS groups by "topic id" & count the events.
# You can find topic ids for associated tags by querying
# [here](https://secure.meetup.com/meetup_api/console/?path=/find/topics).
# The `topic_id` for topic, "R-Ladies", is 1513883.
# The `topic_id` for topic, "WiMLDS", is 1517030.
@ledell
ledell / covid_meetups.R
Last active December 17, 2020 05:23
Count the number of meetups for a group since COVID quarantine started
# Meetups since quarantine started (feel free to adjust the date to your local lockdown date)
library(meetupr)
library(tidyverse)
meetup_urlname <- "Bay-Area-Women-in-Machine-Learning-and-Data-Science" #insert your meetup urlnamne here
events <- get_events(urlname = meetup_urlname,
event_status = "past")
events %>%
@ledell
ledell / install_latest_h2o.R
Created June 17, 2020 23:24
Install the latest H2O R package (latest stable version is sometimes head of what's on CRAN)
install.packages("h2o", repos="http://h2o-release.s3.amazonaws.com/h2o/latest_stable_R", method="curl")
@ledell
ledell / h2oautoml_saveload.R
Last active May 17, 2020 09:46
R function to save and load H2O AutoML projects (models & leaderboards)
library(R.utils)
# Note: For saving H2O AutoML objects, if path is NULL (default),
# then save in pwd with project_name as folder name
# This function (or something similar to it) will be part of H2O soon...
# Written by: https://github.com/tomasfryda
.dump_aml_frames <- function(aml, path) {
frames <- c(attr(aml@leaderboard, "id"), attr(aml@event_log, "id"))
frames <- c(frames,
unlist(sapply(aml@leaderboard$model_id, function(model_id)
@ledell
ledell / h2oautoml_monotonic_constraints.R
Last active April 28, 2020 23:19
Monotonic constraints in H2O AutoML
# Example of monotonic constraints in H2O AutoML (using h2o v3.30.0.1)
# monotone constraints: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/monotone_constraints.html
# H2O AutoML: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
library(h2o)
h2o.init()
# Import the prostate dataset
file <- "http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip"
prostate <- h2o.importFile(file)
@ledell
ledell / h2oautoml_get_cv_metrics.R
Created April 9, 2020 22:34
How to get k-fold metrics for all the H2O AutoML models in R
# How to get k-fold metrics for all the H2O AutoML models in R
# Adapted from: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
library(h2o)
h2o.init()
# Import a sample binary outcome train/test set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
@ledell
ledell / h2o_cluster_id_stacking.R
Last active May 8, 2020 03:51
Stacked Ensembles with clustered observations (pooled repeated measures data) in H2O
# Example of how to do Stacking using clustered (aka. "pooled repeated measures") data:
# Since stacking uses cross-validation, we must ensure that the observations from
# the same clusters are all in the same fold. We borrow the SuperLearner::CVFolds()
# function and use H2O Stacked Ensembles and AutoML to train stacked ensembles.
library(SuperLearner)
library(h2o)
h2o.init()
# Import a sample binary outcome train/test set into H2O
@ledell
ledell / h2o_automl_mushroom_classification.py
Last active June 17, 2020 12:52
H2O AutoML - Mushroom classfication
# My version of the code at this blog post:
# https://towardsdatascience.com/automl-a-tool-to-improve-your-workflow-1a132248371f
import h2o
from h2o.automl import H2OAutoML
h2o.init()
train = h2o.import_file("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data")
y = "C1" #e = edible, p = poisonous
@ledell
ledell / kaggledays-sf_h2o_automl_6000.R
Last active March 3, 2022 03:02
KaggleDays SF: H2O AutoML solution
### Kaggle Days SF: Hackathon submission (8th place)
# I used the latest version of H2O (3.24.0.1)
# Latest stable always here: http://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
# H2O 3.24.0.1: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html
# If you are a Python user, you can use the demo Python code available on the H2O AutoML User Guide
# instead: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
# Unfortunately it was a private competition, so the data is not publicly available!
library("ggplot2")
library("scales")
library("devtools")
install_github("rladies/meetupr", ref = "topic_id")
library("meetupr") #Requires topic_id branch...
api_key = "API_KEY" #Use your own meetup.com API key...
meetups <- find_groups(topic_id = 1513883, api_key = api_key) #all groups tagged with "R-Ladies" topic id
meetups <- meetups[-nrow(meetups), ] #remove Joburg-R-Users-Group
tc <- find_groups(text = "r ladies twin cities", api_key = api_key) #add Twin Cities