Last active
May 31, 2017 22:55
-
-
Save thomas-kassel/c8ca5e17c89f3cf572ef72d8e68c47a7 to your computer and use it in GitHub Desktop.
Example of ML preprocessing in h2o (RECS dataset)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Initiate remote h2o cluster (receives and processes dataset) | |
# No modeling is done locally - an address key is saved to reference the remote version | |
h2o.init(nthreads = -1) | |
# Prepare h2o inputs for modeling | |
recs.reduced2.h2o <- as.h2o(recs.reduced2) # Coerce DF to an h2o object | |
set.seed(0) # For reproducibility of train/test split | |
# Split h2o data into training, validation, and test frames | |
data.split <- h2o.splitFrame(recs.reduced2.h2o,ratios = c(.7,.2)) | |
train <- data.split[[1]] # For training | |
valid <- data.split[[2]] # For validating trained models and comparing different hyperparameter vectors | |
test <- data.split[[3]] # For final evaluation of model performance | |
y = "KWH" | |
x <- setdiff(x = colnames(train), y = "KWH") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment