Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@zdepablo
Forked from multidis/split_strat_scale.r
Last active August 29, 2015 14:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zdepablo/423bbb065747470b721b to your computer and use it in GitHub Desktop.
Save zdepablo/423bbb065747470b721b to your computer and use it in GitHub Desktop.
Stratified sampling: training / test data split preserving class distribution (caret functions) and scaling (standardize) the data. Stratified folds for CV.
library(caret)
## select training indices preserving class distribution
in.train <- createDataPartition(yclass, p=0.8, list=FALSE)
summary(factor(yclass))
ytra <- yclass[in.train]; summary(factor(ytra))
ytst <- yclass[-in.train]; summary(factor(ytst))
## standardize features: training parameters of scaling for test-part
Xtra <- scale(X[in.train,])
Xtest <- scale(X[-in.train,],
center = attr(Xtra,"scaled:center"),
scale = attr(Xtra,"scaled:scale"))
## stratified folds for cross-validation: say Y is a factor
table(Y)
foldInds <- createFolds(Y, k=10, list=TRUE, returnTrain=FALSE)
lapply(foldInds, function(ii) table(Y[ii])) ## verify stratification
## set returnTrain=TRUE if supplyinf these indiced to train-function,
## see https://stat.ethz.ch/pipermail/r-help/2011-May/277722.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment