Skip to content

Instantly share code, notes, and snippets.

@zachmayer
Created May 12, 2011 18:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zachmayer/969138 to your computer and use it in GitHub Desktop.
Save zachmayer/969138 to your computer and use it in GitHub Desktop.
Kaggle Competition Walkthrough: Fitting a model
####################################
# Training parameters
####################################
MyTrainControl=trainControl(
method = "repeatedCV",
number=10,
repeats=5,
returnResamp = "all",
classProbs = TRUE,
summaryFunction=twoClassSummary
)
model <- train(FL,data=trainset,method='glmnet',
metric = "ROC",
tuneGrid = expand.grid(.alpha=c(0,1),.lambda=seq(0,0.05,by=0.01)),
trControl=MyTrainControl)
model
plot(model, metric='ROC')
> model
250 samples
200 predictors
2 classes: 'X0', 'X1'
Pre-processing: None
Resampling: Cross-Validation (10 fold, repeated 1 times)
Summary of sample sizes: 225, 226, 225, 225, 225, 225, ...
Resampling results across tuning parameters:
alpha lambda Sens Spec ROC Sens SD Spec SD ROC SD
0 0 0.731 0.802 0.827 0.102 0.121 0.0888
0 0.01 0.731 0.802 0.827 0.102 0.121 0.0888
0 0.02 0.764 0.741 0.829 0.117 0.154 0.0863
0 0.03 0.698 0.817 0.827 0.131 0.109 0.0819
0 0.04 0.764 0.718 0.825 0.124 0.161 0.0825
0 0.05 0.681 0.825 0.826 0.155 0.13 0.0793
1 0 0.722 0.688 0.792 0.126 0.201 0.0947
1 0.01 0.673 0.749 0.756 0.112 0.112 0.0691
1 0.02 0.798 0.527 0.729 0.105 0.194 0.0663
1 0.03 0.539 0.748 0.69 0.156 0.189 0.0648
1 0.04 0.84 0.382 0.681 0.114 0.136 0.0616
1 0.05 0.48 0.746 0.662 0.243 0.235 0.0627
ROC was used to select the optimal model using the largest value.
The final values used for the model were alpha = 0 and lambda = 0.02.
test <- predict(model, newdata=testset, type = "prob")
colAUC(test, testset$Target)
####################################
# Setup Multicore
####################################
#source:
#http://www.r-bloggers.com/feature-selection-using-the-caret-package/
if ( require("multicore", quietly = TRUE, warn.conflicts = FALSE) ) {
MyTrainControl$workers <- multicore:::detectCores()
MyTrainControl$computeFunction <- mclapply
MyTrainControl$computeArgs <- list(mc.preschedule = FALSE, mc.set.seed = FALSE)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment