Skip to content

Instantly share code, notes, and snippets.

@zachmayer
Created May 3, 2011 14:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zachmayer/953429 to your computer and use it in GitHub Desktop.
Save zachmayer/953429 to your computer and use it in GitHub Desktop.
Kaggle introduction
install.packages(c("caret","reshape2","plyr","caTools"),dependencies=c("Depends", "Imports", "LinkingTo", "Suggests", "Enhances"))
#Directory
setwd('~/Kaggle/Overfitting')
#Load Required Packages
library('caret')
library('glmnet')
library('ipred')
library('e1071')
library('caTools')
Data <- read.csv("overfitting.csv", header=TRUE)
#Choose Target
Data$Target <- as.factor(ifelse(Data$Target_Practice==1,'X1','X0'))
Data$Target_Evaluate = NULL
Data$Target_Leaderboard = NULL
Data$Target_Practice = NULL
#Order
xnames <- setdiff(names(Data),c('Target','case_id','train'))
Data <- Data[,c('Target','case_id','train',xnames)]
#Split to train and test
trainset = Data[Data$train == 1,]
testset = Data[Data$train == 0,]
#Remove unwanted columns
trainset$case_id = NULL
trainset$train = NULL
#Define Formula
FL <- as.formula(paste("Target ~ ", paste(xnames, collapse= "+")))
head(Data)
tail(Data)
head(trainset)
head(testset)
print(FL)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment