Skip to content

Instantly share code, notes, and snippets.

@Zoldin
Last active July 21, 2017 21:02
Show Gist options
  • Select an option

  • Save Zoldin/1617b39f2acbde3cd486616ac442e7cf to your computer and use it in GitHub Desktop.

Select an option

Save Zoldin/1617b39f2acbde3cd486616ac442e7cf to your computer and use it in GitHub Desktop.
train_model.R
#!/usr/bin/Rscript
library(Matrix)
library(glmnet)
# three arguments needs to be provided - train file (.txt, matrix), seed and output name for RData file
args = commandArgs(trailingOnly=TRUE)
if (!length(args)==3) {
stop("Three arguments must be supplied ( train file (.txt, matrix), seed and argument for RData model name).n", call.=FALSE)
}
#read train data set
trainMM = readMM(args[1])
set.seed(as.numeric(args[2]))
#use regular matrix, not sparse
trainMM_reg <- as.matrix(trainMM)
t1 = Sys.time()
print("Started to train the model... ")
glmnet_classifier = cv.glmnet(x = trainMM_reg[,2:500], y = trainMM_reg[,1],
family = 'binomial',
# L1 penalty
alpha = 1,
# interested in the area under ROC curve
type.measure = "auc",
# 5-fold cross-validation
nfolds = 5,
# high value is less accurate, but has faster training
thresh = 1e-3,
# again lower number of iterations for faster training
maxit = 1e3)
print("Model generated...")
print(difftime(Sys.time(), t1, units = 'sec'))
preds = predict(glmnet_classifier, trainMM_reg[,2:500], type = 'response')[,1]
print("AUC for the train... ")
glmnet:::auc(trainMM_reg[,1], preds)
save(glmnet_classifier,file=args[3])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment