Skip to content

Instantly share code, notes, and snippets.

@thiagomarzagao
Created May 31, 2016 01:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thiagomarzagao/dd4a14e10d102bf4d517d44065f5d921 to your computer and use it in GitHub Desktop.
Save thiagomarzagao/dd4a14e10d102bf4d517d44065f5d921 to your computer and use it in GitHub Desktop.
library(tm)
setwd('/Users/thiagomarzagao/Dropbox/dataScience/UnB-CIC/aulaText/')
comprasnet <- read.table('subset.csv',
stringsAsFactors = FALSE,
sep = ',',
nrows = 1000)
corpus <- Corpus(VectorSource(comprasnet$V2))
corpus <- tm_map(corpus, PlainTextDocument)
tfidf <- DocumentTermMatrix(corpus, control = list(weighting = weightTfIdf))
tfidf <- as.data.frame(inspect(tfidf))
xnames <- colnames(tfidf)
tfidf$labels <- as.factor(comprasnet$V1)
library(h2o)
h2o.init(ip = 'localhost',
port = 54321,
nthreads = -1,
max_mem_size = '14G')
trainedModel <- h2o.deeplearning(x = xnames,
y = 'labels',
training_frame = as.h2o(tfidf),
nfolds = 10,
sparse = TRUE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment