Skip to content

Instantly share code, notes, and snippets.

@dselivanov
Created October 22, 2015 07:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dselivanov/ffb535ab1447956bb10f to your computer and use it in GitHub Desktop.
Save dselivanov/ffb535ab1447956bb10f to your computer and use it in GitHub Desktop.
library(devtools)
library(magrittr)
install_github("dselivanov/text2vec")
data("movie_review")
# this works fine:
dtm <- create_dict_corpus(src = movie_review[['review']][1:100],
preprocess_fun = tolower,
tokenizer = regexp_tokenizer,
batch_size = 100,
progress = F) %>%
get_dtm(corpus = corp, type = "dgCMatrix")
dim(dtm)
#but this crashes:
corp <- create_dict_corpus(src = movie_review[['review']][1:100],
preprocess_fun = tolower,
tokenizer = regexp_tokenizer,
batch_size = 100,
progress = F)
str(corp)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment