Skip to content

Instantly share code, notes, and snippets.

@toddleo
Created May 16, 2017 15:06
Show Gist options
  • Save toddleo/051b191e34d24f43b7fc97b12cfce7e4 to your computer and use it in GitHub Desktop.
Save toddleo/051b191e34d24f43b7fc97b12cfce7e4 to your computer and use it in GitHub Desktop.
Generate term-document matrix via **tm** package, and convert to dataframe.
library(tm)
library(magrittr)
docs <- c( "The Indians were taking on the Rays on Monday night, alright? The Indians won, 8-7, and Lonnie Chisenhall and Francisco Lindor both hit home runs, alright? Brad Miller got three hits while continuing to use a pink bat, alright?"
,"That's now three alrights, in three consecutive sentences. What does that get us? It gets us movie star/fanny pack enthusiast Matthew McConaughey wearing an alright hat. He was taking in the game while filming a movie in the area this month.")
# To view the corpus, uncomment the following line:
# docs %>% VectorSource %>% VCorpus %>% inspect
tdm <- docs %>% VectorSource %>% SimpleCorpus %>% TermDocumentMatrix
# To view the Term-Document Matrix, uncomment the following line:
# tdm %>% inspect
tdm.df <- tdm %>% as.matrix %>% t %>% as.data.frame
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment