Created
May 16, 2017 15:06
-
-
Save toddleo/051b191e34d24f43b7fc97b12cfce7e4 to your computer and use it in GitHub Desktop.
Generate term-document matrix via **tm** package, and convert to dataframe.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tm) | |
library(magrittr) | |
docs <- c( "The Indians were taking on the Rays on Monday night, alright? The Indians won, 8-7, and Lonnie Chisenhall and Francisco Lindor both hit home runs, alright? Brad Miller got three hits while continuing to use a pink bat, alright?" | |
,"That's now three alrights, in three consecutive sentences. What does that get us? It gets us movie star/fanny pack enthusiast Matthew McConaughey wearing an alright hat. He was taking in the game while filming a movie in the area this month.") | |
# To view the corpus, uncomment the following line: | |
# docs %>% VectorSource %>% VCorpus %>% inspect | |
tdm <- docs %>% VectorSource %>% SimpleCorpus %>% TermDocumentMatrix | |
# To view the Term-Document Matrix, uncomment the following line: | |
# tdm %>% inspect | |
tdm.df <- tdm %>% as.matrix %>% t %>% as.data.frame |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment