Skip to content

Instantly share code, notes, and snippets.

@s13731105
Last active August 29, 2015 14:04
Show Gist options
  • Save s13731105/94564744529a2f848528 to your computer and use it in GitHub Desktop.
Save s13731105/94564744529a2f848528 to your computer and use it in GitHub Desktop.
Load Data
library(tm)
setwd('C:/test/001')
#sets R's working directory to near where my files are
a <-Corpus(DirSource("/001"), readerControl = list(language="lat"))
summary(a)
a <- tm_map(a, removeNumbers)
a <- tm_map(a , stripWhitespace)
a <- tm_map(a, removePunctuation)
a <- tm_map(a, content_transformer(tolower))
a <- tm_map(a, removeWords, stopwords("english"))
# this stopword file is at C:\Users\[username]\Documents\R\win-library\2.13\tm\stopwords
a <- tm_map(a, stemDocument, language = "english")
adtm <-DocumentTermMatrix(a)
adtm <- removeSparseTerms(adtm, 0.95)
inspect(adtm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment