Skip to content

Instantly share code, notes, and snippets.

@leobarone
Created May 13, 2016 13:09
Show Gist options
  • Save leobarone/866c044d5db1e8adf352f85bf78a901c to your computer and use it in GitHub Desktop.
Save leobarone/866c044d5db1e8adf352f85bf78a901c to your computer and use it in GitHub Desktop.
Wordcloud ponte para o futuro
library(tm)
library(SnowballC)
library(wordcloud)
getwd()
pdfToText <- function(arquivo){
texto <- readPDF(control = list(text = "-layout"))(elem = list(uri = arquivo),
language = "pt", id = "id1")
texto <- as.character(texto)
return(texto)
}
download.file("http://pmdb.org.br/wp-content/uploads/2015/10/RELEASE-TEMER_A4-28.10.15-Online.pdf", "~/ponte.pdf")
texto <- pdfToText("ponte.pdf")
dir.create("ponte")
writeLines(texto, "~/ponte/ponte.txt")
file.remove("ponte.pdf")
ponteCorpus <- VCorpus(DirSource("~/ponte"), readerControl = list(language = "por"))
inspect(ponteCorpus)
ponteCorpus <- tm_map(ponteCorpus, stripWhitespace)
ponteCorpus <- tm_map(ponteCorpus, content_transformer(tolower))
ponteCorpus <- tm_map(ponteCorpus, removeWords, stopwords("portuguese"))
ponteCorpus <- tm_map(ponteCorpus, removePunctuation)
ponteCorpus <- tm_map(ponteCorpus, removeNumbers)
as.character(ponteCorpus[[1]])
wordcloud(ponteCorpus, max.words = 100, random.order = FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment