Skip to content

Instantly share code, notes, and snippets.

@leobarone
Last active May 13, 2016 13:45
Show Gist options
  • Save leobarone/46d5655959dac08cde0fb9f3a7ba5d0c to your computer and use it in GitHub Desktop.
Save leobarone/46d5655959dac08cde0fb9f3a7ba5d0c to your computer and use it in GitHub Desktop.
Discurso de Posse Presidente Interino
library(XML)
library(tm)
library(SnowballC)
library(wordcloud)
url <- "http://www2.planalto.gov.br/acompanhe-o-planalto/discursos/discurso-do-presidente-da-republica-michel-temer-durante-cerimonia-de-posse-dos-novos-ministros-de-estado-palacio-do-planalto"
pagina <- xmlRoot(htmlParse(readLines(url)))
texto <- xpathSApply(pagina, "//div[@id = 'parent-fieldname-text']", xmlValue)
dir.create("posse_interino")
writeLines(texto, "~/posse_interino/posse_interino.txt")
ponteCorpus <- VCorpus(DirSource("~/posse_interino"), readerControl = list(language = "por"))
inspect(ponteCorpus)
ponteCorpus <- tm_map(ponteCorpus, stripWhitespace)
ponteCorpus <- tm_map(ponteCorpus, content_transformer(tolower))
ponteCorpus <- tm_map(ponteCorpus, removeWords, stopwords("portuguese"))
ponteCorpus <- tm_map(ponteCorpus, removePunctuation)
ponteCorpus <- tm_map(ponteCorpus, removeNumbers)
as.character(ponteCorpus[[1]])
wordcloud(ponteCorpus, max.words = 100, random.order = FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment