Skip to content

Instantly share code, notes, and snippets.

@arademaker
Created August 13, 2011 12:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arademaker/1143801 to your computer and use it in GitHub Desktop.
Save arademaker/1143801 to your computer and use it in GitHub Desktop.
Wordcloud em R
library(RColorBrewer)
library(wordcloud)
pal2 <- brewer.pal(8,"Set2")
lines <- readLines("pg32519.txt")
lines <- lines[45:3192]
text <- paste(lines, collapse = " ")
words <- unlist(strsplit(tolower(text), "\\s+|,|\\(|\\)|:|\\.|!|\\?|;"))
stopwords <- c("para", "from","que","não","por","mais","com","seus",
"seu","uma","sua", "pelos","assim","estava","então",
"contra","muitos","agora","entre",
"tanto","tambem","havia","haviam","quase","sempre","colombo")
pos <- c()
for(w in stopwords)
pos <- c(pos, which(words == w))
words <- words[-pos]
words <- words[nchar(words) > 3]
twords <- table(words)
# To check if I need more stopwords
# tmp <- as.data.frame(twords)
# head(tmp[order(tmp$Freq,decreasing=T),], 20)
# Removing the words that occurs less than 2 times
twords <- twords[twords > 2]
# the plot
png("cloud.png", width=580, height=580)
wordcloud(names(twords), twords, scale=c(9,.1),min.freq=2, max.words=Inf,
random.order=F, rot.per=.3, colors=pal2)
dev.off()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment