Skip to content

Instantly share code, notes, and snippets.

@jnv
Created November 25, 2013 17:02
Show Gist options
  • Save jnv/7644662 to your computer and use it in GitHub Desktop.
Save jnv/7644662 to your computer and use it in GitHub Desktop.
N-gramy do wordcloudu
# Podle https://gist.github.com/josefslerka/4148592
## Nacteni knihoven
library(textcat)
library(tau)
library(wordcloud)
## Vytvoreni korpusu
# Pro texty ve Windows kodovani pouzijte encoding="cp1250"
mujKorpus <- Corpus(DirSource("klaus", encoding="UTF-8"), readerControl = list(language = "cz"))
ngramy <- textcnt(mujKorpus, method = "string",n=3) # Pro zmenu poctu slov zmente parametr n
#sort(ngramy, decreasing=TRUE)[1:50]
## ! Prevod textcnt na dataframe (to podstatne)
df <- data.frame(word = names(ngramy), freq=unclass(ngramy))
## Vytvoreni wordcloudu
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_ngram.png", width=1024,height=768)
wordcloud(df$word,df$freq, scale=c(10,.2),min.freq=3,
max.words=150, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment