Instantly share code, notes, and snippets.

Embed
What would you like to do?
How to make a simple wordcloud with R
library(dplyr)
library(tidytext)
library(janeaustenr)
# Using dplyr and janeaustenr, get the contents of 'Sense & Sensibility'
sns<-austen_books()
sns<-sns%>%
filter(book=='Sense & Sensibility')
head(sns)
# tidytext has a function called unnest_tokens to split text into words
# Here we create a new dataframe with a column 'word' made up from the 'text' column in sns
words<-sns%>%
unnest_tokens(word, text)
head(words)
# We can filter out common words (aka "stop words") using a dataframe from tidytext
words<-words%>%
filter(!(word %in% stop_words$word))
head(words)
# Summarize the number of times each word is used using dplyr
wordFreq<-words%>%
group_by(word)%>%
summarize(count=n())
# Create the wordcloud (here, we only show the top 100 words)
wordcloud(wordFreq$word, wordFreq$count, max.words=100)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment