Skip to content

Instantly share code, notes, and snippets.

@michael-erasmus
Last active July 27, 2017 13:13
Show Gist options
  • Save michael-erasmus/eb0e30c4fba8ba5a5121 to your computer and use it in GitHub Desktop.
Save michael-erasmus/eb0e30c4fba8ba5a5121 to your computer and use it in GitHub Desktop.
This is a quick R script that will generate a world cloud from a Slack app team export
#Obviously these need to be installed!
library(jsonlite)
library(tm)
library(wordcloud)
files <- list.files('.',"*.json", recursive=T)
json <- sapply(files, fromJSON)
texts <- sapply(json, function(f){if ('subtype' %in% names(f)) f$text[is.na(f$subtype)] else f$text})
flat <- unlist(texts)
corpus <- Corpus(VectorSource(flat))
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords('english'))
#remove words slack adds
corpus1 <- tm_map(corpus, removeWords, c('UZDP', 'UULQF', 'UKZS','UCNJ', 'UUJX','UULQF'))
wordcloud(corpus1, scale=c(2.5,0.5), max.words=1000, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, "Dark2"))
@mfornasa
Copy link

A couple of bug fixes:

  • corpus1 <- tm_map(corpus1, removeWords, c('UZDP', 'UULQF', 'UKZS','UCNJ', 'UUJX','UULQF')) should be corpus1 <- tm_map(corpus, removeWords, c('UZDP', 'UULQF', 'UKZS','UCNJ', 'UUJX','UULQF')) (I think)
  • the tolower line need to be corpus <- tm_map(corpus, content_transformer(tolower)) on newest tm versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment