Skip to content

Instantly share code, notes, and snippets.

Forked from sckott/beswordles.r
Last active December 10, 2015 13:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rossmounce/4444291 to your computer and use it in GitHub Desktop.
Save rossmounce/4444291 to your computer and use it in GitHub Desktop.
#N.B. On *ubuntu RCurl may not install for you off the bat. If so read: & sudo apt-get install libcurl4-openssl-dev
library(twitteR); library(wordcloud); library(tm); library(stringr);
# Search for #mooc tweets
mooctweets <- searchTwitter("#mooc", n=2000)
length(mooctweets) # ends up with 713 as of 03-Jan-13 at 15:42 London time
# make into a data.frame
mooctweets_df <- twListToDF(mooctweets)
# Words used
cleaned <- sapply(mooctweets_df$text, function(x) str_trim(gsub("\"|@[A-Za-z.-_]+|(RT)|(MT)|[!:;]\\s+|http[s]?://[A-Za-z0-9]+\\.?[A-Za-z0-9]+/[A-Za-z0-9]+\\.?[A-Za-z0-9]+|#[A-Za-z0-9]+", "", x), "both"), USE.NAMES=F)
cleaned_coll <- paste(cleaned, collapse=" ")
corpus <- Corpus(VectorSource(cleaned_coll))
moocCorpus <- tm_map(corpus, function(x)removeWords(x,stopwords()))
mooc_ <- TermDocumentMatrix(moocCorpus)
ap.m <- as.matrix(mooc_)
ap.v <- sort(rowSums(ap.m), decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
pal2 <- brewer.pal(8,"Dark2")
png("~/mooctweets.png", width=800, height=600)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=2, max.words=100,
random.order=FALSE, rot.per=.15, colors=pal2)
# Users
userscorpus <- Corpus(VectorSource(mooctweets_df$screenName))
userscorpus_ <- tm_map(userscorpus, function(x)removeWords(x,stopwords()))
mooc_ <- TermDocumentMatrix(userscorpus_)
ap.m <- as.matrix(mooc_)
ap.v <- sort(rowSums(ap.m), decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
pal2 <- brewer.pal(8,"Dark2")
png("~/mooctweeters.png", width=800,height=600)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=2, max.words=100,
random.order=FALSE, rot.per=.15, colors=pal2)
Copy link

If you get "Error in object[[i]] : object of type 'closure' is not subsettable"

then there's too many tweets.

1500 is the maximum if you're not specially authenticated by Twitter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment