Skip to content

Instantly share code, notes, and snippets.

@AmeliaMN
Created October 22, 2012 18:15
Show Gist options
  • Save AmeliaMN/3933102 to your computer and use it in GitHub Desktop.
Save AmeliaMN/3933102 to your computer and use it in GitHub Desktop.
R plotting example: Box office performance and tweets
# Data came in a series of directories named for major movies,
# each containing a variable number of files of corresponding tweets.
# Function that determines the number of tweets for a movie, given the name of the directory where the files reside.
numtweets=function(moviedir){
files=list.files(moviedir, full.names=TRUE)
x={}
howmany=0
for (i in 1:length(files)){
appending=readLines(files[i])
howmany=howmany+length(appending)}
return(howmany)
}
# This was a class project, so I entered each movie by hand, as in this example:
Saw3D="/data/movies/Saw_3D/"
x[1]=numtweets(Saw3D)
# For a larger project, it would have made sense to do this automatically.
# Data modification for plotting
money=as.numeric(gsub('[^0-9.]', '', data1$Opening.Box.Office.Performance))
names=as.character(data1$Name)
# Plotting
plot(money,x,xlim=c(5000000,(max(money)+5000000)),
main="Correspondence between tweet counts and box office performance",
xlab="Box office performance, in dollars", ylab="Number of tweets")
textxy(money,x,names)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment