Skip to content

Instantly share code, notes, and snippets.

@uhlenbrock
Created February 8, 2010 22:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save uhlenbrock/298662 to your computer and use it in GitHub Desktop.
Save uhlenbrock/298662 to your computer and use it in GitHub Desktop.
Retrieve archival tweets in json
# TODO: parameterize query
# TODO: paginate google results
require 'rubygems'
require 'mechanize'
require 'json'
tweets = []
a = Mechanize.new do |agent|
agent.user_agent_alias = 'Mac Safari'
end
a.get('http://google.com/') do |page|
search_result = page.form_with(:name => 'f') do |search|
search.q = 'daterange:2454832.50000-2454863.50000 site:twitter.com superbowl'
end.submit
search_result.links.each do |link|
next unless link.href =~ %r{/statuses/}
status_page = a.click(link)
tweet = {
:status => status_page.root.css('.entry-content').first.content,
:author => status_page.root.css('.tweet-url.screen-name').first.content,
:link => link.href
}
tweets << tweet
end
puts tweets.to_json
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment