Skip to content

Instantly share code, notes, and snippets.

@arjunvenkat
Created December 12, 2012 17:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arjunvenkat/4269615 to your computer and use it in GitHub Desktop.
Save arjunvenkat/4269615 to your computer and use it in GitHub Desktop.
build a scraper using Nokogiri
# nokogiri requires open-uri to work properly
require 'nokogiri'
require 'open-uri'
# Putting it all together
# ===============================================
# initialize a url and feed into Nokogiri
url = "http://www.rottentomatoes.com/m/lincoln_2011/"
doc = Nokogiri::HTML(open(url))
# Creates an array that contains all the rotten/fresh
# reviews on the page along with their respective critics
# drills down to the lowest level that still contains all necessary data
critics = doc.css('div#reviews div.quote_bubble')
reviews_array = []
critics.each do |critic|
# drills down to pull out a critic's name
name = critic.css('div.media_block_content div.bold')
name = name.text.strip
review = [name] # saves the name into an array called review
# drills down to the element that may contain a fresh class
fresh = critic.css('div.quote_contents div.fresh')
#checks if a fresh class exists
if fresh.empty?
review << "rotten"
else
review << "fresh"
end
reviews_array << review
end
puts reviews_array.inspect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment