Skip to content

Instantly share code, notes, and snippets.

@dpick
Created October 20, 2010 04:54
Show Gist options
  • Save dpick/635789 to your computer and use it in GitHub Desktop.
Save dpick/635789 to your computer and use it in GitHub Desktop.
Scraping IMDB user reviews
require 'rubygems'
require 'open-uri'
require 'hpricot'
movie = tt0379786
0.upto(5) do |page|
@url = "http://www.imdb.com/title/#{movie}/usercomments?start=#{page * 10}"
open(@url) { |f| @response = f.read }
doc = Hpricot(@response)
doc.search("/html/body//p").each do |content|
if !content.inner_html.include?("<a") && !content.inner_html.include?("<b>")
puts "-" * 50
puts content.inner_html
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment