Skip to content

Instantly share code, notes, and snippets.

@EnriqueCanals
Created October 8, 2012 04:10
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save EnriqueCanals/3850674 to your computer and use it in GitHub Desktop.
def url_list
i = 0
pgCount = 80
while i < pgCount
i+=1
num = i < 10 ? '0'+i.to_s: i.to_s
p 'https://s3.amazonaws.com/edx-textbooks/guttag_computation/p0'+num+'.png'
end
end
require 'rubygems'
require 'open-uri'
@url = "url_list"
@response = ''
# open-uri RDoc: http://stdlib.rubyonrails.org/libdoc/open-uri/rdoc/index.html
open(@url, "User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "email@addr.com",
"Referer" => "https://s3.amazonaws.com/edx-textbooks/guttag_computation/p001.png") { |f|
puts "Fetched document: #{f.base_uri}"
puts "\t Content Type: #{f.content_type}\n"
puts "\t Charset: #{f.charset}\n"
puts "\t Content-Encoding: #{f.content_encoding}\n"
puts "\t Last Modified: #{f.last_modified}\n\n"
# Save the response body
@response = f.read
}
#Rdoc: http://code.whytheluckystiff.net/hpricot/
doc = Hpricot(@response)
# Retrive number of comments
# - Hover your mouse over the 'X Comments' heading at the end of this article
# - Copy the XPath and confirm that it's the same as shown below
puts (doc/"/html/body/img").inner_html
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment