Skip to content

Instantly share code, notes, and snippets.

@railsfactory
Created September 15, 2009 17:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save railsfactory/187489 to your computer and use it in GitHub Desktop.
Save railsfactory/187489 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'pp'
require 'digest/md5'
urls = ['http://www.yahoo.com/','http://www.railsfactory.com/','http://www.railsbuddies.com/','http://www.techcrunch.com/','http://www.wired.com/'] #['http://www.google.com/'] #,
EventMachine.run {
multi = EventMachine::MultiRequest.new
urls.each {|url| multi.add(EventMachine::HttpRequest.new(url).get)}
EM::Timer.new(1) do
puts Time.now #fetch more pages to fetch
end
multi.callback {
multi.responses[:succeeded].each do |h|
if h.response_header.status == 200
pp h.response_header
pp h.uri.to_s
#pp h.response
#puts h.methods.sort.join(",")
filename = Digest::MD5.hexdigest(h.uri.to_s)
#write to file
File.open('./data/'+ filename, 'w') { |f| f.write(h.response) }
else
#if redirect should it follow?
pp h.response_header.status
pp h.response_header
end
end
multi.responses[:failed].each do |h|
puts "#{h.inspect} failed"
end
# EventMachine.stop #disable to make it into a full fledged daemon
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment