Skip to content

Instantly share code, notes, and snippets.

@danielharan
Created October 2, 2008 01:24
Show Gist options
  • Save danielharan/14250 to your computer and use it in GitHub Desktop.
Save danielharan/14250 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'mechanize'
postal_codes = File.open("postal_codes.txt").read.split("\n")
# randomize to make the pattern slightly harder to see in logs
postal_codes = postal_codes.sort_by {|e| rand(10_000)}
@agent = WWW::Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
a.max_history = 1
end
@page = @agent.get("http://www.conservative.ca/EN/1051")
def scrape(postcode)
lookup = @page.forms.first
lookup.fields.name("postal_code").value = postcode
search_results = @agent.submit(lookup)
# write file
File.open("pages/#{postcode.sub(' ', '')}", "w") do |f|
f.puts search_results.body
end
end
postal_codes.each do |postcode|
next if File.exists?("pages/#{postcode.sub(' ', '')}")
scrape(postcode)
sleep(1 + (rand(4_000) / 1_000.0)) # pause 1-5 seconds
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment