Skip to content

Instantly share code, notes, and snippets.

@elecnix
Forked from danielharan/scrape_conservative_ca
Created October 3, 2008 18:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save elecnix/14599 to your computer and use it in GitHub Desktop.
Save elecnix/14599 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'mechanize'
postal_codes = File.open("postal_codes.txt").read.split("\n")
# randomize to make the pattern slightly harder to see in logs
postal_codes = postal_codes.sort_by {|e| rand(10_000)}
@agent = WWW::Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
a.max_history = 1
end
@page = @agent.get("http://www.conservative.ca/EN/1051")
def scrape(postcode)
lookup = @page.forms.first
lookup.fields.name("postal_code").value = postcode
search_results = @agent.submit(lookup)
puts postcode
# write file
postcode.sub! ' ', '' # done in javascript on the site, "A1A1A1" is OK, "A1A 1A1" is NOT
File.open("pages/#{postcode}", "w") do |f|
f.puts search_results.body
end
end
postal_codes.each do |postcode|
pc = postcode.sub ' ', '' # done in javascript on the site, "A1A1A1" is OK, "A1A 1A1" is NOT
next if File.exists?("pages/#{pc}")
scrape(postcode)
sleep(1 + (rand(4_000) / 1_000.0)) # pause 1-5 seconds
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment