Skip to content

Instantly share code, notes, and snippets.

@tjmuehleman
Last active February 25, 2016 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tjmuehleman/5e997867adaee50fadc3 to your computer and use it in GitHub Desktop.
Save tjmuehleman/5e997867adaee50fadc3 to your computer and use it in GitHub Desktop.
require 'net/http'
require 'rubygems'
require 'json'
require 'nokogiri'
require 'open-uri'
require 'active_record'
require 'pg'
# this is an awesome computer program
# it crawls the ga state house website pulling all of hte member emails into a database
# it was written at night under the influence by a semi competent programmer. no code reviews required
def get_reps
# this lists all of the ga state reps
url = "http://www.house.ga.gov/Representatives/en-US/HouseMembersList.aspx"
doc = Nokogiri::HTML(open(url))
# searching the dom looking for a div that starts listing all of the emails
node = doc.xpath("//div[@style='font-size:13px;']")
# making a list of all <a> tags.
node.xpath("//a").each do |a|
url_frag = a.xpath("@href").text
# does the <a> tag go to a member.aspx page? if so, this is a rep's page
if url_frag.include? "member.aspx"
# clean up the url for the rep's page a bit
rep_url = fix_rep_url(url_frag)
# go to the rep page.
get_rep(rep_url)
end
end
end
# this goes to an actual rep's page
def get_rep(url)
doc = Nokogiri::HTML(open(url))
# i know there's a better way to find an <a> tag with a mailto: in it but honestly this works just fine so who gives a shit
nodes = doc.xpath("//a")
nodes.each do |a|
href = a.xpath("@href").text
if href.include? "mailto"
email = href.gsub(/mailto:/, "")
# put this into a database
Receiver.save_receiver(email)
end
end
end
# cleans up the url so we can navigate to it
def fix_rep_url(url)
base_url = "http://www.house.ga.gov/Representatives/en-US/"
full_url = base_url + url.gsub(/.\//, "")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment