Skip to content

Instantly share code, notes, and snippets.

@gnarl
Created March 2, 2013 01:49
Show Gist options
  • Save gnarl/5069258 to your computer and use it in GitHub Desktop.
Save gnarl/5069258 to your computer and use it in GitHub Desktop.
Scraping Trulia to find a house
require 'mechanize'
require 'awesome_print'
BASE = "http://www.trulia.com"
PREFIX = "/for_sale/"
PRICE = "215000-262000_price/"
NL = "7_nl/"
CITY_LIST = ["Apex,NC;Cary,NC/", "28846,28847_nh/27612,27613,27617_zip/"]
House = Struct.new(:address, :price, :link, :detail)
@agent = Mechanize.new
def load_page(city)
url = BASE + PREFIX + city + PRICE + NL
ap url
@agent.get(url)
end
def find_houses(page)
address_sections = page.search("div.property-data-elem")
ap address_sections.size
houses = []
address_sections.each do |sectnode|
address = sectnode.at_css("div.address_section").at_css("a").text.strip
link = BASE + sectnode.at_css("div.address_section").at_css("a")['href']
price = sectnode.at_css("div.price_section").text.strip
house = House.new(address, price, link, "" )
houses << house
end
houses
end
####################################################
CITY_LIST.each do |city|
page = load_page(city)
houses = find_houses(page)
houses.each do |house|
puts house.address + " " + house.price
puts house.link + "\n\n"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment