Skip to content

Instantly share code, notes, and snippets.

@thinkerbot
Created May 30, 2011 17:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thinkerbot/999211 to your computer and use it in GitHub Desktop.
Save thinkerbot/999211 to your computer and use it in GitHub Desktop.
fetch + format craigslist data into csv
require 'rubygems'
require 'rest_client'
request = 'http://denver.craigslist.org/search/cta?query=accord&srchType=T&minAsk=8000&maxAsk=20000&hasPic=1&s=%s'
page_min, page_max = ARGV
(page_min.to_i...page_max.to_i).each do |page|
puts RestClient.get(request % page)
end
# ruby fetch.rb 1 10 | ruby split.rb > accord.csv
while line = gets
# Sep 2 - <a href="http://denver.craigslist.org/cto/1933637967.html">2003 Honda Accord V6 Fully Loaded -</a>
# $8500<font size="-1"> (Frederick)</font> <small class="gc"><a href="/cto/">owner</a></small> <span class="p"> pic</span><br class="c">
if line =~ /<a href="(.*?)">(\d{4})(.*?)-<\/a>/
url, year, desc = $1, $2, $3
next unless gets =~ /\$(\d+).*?>(dealer|owner)/
price, type = $1, $2
puts "#{year},#{price},#{type},#{desc.strip},#{url}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment