Skip to content

Instantly share code, notes, and snippets.

@ian-bartholomew
Last active October 2, 2015 20:19
Show Gist options
  • Save ian-bartholomew/51ddae4a274882661dd7 to your computer and use it in GitHub Desktop.
Save ian-bartholomew/51ddae4a274882661dd7 to your computer and use it in GitHub Desktop.
site crawler ruby script with spidr https://github.com/postmodern/spidr
#!/usr/bin/env ruby
require 'rubygems'
require 'csv'
require 'spidr'
CSV.open('pets_crawl.csv', 'wb') do |csv|
csv << ['page title', 'page url']
Spidr.site('http://www.petsmart.com', :ignore_links => [/gsi/]) do |spider|
spider.every_html_page do |page|
puts "#{page.title}, #{page.url}"
csv << [page.title, page.url]
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment