Skip to content

Instantly share code, notes, and snippets.

@rob-mcgrail
Created December 4, 2012 01:08
Show Gist options
  • Save rob-mcgrail/4199595 to your computer and use it in GitHub Desktop.
Save rob-mcgrail/4199595 to your computer and use it in GitHub Desktop.
$ ruby auditbot.rb http://site.com/whatever
require 'rubygems'
require 'anemone'
require 'redis'
require 'trollop'
require 'highline'
$term = HighLine.new
opts = Trollop::options do
opt :redis, "Select redis store", :default => 2
opt :flush, "Start with fresh data", :default => nil
end
Trollop::die "I need a full path from which to start my crawl - like http://site.com" if ARGV.empty?
$redis = Redis.new
$redis.select opts[:redis].to_i
site = ARGV[0]
audit_key = 'auditbot:pagez'
$redis.del audit_key if opts[:flush]
Anemone.crawl(site) do |anemone|
anemone.on_every_page do |page|
anemone.skip_links_like(/sections_to_avoid_go_here/)
puts $term.color(page.url.to_s, :green);
$redis.sadd audit_key, page.url
end
end
count = $redis.scard audit_key
puts $term.color("\n#{count} pages", :magenta);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment