Skip to content

Instantly share code, notes, and snippets.

@lucapette
Created December 3, 2011 16:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lucapette/1427454 to your computer and use it in GitHub Desktop.
Save lucapette/1427454 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
require 'rubygems'
require 'anemone'
require 'optparse'
require 'ostruct'
Anemone.crawl(ARGV[0], {:discard_page_bodies => true}) do |anemone|
anemone.after_crawl do |pages|
# print a list of 404's
not_found = []
pages.each_value do |page|
url = page.url.to_s
not_found << url if page.not_found?
end
unless not_found.empty?
puts "\n404's:"
missing_links = pages.urls_linking_to(not_found)
missing_links.each do |url, links|
puts URI(url).path.to_s
links.each do |u|
u = u.path if options.relative
puts " linked from #{u}"
end
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment