Skip to content

Instantly share code, notes, and snippets.

@jess
Created December 28, 2011 15:29
Show Gist options
  • Save jess/1528347 to your computer and use it in GitHub Desktop.
Save jess/1528347 to your computer and use it in GitHub Desktop.
# crawl a site and build a sitemap for montlick.com
require 'rubygems'
require 'anemone'
require 'builder'
sitemap = ""
xml = Builder::XmlMarkup.new(:target => sitemap, :indent=>2)
sitemap_txt = ""
xml.instruct!
xml.urlset(:xmlns=>'http://www.sitemaps.org/schemas/sitemap/0.9') {
Anemone.crawl("http://www.montlick.com/", :discard_page_bodies => true) do |anemone|
anemone.skip_links_like /\/videos|\/flash\/|\/montlick-blog|\/videoPlayer|.jpg$/i
anemone.on_every_page do |page|
sitemap_txt += "
#{page.url}"
xml.url {
xml.loc(page.url)
xml.lastmod(Time.now.utc.strftime("%Y-%m-%dT%H:%M:%S+00:00"))
xml.changefreq('weekly')
}
end
end
}
File.open('sitemap.xml', 'w') do |f|
f.write sitemap
end
File.open('sitemap.txt', 'w') do |f|
f.write sitemap_txt
end
puts "http://www.pingsitemap.com/?action=submit&url=http%3A%2F%2Fwww.montlick.com%2Fsitemap.xml"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment