Skip to content

Instantly share code, notes, and snippets.

@robmiller
Created September 22, 2017 11:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robmiller/93a81a50f4c4750a272388765b2351c0 to your computer and use it in GitHub Desktop.
Save robmiller/93a81a50f4c4750a272388765b2351c0 to your computer and use it in GitHub Desktop.
Fetch a sitemaps.org-compatible sitemap file and output all of its URLs on the command-line
#!/usr/bin/env ruby
#
# sitemap
#
# Author: Rob Miller <rob@bigfish.co.uk>
#
# Fetches and parses a sitemaps.org-compatible sitemap file, outputting
# all of the URLs stored in it; useful to start a spidering process, or
# to generate a list of URLs to later redirect.
#
# It works with nested sitemap files as well as ordinary ones.
#
# Usage:
#
# $ sitemap https://example.com/sitemap.xml
#
# Requirements:
#
# - Ruby >1.9.2
# - The sitemap-parser gem:
#
# $ gem install sitemap-parser
gem "sitemap-parser", "~> 0.4.0"
require "sitemap-parser"
url = ARGV[0]
unless url
abort "Usage: sitemap URL"
end
begin
sitemap = SitemapParser.new(url, recurse: true)
sitemap.to_a.each do |url|
puts url
end
rescue Exception => e
abort "Something terrible went wrong: #{e.message}"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment