Skip to content

Instantly share code, notes, and snippets.

@floehopper
floehopper / parse_urls_from_sitemap.rb
Created March 4, 2017 12:45
Parse URLs from sitemap.xml
require 'nokogiri'
response = get(sitemap_path)
doc = Nokogiri::XML(response.body)
namespace = 'http://www.sitemaps.org/schemas/sitemap/0.9'
locs = doc.xpath('//urlset:url//urlset:loc', 'urlset' => namespace)
urls = locs.map(&:text).map { |u| URI(u).path }.uniq