Skip to content

Instantly share code, notes, and snippets.

@floehopper
Created March 4, 2017 12:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save floehopper/4b1a4514e21d09b70cece55e18ccd68e to your computer and use it in GitHub Desktop.
Save floehopper/4b1a4514e21d09b70cece55e18ccd68e to your computer and use it in GitHub Desktop.
Parse URLs from sitemap.xml
require 'nokogiri'
response = get(sitemap_path)
doc = Nokogiri::XML(response.body)
namespace = 'http://www.sitemaps.org/schemas/sitemap/0.9'
locs = doc.xpath('//urlset:url//urlset:loc', 'urlset' => namespace)
urls = locs.map(&:text).map { |u| URI(u).path }.uniq
@bobf
Copy link

bobf commented Jun 15, 2023

This saved me some time 6 years later, thanks.

@floehopper
Copy link
Author

This saved me some time 6 years later, thanks.

🙌 ❤️

@ingemar
Copy link

ingemar commented Oct 6, 2023

This saved me some time 7 years later, thanks.

@floehopper
Copy link
Author

This saved me some time 7 years later, thanks.

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment