Skip to content

Instantly share code, notes, and snippets.

@hisnameisjimmy
Last active February 28, 2018 18:03
Show Gist options
  • Save hisnameisjimmy/60e34c2c061e075ab9f7f896ebeb2c2f to your computer and use it in GitHub Desktop.
Save hisnameisjimmy/60e34c2c061e075ab9f7f896ebeb2c2f to your computer and use it in GitHub Desktop.
Extract all URLs from a sitemap to plan text
#!/bin/sh
# Replace the website for your own, and you can extract all sitemap files from a parent sitemap file.
#
# Into their own individual sitemap section files:
number=1;
for i in $(curl https://website.com/sitemap.xml | sitemap-urls);
do curl -s -N $i | sitemap-urls > ~/Desktop/sitemap-output/map-$number.txt;
number=$(($number+1));
done
# Into one massive sitemap file
for i in $(curl https://website.com/sitemap.xml | sitemap-urls);
do curl -s -N $i | sitemap-urls >> ~/Desktop/sitemap-output/merged.txt;
done
# Merge the output files
for i in $(ls ~/Desktop/sitemap-output/); do cat $i >> merged.txt; done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment