Use wget to scrape all URLs from a sitemap.xml Usage: scrape-sitemap.sh http://domain.com/sitemap.xml
#!/bin/sh | |
SITEMAP=$1 | |
if [ "$SITEMAP" = "" ]; then | |
echo "Usage: $0 http://domain.com/sitemap.xml" | |
exit 1 | |
fi | |
XML=`wget -O - --quiet $SITEMAP` | |
URLS=`echo $XML | egrep -o "<loc>[^<>]*</loc>" | sed -e 's:</*loc>::g'` | |
echo $URLS | tr ' ' '\n' | wget -O /dev/null -i - --wait=1 --random-wait -nv |
This comment has been minimized.
This comment has been minimized.
Thanks for the script! |
This comment has been minimized.
This comment has been minimized.
Wow, just what I was looking for, thanks! |
This comment has been minimized.
This comment has been minimized.
Is there a vise versa? Can I build a sitemap of a website? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This comment has been minimized.
Thanks a lot for this bash script! Saved me a lot of head banging💯