Skip to content

Instantly share code, notes, and snippets.

@mikaelz
Last active April 30, 2023 12:53
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikaelz/43d56f9af5cc1b64fe40054b22edeb25 to your computer and use it in GitHub Desktop.
Save mikaelz/43d56f9af5cc1b64fe40054b22edeb25 to your computer and use it in GitHub Desktop.
#!/bin/bash
cd /tmp
echo "Wget $1"
wget --spider --recursive --level=3 --no-verbose --output-file=sitemap.txt $1
echo "Grep URLs"
grep -i URL /tmp/sitemap.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > /tmp/sitemap-urls.txt
header='<?xml version="1.0" encoding="UTF-8"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
echo $header > /tmp/sitemap.xml
while read url; do
case "$url" in
http:* | https:*)
echo '<url><loc>'$url'</loc></url>' >> sitemap.xml
;;
*)
;;
esac
done < /tmp/sitemap-urls.txt
echo "</urlset>" >> sitemap.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment