Skip to content

Instantly share code, notes, and snippets.

@JorgenEvens
Created August 29, 2013 21:22
Show Gist options
  • Save JorgenEvens/6383584 to your computer and use it in GitHub Desktop.
Save JorgenEvens/6383584 to your computer and use it in GitHub Desktop.
Generates a sitemap by crawling all of the available links. example usages: ./make-sitemap http://my-site.com ./make-sitemap http://my-site.com daily
#!/bin/sh
##################################################
# #
# This script generates a sitemap from an #
# existing website by crawling each page #
# accessible to the outside world. #
# #
# Author: Jorgen Evens <jorgen@evens.eu> #
# License: New BSD License #
# #
##################################################
WEBSITE=$1
REFRESH="monthly"
if [ ${#WEBSITE} -eq 0 ]; then
echo "Usage: ./make-sitemap url [refresh]"
exit
fi
if [ ! -z $2 ]; then
REFRESH=$2
fi
# Write sitemap header
cat << EOF
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
EOF
# Start crawling
wget --spider -r -nd -nv --level=inf $WEBSITE 2>&1 |
while read line
do
# Filter each line, using only the line with an URL on it.
# Replace every occurence of & with &amp;
line=`echo "$line" | tr -d "\n" | sed -n "s@.\+ URL:\([^ ]\+\) .\+@\1@p" - | sed "s@&@\&amp;@"`
if [ $line ]; then
cat << EOF
<url>
<loc>$line</loc>
<changefreq>$REFRESH</changefreq>
</url>
EOF
fi
done
# Write footer
cat << EOF
</urlset>
EOF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment