Skip to content

Instantly share code, notes, and snippets.

@PatelUtkarsh
Created January 11, 2023 10:34
Show Gist options
  • Save PatelUtkarsh/8a753dce1ab4a17edff6b7ff398bf2a0 to your computer and use it in GitHub Desktop.
Save PatelUtkarsh/8a753dce1ab4a17edff6b7ff398bf2a0 to your computer and use it in GitHub Desktop.
Migrate domain to new domain? A script to verify domains are redirecting 1:1 in new site based on old sitemap.
function verify_sitemap_urls() {
if [ -z "$1" ]; then
echo "Usage: check_sitemap [file_path/url] (new_domain)"
echo "If no new domain passed it will use the url from sitemap; if domain is passed then it will replace the domain in sitemap url with same path."
return
fi
if [[ "$1" == *"http"* ]]; then
TEMPFILE=$(mktemp)
# Download the sitemap file from the given URL
curl -s -o $TEMPFILE "$1"
else
# Relative path to absolute path in $TEMPFILE from where the function is called.
TEMPFILE=$(cd $(dirname $1) && pwd)/$(basename $1)
# check file exists.
if [ ! -f "$TEMPFILE" ]; then
echo "File not found: $TEMPFILE"
return
fi
fi
# Parse the sitemap file and extract all URLS_IN_SITEMAP
URLS_IN_SITEMAP=($(ggrep -oP '(?<=<loc>).*(?=</loc>)' $TEMPFILE))
# if url is passed remove the temp file.
if [[ "$1" == *"http"* ]]; then
rm $TEMPFILE
fi
# Send a request to each URL and check the HTTP status code
for URL_ENTRY in "${URLS_IN_SITEMAP[@]}"
do
# If $2 is empty, use the same domain as sitemap.
if [ -z "$2" ]; then
NEW_DOMAIN_URL=$URL_ENTRY
else
URL_PATH=$(echo $URL_ENTRY | sed -e 's|^.*://[^/]*||;s|\?.*$||')
NEW_DOMAIN_URL="$2$URL_PATH"
fi
http_status=$(curl -s -o /dev/null -w "%{http_code}" -L "$NEW_DOMAIN_URL")
if [ $http_status -eq 200 ]
then
echo -n "."
#echo "OK: $NEW_DOMAIN_URL"
else
echo -e "\nError: $NEW_DOMAIN_URL returned $http_status"
fi
done
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment