Skip to content

Instantly share code, notes, and snippets.

@noirscape
Last active September 3, 2016 17:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save noirscape/65c112e834be74771f3647928bd57a76 to your computer and use it in GitHub Desktop.
Save noirscape/65c112e834be74771f3647928bd57a76 to your computer and use it in GitHub Desktop.
Restructure a httrack website
#!/bin/bash
# Restructurer for httrack mirror.
# Also works on wget mirrors that you want to be organized.
read -rp "Enter path to current mirror: " existingmirror
read -rp "Enter path to output directory: " outdir
read -rp 'Add any flags you want to the mirror (usually you will want one of the -N flags). By default, the mirror will add the "--disable-security-limits" and "-A1000000000000" flags to speed up the download, so you cant enter those (script will fail if you do!): ' flags
cd $existingmirror
# If index.html does not exist, then we generate one using tree
if [[ -f index.html ]]; then
echo "No index.html found. restructure.sh will generate one using tree."
tree -H . > index.html
fi
# Changed port here, because 8000 is fairly common.
python -m SimpleHTTPServer 9482 &
server_PID=$!
# And now we initiate the mirroring operation.
httrack --disable-security-limits -A1000000000000 $flags -O $outdir localhost:9482
if [[ $? != 0 ]]; then
echo "Mirroring failed with exit status $? . Exiting with exit code 1."
kill -9 $server_PID
exit 1
fi
echo "Mirror was succesful. Killing webserver..."
kill -9 $server_PID
echo "done."
echo "Exiting with exit code 0."
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment