Skip to content

Instantly share code, notes, and snippets.

@dphiffer
Created October 13, 2016 00:24
Show Gist options
  • Save dphiffer/b6e0baf597a925f28b10ac53c5b42a8b to your computer and use it in GitHub Desktop.
Save dphiffer/b6e0baf597a925f28b10ac53c5b42a8b to your computer and use it in GitHub Desktop.
A script to collect (conserve? steal?) Constant Dullaart's "war" web pages
#!/bin/sh
# A script to collect (conserve? steal?) Constant Dullaart's "war" web pages
# See: https://twitter.com/constantdull/status/785797564167839744
# Usage: ./war_collector.sh
start_from="war.repair"
if [ ! -d src ] ; then
mkdir src
fi
function collect_the_art() {
echo "collecting $1"
if [ ! -f "src/$1.html" ] ; then
curl -s -o "src/$1.html" "http://$1/"
fi
img_num=`grep "<img" "src/$1.html" | sed 's/^.*<img src="//' | sed 's/.png" width="100%" height="100%"><\/img>//'`
if [ ! -f src/$img_num.png ] ; then
curl -s -o "src/$img_num.png" "http://$1/$img_num.png"
fi
next=`grep "URL=" src/$1.html | sed 's/^.*<meta http-equiv="refresh" content="0;URL=http:\/\///' | sed 's/\/" \/>//'`
cat "src/$1.html" | sed "s/http:\/\/$next\//$next.html/" | sed "s/$img_num.png/src\/$img_num.png/" > "$1.html"
if [ $next != $start_from ] ; then
collect_the_art $next
fi
}
collect_the_art $start_from
@dphiffer
Copy link
Author

Worth pointing out here that sed is not the best tool for this job. A better option is pup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment