Skip to content

Instantly share code, notes, and snippets.

@abargnesi
Last active June 21, 2016 01:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abargnesi/63e4cdae866173924c08ae988f4c56bb to your computer and use it in GitHub Desktop.
Save abargnesi/63e4cdae866173924c08ae988f4c56bb to your computer and use it in GitHub Desktop.
bash function that crawls for non-web URL paths
function crawl_for_urls() {
if [ $# != 1 ]; then
echo "usage: crawl_for_urls URL" >2
exit 1
fi
wget --spider --force-html -r "$1" 2>&1 | \
grep '^--' | \
awk '{ print \$3 }' | \
grep -v '\.\(css\|js\|png\|gif\|jpg\)$'
}
crawl_for_urls "http://www.infoq.com"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment