Skip to content

Instantly share code, notes, and snippets.

@mindcont
Forked from azhawkes/spider.sh
Created May 9, 2018 08:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mindcont/de37f67212874b6756845745c37c4b3e to your computer and use it in GitHub Desktop.
Save mindcont/de37f67212874b6756845745c37c4b3e to your computer and use it in GitHub Desktop.
Really simple wget spider to obtain a list of URLs on a website, by crawling n levels deep from a starting page.
#!/bin/bash
HOME="http://www.yourdomain.com/some/page"
DOMAINS="yourdomain.com"
DEPTH=2
OUTPUT="./urls.csv"
wget -r --spider --delete-after --force-html -D "$DOMAINS" -l $DEPTH "$HOME" 2>&1 \
| grep '^--' | awk '{ print $3 }' | grep -v '\. \(css\|js\|png\|gif\|jpg\)$' | sort | uniq > $OUTPUT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment