Created January 13, 2014 18:00
Really simple wget spider to obtain a list of URLs on a website, by crawling n levels deep from a starting page.
wget -r --spider --delete-after --force-html -D "$DOMAINS" -l $DEPTH "$HOME" 2>&1 \
| grep '^--' | awk '{ print $3 }' | grep -v '\. \(css\|js\|png\|gif\|jpg\)$' | sort | uniq > $OUTPUT
obriat commented Oct 10, 2017

There is a unwanted space in the grep -v between the period and the extensions.

Ontario7 commented Dec 25, 2017

this grep expression does not fits urls with a parameters like that:

Alternatively you can use parameter for wget:

