Skip to content

Instantly share code, notes, and snippets.

@ColeMundus
Forked from azhawkes/spider.sh
Last active November 12, 2017 05:32
Show Gist options
  • Save ColeMundus/92e8ca30d9c487836a708c1e889060ca to your computer and use it in GitHub Desktop.
Save ColeMundus/92e8ca30d9c487836a708c1e889060ca to your computer and use it in GitHub Desktop.
Really simple wget spider to obtain a list of URLs on a website, by crawling n levels deep from a starting page.
#!/bin/bash
HOME="http://listen.tidal.com"
DOMAINS="listen.tidal.com"
OUTPUT="./urls.csv"
wget -r --spider --delete-after --force-html -D "$DOMAINS" "$HOME" 2>&1 \
| grep '^--' | awk '{ print $3 }' | grep -v '\. \(css\|js\|png\|gif\|jpg\)$' | grep '/album/' | sort | uniq > $OUTPUT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment