Skip to content

Instantly share code, notes, and snippets.

@robmiller
Created May 9, 2018 12:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robmiller/16626d52c9333c00a58211e1b2dcb21a to your computer and use it in GitHub Desktop.
Save robmiller/16626d52c9333c00a58211e1b2dcb21a to your computer and use it in GitHub Desktop.
Outputs all of the URLs of pages on a given site
#!/bin/bash
#
# spider
#
# Author: Rob Miller <rob@bigfish.co.uk>
#
# Outputs all of the HTML pages on a given domain.
wget -r -nd --delete-after -w 1 "$1" 2>&1 |
grep -B3 text/html |
grep -B2 '200 OK' | egrep 'https?://' |
cut -d' ' -f3- |
sort | uniq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment