Created
May 9, 2018 12:29
-
-
Save robmiller/16626d52c9333c00a58211e1b2dcb21a to your computer and use it in GitHub Desktop.
Outputs all of the URLs of pages on a given site
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# | |
# spider | |
# | |
# Author: Rob Miller <rob@bigfish.co.uk> | |
# | |
# Outputs all of the HTML pages on a given domain. | |
wget -r -nd --delete-after -w 1 "$1" 2>&1 | | |
grep -B3 text/html | | |
grep -B2 '200 OK' | egrep 'https?://' | | |
cut -d' ' -f3- | | |
sort | uniq |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment