Skip to content

Instantly share code, notes, and snippets.

@robmiller
Last active September 26, 2017 15:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robmiller/95ae313ba1a0e052d89f65f84329796b to your computer and use it in GitHub Desktop.
Save robmiller/95ae313ba1a0e052d89f65f84329796b to your computer and use it in GitHub Desktop.
Find all unique URLs on a website. Sort them. Output them in nginx rewrite format. Useful when replacing an old website to make sure there are as few 404s as possible
# Change example.com to the correct host
wget -r -nd --delete-after -w 0.5 'http://www.example.com/' 2>&1 |
grep -B3 text/html |
grep -B2 '200 OK' | egrep 'https?://' |
cut -d' ' -f3- |
sort | uniq |
ruby -ane 'puts "rewrite ^#{$_.strip.sub(%r{^https?://[^/]+/}, "/")}$ / permanent;"'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment