Skip to content

Instantly share code, notes, and snippets.

@suzannealdrich
Last active December 11, 2023 15:12
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save suzannealdrich/1da087c05ccd5ce3ad5d to your computer and use it in GitHub Desktop.
Save suzannealdrich/1da087c05ccd5ce3ad5d to your computer and use it in GitHub Desktop.
wget spider cache warmer
wget --spider -o wget.log -e robots=off -r -l 5 -p -S --header="X-Bypass-Cache: 1" --limit-rate=124k www.example.com
# Options explained
# --spider: Crawl the site
# -o wget.log: Keep the log
# -e robots=off: Ignore robots.txt
# -r: specify recursive download
# -l 5: Depth to search. I.e 1 means 'crawl the homepages'.  2 means 'crawl the homepage and all pages it links to'...
# -p: get all images, etc. needed to display HTML page
# -S: print server response
# --limit-rate=124k: Make sure we're crawling and not DOS'ing the site.
# www.example.com: URL to start crawling
@wuxmedia
Copy link

wuxmedia commented Feb 11, 2019

Perfect. thanks!
I added --limit-rate=124k just so the server wouldn't get too hot

@suzannealdrich
Copy link
Author

Thanks @wuxmedia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment