Skip to content

Instantly share code, notes, and snippets.

@pete-otaqui
Created October 21, 2016 07:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save pete-otaqui/969cc810af662e7e8b5a40482817ac91 to your computer and use it in GitHub Desktop.
Save pete-otaqui/969cc810af662e7e8b5a40482817ac91 to your computer and use it in GitHub Desktop.
download a website for offline browsing with wget
#!/bin/bash
wget -E -k -r -p -e robots=off https://some-site.com/docs/
#### Note the following arguments:
# -E : converts downloaded HTML filenames to have a ".html" suffix
# -k : converts internal links within downloaded files to point to other downloaded files
# -r : recursively download by scanning for internal links in pages
# -p : download "page requisites", i.e. images, styles, scripts
# -e robots=off : ignore robots.txt (because some sites use it to avoid indexing)
#### Other useful arguments
# --no-parent : don't ascend in the path hierarchy (useful for just getting a "/docs/" section)
# -A "/index.html,*.svg,*/docs/*" : comma-separated "accept list", can use patterns
# -R "*.eot,*.woff,/archive" : comma-separated "reject list", can use patterns
# -L : spans host names, careful you don't try to download the entire web
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment