Skip to content

Instantly share code, notes, and snippets.

@pe3
Last active March 9, 2024 17:42
Show Gist options
  • Star 19 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save pe3/5978540 to your computer and use it in GitHub Desktop.
Save pe3/5978540 to your computer and use it in GitHub Desktop.
Scrape An Entire Website with wget
this worked very nice for a single page site
```
wget \
--recursive \
--page-requisites \
--convert-links \
[website]
```
wget options
```
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent \
www.website.com
--recursive: download the entire Web site.
--domains website.org: don't follow links outside website.org.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.
--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and
resumed).
```
there is also [node-wget](https://github.com/wuchengwei/node-wget)
@kai-r
Copy link

kai-r commented Oct 11, 2023

Hi;
What would be the right syntax for using this script?

@pe3
Copy link
Author

pe3 commented Oct 11, 2023

This looks like just initial notes on wget options. Not a script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment