# One liner | |
wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows --domains yoursite.com --no-parent yoursite.com | |
# Explained | |
wget \ | |
--recursive \ # Download the whole site. | |
--page-requisites \ # Get all assets/elements (CSS/JS/images). | |
--adjust-extension \ # Save files with .html on the end. | |
--span-hosts \ # Include necessary assets from offsite as well. | |
--convert-links \ # Update links to still work in the static version. | |
--restrict-file-names=windows \ # Modify filenames to work in Windows as well. | |
--domains yoursite.com \ # Do not follow links outside this domain. | |
--no-parent \ # Don't follow links outside the directory you pass in. | |
yoursite.com/whatever/path # The URL to download |
This comment has been minimized.
This comment has been minimized.
hello good afternoon...please still don't know how to use it...to download the entire website |
This comment has been minimized.
This comment has been minimized.
This is just using wget, just look up how to use wget. There are tons of examples online. Either way you need to make sure you have wget installed already: Centos/RHEL: Here are some usage examples to download an entire site: without converting: One more example to download an entire site with wget: Explanation of the various flags: --mirror – Makes (among other things) the download recursive. Alternatively, the command above may be shortened: If you still insist on running this script, it is a BASH script so first set it as executable: and then this to run the script: if you still can't run the script edit it by adding this as the first line: Also you need to specify the site in the script that you want to download. At this point you are really better off just using wget outright. |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
2: ‘--mirror’ |
This comment has been minimized.
This comment has been minimized.
Thanks for the tips. After I download the website, every time I open the file, it links back to its original website. Any idea how to solve this? Thanks! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Maybe you need |
This comment has been minimized.
This comment has been minimized.
H!, if I am wrong you can virtually shoot me, but the no-parent command is maybe hit by a typo because when I tried with ----no-parent it did not recognize the command but when I did some surgery I endid up with --no-parent and it worked so if I am right cool if I am wrong I am sorry: YS: polly4you |
This comment has been minimized.
This comment has been minimized.
What if the website requires authorization of some sort? How do we specify some cookies to wget? |
This comment has been minimized.
This comment has been minimized.
as quoted from docs:
|
This comment has been minimized.
This comment has been minimized.
Does anyone know how I'd go about downloading all the get requests a site does? My wget mirror only gets the links from the site html into my folders. Not the requests the site does (probably within the JS ...? ) should I get some kinda proxy thing and wget the links extracted from that? |
This comment has been minimized.
sudo apt-get update