Skip to content

Instantly share code, notes, and snippets.

@azizur
Last active March 26, 2024 18:32
Show Gist options
  • Save azizur/ffe8ee6a0a2bb418e5cc8ff101fad91a to your computer and use it in GitHub Desktop.
Save azizur/ffe8ee6a0a2bb418e5cc8ff101fad91a to your computer and use it in GitHub Desktop.
Creating a static copy of a dynamic website

The command line, in short…

wget -k -K -E -r -l 10 -p -N -F --restrict-file-names=windows -nH http://website.com/

…and the options explained

  • -k : convert links to relative
  • -K : keep an original versions of files without the conversions made by wget
  • -E : rename html files to .html (if they don’t already have an htm(l) extension)
  • -r : recursive… of course we want to make a recursive copy
  • -l 10 : the maximum level of recursion. if you have a really big website you may need to put a higher number, but 10 levels should be enough.
  • -p : download all necessary files for each page (css, js, images)
  • -N : Turn on time-stamping.
  • -F : When input is read from a file, force it to be treated as an HTML file.
  • -nH : By default, wget put files in a directory named after the site’s hostname. This will disabled creating of those hostname directories and put everything in the current directory.
  • –restrict-file-names=windows : may be useful if you want to copy the files to a Windows PC.

source: http://blog.jphoude.qc.ca/2007/10/16/creating-static-copy-of-a-dynamic-website/

@azizur
Copy link
Author

azizur commented Dec 24, 2022

The website you are trying to copy does not use a AuthType Basic for authentication. Hence it did not work.

As per wget man

--user=USER                 set both ftp and http user to USER
--password=PASS             set both ftp and http password to PASS
--ask-password              prompt for passwords
--use-askpass=COMMAND       specify credential handler for requesting
                             username and password.  If no COMMAND is
                             specified the WGET_ASKPASS or the SSH_ASKPASS
                             environment variable is used.

These parameters are for sites that uses AuthType Basic.

If the site uses cookies perhaps you can use the cookies options.

--load-cookies=FILE         load cookies from FILE before session
--save-cookies=FILE         save cookies to FILE after session
--keep-session-cookies      load and save session (non-permanent) cookies

If the site uses JWT you can use the header option.

--header=STRING             insert STRING among the headers

@lamff
Copy link

lamff commented Mar 1, 2024

great!

  • --no-check-certificate : may be useful if cert expired

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment