Skip to content

Instantly share code, notes, and snippets.

@Brainiarc7
Last active December 12, 2023 11:17
Show Gist options
  • Save Brainiarc7/03d6519a62a5111a99b8566d94773d39 to your computer and use it in GitHub Desktop.
Save Brainiarc7/03d6519a62a5111a99b8566d94773d39 to your computer and use it in GitHub Desktop.
wget usage manual from the notes

wget: Shortcuts to excellent downloads at your fingertips

Install wget first on your Linux distribution, then proceed to usage.

Download a Single File:

Let’s start with something simple. Copy the URL for a file you’d like to download in your browser.

Now head back to the Terminal and type wget followed by the pasted URL. The file will download, and you’ll see progress in real-time as it does.

Note that the file will download to your Terminal’s current folder, so you may want to switch to a different download location from the same session (as shown below) or change to the target directory (via cd).

To download it elsewhere, you can either use the prefix option to fetch files to given directory or the output option to fetch a single file to a given path, as illustrated:

-O is the option to specify the path of the file you want to download to:

wget <file.ext> -O /full/path/to/folder/file.ext

-P is prefix where it will download the file in the directory:

wget <file.ext> -P /full/path/to/directory

Note that wget can also take a file list in a text file and download the files listed therein with the options passed to it:

wget -i list.txt 

Notes:

If you want to use wget to fetch files and return the proper file name upon redirection, read the notes below:

By default, wget writes to a file whose name is the last component of the URL that you pass to it. Many servers redirect URLs like http://www.url1.com/app?q=123&gibb=erish&gar=ble to a different URL with a nice-looking file name like http://download.url1.com/files/something.pdf. You can tell wget to use the name from the redirected URL (i.e. something.pdf) instead of app?q=123&gibb=erish&gar=ble by passing the --trust-server-names option. This isn't the default mode because, if used carelessly, it could lead to overwriting an unpredictable file name in the current directory; but if you trust the server or are working in a directory containing no other precious files, --trust-server-names is usually the right thing to use.

Some servers use a Content-Disposition header instead of redirection to specify a file name. Pass the --content-disposition option to make wget use this file name.

Thus:

wget --content-disposition --trust-server-names -i list_of_urls

If you still aren't getting nice-looking file names, you may want to specify your own. Suppose you have a file containing lines like:

http://www.url1.com/app?q=123&gibb=erish&gar=ble foo.pdf
http://www.url2.com/app?q=111&wha=tcha&mac=allit bar.txt

To make wget download the files to the specified file names, assuming there are no white-space characters in the URL or in the file names:

err=0
while read -r url filename tail; do
  wget -O "$filename" "$url" || err=1
done <list_of_urls_and_file_names

The err variable contains 0 if all downloads succeeded and 1 otherwise, you can return $err if you put this snippet in a function or exit $err if you put this snippet in a string.

If you don't want to specify anything other than the URLs, and you can't get nice names from the server, you can guess the file type and attempt to get at least meaningful extensions.

err=0
n=1
while read -r url; do
  if wget -O tmpfile "$url"; then
    ext=data
    case $(file -i tmpfile) in
      application/pdf) ext=pdf;;
      image/jpeg) ext=jpg;;
      text/html) ext=html;;
      text/*) ext=txt;;
    esac
    mv tmpfile "$n.$ext"
  else
    err=1
  fi
  n=$((n+1))
done

Add other types as desired. If your file command doesn't have the -m option, leave it out, and check what file returns on your system for the file types you're interested in. If you have a file /etc/mime.types on your system, you can read associations of MIME types to extensions from it instead of supplying your own list:

n=1
while read -r url; do
  if wget -O tmpfile "$url"; then
    mime_type=$(file -m tmpfile)
    ext=$(awk "$1 == \"$mime_type\" {print \$2; exit} END {print \"data\"}" /etc/mime.types)
    mv tmpfile "$n.$ext"
  else
    err=1
  fi
  n=$((n+1))
done

Continue an Incomplete Download:

If, for whatever reason, you stopped a download before it could finish, don’t worry: wget can pick up right where it left off. Just use this command:

wget -c file

The key here is -c, which is an “option” in command line parlance. This particular option tells wget that you’d like to continue an existing download.

Mirror an Entire Website:

If you want to download an entire website, wget can do the job.

wget -m http://example.com

By default, this will download everything on the site example.com, but you’re probably going to want to use a few more options for a usable mirror.

--convert-links changes links inside each downloaded page so that they point to each other, not --page-requisites downloads things like style sheets, so pages will look correct offline. --no-parent stops wget from downloading parent sites. So if you want to download http://example.com/subexample, you won’t end up with the parent page. Combine these options to taste, and you’ll end up with a copy of any website that you can browse on your computer.

Note: Mirroring an entire website on the modern Internet is going to take up a massive amount of space, so limit this to small sites unless you have near-unlimited storage.

Download an Entire Directory:

If you’re browsing an FTP server and find an entire folder you’d like to download, just run:

wget -r http://example.com/folder

The -r in this case tells wget you want a recursive download. You can also include --noparent if you want to avoid downloading folders and files above the current level.

Extras:

To learn more about what wget can do, type man wget in the terminal and read what comes up. You’ll learn a lot.

Having said that, here are a few other options I think are neat:

If you want your download to run in the background, just include the option -b. If you want wget to keep trying to download even if there is a 404 error, use the option -t 10. That will try to download 10 times; you can use whatever number you like. I you want to manage your bandwidth, the option --limit-rate=200k will cap your download speed at 200KB/s. Change the number to change the rate.

Mask User Agent and Display wget like Browser Using wget –user-agent:

Some websites can disallow you to download its page by identifying that the user agent is not a browser. So you can mask the user agent by using –user-agent options and show wget like a browser as shown below.

$ wget --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" URL-TO-DOWNLOAD

Test Download URL Using wget –spider:

When you are going to do a scheduled download, you should check whether the download will happen fine or not at the scheduled time. To do so, copy the line exactly from the schedule, and then add –spider option to check.

$ wget --spider DOWNLOAD-URL

If the URL given is correct, it will say

$ wget --spider download-url
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

This ensures that the downloading will get success at the scheduled time. But when you had give a wrong URL, you will get the following error.

$ wget --spider download-url
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

You can use the spider option under following scenarios:

Check before scheduling a download.
Monitoring whether a website is available or not at certain intervals.
Check a list of pages from your bookmark, and find out which pages are still exists.

Reject Certain File Types while Downloading Using wget –reject

You have found a website which is useful, but don’t want to download the images you can specify the following:

$ wget --reject=gif WEBSITE-TO-BE-DOWNLOADED

Log messages to a log file instead of stderr Using wget -o

When you wanted the log to be redirected to a log file instead of the terminal, do:

$ wget -o download.log DOWNLOAD-URL

Quit Downloading When it Exceeds Certain Size Using wget -Q

When you want to stop a download when it crosses a specified size, say 5 MB, you can use the following wget command line.

$ wget -Q5m -i FILE-WHICH-HAS-URLS

Note: This quota will not apply when you download a single URL. That is irrespective of the quota size everything will get downloaded when you specify a single file. This quota is applicable only for recursive downloads.

Download Only Certain File Types Using wget -r -A

You can use this under following situations:

Download all images from a website
Download all videos from a website
Download all PDF files from a website

Run:

$ wget -r -A.pdf http://url-to-webpage-with-pdfs/

FTP Download With wget:

You can use wget to perform FTP download as shown below.

(a).Anonymous FTP download using Wget

$ wget ftp-url

(b). FTP download using wget with username and password authentication.

$ wget --ftp-user=USERNAME --ftp-password=PASSWORD DOWNLOAD-URL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment