Skip to content

Instantly share code, notes, and snippets.

@mikaelvesavuori
Created February 22, 2018 20:13
Show Gist options
  • Save mikaelvesavuori/57fcfc719c784ba9a94ccf9abf376673 to your computer and use it in GitHub Desktop.
Save mikaelvesavuori/57fcfc719c784ba9a94ccf9abf376673 to your computer and use it in GitHub Desktop.
Using wget to scrape a site

Using wget to scrape a site

You can easily scrape (or download) a site with a CLI tool called wget. It's available for Linux, Mac and Windows.

Installation

I recommend using Homebrew, especially if you're on a Mac, to install it.

brew install wget

Scraping a site

wget -https://{SITENAME}.{TLD}, example wget https://www.google.com

To scrape the main entry (mirrored) with 1 level recursion depth, with everything you might need to run the site locally:

wget -m https://{SITENAME}.{TLD} -r -l1 --no-parent --page-requisites --convert-links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment