Skip to content

Instantly share code, notes, and snippets.

View Asparagirl's full-sized avatar

Brooke Schreier Ganz Asparagirl

  • Mill Valley, California
View GitHub Profile
@Asparagirl
Asparagirl / gist:6202872
Last active March 28, 2022 20:28
Want to help Archive Team do a "panic grab" of a website, so that you can later upload it to the Internet Archive for inclusion in its WayBack Machine? Here's the code!

Want to grab a copy of your favorite website, using wget in the command line, and saving it in WARC format? Then this is the gist for you. Read on!

First, copy the following lines into a textfile, and edit them as needed. Then paste them into your command line and hit enter:

export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
export DOMAIN_NAME_TO_SAVE="www.example.com"
export SPECIFIC_HOSTNAMES_TO_INCLUDE="example1.com,example2.com,images.example2.com"
export FILES_AND_PATHS_TO_EXCLUDE="/path/to/ignore"
export WARC_NAME="example.com-20130810-panicgrab"