Skip to content

Instantly share code, notes, and snippets.

@hannahwhy
Created January 14, 2012 09:47
Show Gist options
  • Save hannahwhy/1610858 to your computer and use it in GitHub Desktop.
Save hannahwhy/1610858 to your computer and use it in GitHub Desktop.
Lamar Smith
#!/usr/bin/env bash
set -x
DEST=data
NOW=`date '+%s'`
WGET_WARC=./wget-warc
USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.17) Gecko/20110123 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070225 lolifox/0.32"
BASE="$DEST/texansforlamarsmith-${NOW}"
mkdir -p tmp
mkdir -p $DEST
$WGET_WARC \
-U "${USER_AGENT}" \
-e "robots=off" \
-nv \
-r \
-l inf \
-o "$BASE.log" \
--directory-prefix="tmp" \
--warc-file="$BASE" \
--no-remove-listing \
--warc-header="operator: Archive Team" \
--no-timestamping \
--page-requisites --trust-server-names \
"http://www.texansforlamarsmith.com/"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment