Skip to content

Instantly share code, notes, and snippets.

@kputnam
Created June 30, 2016 21:02
Show Gist options
  • Save kputnam/5828df1fd32f8e86ca23bf76963b5d52 to your computer and use it in GitHub Desktop.
Save kputnam/5828df1fd32f8e86ca23bf76963b5d52 to your computer and use it in GitHub Desktop.
#!/bin/sh
# either pass a single URL or --input-file=one-url-per-line.txt
wget \
--tries=5 \
--server-response \
--save-headers \
--default-page=index.html \
--adjust-extension \
--timeout=10 \
--wait=2 \
--ignore-length \
--header 'Accept-Language: Accept-Language:en-US,en;q=0.8' \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36 OPR/36.0.2130.65' \
--max-redirect=4 \
--no-check-certificate \
--warc-file=archive \
--warc-cdx \
--warc-max-size=1G \
--recursive \
--level=50 \
--page-requisites \
--directory-prefix=files \
--reject=wmv,pdf,mp4,mp3,mov,avi,mpg,xlsx,xls,docx,doc,pptx,ppt,swf,flv,fla,exe,msi,wav,zip,rss,atom,xml \
--reject=WMV,PDF,MP4,MP3,MOV,AVI,MPG,XLSX,XLS,DOCX,DOC,PPTX,PPT,SWF,FLV,FLA,EXE,MSI,WAV,ZIP,rss,atom,xml \
--regex-type=pcre \
--reject-regex='/(node|video|download|store)|\.(wmv|pdf|mp4|mp3|mov|avi|mpg|xlsx|xls|docx|doc|pptx|ppt|js|swf|flv|fla|exe|msi|wav|zip)' \
--ignore-tags=object,video,embed,audio,iframe \
--protocol-directories \
--inet4-only \
--retry-connrefused \
--rejected-log=reject.log \
"$@"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment