Skip to content

Instantly share code, notes, and snippets.

@hubgit
Last active March 2, 2017 10:33
Show Gist options
  • Save hubgit/6535462 to your computer and use it in GitHub Desktop.
Save hubgit/6535462 to your computer and use it in GitHub Desktop.
Fetch all HTML books from Project Gutenberg
#!/bin/bash
URL='http://www.gutenberg.org/robot/harvest?filetypes[]=html&langs[]=en'
REFERER='http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages'
USER_AGENT='Gutenberg Importer'
wget --wait 10 --mirror --span-hosts --referer="$REFERER" --user-agent="$USER_AGENT" "$URL"
@srikar2097
Copy link

this does not work anymore.

Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2017-03-02 10:33:10 ERROR 403: Forbidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment