Skip to content

Instantly share code, notes, and snippets.

@tomasonjo
Created August 30, 2019 17:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tomasonjo/79fc97f25639a0afcb0ee742a1ca2b1f to your computer and use it in GitHub Desktop.
Save tomasonjo/79fc97f25639a0afcb0ee742a1ca2b1f to your computer and use it in GitHub Desktop.
# https://www.gutenberg.org/ebooks/95 Prisoner of Zelda
# Fetch the data
target_url = 'https://www.gutenberg.org/files/95/95-0.txt'
import urllib.request
data = urllib.request.urlopen(target_url)
raw_data = data.read().decode('utf8').strip()
# Preprocess text into chapters
import re
chapters = re.sub('[^A-z0-9 -]', ' ', raw_data).split('CHAPTER')[1:]
chapters[-1] = chapters[-1].split('End of the Project Gutenberg EBook')[0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment