Skip to content

Instantly share code, notes, and snippets.

@MarcScott
Created March 3, 2017 14:13
Show Gist options
  • Save MarcScott/01529a3c20a9257c71542e89bf2d4c77 to your computer and use it in GitHub Desktop.
Save MarcScott/01529a3c20a9257c71542e89bf2d4c77 to your computer and use it in GitHub Desktop.
import requests
import string
import re
from collections import Counter
cnt = Counter()
pp = requests.get('http://www.gutenberg.org/files/1342/1342-0.txt').text
ss = requests.get('http://www.gutenberg.org/cache/epub/161/pg161.txt').text
def only_letters(book):
book = book.lower()
book = re.sub(r'[^\x00-\x7F]+',' ', book)
book = re.sub('['+string.punctuation+']', ' ', book)
book = book.rsplit()
return book
book_dict = Counter(only_letters(pp))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment