Skip to content

Instantly share code, notes, and snippets.

@glickmac
Created December 17, 2019 20:05
Show Gist options
  • Save glickmac/844331b95a8e48da54c1d130ed2c40cc to your computer and use it in GitHub Desktop.
Save glickmac/844331b95a8e48da54c1d130ed2c40cc to your computer and use it in GitHub Desktop.
url = 'http://www.gutenberg.org/files/501/501-0.txt'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
text = str(text)
text = text.replace("\n", " ").replace("\r", " ").replace("\\r", " ").replace("\\n", " ").replace("_", "").lower()
text = text.split("the first chapter")[1].split("illustration: the end")[0]
with open("../data/Doctor_Dolittle.txt", "w") as f:
f.write(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment