Skip to content

Instantly share code, notes, and snippets.

@jamescalam
Last active April 4, 2020 17:19
Show Gist options
  • Save jamescalam/6f104337ab9c826297da6af41bf5c79d to your computer and use it in GitHub Desktop.
Save jamescalam/6f104337ab9c826297da6af41bf5c79d to your computer and use it in GitHub Desktop.
Code snippet for part of data cleansing process for Meditations data import.
import requests
import re
# import Meditations
response = requests.get('http://classics.mit.edu/Antoninus/meditations.mb.txt')
data = response.text
# clean the text
data = data.split("Translated by George Long")[1].replace("-", "").split("THE END")[0]
data = re.sub("BOOK [A-Z]+\n", "", data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment