Skip to content

Instantly share code, notes, and snippets.

@OsmanMutlu
Created December 3, 2018 10:44
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save OsmanMutlu/12f00dc33472b6031bf59ca9349a0e93 to your computer and use it in GitHub Desktop.
Small functions
def url_to_filename(row): #Takes a pandas row
url = row.url
url = re.sub(r"://", r"__", url)
url = re.sub(r"/", r"_", url)
row.url = re.sub(r"\.?(cms|html|ece|ece1)?$", r".folia.xml", url)
return row
def getwordtext(entity): #Takes a folia entity
annot = ""
for word in entity.wrefs():
annot = annot + word.text()
if word.space:
annot = annot + " "
return re.sub(r" $", r"", annot)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment