Skip to content

Instantly share code, notes, and snippets.

@diyclassics
Last active January 27, 2023 21:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save diyclassics/8c531175c0a44bfafd3523ae0f60ebe0 to your computer and use it in GitHub Desktop.
Save diyclassics/8c531175c0a44bfafd3523ae0f60ebe0 to your computer and use it in GitHub Desktop.
Example for how to fix json content in CLTK Perseus file
import json
from cltkreaders.grc import GreekTesseraeCorpusReader
T = GreekTesseraeCorpusReader()
BOOK = 16
file = f"homer.iliad.part.{BOOK}.tess"
output = dict()
def get_line_number_from_citation(citation):
# e.g. '<hom. il. 16.840>'
return citation.split(".")[-1].replace(">", "")
doc_rows = next(T.doc_rows(file))
for citation, line in doc_rows.items():
line_number = get_line_number_from_citation(citation)
output[line_number] = line
book_output = {"16": output}
with open(f"iliad-{BOOK}.json", "w") as f:
json.dump(book_output, f, ensure_ascii=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment