This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Jupyter notebook for this code available at: | |
### https://github.com/isaw-ga-3024/isaw-ga-3024.github.io/blob/master/burns-patrick-diyclassics/notebooks/Omeka-XML-Parse.ipynb | |
omeka = """<?xml version="1.0"?> | |
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> | |
<rdf:Description rdf:about="http://dcaa.hosting.nyu.edu/cms/admin/items/show/2"> | |
<dc:title>Changing the Center of Gravity</dc:title> | |
<dc:subject>digital humanities</dc:subject> | |
<dc:creator>Terras, Melissa</dc:creator> | |
<dc:creator>Crane, Gregory</dc:creator> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Patrick J. Burns, PhD | |
Classical Language Toolkit | |
Google Summer of Code 2016 Final Report | |
Here is a summary of the work I completed for the 2016 Google Summer of Code project "CLTK Latin/Greek Backoff Lemmatizer" for the Classical Language Toolkit (cltk.org). The code can be found at https://github.com/diyclassics/cltk/tree/lemmatize/cltk/lemmatize. | |
- Wrote custom lemmatizers for Latin and Greek as subclasses of NLTK's tag module (http://www.nltk.org/api/nltk.tag.html), including: | |
- Default lemmatization, i.e. same lemma returned for every token | |
- Identity lemmatization, i.e. original token returned as lemma | |
- Model lemmatization, i.e. lemma returned based on dictionary lookup | |
- Context lemmatization, i.e. lemma returned based on proximal token/lemma tuples in training data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Script for extracting attributes (like word, lemma, form, etc.) | |
# from the XML files in the Perseus Latin Dependency Treebank 2.1 | |
# available here: | |
# https://github.com/PerseusDL/treebank_data/tree/master/v2.1/Latin | |
# | |
# Returns tuples that can be used for testing the new version of | |
# the CLTK lemmatizer | |
# | |
# Use: from the command line, call script with filename: | |
# >>> python xml_parse_perseus_nlp.py phi0448.phi001.perseus-lat1.tb.xml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from cltk.corpus.latin import latinlibrary | |
from cltk.tokenize.word import WordTokenizer | |
tokenizer = WordTokenizer('latin') | |
ll_raw = latinlibrary.raw() | |
print(ll_raw[:500]) | |
ll_words = latinlibrary.words() | |
print(ll_words[:100]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Zen Pythonis | |
a T. Peters imprimis Anglice scriptum | |
redditumque Latine a Patricio Ios. Burns: | |
– Formosum deformi praefertur. | |
– Directum obliquo praefertur. | |
– Simplex multiplici praefertur. | |
– Multiplex contorto praefertur. | |
– Planum implicato praefertur. | |
– Rarum denso praefertur. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# !/usr/local/bin/python | |
# -*- coding: utf-8 -*- | |
# alliteration_python_sample.py | |
""" | |
Workflow example for | |
Distant Reading Alliteration in Latin Poetry | |
Patrick J. Burns | |
Fordham University, Department of Classics | |
Word, Space, Time: Digital Perspectives on the Classical World |
NewerOlder