Skip to content

Instantly share code, notes, and snippets.

@diyclassics
diyclassics / omeka-xml-parse.py
Last active August 14, 2022 16:23
Parsing DublinCore XML data exported from Omeka
### Jupyter notebook for this code available at:
### https://github.com/isaw-ga-3024/isaw-ga-3024.github.io/blob/master/burns-patrick-diyclassics/notebooks/Omeka-XML-Parse.ipynb
omeka = """<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://dcaa.hosting.nyu.edu/cms/admin/items/show/2">
<dc:title>Changing the Center of Gravity</dc:title>
<dc:subject>digital humanities</dc:subject>
<dc:creator>Terras, Melissa</dc:creator>
<dc:creator>Crane, Gregory</dc:creator>
@diyclassics
diyclassics / gsoc-summary.txt
Last active August 23, 2016 13:43
GSoC 2016 Summary
Patrick J. Burns, PhD
Classical Language Toolkit
Google Summer of Code 2016 Final Report
Here is a summary of the work I completed for the 2016 Google Summer of Code project "CLTK Latin/Greek Backoff Lemmatizer" for the Classical Language Toolkit (cltk.org). The code can be found at https://github.com/diyclassics/cltk/tree/lemmatize/cltk/lemmatize.
- Wrote custom lemmatizers for Latin and Greek as subclasses of NLTK's tag module (http://www.nltk.org/api/nltk.tag.html), including:
- Default lemmatization, i.e. same lemma returned for every token
- Identity lemmatization, i.e. original token returned as lemma
- Model lemmatization, i.e. lemma returned based on dictionary lookup
- Context lemmatization, i.e. lemma returned based on proximal token/lemma tuples in training data
@diyclassics
diyclassics / xml_parse_perseus_nlp.py
Created July 15, 2016 16:59
Script for extracting attributes from Perseus Latin Treebank XML files
# Script for extracting attributes (like word, lemma, form, etc.)
# from the XML files in the Perseus Latin Dependency Treebank 2.1
# available here:
# https://github.com/PerseusDL/treebank_data/tree/master/v2.1/Latin
#
# Returns tuples that can be used for testing the new version of
# the CLTK lemmatizer
#
# Use: from the command line, call script with filename:
# >>> python xml_parse_perseus_nlp.py phi0448.phi001.perseus-lat1.tb.xml
@diyclassics
diyclassics / gist:5f4e7ff7963e255dd44278577ffcbf6e
Last active March 22, 2018 15:31
ll-plaintextcorpus-demo
from cltk.corpus.latin import latinlibrary
from cltk.tokenize.word import WordTokenizer
tokenizer = WordTokenizer('latin')
ll_raw = latinlibrary.raw()
print(ll_raw[:500])
ll_words = latinlibrary.words()
print(ll_words[:100])
@diyclassics
diyclassics / zen-pythonis.txt
Created February 18, 2016 19:38
Latin translation of Peters's "The Zen of Python"
Zen Pythonis
a T. Peters imprimis Anglice scriptum
redditumque Latine a Patricio Ios. Burns:
– Formosum deformi praefertur.
– Directum obliquo praefertur.
– Simplex multiplici praefertur.
– Multiplex contorto praefertur.
– Planum implicato praefertur.
– Rarum denso praefertur.
@diyclassics
diyclassics / alliteration_python_sample.py
Created November 16, 2015 12:03 — forked from pbartleby/alliteration_python_sample.py
Workflow example for "Distant Reading Alliteration in Latin Poetry". Presented at Word, Space, Time: Digital Perspectives on the Classical World (Digital Classics Association conference) on 4.6.13 @ U. Buffalo.
# !/usr/local/bin/python
# -*- coding: utf-8 -*-
# alliteration_python_sample.py
"""
Workflow example for
Distant Reading Alliteration in Latin Poetry
Patrick J. Burns
Fordham University, Department of Classics
Word, Space, Time: Digital Perspectives on the Classical World