Patrick J. Burns diyclassics

## omeka-xml-parse.py
### Jupyter notebook for this code available at:
### https://github.com/isaw-ga-3024/isaw-ga-3024.github.io/blob/master/burns-patrick-diyclassics/notebooks/Omeka-XML-Parse.ipynb

omeka = """<?xml version="1.0"?>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description rdf:about="http://dcaa.hosting.nyu.edu/cms/admin/items/show/2">
        <dc:title>Changing the Center of Gravity</dc:title>
        <dc:subject>digital humanities</dc:subject>
        <dc:creator>Terras, Melissa</dc:creator>
        <dc:creator>Crane, Gregory</dc:creator>

## gsoc-summary.txt
Patrick J. Burns, PhD
Classical Language Toolkit
Google Summer of Code 2016 Final Report

Here is a summary of the work I completed for the 2016 Google Summer of Code project "CLTK Latin/Greek Backoff Lemmatizer" for the Classical Language Toolkit (cltk.org). The code can be found at https://github.com/diyclassics/cltk/tree/lemmatize/cltk/lemmatize.
- Wrote custom lemmatizers for Latin and Greek as subclasses of NLTK's tag module (http://www.nltk.org/api/nltk.tag.html), including:
  - Default lemmatization, i.e. same lemma returned for every token
  - Identity lemmatization, i.e. original token returned as lemma
  - Model lemmatization, i.e. lemma returned based on dictionary lookup
  - Context lemmatization, i.e. lemma returned based on proximal token/lemma tuples in training data

## xml_parse_perseus_nlp.py
# Script for extracting attributes (like word, lemma, form, etc.)
# from the XML files in the Perseus Latin Dependency Treebank 2.1
# available here:
# https://github.com/PerseusDL/treebank_data/tree/master/v2.1/Latin
#
# Returns tuples that can be used for testing the new version of
# the CLTK lemmatizer
#
# Use: from the command line, call script with filename:
# >>> python xml_parse_perseus_nlp.py phi0448.phi001.perseus-lat1.tb.xml

## gist:5f4e7ff7963e255dd44278577ffcbf6e
from cltk.corpus.latin import latinlibrary
from cltk.tokenize.word import WordTokenizer

tokenizer = WordTokenizer('latin')

ll_raw = latinlibrary.raw()
print(ll_raw[:500])

ll_words = latinlibrary.words()
print(ll_words[:100])

## zen-pythonis.txt
Zen Pythonis
a T. Peters imprimis Anglice scriptum
redditumque Latine a Patricio Ios. Burns:

– Formosum deformi praefertur.
– Directum obliquo praefertur.
– Simplex multiplici praefertur.
– Multiplex contorto praefertur.
– Planum implicato praefertur.
– Rarum denso praefertur.

## alliteration_python_sample.py
# !/usr/local/bin/python
# -*- coding: utf-8 -*-
# alliteration_python_sample.py

"""
Workflow example for
Distant Reading Alliteration in Latin Poetry
Patrick J. Burns
Fordham University, Department of Classics
Word, Space, Time: Digital Perspectives on the Classical World
	### Jupyter notebook for this code available at:
	### https://github.com/isaw-ga-3024/isaw-ga-3024.github.io/blob/master/burns-patrick-diyclassics/notebooks/Omeka-XML-Parse.ipynb

	omeka = """<?xml version="1.0"?>
	<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<rdf:Description rdf:about="http://dcaa.hosting.nyu.edu/cms/admin/items/show/2">
	<dc:title>Changing the Center of Gravity</dc:title>
	<dc:subject>digital humanities</dc:subject>
	<dc:creator>Terras, Melissa</dc:creator>
	<dc:creator>Crane, Gregory</dc:creator>
	Patrick J. Burns, PhD
	Classical Language Toolkit
	Google Summer of Code 2016 Final Report

	Here is a summary of the work I completed for the 2016 Google Summer of Code project "CLTK Latin/Greek Backoff Lemmatizer" for the Classical Language Toolkit (cltk.org). The code can be found at https://github.com/diyclassics/cltk/tree/lemmatize/cltk/lemmatize.
	- Wrote custom lemmatizers for Latin and Greek as subclasses of NLTK's tag module (http://www.nltk.org/api/nltk.tag.html), including:
	- Default lemmatization, i.e. same lemma returned for every token
	- Identity lemmatization, i.e. original token returned as lemma
	- Model lemmatization, i.e. lemma returned based on dictionary lookup
	- Context lemmatization, i.e. lemma returned based on proximal token/lemma tuples in training data
	# Script for extracting attributes (like word, lemma, form, etc.)
	# from the XML files in the Perseus Latin Dependency Treebank 2.1
	# available here:
	# https://github.com/PerseusDL/treebank_data/tree/master/v2.1/Latin
	#
	# Returns tuples that can be used for testing the new version of
	# the CLTK lemmatizer
	#
	# Use: from the command line, call script with filename:
	# >>> python xml_parse_perseus_nlp.py phi0448.phi001.perseus-lat1.tb.xml
	from cltk.corpus.latin import latinlibrary
	from cltk.tokenize.word import WordTokenizer

	tokenizer = WordTokenizer('latin')

	ll_raw = latinlibrary.raw()
	print(ll_raw[:500])

	ll_words = latinlibrary.words()
	print(ll_words[:100])
	Zen Pythonis
	a T. Peters imprimis Anglice scriptum
	redditumque Latine a Patricio Ios. Burns:

	– Formosum deformi praefertur.
	– Directum obliquo praefertur.
	– Simplex multiplici praefertur.
	– Multiplex contorto praefertur.
	– Planum implicato praefertur.
	– Rarum denso praefertur.
	# !/usr/local/bin/python
	# -- coding: utf-8 --
	# alliteration_python_sample.py

	"""
	Workflow example for
	Distant Reading Alliteration in Latin Poetry
	Patrick J. Burns
	Fordham University, Department of Classics
	Word, Space, Time: Digital Perspectives on the Classical World