Patrick J. Burns diyclassics

## fix-iliad-16-cltk-perseus-json.py
import json
from cltkreaders.grc import GreekTesseraeCorpusReader

T = GreekTesseraeCorpusReader()

BOOK = 16
file = f"homer.iliad.part.{BOOK}.tess"

output = dict()

## upbrew.sh
#!/bin/bash

echo "\nUpdating Homebrew...\n"
brew update

echo "\nUpgrading Homebrew...\n"
brew upgrade

echo "\nCleaning Homebrew...\n"
brew cleanup -s

## pycollatinus output for 'ne'
```python
from pycollatinus import Lemmatiseur
import pprint
tagger = Lemmatiseur()
```

```python
lemmas = tagger.lemmatise_multiple('ne')
lemmas_set = sorted(set([lemma['lemma'] for lemma in lemmas[0]]))
print(lemmas_set)

## updateLastUpdated
/**
 * Written by diyclassics
 *
 * Looks for text in a Google Doc on open in the form "last updated 1/1/2001" and updates
 * with the current date; also adds a menu item for manual update.
 */

function onOpen() {
  var ui = DocumentApp.getUi();
  // Or FormApp or SpreadsheetApp.

## get_perseus_short_defs.gs
function getShortDef(input) {
  var array = [];

  var url = "http://www.perseus.tufts.edu/hopper/morph?l=" + input;
  var page = UrlFetchApp.fetch(url);
  var doc = Xml.parse(page, true);
  var bodyHtml = doc.html.body.toXmlString();
  doc = XmlService.parse(bodyHtml);
  var root = doc.getRootElement();


## extract_greek_text.py
import re
GREEK = '\u0300-\u03FF'
GREEK_EXT = '\u1F00-\u1FFF'

# Cicero Att 1.4
# http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.02.0008%3Abook%3D1%3Aletter%3D1%3Asection%3D4

text = """
sane sum perturbatus cum ipsius Satyri familiaritate tum Domiti, in quo uno maxime ambitio nostra nititur. demonstravi haec Caecilio simul et illud ostendi, si ipse unus cum illo uno contenderet, me ei satis facturum fuisse; nunc in causa universorum creditorum, hominum praesertim amplissimorum, qui sine eo quem Caecilius suo nomine perhiberet facile causam communem sustinerent, aequum esse eum et officio meo consulere et tempori. durius accipere hoc mihi visus est quam vellem et quam homines belli solent, et postea prorsus ab instituta nostra paucorum dierum consuetudine longe refugit.

## get_pleiades_id.py
import json
import urllib.request

def get_pleiades_json(pleiades_id):
    # pleiades_id: STR
    pleiades_url = "https://raw.githubusercontent.com/ryanfb/pleiades-geojson/gh-pages/geojson/%s.geojson" % pleiades_id
    try:
        with urllib.request.urlopen(pleiades_url) as url:
            pleiades_geojson = json.loads(url.read().decode())
        return pleiades_geojson

## gist:8caaa77b163ab55c6238d75e45f33281
<symbol id="icon-hcommons" viewBox="0 0 240 240">
<title>hcommons</title>
<g transform="translate(0.000000,240.000000) scale(0.100000,-0.100000)"
fill="#000000" stroke="none">
<path d="M1045 2394 c-85 -14 -235 -57 -312 -90 -361 -154 -608 -451 -705
-845 -31 -126 -33 -381 -4 -504 98 -419 381 -743 767 -882 157 -57 228 -68
414 -68 153 1 181 4 280 29 122 31 296 109 395 176 240 163 420 431 492 733
32 132 32 382 0 514 -38 160 -112 325 -200 448 -120 166 -306 314 -503 398
-151 65 -262 88 -439 92 -85 2 -168 1 -185 -1z m948 -585 l57 -12 0 -114 0
-114 -47 8 c-27 4 -91 8 -144 8 -86 0 -100 -3 -133 -25 -20 -14 -46 -45 -59

## CLTK GSoC Proposal Suggestions
Several prospective CLTK Google Summer of Code applicants have written recently about what the proposal should include. While successful project proposals can take many different forms, here is an outline that helps address the questions likely to come up as the proposal are reviewed:
- Abstract: It is helpful to distill your proposal into 100-200 words that define the problem, identify your solution, name the datasets necessary to do the work, and report the expected outcome of this project. On this last point, note that since this is a proposal, we do not expect you to report results—but you should have a clear idea of where you expect to be by the end of the summer. We will also need to use abstracts and brief descriptions of your project on the GSoC page if your proposal is selected.
- Proposal: This will be the bulk of your submission. Here you want to expand upon the points mentioned in the abstract, including:
    - Define the problem. Depending on your project, CLTK may be different than other open so

## gist:b24fbd1ad3bbb726387de443fab84956
    def _define_lemmatizer(self):
        backoff0 = None
        backoff1 = IdentityLemmatizer()
        backoff2 = TrainLemmatizer(model=self.LATIN_OLD_MODEL, backoff=backoff1)
        backoff3 = PPLemmatizer(regexps=self.latin_verb_patterns, pps=self.latin_pps, backoff=backoff2)
        backoff4 = UnigramLemmatizer(self.train_sents, backoff=backoff3)
        backoff5 = RegexpLemmatizer(self.latin_misc_patterns, backoff=backoff4)
        backoff6 = TrainLemmatizer(model=self.LATIN_MODEL, backoff=backoff5)
        #backoff7 = BigramPOSLemmatizer(self.pos_train_sents, include=['cum'], backoff=backoff6)
        lemmatizer = backoff6
	import json
	from cltkreaders.grc import GreekTesseraeCorpusReader

	T = GreekTesseraeCorpusReader()

	BOOK = 16
	file = f"homer.iliad.part.{BOOK}.tess"

	output = dict()
	#!/bin/bash

	echo "\nUpdating Homebrew...\n"
	brew update

	echo "\nUpgrading Homebrew...\n"
	brew upgrade

	echo "\nCleaning Homebrew...\n"
	brew cleanup -s
	```python
	from pycollatinus import Lemmatiseur
	import pprint
	tagger = Lemmatiseur()
	```

	```python
	lemmas = tagger.lemmatise_multiple('ne')
	lemmas_set = sorted(set([lemma['lemma'] for lemma in lemmas[0]]))
	print(lemmas_set)
	/**
	* Written by diyclassics
	*
	* Looks for text in a Google Doc on open in the form "last updated 1/1/2001" and updates
	* with the current date; also adds a menu item for manual update.
	*/

	function onOpen() {
	var ui = DocumentApp.getUi();
	// Or FormApp or SpreadsheetApp.
	function getShortDef(input) {
	var array = [];

	var url = "http://www.perseus.tufts.edu/hopper/morph?l=" + input;
	var page = UrlFetchApp.fetch(url);
	var doc = Xml.parse(page, true);
	var bodyHtml = doc.html.body.toXmlString();
	doc = XmlService.parse(bodyHtml);
	var root = doc.getRootElement();
	import re
	GREEK = '\u0300-\u03FF'
	GREEK_EXT = '\u1F00-\u1FFF'

	# Cicero Att 1.4
	# http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.02.0008%3Abook%3D1%3Aletter%3D1%3Asection%3D4

	text = """
	sane sum perturbatus cum ipsius Satyri familiaritate tum Domiti, in quo uno maxime ambitio nostra nititur. demonstravi haec Caecilio simul et illud ostendi, si ipse unus cum illo uno contenderet, me ei satis facturum fuisse; nunc in causa universorum creditorum, hominum praesertim amplissimorum, qui sine eo quem Caecilius suo nomine perhiberet facile causam communem sustinerent, aequum esse eum et officio meo consulere et tempori. durius accipere hoc mihi visus est quam vellem et quam homines belli solent, et postea prorsus ab instituta nostra paucorum dierum consuetudine longe refugit.
	import json
	import urllib.request

	def get_pleiades_json(pleiades_id):
	# pleiades_id: STR
	pleiades_url = "https://raw.githubusercontent.com/ryanfb/pleiades-geojson/gh-pages/geojson/%s.geojson" % pleiades_id
	try:
	with urllib.request.urlopen(pleiades_url) as url:
	pleiades_geojson = json.loads(url.read().decode())
	return pleiades_geojson
	<symbol id="icon-hcommons" viewBox="0 0 240 240">
	<title>hcommons</title>
	<g transform="translate(0.000000,240.000000) scale(0.100000,-0.100000)"
	fill="#000000" stroke="none">
	<path d="M1045 2394 c-85 -14 -235 -57 -312 -90 -361 -154 -608 -451 -705
	-845 -31 -126 -33 -381 -4 -504 98 -419 381 -743 767 -882 157 -57 228 -68
	414 -68 153 1 181 4 280 29 122 31 296 109 395 176 240 163 420 431 492 733
	32 132 32 382 0 514 -38 160 -112 325 -200 448 -120 166 -306 314 -503 398
	-151 65 -262 88 -439 92 -85 2 -168 1 -185 -1z m948 -585 l57 -12 0 -114 0
	-114 -47 8 c-27 4 -91 8 -144 8 -86 0 -100 -3 -133 -25 -20 -14 -46 -45 -59
	Several prospective CLTK Google Summer of Code applicants have written recently about what the proposal should include. While successful project proposals can take many different forms, here is an outline that helps address the questions likely to come up as the proposal are reviewed:
	- Abstract: It is helpful to distill your proposal into 100-200 words that define the problem, identify your solution, name the datasets necessary to do the work, and report the expected outcome of this project. On this last point, note that since this is a proposal, we do not expect you to report results—but you should have a clear idea of where you expect to be by the end of the summer. We will also need to use abstracts and brief descriptions of your project on the GSoC page if your proposal is selected.
	- Proposal: This will be the bulk of your submission. Here you want to expand upon the points mentioned in the abstract, including:
	- Define the problem. Depending on your project, CLTK may be different than other open so
	def _define_lemmatizer(self):
	backoff0 = None
	backoff1 = IdentityLemmatizer()
	backoff2 = TrainLemmatizer(model=self.LATIN_OLD_MODEL, backoff=backoff1)
	backoff3 = PPLemmatizer(regexps=self.latin_verb_patterns, pps=self.latin_pps, backoff=backoff2)
	backoff4 = UnigramLemmatizer(self.train_sents, backoff=backoff3)
	backoff5 = RegexpLemmatizer(self.latin_misc_patterns, backoff=backoff4)
	backoff6 = TrainLemmatizer(model=self.LATIN_MODEL, backoff=backoff5)
	#backoff7 = BigramPOSLemmatizer(self.pos_train_sents, include=['cum'], backoff=backoff6)
	lemmatizer = backoff6