Skip to content

Instantly share code, notes, and snippets.

View jtauber's full-sized avatar

James Tauber jtauber

View GitHub Profile

Differences between Syntax Tree and MorphGNT

Text

  • there are additional sections of text in John, 2 Timothy and James and one extra word in Acts (these could be the result of multiple analyses but I haven't confirmed that yet)

Lemmatization

  • Syntax Tree has θεμέλιον for θεμέλιος
def struct_hash(structure):
if isinstance(structure, list):
r = "list:" + repr([struct_hash(item) for item in structure])
elif isinstance(structure, dict):
r = "dict:" + repr([(struct_hash(key), struct_hash(value)) for (key, value) in sorted(structure.items())])
elif isinstance(structure, str):
r = "unicode:" + repr(unicode(structure))
elif isinstance(structure, unicode):
r = "unicode:" + repr(structure)
elif isinstance(structure, int):
NEWTON is riding his bicycle in Lincolnshire when he comes across
SOCRATES on the road.
SOCRATES: For what purpose do you return to Woolsthorpe my dear sir?
NEWTON: Cambridge has been closed due to the plague.
SOCRATES: By Zeus! Closed?
NEWTON: Indeed.
SOCRATES: And so what are you doing with your time here? Surely not
farming.
@jtauber
jtauber / dep.py
Last active December 15, 2015 07:06
script for finding functional dependencies in MorphGNT columns
#!/usr/bin/env python3
import argparse
import collections
import glob
parser = argparse.ArgumentParser(description="count (and optionally list) the entries where the determinant columns do not functionally determine the dependent columns.")
parser.add_argument("-v", "--verbose", help="output full results", action="store_true")
parser.add_argument("determinant", help="comma-separated list of columns")
parser.add_argument("dependent", help="comma-separated list of columns")
@jtauber
jtauber / nfkc2.py
Created December 5, 2015 02:45
NFKC normalisation in python 2 and 3
#!/usr/bin/env python
import sys
import unicodedata
with open(sys.argv[1]) as f:
for line in f:
sys.stdout.write(unicodedata.normalize("NFKC", line.decode("utf-8")).encode("utf-8"))

Rev 7.7
ἐκ φυλῆς Λευὶ δώδεκα χιλιάδες,

Rev 7.5
ἐκ φυλῆς Ἰούδα δώδεκα χιλιάδες ἐσφραγισμένοι,

#!/usr/bin/env python3
from collections import defaultdict
from pysblgnt import morphgnt_rows
count_by_item = defaultdict(int)
total_item_count = 0
for book_num in range(1, 28):
#!/usr/bin/env python3
from collections import defaultdict
from math import log
import sys
depths_by_target = defaultdict(list)
with open(sys.argv[1]) as f:
#!/usr/bin/env python3
import sys
lines = []
parent_by_id = {}
rel_by_id = {}
with open(sys.argv[1]) as f:
for line in f:
@jtauber
jtauber / gist:3140197
Created July 19, 2012 01:37
Blogging, Twitter and something in between
I don't want to get too deep into the psychology of why I stopped blogging other than to suggest that when you don't blog for a while, it raises the bar of what you break your blogging drought with. There was one time I didn't blog for a couple of months and the next time I blogged, a friend said "I've waited months for a blog post and you post THAT!".
So to get back to putting more content on my site, I need to give myself permission to do shorter, less well-thought-out posts and not feel that every post has to be an epic article. Looking at the taxonomy above, it's clear that in the past I have made blog posts considerably shorter than informational articles.
I think there is value in distinguishing short-form and long-term posts and making enough of a separation that there is less pressure to always do long-term posts. But as well as the dimension of length, I think it also makes a lot of sense to distinguish posts which are ephemeral (or at least quite specific to the time in which they were made) from