James Tauber jtauber

## gist:6283307

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jtauber
                / gist:6283307
            
            
              Created
              August 20, 2013 15:50
            
          
    Differences between Syntax Tree and MorphGNT

Text


there are additional sections of text in John, 2 Timothy and James and one extra word in Acts (these could be the result of multiple analyses but I haven't confirmed that yet)

Lemmatization


Syntax Tree has θεμέλιον for θεμέλιος


## gist:5593409
def struct_hash(structure):
    if isinstance(structure, list):
        r = "list:" + repr([struct_hash(item) for item in structure])
    elif isinstance(structure, dict):
        r = "dict:" + repr([(struct_hash(key), struct_hash(value)) for (key, value) in sorted(structure.items())])
    elif isinstance(structure, str):
        r = "unicode:" + repr(unicode(structure))
    elif isinstance(structure, unicode):
        r = "unicode:" + repr(structure)
    elif isinstance(structure, int):

## gist:5395681

NEWTON is riding his bicycle in Lincolnshire when he comes across
SOCRATES on the road.

SOCRATES: For what purpose do you return to Woolsthorpe my dear sir?
NEWTON: Cambridge has been closed due to the plague.
SOCRATES: By Zeus! Closed?
NEWTON: Indeed.
SOCRATES: And so what are you doing with your time here? Surely not
farming.

## dep.py
#!/usr/bin/env python3

import argparse
import collections
import glob

parser = argparse.ArgumentParser(description="count (and optionally list) the entries where the determinant columns do not functionally determine the dependent columns.")
parser.add_argument("-v", "--verbose", help="output full results", action="store_true")
parser.add_argument("determinant", help="comma-separated list of columns")
parser.add_argument("dependent", help="comma-separated list of columns")

## nfkc2.py
#!/usr/bin/env python

import sys
import unicodedata

with open(sys.argv[1]) as f:
    for line in f:
        sys.stdout.write(unicodedata.normalize("NFKC", line.decode("utf-8")).encode("utf-8"))

## paragraph_reader.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jtauber
                / paragraph_reader.md
            
            
              Created
              November 10, 2015 20:50
            
          
    Rev 7.7

ἐκ φυλῆς Λευὶ δώδεκα χιλιάδες,
Rev 7.5

ἐκ φυλῆς Ἰούδα δώδεκα χιλιάδες ἐσφραγισμένοι,

  
## frequency_order.py
#!/usr/bin/env python3

from collections import defaultdict

from pysblgnt import morphgnt_rows

count_by_item = defaultdict(int)
total_item_count = 0

for book_num in range(1, 28):

## mean_dependency_depth.py
#!/usr/bin/env python3

from collections import defaultdict
from math import log

import sys

depths_by_target = defaultdict(list)

with open(sys.argv[1]) as f:

## make-paths.py
#!/usr/bin/env python3

import sys

lines = []
parent_by_id = {}
rel_by_id = {}

with open(sys.argv[1]) as f:
    for line in f:

## gist:3140197

I don't want to get too deep into the psychology of why I stopped blogging other than to suggest that when you don't blog for a while, it raises the bar of what you break your blogging drought with. There was one time I didn't blog for a couple of months and the next time I blogged, a friend said "I've waited months for a blog post and you post THAT!".

So to get back to putting more content on my site, I need to give myself permission to do shorter, less well-thought-out posts and not feel that every post has to be an epic article. Looking at the taxonomy above, it's clear that in the past I have made blog posts considerably shorter than informational articles.

I think there is value in distinguishing short-form and long-term posts and making enough of a separation that there is less pressure to always do long-term posts. But as well as the dimension of length, I think it also makes a lot of sense to distinguish posts which are ephemeral (or at least quite specific to the time in which they were made) from
	def struct_hash(structure):
	if isinstance(structure, list):
	r = "list:" + repr([struct_hash(item) for item in structure])
	elif isinstance(structure, dict):
	r = "dict:" + repr([(struct_hash(key), struct_hash(value)) for (key, value) in sorted(structure.items())])
	elif isinstance(structure, str):
	r = "unicode:" + repr(unicode(structure))
	elif isinstance(structure, unicode):
	r = "unicode:" + repr(structure)
	elif isinstance(structure, int):

	NEWTON is riding his bicycle in Lincolnshire when he comes across
	SOCRATES on the road.

	SOCRATES: For what purpose do you return to Woolsthorpe my dear sir?
	NEWTON: Cambridge has been closed due to the plague.
	SOCRATES: By Zeus! Closed?
	NEWTON: Indeed.
	SOCRATES: And so what are you doing with your time here? Surely not
	farming.
	#!/usr/bin/env python3

	import argparse
	import collections
	import glob

	parser = argparse.ArgumentParser(description="count (and optionally list) the entries where the determinant columns do not functionally determine the dependent columns.")
	parser.add_argument("-v", "--verbose", help="output full results", action="store_true")
	parser.add_argument("determinant", help="comma-separated list of columns")
	parser.add_argument("dependent", help="comma-separated list of columns")
	#!/usr/bin/env python

	import sys
	import unicodedata

	with open(sys.argv[1]) as f:
	for line in f:
	sys.stdout.write(unicodedata.normalize("NFKC", line.decode("utf-8")).encode("utf-8"))
	#!/usr/bin/env python3

	from collections import defaultdict

	from pysblgnt import morphgnt_rows

	count_by_item = defaultdict(int)
	total_item_count = 0

	for book_num in range(1, 28):
	#!/usr/bin/env python3

	from collections import defaultdict
	from math import log

	import sys

	depths_by_target = defaultdict(list)

	with open(sys.argv[1]) as f:
	#!/usr/bin/env python3

	import sys

	lines = []
	parent_by_id = {}
	rel_by_id = {}

	with open(sys.argv[1]) as f:
	for line in f:

	I don't want to get too deep into the psychology of why I stopped blogging other than to suggest that when you don't blog for a while, it raises the bar of what you break your blogging drought with. There was one time I didn't blog for a couple of months and the next time I blogged, a friend said "I've waited months for a blog post and you post THAT!".

	So to get back to putting more content on my site, I need to give myself permission to do shorter, less well-thought-out posts and not feel that every post has to be an epic article. Looking at the taxonomy above, it's clear that in the past I have made blog posts considerably shorter than informational articles.

	I think there is value in distinguishing short-form and long-term posts and making enough of a separation that there is less pressure to always do long-term posts. But as well as the dimension of length, I think it also makes a lot of sense to distinguish posts which are ephemeral (or at least quite specific to the time in which they were made) from