Skip to content

Instantly share code, notes, and snippets.

Avatar

James Tauber jtauber

View GitHub Profile
@jtauber
jtauber / .block
Last active Nov 2, 2018 — forked from Stroked/.block
Cartesian Distortion (Fisheye)
View .block
license: mit
@jtauber
jtauber / gist:ed07e0fd15ecdc5394755d3e0c9304f8
Last active Aug 28, 2018
normalisation code for graves and extra accents
View gist:ed07e0fd15ecdc5394755d3e0c9304f8
VARIA = "\u0300"
OXIA = "\u0301"
PERISPOMENI = "\u0342"
ACCENTS = [VARIA, OXIA, PERISPOMENI]
def strip_accents(s):
return unicodedata.normalize("NFKC", "".join(
c for c in return unicodedata.normalize("NFD", s) if c not in ACCENTS
@jtauber
jtauber / recipes.py
Last active Aug 3, 2018
[WORK IN PROGRESS] little recipes I use in processing (mostly Greek) texts
View recipes.py
### strip specific accents
def strip_accents(w):
return unicodedata.normalize("NFC", "".join(
ch
for ch in unicodedata.normalize("NFD", w)
if ch not in ["\u0300", "\u0301", "\u0342"]
))
@jtauber
jtauber / tokenize_01.py
Last active Aug 3, 2018
incremental development of a script to tokenize DigltalNyssa
View tokenize_01.py
# Opens the file with the given filename for reading and puts the resultant
# file object in the variable `f`.
f = open("OCR Output linebreaks removed.txt")
# `f.read()` reads the file and returns a string.
# `.split()` splits that string on whitespace and returns a list of strings.
# `for A in B:` iterates over the list B and runs the indented block with each
# list item in the variable A.
for token in f.read().split():
@jtauber
jtauber / dep.py
Last active Dec 15, 2015
script for finding functional dependencies in MorphGNT columns
View dep.py
#!/usr/bin/env python3
import argparse
import collections
import glob
parser = argparse.ArgumentParser(description="count (and optionally list) the entries where the determinant columns do not functionally determine the dependent columns.")
parser.add_argument("-v", "--verbose", help="output full results", action="store_true")
parser.add_argument("determinant", help="comma-separated list of columns")
parser.add_argument("dependent", help="comma-separated list of columns")
@jtauber
jtauber / nfkc2.py
Created Dec 5, 2015
NFKC normalisation in python 2 and 3
View nfkc2.py
#!/usr/bin/env python
import sys
import unicodedata
with open(sys.argv[1]) as f:
for line in f:
sys.stdout.write(unicodedata.normalize("NFKC", line.decode("utf-8")).encode("utf-8"))
View paragraph_reader.md

Rev 7.7
ἐκ φυλῆς Λευὶ δώδεκα χιλιάδες,

Rev 7.5
ἐκ φυλῆς Ἰούδα δώδεκα χιλιάδες ἐσφραγισμένοι,

View frequency_order.py
#!/usr/bin/env python3
from collections import defaultdict
from pysblgnt import morphgnt_rows
count_by_item = defaultdict(int)
total_item_count = 0
for book_num in range(1, 28):
View mean_dependency_depth.py
#!/usr/bin/env python3
from collections import defaultdict
from math import log
import sys
depths_by_target = defaultdict(list)
with open(sys.argv[1]) as f:
You can’t perform that action at this time.