Skip to content

Instantly share code, notes, and snippets.

View jtauber's full-sized avatar

James Tauber jtauber

View GitHub Profile
@jtauber
jtauber / gist:8b2fa862d43ecedf7536
Last active January 20, 2023 19:40
setting up OS X 10.9 for i386-elf cross-compilation
# frustratingly, we need to have "real" gcc to build the i386-elf binutils/gcc
brew install gcc
# $PROJECT_HOME should be the parent directory under which you'll download
# everything and build the toolchain
cd $PROJECT_HOME
mkdir toolchain

Keybase proof

I hereby claim:

  • I am jtauber on github.
  • I am jtauber (https://keybase.io/jtauber) on keybase.
  • I have a public key ASBKXRd38Pg3fZKJJkLQ9TG3sLxxs17UAcX1zhjDiL3cpQo

To claim this, I am signing this object:

@jtauber
jtauber / .block
Last active November 2, 2018 05:16 — forked from Stroked/.block
Cartesian Distortion (Fisheye)
license: mit
@jtauber
jtauber / gist:ed07e0fd15ecdc5394755d3e0c9304f8
Last active August 28, 2018 14:03
normalisation code for graves and extra accents
VARIA = "\u0300"
OXIA = "\u0301"
PERISPOMENI = "\u0342"
ACCENTS = [VARIA, OXIA, PERISPOMENI]
def strip_accents(s):
return unicodedata.normalize("NFKC", "".join(
c for c in return unicodedata.normalize("NFD", s) if c not in ACCENTS
@jtauber
jtauber / tokenize_01.py
Last active August 3, 2018 20:35
incremental development of a script to tokenize DigltalNyssa
# Opens the file with the given filename for reading and puts the resultant
# file object in the variable `f`.
f = open("OCR Output linebreaks removed.txt")
# `f.read()` reads the file and returns a string.
# `.split()` splits that string on whitespace and returns a list of strings.
# `for A in B:` iterates over the list B and runs the indented block with each
# list item in the variable A.
for token in f.read().split():
@jtauber
jtauber / recipes.py
Last active August 3, 2018 15:26
[WORK IN PROGRESS] little recipes I use in processing (mostly Greek) texts
### strip specific accents
def strip_accents(w):
return unicodedata.normalize("NFC", "".join(
ch
for ch in unicodedata.normalize("NFD", w)
if ch not in ["\u0300", "\u0301", "\u0342"]
))
#!/usr/bin/env python3
from collections import defaultdict
from math import log
from pysblgnt import morphgnt_rows
items_by_target = defaultdict(list)
count_by_item = defaultdict(int)
total_item_count = 0
The Ugly Duckling.
A duck made her nest under some leaves.
She sat on the eggs to keep them warm.
At last the eggs broke, one after the other. Little ducks came out.
Only one egg was left. It was a very large one.
At last it broke, and out came a big, ugly duckling.
"What a big duckling!" said the old duck. "He does not look like us. Can he be a turkey?--We will see. If he does not like the water, he is not a duck."
The next day the mother duck took her ducklings to the pond.
@jtauber
jtauber / gist:6331304
Created August 25, 2013 01:16
Difference between MorphGNT SBLGNT (but with numbers changed to `NU --------` to avoid a major class of systematic differences) and the Asia Bible Society syntax analysis (converted to a MorphGNT-like format for comparison purposes)
Matthew
5997c5997
< 40011026003 N- ----VSM- πατήρ, πατήρ
---
> 40011026003 N- ----NSM- πατήρ, πατήρ
10483c10483
< 40018012006 RI ----DSM- τινι τις
---