Skip to content

Instantly share code, notes, and snippets.

View goodmami's full-sized avatar

Michael Wayne Goodman goodmami

View GitHub Profile
@goodmami
goodmami / regenerator.py
Last active August 29, 2015 14:03
Using the send() function of a Python generator to approximate lookahead and other uses
def regenerator(gen):
for x in gen:
regen = (yield x)
while regen is not None:
yield None # send() also yields something, so don't pull from gen again
regen = (yield regen)
gen = (i for i in range(10))
regen = regenerator(gen)
v = next(regen) # 0
@goodmami
goodmami / README.md
Last active August 29, 2015 14:13
Arc Diagrams with Variably Spaced Nodes
@goodmami
goodmami / make-preference.sh
Last active August 29, 2015 14:16
Make a [incr tsdb()] preference file with a specific result ID.
#!/bin/bash
if [ $# -ne 2 ]; then
echo 'usage: make-preference.sh PROFILE RESULT-ID'
exit 1
fi
awk -F@ -v RES="$2" \
'{ if($2 == RES) { printf("%d@-1@%d\n", $1, $2) } }' \
< "$1"/result
@goodmami
goodmami / AccumulationDict.py
Last active January 4, 2016 21:18
A dictionary with a user-definable function for handling collisions.
class AccumulationDict(dict):
def __init__(self, accumulator, *args, **kwargs):
if not hasattr(accumulator, '__call__'):
raise TypeError('Accumulator must be a binary function.')
self.accumulator = accumulator
self.accumulate(*args, **kwargs)
def __additem__(self, key, value):
if key in self:
self[key] = self.accumulator(self[key], value)
@goodmami
goodmami / nltk-bleu.py
Created June 27, 2017 01:03
Simple multi-bleu utility using the NLTK
#!/usr/bin/env python3
# Copyright 2017 Michael Wayne Goodman <goodman.m.w@gmail.com>
# Licensed under the MIT license: https://opensource.org/licenses/MIT
import sys
import os
import gzip
import docopt
@goodmami
goodmami / lark-parsimonious.py
Created August 30, 2018 21:58
Comparing Lark and Parsimonious on JSON parsing
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# usage: python3 lark-parsimonious.py [TESTNUM]
#
# Where TESTNUM is one of:
#
# 1. Parsimonious with the faster grammar (tree-only)
# 2. Parsimonious with the faster grammar (transform data)
# 3. Parsimonious with the slower grammar (tree-only)
@goodmami
goodmami / getargs.bash
Created August 14, 2016 04:50
Processing command-line arguments in Bash
#!/bin/bash
die() { echo "$1"; exit 1; }
usage() {
cat <<EOF
Usage: getargs [--help] [OPTION...] ARGUMENT...
Example usage of useful conventions for command-line argument parsing.
@goodmami
goodmami / repp.md
Created November 25, 2019 14:35
REPP notes

Regular Expression Preprocessing (REPP)

Specification

Modules

Operators

Every operator must appear as the first character on a line (in column 0).

@goodmami
goodmami / ElementPath-xpath_tokenizer-original.py
Last active September 2, 2021 16:09
ElementPath with default namespace support
def xpath_tokenizer(pattern, namespaces=None):
for token in xpath_tokenizer_re.findall(pattern):
tag = token[1]
if tag and tag[0] != "{" and ":" in tag:
try:
prefix, uri = tag.split(":", 1)
if not namespaces:
raise KeyError
yield token[0], "{%s}%s" % (namespaces[prefix], uri)
except KeyError:
@goodmami
goodmami / build_lm.sh
Created December 17, 2014 07:43
Ngrams, LMs, and Perplexity in AWK and sed
#!/bin/bash
ngram_count_file="$1"
lm_file="$2"
awk -v f="$ngram_count_file"\
'function log10(x){ return log(x)/log(10.0) }
BEGIN {
while (getline < f) {
if(/[0-9]+\t[^ ]+$/) { type[1]++; token[1]=token[1]+$1 }