Skip to content

Instantly share code, notes, and snippets.

View kylebgorman's full-sized avatar

Kyle Gorman kylebgorman

View GitHub Profile
@kylebgorman
kylebgorman / Zr.R
Created July 10, 2011 17:38
The Z_r averaging transform in R; very useful for studying the statistical properties of sparse data
# Z_r (or "averaging") transform functions, based on:
#
# Kenneth W. Church and William A. Gale. 1991. A comparison of the enhanced
# Good-Turing and deleted estimation methods for estimating probabilities of
# English bigrams. Computer Speech and Language 5(1):19--54
#
# Kyle Gorman <kgorman@ling.upenn.edu>
#
# Church and Gale do not say what is to be done about points at the edges. I
# have chosen to average them with respect to only the inward facing frequency,
@kylebgorman
kylebgorman / difflib_demo.py
Created July 10, 2011 17:45
Demonstration of using the difflib built-in class in Python to compute approximately Levenshtein-optimal alignments, with examples from my past-tense learning experiments
#!/usr/bin/env python
# difflib_demo.py
# Kyle Gorman <kgorman@ling.upenn.edu>
from difflib import SequenceMatcher
if __name__ == '__main__':
from sys import argv
for file in argv[1:]:
@kylebgorman
kylebgorman / wagnerfischer.py
Created July 14, 2011 04:41
Python implementation of the Wagner & Fischer dynamic programming approach to computing Levenshtein distance, with support for thresholding, arbitrary weights, and traceback to get individual insertion/deletion/substitution counts.
#!/usr/bin/env python
# wagnerfischer.py: Dynamic programming Levensthein distance function
# Kyle Gorman <gormanky@ohsu.edu>
#
# Based on:
#
# Robert A. Wagner and Michael J. Fischer (1974). The string-to-string
# correction problem. Journal of the ACM 21(1):168-173.
#
# The thresholding function was inspired by BSD-licensed code from
@kylebgorman
kylebgorman / probdist.py
Created July 26, 2011 02:24
Classes for building and sampling from probability distributions; Constantine Lignos tells me this is a variation on the "Shannon-Miller-Selfridge" algorithm which does the summing once and uses bisection each time (as opposed to summing every sample). I
#!/usr/bin/env python
# ProbDist.py: Two classes for probability distributions and sampling.
# Kyle Gorman <kgorman@ling.upenn.edu>
from math import fsum
from bisect import bisect
from random import random
from collections import defaultdict
class MLProbDist(object):
@kylebgorman
kylebgorman / KMP.py
Created September 20, 2011 00:30
The Knuth-Morris-Pratt algorithm in Python
#!/usr/bin/env python
# Knuth-Morris-Pratt demonstration
# Kyle Gorman <kgorman@ling.upenn.edu>
#
# A naive Python implementation of a function that returns the (first) index of
# a sequence in a supersequence is the following:
def subsequence(needle, haystack):
"""
Naive subsequence indexer; None if not found
@kylebgorman
kylebgorman / A.TextGrid
Created November 13, 2011 19:13
Methods for scoring forced alignments...currently in development
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 3
tiers? <exists>
size = 1
item []:
item [1]:
class = "IntervalTier"
@kylebgorman
kylebgorman / tolerance.c
Last active November 27, 2023 19:22
Just for fun: a C-based calculator for Yang's (2005; "On Productivity", Language Variation Yearbook) Tolerance function
/*
* Tolerance Principle calculator, based on:
*
* C. Yang (2005). On productivity. Language Variation Yearbook 5:333-370.
*
* Definition:
*
* The number of data points consistent with a rule R is given by N, and the
* number of exceptions to it by m. By Tolerance, R is productive iff:
*
@kylebgorman
kylebgorman / point_bisect.py
Created December 6, 2011 01:48
I constantly use these Python patterns for searching sorted iterables of continuous points
#!/usr/bin/env python
# point_bisect.py
# Kyle Gorman
#
# I continually use these two patterns in Python for iterables that contain
# continuous values, sorted. Here they are in their full glory.
from bisect import bisect_left
@kylebgorman
kylebgorman / stirling.c
Created February 18, 2012 02:02
stirling.c: first order Stirling factorial approximation calculator
/* Copyright (c) 2012 Kyle Gorman
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
@kylebgorman
kylebgorman / gk.c
Created February 18, 2012 02:05
gk.c: Goodman-Kruskal gamma calculator in C
/* Copyright (c) 2012 Kyle Gorman
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in