Skip to content

Instantly share code, notes, and snippets.

View kylebgorman's full-sized avatar

Kyle Gorman kylebgorman

View GitHub Profile
@kylebgorman
kylebgorman / Zr.R
Created July 10, 2011 17:38
The Z_r averaging transform in R; very useful for studying the statistical properties of sparse data
# Z_r (or "averaging") transform functions, based on:
#
# Kenneth W. Church and William A. Gale. 1991. A comparison of the enhanced
# Good-Turing and deleted estimation methods for estimating probabilities of
# English bigrams. Computer Speech and Language 5(1):19--54
#
# Kyle Gorman <kgorman@ling.upenn.edu>
#
# Church and Gale do not say what is to be done about points at the edges. I
# have chosen to average them with respect to only the inward facing frequency,
@kylebgorman
kylebgorman / difflib_demo.py
Created July 10, 2011 17:45
Demonstration of using the difflib built-in class in Python to compute approximately Levenshtein-optimal alignments, with examples from my past-tense learning experiments
#!/usr/bin/env python
# difflib_demo.py
# Kyle Gorman <kgorman@ling.upenn.edu>
from difflib import SequenceMatcher
if __name__ == '__main__':
from sys import argv
for file in argv[1:]:
@kylebgorman
kylebgorman / probdist.py
Created July 26, 2011 02:24
Classes for building and sampling from probability distributions; Constantine Lignos tells me this is a variation on the "Shannon-Miller-Selfridge" algorithm which does the summing once and uses bisection each time (as opposed to summing every sample). I
#!/usr/bin/env python
# ProbDist.py: Two classes for probability distributions and sampling.
# Kyle Gorman <kgorman@ling.upenn.edu>
from math import fsum
from bisect import bisect
from random import random
from collections import defaultdict
class MLProbDist(object):
@kylebgorman
kylebgorman / A.TextGrid
Created November 13, 2011 19:13
Methods for scoring forced alignments...currently in development
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 3
tiers? <exists>
size = 1
item []:
item [1]:
class = "IntervalTier"
@kylebgorman
kylebgorman / point_bisect.py
Created December 6, 2011 01:48
I constantly use these Python patterns for searching sorted iterables of continuous points
#!/usr/bin/env python
# point_bisect.py
# Kyle Gorman
#
# I continually use these two patterns in Python for iterables that contain
# continuous values, sorted. Here they are in their full glory.
from bisect import bisect_left
@kylebgorman
kylebgorman / gk.c
Created February 18, 2012 02:05
gk.c: Goodman-Kruskal gamma calculator in C
/* Copyright (c) 2012 Kyle Gorman
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
@kylebgorman
kylebgorman / TIMIT+.py
Created February 18, 2012 02:08
TIMIT+: Make TIMIT bearable (see top for instructions)
#!/usr/bin/env python
#
# TIMIT+.py: make TIMIT bearable to use
# Kyle Gorman <kgorman@ling.upenn.edu
#
# To use this:
# 1. place in the same directory as a copy of TIMIT
# 2. install SoX and textgrid.py
# 3. run ./TIMIT+.py
#
@kylebgorman
kylebgorman / stirling.c
Created February 18, 2012 02:02
stirling.c: first order Stirling factorial approximation calculator
/* Copyright (c) 2012 Kyle Gorman
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
@kylebgorman
kylebgorman / Getopt.java
Created July 12, 2012 17:46
BSD-licensed Getopt for Java
/**
* Copyright (C) 2012 Kyle Gorman
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
@kylebgorman
kylebgorman / lda-match.R
Created September 8, 2013 00:36
lda-match.R: perform group matching via backward selection using a heuristic based on Fisher's linear discriminant
#!/usr/bin/env Rscript
# lda-match.R: Perform group matching via backward selection using a heuristic based on Fisher's
# linear discriminant
# Kyle Gorman <gormanky@ohsu.edu>
require(MASS)
lda.match <- function(x, grouping, term.fnc=univariate.all) {
# Create a matched group via backward selection using a heuristic
# based on Fisher's linear discriminant. Observations are removed