Skip to content

Instantly share code, notes, and snippets.

View kylebgorman's full-sized avatar

Kyle Gorman kylebgorman

View GitHub Profile
@kylebgorman
kylebgorman / log_odds.pyx
Last active February 6, 2024 19:49
Log-odds calculations
"""Log-odds computations."""
from libc.math cimport log, sqrt
from libc.stdint cimport int64_t
ctypedef int64_t int64
@kylebgorman
kylebgorman / wagnerfischer.py
Created July 14, 2011 04:41
Python implementation of the Wagner & Fischer dynamic programming approach to computing Levenshtein distance, with support for thresholding, arbitrary weights, and traceback to get individual insertion/deletion/substitution counts.
#!/usr/bin/env python
# wagnerfischer.py: Dynamic programming Levensthein distance function
# Kyle Gorman <gormanky@ohsu.edu>
#
# Based on:
#
# Robert A. Wagner and Michael J. Fischer (1974). The string-to-string
# correction problem. Journal of the ACM 21(1):168-173.
#
# The thresholding function was inspired by BSD-licensed code from
@kylebgorman
kylebgorman / tolerance.c
Last active November 27, 2023 19:22
Just for fun: a C-based calculator for Yang's (2005; "On Productivity", Language Variation Yearbook) Tolerance function
/*
* Tolerance Principle calculator, based on:
*
* C. Yang (2005). On productivity. Language Variation Yearbook 5:333-370.
*
* Definition:
*
* The number of data points consistent with a rule R is given by N, and the
* number of exceptions to it by m. By Tolerance, R is productive iff:
*
@kylebgorman
kylebgorman / .vimrc
Last active October 23, 2023 21:57
my .vimrc (pretty basic)
syn on
set hlsearch
set ruler
" tab stuff
set expandtab
set tabstop=4
" scrolling
set scrolloff=5
" backspace over everythign
set backspace=indent,eol,start
@kylebgorman
kylebgorman / function_words.py
Created June 22, 2018 18:57
Function words
"""English function words.
Sets of English function words, based on
E.O. Selkirk. 1984. Phonology and syntax: The relationship between
sound and structure. Cambridge: MIT Press. (p. 352f.)
The categories are of my own creation.
"""
@kylebgorman
kylebgorman / LING78100-lecture02.ipynb
Created September 18, 2019 14:49
LING78100 Lecture 2
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kylebgorman
kylebgorman / lnre.py
Last active June 18, 2023 05:39
LNRE calculator
#!/usr/bin/env python
"""LNRE calculator.
This script computes a number of statistics characterizing LNRE data:
* N: corpus size
* V: vocabulary size
* V(1): the number of _hapax legomena_ (symbols occuring once)
* V(2): the number of _dis legomena_ (symbols occurring twice)
* V/N: vocabulary growth rate
@kylebgorman
kylebgorman / byte.sym
Created July 10, 2019 12:43
OpenFst byte symbol table
<epsilon> 0
<SOH> 1
<STX> 2
<ETX> 3
<EOT> 4
<ENQ> 5
<ACK> 6
<BEL> 7
<BS> 8
<HT> 9
@kylebgorman
kylebgorman / word_tokenize.py
Last active June 18, 2023 05:35
Applies NLTK PTB tokenizer to input text
#!/usr/bin/env python
import fileinput
import nltk
if __name__ == "__main__":
for line in fileinput.input():
print(" ".join(nltk.word_tokenize(line)))
@kylebgorman
kylebgorman / wagnerfischerpp.py
Last active March 3, 2023 17:27
Wagner-Fischer Levenshtein distance, now with a means to generate all possible optimal alignments.
# Copyright (c) 2013-2022 Kyle Gorman
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#