Skip to content

Instantly share code, notes, and snippets.

View kylebgorman's full-sized avatar

Kyle Gorman kylebgorman

View GitHub Profile
@kylebgorman
kylebgorman / log_odds.pyx
Last active February 6, 2024 19:49
Log-odds calculations
"""Log-odds computations."""
from libc.math cimport log, sqrt
from libc.stdint cimport int64_t
ctypedef int64_t int64
@kylebgorman
kylebgorman / tolerance.c
Last active November 27, 2023 19:22
Just for fun: a C-based calculator for Yang's (2005; "On Productivity", Language Variation Yearbook) Tolerance function
/*
* Tolerance Principle calculator, based on:
*
* C. Yang (2005). On productivity. Language Variation Yearbook 5:333-370.
*
* Definition:
*
* The number of data points consistent with a rule R is given by N, and the
* number of exceptions to it by m. By Tolerance, R is productive iff:
*
@kylebgorman
kylebgorman / .vimrc
Last active October 23, 2023 21:57
my .vimrc (pretty basic)
syn on
set hlsearch
set ruler
" tab stuff
set expandtab
set tabstop=4
" scrolling
set scrolloff=5
" backspace over everythign
set backspace=indent,eol,start
@kylebgorman
kylebgorman / function_words.py
Created June 22, 2018 18:57
Function words
"""English function words.
Sets of English function words, based on
E.O. Selkirk. 1984. Phonology and syntax: The relationship between
sound and structure. Cambridge: MIT Press. (p. 352f.)
The categories are of my own creation.
"""
@kylebgorman
kylebgorman / lnre.py
Last active June 18, 2023 05:39
LNRE calculator
#!/usr/bin/env python
"""LNRE calculator.
This script computes a number of statistics characterizing LNRE data:
* N: corpus size
* V: vocabulary size
* V(1): the number of _hapax legomena_ (symbols occuring once)
* V(2): the number of _dis legomena_ (symbols occurring twice)
* V/N: vocabulary growth rate
@kylebgorman
kylebgorman / byte.sym
Created July 10, 2019 12:43
OpenFst byte symbol table
<epsilon> 0
<SOH> 1
<STX> 2
<ETX> 3
<EOT> 4
<ENQ> 5
<ACK> 6
<BEL> 7
<BS> 8
<HT> 9
@kylebgorman
kylebgorman / word_tokenize.py
Last active June 18, 2023 05:35
Applies NLTK PTB tokenizer to input text
#!/usr/bin/env python
import fileinput
import nltk
if __name__ == "__main__":
for line in fileinput.input():
print(" ".join(nltk.word_tokenize(line)))
@kylebgorman
kylebgorman / wagnerfischerpp.py
Last active March 3, 2023 17:27
Wagner-Fischer Levenshtein distance, now with a means to generate all possible optimal alignments.
# Copyright (c) 2013-2022 Kyle Gorman
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#
@kylebgorman
kylebgorman / WALS131A.R
Last active January 22, 2023 16:27
Tests the hypothesis that vigesimal (base-20) number systems are more common at tropical latitudes
#!/usr/bin/env Rscript
# WALS131A.R
# Kyle Gorman <kylebgorman@gmail.com>
#
# Tests the hypothesis that vigesimal (base-20) number systems are more common
# at tropical latitudes. Thanks to Richard Sproat for suggesting this
# hypothesis.
#
# The data is read directly from WALS (#131A):
#
@kylebgorman
kylebgorman / autoloess.R
Last active November 28, 2022 16:06
autoloess.R: set the "span" (smoothing) hyperparameter for a LOESS curve so as to minimize AIC_c (includes a cute demonstration)
# autoloess.R: compute loess metaparameters automatically
# Kyle Gorman <gormanky@ohsu.edu>
aicc.loess <- function(fit) {
# compute AIC_C for a LOESS fit, from:
#
# Hurvich, C.M., Simonoff, J.S., and Tsai, C. L. 1998. Smoothing
# parameter selection in nonparametric regression using an improved
# Akaike Information Criterion. Journal of the Royal Statistical
# Society B 60: 271–293.