Skip to content

Instantly share code, notes, and snippets.

View kylebgorman's full-sized avatar

Kyle Gorman kylebgorman

View GitHub Profile
@kylebgorman
kylebgorman / lnre.py
Last active June 18, 2023 05:39
LNRE calculator
#!/usr/bin/env python
"""LNRE calculator.
This script computes a number of statistics characterizing LNRE data:
* N: corpus size
* V: vocabulary size
* V(1): the number of _hapax legomena_ (symbols occuring once)
* V(2): the number of _dis legomena_ (symbols occurring twice)
* V/N: vocabulary growth rate
@kylebgorman
kylebgorman / byte.sym
Created July 10, 2019 12:43
OpenFst byte symbol table
<epsilon> 0
<SOH> 1
<STX> 2
<ETX> 3
<EOT> 4
<ENQ> 5
<ACK> 6
<BEL> 7
<BS> 8
<HT> 9
@kylebgorman
kylebgorman / casefold.py
Created July 10, 2019 12:15
Applies Unicode case folding to input data
#!/usr/bin/env python
import fileinput
import nltk
if __name__ == "__main__":
for line in fileinput.input():
print(line.rstrip().casefold())
@kylebgorman
kylebgorman / word_tokenize.py
Last active June 18, 2023 05:35
Applies NLTK PTB tokenizer to input text
#!/usr/bin/env python
import fileinput
import nltk
if __name__ == "__main__":
for line in fileinput.input():
print(" ".join(nltk.word_tokenize(line)))
@kylebgorman
kylebgorman / covfefe.py
Created June 8, 2019 19:11
Which English word is most similar to "covfefe"?
#!/usr/bin/env python
# What's the nearest word (in Levenshtein distance) to "covfefe"?
import string
# Available from: https://github.com/kylebgorman/EditTransducer
import edit_transducer
# You probably have this file if you're on Linux or Mac OS X.
with open("/usr/share/dict/words") as source:
@kylebgorman
kylebgorman / fix.sh
Created May 6, 2019 18:14
Update shared library caches
# On Linux:
sudo ldconfig
# On Mac OS X:
sudo update_dyld_shared_cache
@kylebgorman
kylebgorman / torch_cuda.py
Last active October 8, 2019 15:58
Checks that PyTorch can reach CUDA
#!/usr/bin/env python
"""Checks that PyTorch can reach CUDA."""
import sys
import torch
if __name__ == "__main__":
if not torch.cuda.is_available():
@kylebgorman
kylebgorman / log_odds.pyx
Last active February 6, 2024 19:49
Log-odds calculations
"""Log-odds computations."""
from libc.math cimport log, sqrt
from libc.stdint cimport int64_t
ctypedef int64_t int64
@kylebgorman
kylebgorman / function_words.py
Created June 22, 2018 18:57
Function words
"""English function words.
Sets of English function words, based on
E.O. Selkirk. 1984. Phonology and syntax: The relationship between
sound and structure. Cambridge: MIT Press. (p. 352f.)
The categories are of my own creation.
"""
@kylebgorman
kylebgorman / z408.py
Last active May 25, 2021 14:46
Zodiac cipher 408: freestanding Python 3 script for converting the plaintext and ciphertext to OpenFst assets
#!/usr/bin/env python
#
# Constructs resources for Zodiac cipher 408:
#
# * Plaintext and ciphertext FARs
# * Unweighted "key" FSTs and "channel" (hypothesis space) FSTs
# * A textual symbol table for plaintext and ciphertext
#
# Requires: Pynini and OpenFst with the FAR extension.