Skip to content

Instantly share code, notes, and snippets.

nsaphra /
Last active August 29, 2015 14:00
Generate GIZA++ input files from segmented parallel text files, with option to add onto previous input files.
import argparse
from collections import defaultdict
parser = argparse.ArgumentParser(description='Generate GIZA++ input files from '
'segmented parallel text files.')
parser.add_argument('-s', '--src_in', help='Source input file')
parser.add_argument('-t', '--tgt_in', help='Target input file')
parser.add_argument('-p', '--prev_out', default=None, help='Previous output files prefix')
parser.add_argument('-o', '--out', help='Prefix for output files')
nsaphra / Find.jl
Created October 14, 2014 21:37
Filesystem find one-liner
find(path::String, exec, filterfcn) = [name => exec(name) for name in filter(filterfcn, readdir(path))]
nsaphra / SparsePy.jl
Created December 16, 2014 23:01
module SparsePy
# TODO this is only for CSC sparse matrix PyObjects and julia matrices.
# Add other types when julia releases them?
using PyCall
export jlmat2pymat, pymat2jlmat
@pyimport scipy.sparse as pysparse
nsaphra /
Created February 17, 2015 17:29
Concatenate all the files in a directory, recursively, and print their contents.
from collections import defaultdict
import json
import os
import argparse
import gzip
import sys
import codecs
from time import asctime
nsaphra / LispParser.jl
Last active March 2, 2016 14:49
Simple lisp parser for RC pair programming interview.
type SyntaxNode
# TODO No error handling when going up a level with undefined parent.
SyntaxNode() = (
x = new();
x.label = "";
x.children = [];
nsaphra /
Last active November 24, 2017 16:24
Activate a conda jupyter notebook in tmux, for use on a server with timeouts after each notebook start.
if [ "$TERM" != "screen" ]
if type tmux >/dev/null 2>&1
tmux att || tmux \
new -s tensorflow -n shell \; \
neww -n notebook "source activate tensorflow; cd Documents/dynamic_curriculum; jupyter notebook" \; \
neww -n dir "cd Documents/dynamic_curriculum"
nsaphra /
Created March 6, 2017 19:14
recurse center interview code
class NoughtsAndCrosses:
EMPTY = " "
STALEMATE = "Nobody"
def __init__(self):
self.board = [[self.EMPTY] * 3, [self.EMPTY] * 3, [self.EMPTY] * 3]
nsaphra /
Created April 19, 2017 15:50
discrete log uniform power distribution
def zipf(size, exponent):
x = np.arange(size, dtype='float')
pmf = (x ** exponent).reciprocal()
pmf /= pmf.sum()
return stats.rv_discrete(values=range(size), pmf)

Keybase proof

I hereby claim:

  • I am nsaphra on github.
  • I am nsaphra ( on keybase.
  • I have a public key ASCpyzsqtJYqR6IjSCnoPwSjrInpOg35MPypGR9l_pvTcQo

To claim this, I am signing this object:

nsaphra /
Created July 9, 2018 14:59
If you have a corpus in a format where 1 file contains tokens and a different file has the corresponding POS tags, take the 2 files and shuffle them simultaneously so the tokens are still aligned with the correct tags.
# -*- coding: utf-8 -*-
import os
from random import shuffle
import argparse
parser = argparse.ArgumentParser(description='shuffle a corpus such that the tags and the original tokenized text still align')
parser.add_argument('--unshuffled_dir', type=str)
parser.add_argument('--shuffled_dir', type=str)
parser.add_argument('--tag_suffix', type=str, default='.tag')
args = parser.parse_args()