Skip to content

Instantly share code, notes, and snippets.

View brendano's full-sized avatar

Brendan O'Connor brendano

View GitHub Profile
julinclude("gotree.jl")
Array{CountTrie,1}
accept rate = 850654/850654 = 1.000
elapsed time: 33.986743839 seconds (5921922372 bytes allocated)
.ITER 1
accept rate = 809118/850654 = 0.951
elapsed time: 36.449559574 seconds (6175698392 bytes allocated)
.ITER 2
accept rate = 796254/850654 = 0.936
elapsed time: 30.21721326 seconds (6166426280 bytes allocated)
# http://mikelove.wordpress.com/2013/11/07/empirical-bayes/
# Stein's estimation rule and its competitors - an empirical Bayes approach
# B Efron, C Morris, Journal of the American Statistical, 1973
n <- 1000
sigma.means <- 5
means <- rnorm(n, 0, sigma.means)
# sigma.y <- 5
library(manipulate)
manipulate({
package nlp;
import java.io.IOException;
import java.io.StringReader;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.trees.LabeledScoredTreeFactory;
import edu.stanford.nlp.trees.PennTreeReader;

In Thomas Bass's The Predictors,

In one scene they are talking to a potential investor who kept wanting to talk about their earlier complexity theory work,

Marrin wanted chaos and fractals, and we were offering engineering and statistics.

I remember reading that and thinking, wait, but isn't that why I'm reading this book, and why the book is supposed to be interesting? I stopped reading the book at some point after that.

This is a edu.stanford.nlp.trees.Tree
tree.setSpans(); // these are 0-indexed, inclusive-inclusive
tree.indexSpans(); // yup, this saves stuff to a different place. apparently 0-indexed inclusive-exclusive
tree.indexLeaves(); // these are 1-indexed (!!) stanfordnlp coref code heavily uses them
===
[Update July 25... and after https://gist.github.com/leondz/6082658 ]
OK never mind the questions about cross-validation versus a smaller eval split and all that.
We evaluated our tagger (current release, version 0.3.2),
trained and evaluated on the same splits as the GATE tagger
(from http://gate.ac.uk/wiki/twitter-postagger.html and specifically twitie-tagger.zip)
and it gets 90.4% accuracy (significantly different than the GATE results).
@brendano
brendano / morpha.py
Last active April 16, 2021 19:18
Python wrapper for morpha (English lemmatizer)
"""
Wrapper around morpha from
http://www.informatics.sussex.ac.uk/research/groups/nlp/carroll/morph.html
Vaguely follows edu.stanford.nlp.Morphology except we implement with a pipe.
hacky. Would be nice to use cython/swig/ctypes to directly embed morpha.yy.c
as a python extension.
TODO compare linguistic quality to lemmatizer in python's "pattern" package
mcmc convergence diagnostics
https://github.com/brendano/conplot
~/myutil % grep totalLL log|awk '{print $2}' | conplot
-2.87e+06 o
oooo
o oooooooooooooooooooooooooooooooooooooo
oooo
-2.93e+06 ooooo
---------- Forwarded message ----------
From: Daniel Bauer <@cs.columbia.edu>
Date: Thu, Feb 21, 2013 at 6:19 PM
Subject: Mon 2/25 - Brendan O'Connor
To: nlp-announce
Cc: Brendan O'Connor <brenocon@cmu.edu>
Dear all,
--- CountryInfo.AllNations.txt 2013-01-23 12:27:37.000000000 -0500
+++ CountryInfo.CleanedNations.txt 2013-01-23 12:27:37.000000000 -0500
@@ -4458,10 +4458,6 @@
[USAVIR] VIRGIN_ISLANDS_OF_THE_U.S._
[USAVIR] VIRGIN_ISLANDS_OF_THE_UNITED_STATES_
[USAVIR] VIRGIN_ISLANDS_OF_THE_US_
-[USA] ALABAMA_
-[USA] ALASKA_
-[USA] ALOHA_STATE_
-[USA] AMERICALAND_