Brendan O'Connor brendano

## gist:8173601
julinclude("gotree.jl")
Array{CountTrie,1}
accept rate = 850654/850654 = 1.000
elapsed time: 33.986743839 seconds (5921922372 bytes allocated)
.ITER 1
accept rate = 809118/850654 = 0.951
elapsed time: 36.449559574 seconds (6175698392 bytes allocated)
.ITER 2
accept rate = 796254/850654 = 0.936
elapsed time: 30.21721326 seconds (6166426280 bytes allocated)

## gist:7365513
# http://mikelove.wordpress.com/2013/11/07/empirical-bayes/

# Stein's estimation rule and its competitors - an empirical Bayes approach
# B Efron, C Morris, Journal of the American Statistical, 1973
n <- 1000
sigma.means <- 5
means <- rnorm(n, 0, sigma.means)
# sigma.y <- 5
library(manipulate)
manipulate({

## IndexAnnotationsMadness.java
package nlp;

import java.io.IOException;
import java.io.StringReader;

import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.trees.LabeledScoredTreeFactory;
import edu.stanford.nlp.trees.PennTreeReader;

## gist:7057169

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                brendano
                / gist:7057169
            
            
              Last active
              December 25, 2015 23:29
            
          
    In Thomas Bass's The Predictors,
In one scene they are talking to a potential investor who kept wanting to talk about their earlier complexity theory work,
Marrin wanted chaos and fractals, and we were offering engineering and statistics.
I remember reading that and thinking, wait, but isn't that why I'm reading this book, and why the book is supposed to be interesting? I stopped reading the book at some point after that.

  
## gist:6516637
This is a edu.stanford.nlp.trees.Tree

		tree.setSpans(); // these are 0-indexed, inclusive-inclusive
		tree.indexSpans(); // yup, this saves stuff to a different place. apparently 0-indexed inclusive-exclusive
		tree.indexLeaves(); // these are 1-indexed (!!) stanfordnlp coref code heavily uses them

## gist:6070886
===
[Update July 25... and after https://gist.github.com/leondz/6082658 ]

OK never mind the questions about cross-validation versus a smaller eval split and all that.

We evaluated our tagger (current release, version 0.3.2),
trained and evaluated on the same splits as the GATE tagger
(from http://gate.ac.uk/wiki/twitter-postagger.html and specifically twitie-tagger.zip)
and it gets 90.4% accuracy (significantly different than the GATE results).

## morpha.py
"""
Wrapper around morpha from
http://www.informatics.sussex.ac.uk/research/groups/nlp/carroll/morph.html

Vaguely follows edu.stanford.nlp.Morphology except we implement with a pipe.
hacky.  Would be nice to use cython/swig/ctypes to directly embed morpha.yy.c
as a python extension.

TODO compare linguistic quality to lemmatizer in python's "pattern" package

## gist:5114194
mcmc convergence diagnostics
https://github.com/brendano/conplot

~/myutil % grep totalLL log|awk '{print $2}' | conplot
-2.87e+06                                 o
                                      oooo
                                     o    oooooooooooooooooooooooooooooooooooooo

                                  oooo
-2.93e+06                         ooooo

## gist:5016854
---------- Forwarded message ----------
From: Daniel Bauer <@cs.columbia.edu>
Date: Thu, Feb 21, 2013 at 6:19 PM
Subject: Mon 2/25 - Brendan O'Connor
To: nlp-announce
Cc: Brendan O'Connor <brenocon@cmu.edu>


Dear all,

## gist:4610598
--- CountryInfo.AllNations.txt	2013-01-23 12:27:37.000000000 -0500
+++ CountryInfo.CleanedNations.txt	2013-01-23 12:27:37.000000000 -0500
@@ -4458,10 +4458,6 @@
 [USAVIR] VIRGIN_ISLANDS_OF_THE_U.S._
 [USAVIR] VIRGIN_ISLANDS_OF_THE_UNITED_STATES_
 [USAVIR] VIRGIN_ISLANDS_OF_THE_US_
-[USA] ALABAMA_
-[USA] ALASKA_
-[USA] ALOHA_STATE_
-[USA] AMERICALAND_
	julinclude("gotree.jl")
	Array{CountTrie,1}
	accept rate = 850654/850654 = 1.000
	elapsed time: 33.986743839 seconds (5921922372 bytes allocated)
	.ITER 1
	accept rate = 809118/850654 = 0.951
	elapsed time: 36.449559574 seconds (6175698392 bytes allocated)
	.ITER 2
	accept rate = 796254/850654 = 0.936
	elapsed time: 30.21721326 seconds (6166426280 bytes allocated)
	# http://mikelove.wordpress.com/2013/11/07/empirical-bayes/

	# Stein's estimation rule and its competitors - an empirical Bayes approach
	# B Efron, C Morris, Journal of the American Statistical, 1973
	n <- 1000
	sigma.means <- 5
	means <- rnorm(n, 0, sigma.means)
	# sigma.y <- 5
	library(manipulate)
	manipulate({
	package nlp;

	import java.io.IOException;
	import java.io.StringReader;

	import edu.stanford.nlp.io.IOUtils;
	import edu.stanford.nlp.ling.CoreAnnotations;
	import edu.stanford.nlp.ling.CoreLabel;
	import edu.stanford.nlp.trees.LabeledScoredTreeFactory;
	import edu.stanford.nlp.trees.PennTreeReader;
	This is a edu.stanford.nlp.trees.Tree

	tree.setSpans(); // these are 0-indexed, inclusive-inclusive
	tree.indexSpans(); // yup, this saves stuff to a different place. apparently 0-indexed inclusive-exclusive
	tree.indexLeaves(); // these are 1-indexed (!!) stanfordnlp coref code heavily uses them
	===
	[Update July 25... and after https://gist.github.com/leondz/6082658 ]

	OK never mind the questions about cross-validation versus a smaller eval split and all that.

	We evaluated our tagger (current release, version 0.3.2),
	trained and evaluated on the same splits as the GATE tagger
	(from http://gate.ac.uk/wiki/twitter-postagger.html and specifically twitie-tagger.zip)
	and it gets 90.4% accuracy (significantly different than the GATE results).
	"""
	Wrapper around morpha from
	http://www.informatics.sussex.ac.uk/research/groups/nlp/carroll/morph.html

	Vaguely follows edu.stanford.nlp.Morphology except we implement with a pipe.
	hacky. Would be nice to use cython/swig/ctypes to directly embed morpha.yy.c
	as a python extension.

	TODO compare linguistic quality to lemmatizer in python's "pattern" package
	mcmc convergence diagnostics
	https://github.com/brendano/conplot

	~/myutil % grep totalLL log\|awk '{print $2}' \| conplot
	-2.87e+06 o
	oooo
	o oooooooooooooooooooooooooooooooooooooo

	oooo
	-2.93e+06 ooooo
	---------- Forwarded message ----------
	From: Daniel Bauer <@cs.columbia.edu>
	Date: Thu, Feb 21, 2013 at 6:19 PM
	Subject: Mon 2/25 - Brendan O'Connor
	To: nlp-announce
	Cc: Brendan O'Connor <brenocon@cmu.edu>


	Dear all,
	--- CountryInfo.AllNations.txt 2013-01-23 12:27:37.000000000 -0500
	+++ CountryInfo.CleanedNations.txt 2013-01-23 12:27:37.000000000 -0500
	@@ -4458,10 +4458,6 @@
	[USAVIR] VIRGIN_ISLANDS_OF_THE_U.S._
	[USAVIR] VIRGIN_ISLANDS_OF_THE_UNITED_STATES_
	[USAVIR] VIRGIN_ISLANDS_OF_THE_US_
	-[USA] ALABAMA_
	-[USA] ALASKA_
	-[USA] ALOHA_STATE_
	-[USA] AMERICALAND_