Skip to content

Instantly share code, notes, and snippets.

View brendano's full-sized avatar

Brendan O'Connor brendano

View GitHub Profile
@brendano
brendano / .RData
Last active June 27, 2019 17:00
google ngram books plot
@brendano
brendano / example.ipynb
Last active April 30, 2019 01:56
l1 generative implementation with liblbfgs (owl-qn)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@brendano
brendano / corenlp_client_example.py
Last active July 15, 2018 05:38
example of using corenlp server from python
"""
example of using corenlp server from python
This code requires server to already be running: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html
To start server:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
To call it, e.g.:
curl --data "The man wanted to go to work." 'http://localhost:9000/?properties={%22annotators%22%3A%22tokenize%2Cssplit%2Cpos%2Cdepparse%22%2C%22outputFormat%22%3A%22conllu%22}'
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 30 columns, instead of 9. in line 5.
http://brenocon.com/confsize,,,"Numer of paper submissions and acceptances for various conferences over time. Trying to only select full-length or ""main"" research papers, though others are sometimes included.
The first several columns are the main data. Sources on the right. Sometimes I tried to put in original source data in different columns. Sometimes data contradicts",,,,,,,,,,,,,,,,,,,,,,,,,,
area,conference,year,submit,accept,accept rates - some are messy or partial from copy-and-paste,joint,attendance,source/notes,other notes,other notes,other notes,other notes,other notes,other notes,,,,,,,,,,,,,,,
,,,,,,,,,https://dl.acm.org/citation.cfm?id=2766462&picked=source&preflayout=tabs,https://dl.acm.org/citation.cfm?id=312624,the ACM table:,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,Year,Submitted,Accepted,Rate,,,,,,,,,,,,,,,
webir,sigir,1999,135,33,,,,https://dl.acm.org/citation.cfm?id=312624,,22nd annual,SIGIR '99,135,33,24%,,,,,,,,,,,,,,,
webir,sigir,2000,??,??,,,,https://dl.acm.org/citation.cfm?id=345508&picked
@brendano
brendano / FastRandom.java
Last active February 10, 2018 15:41
An RNG class that's faster and hopefully better than java.util.Random.
package util;
import java.io.Serializable;
import java.lang.management.ManagementFactory;
import java.nio.ByteBuffer;
import java.util.*;
/**
* An RNG class that's faster and hopefully better than java.util.Random.
*
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@brendano
brendano / e8.reg
Created February 8, 2011 20:56
syntactic rules from Abney's CASS finite-state parser
phrase h; ## immediate head
phrase s; ## subject
phrase o; ## object
phrase f; ## function word
phrase k; ## `kind' -- potential partitive complement
### Level 0: tags;
### Level 1: date cdqlx cdx doll ci-st mx;
@brendano
brendano / gplots.py
Last active November 14, 2017 22:12
gplots.py
"""
Make plots with the "Google Image Charts" API.
The functions here return a URL. That URL will then throw a PNG back at you.
So embed them in <img src=...> or whatever.
This uses the "Google Image Charts API," formerly known as "Google Charts API",
which is officially deprecated and has been since 2012, I guess:
https://en.wikipedia.org/wiki/Google_Chart_API
https://developers.google.com/chart/image/
So look out I guess. But it is a MUCH better quick-and-simple API than their
@brendano
brendano / gist:963c826e7109a5e50d54
Created July 3, 2014 16:50
papers that do NLP-like stuff with source code
NLP and source code papers, very scattered and partial listing
(collected by Nathan Schneider and Brendan O'Connor)
ICML 2014
Maddison and Tarlow
Structured Generative Models of Natural Source Code
http://jmlr.org/proceedings/papers/v32/maddison14.pdf
ACL 2013
r"""
stdin: IDs of tweets to get (whitespace or line separated)
stdout: the tweets as two-column TSV: ID \t TweetJSON
This retrieves tweets using the API.
If there was an error when retrieving a message - most prominently, if the
message is now deleted -- the error information is saved as JSON. Therefore
there should be exactly as many output lines as there are input IDs.