Skip to content

Instantly share code, notes, and snippets.

Brendan O'Connor brendano

Block or report user

Report or block brendano

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@brendano
brendano / .RData
Last active Jun 27, 2019
google ngram books plot
@brendano
brendano / example.ipynb
Last active Apr 30, 2019
l1 generative implementation with liblbfgs (owl-qn)
View example.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View bla.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 30 columns, instead of 9. in line 5.
http://brenocon.com/confsize,,,"Numer of paper submissions and acceptances for various conferences over time. Trying to only select full-length or ""main"" research papers, though others are sometimes included.
The first several columns are the main data. Sources on the right. Sometimes I tried to put in original source data in different columns. Sometimes data contradicts",,,,,,,,,,,,,,,,,,,,,,,,,,
area,conference,year,submit,accept,accept rates - some are messy or partial from copy-and-paste,joint,attendance,source/notes,other notes,other notes,other notes,other notes,other notes,other notes,,,,,,,,,,,,,,,
,,,,,,,,,https://dl.acm.org/citation.cfm?id=2766462&picked=source&preflayout=tabs,https://dl.acm.org/citation.cfm?id=312624,the ACM table:,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,Year,Submitted,Accepted,Rate,,,,,,,,,,,,,,,
webir,sigir,1999,135,33,,,,https://dl.acm.org/citation.cfm?id=312624,,22nd annual,SIGIR '99,135,33,24%,,,,,,,,,,,,,,,
webir,sigir,2000,??,??,,,,https://dl.acm.org/citation.cfm?id=345508&picked
@brendano
brendano / log_logistic.py.md
Last active Apr 6, 2018
numerically stable implementation of the log-logistic function
View log_logistic.py.md

Binary case

This is just the middle section of Bob Carpenter's note for evaluating log-loss via the binary logistic functoin https://lingpipe-blog.com/2012/02/16/howprevent-overflow-underflow-logistic-regression/

The logp function calculates the negative cross-entropy:

    dotproduct( [y, 1-y],  [logP(y=1), logP(y=0)] )

where the input s is the beta'x log-odds scalar value. The trick is to make this numerically stable for any choice of s and y.

@brendano
brendano / corenlp_client_example.py
Last active Jul 15, 2018
example of using corenlp server from python
View corenlp_client_example.py
"""
example of using corenlp server from python
This code requires server to already be running: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html
To start server:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
To call it, e.g.:
curl --data "The man wanted to go to work." 'http://localhost:9000/?properties={%22annotators%22%3A%22tokenize%2Cssplit%2Cpos%2Cdepparse%22%2C%22outputFormat%22%3A%22conllu%22}'
View race vs age, non-whites as stacked bar, ACS 2016 PUMS data.pdf
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View gplots.py
"""
Make plots with the "Google Image Charts" API.
The functions here return a URL. That URL will then throw a PNG back at you.
So embed them in <img src=...> or whatever.
This uses the "Google Image Charts API," formerly known as "Google Charts API",
which is officially deprecated and has been since 2012, I guess:
https://en.wikipedia.org/wiki/Google_Chart_API
https://developers.google.com/chart/image/
So look out I guess. But it is a MUCH better quick-and-simple API than their
View get_tweets_by_id.py
r"""
stdin: IDs of tweets to get (whitespace or line separated)
stdout: the tweets as two-column TSV: ID \t TweetJSON
This retrieves tweets using the API.
If there was an error when retrieving a message - most prominently, if the
message is now deleted -- the error information is saved as JSON. Therefore
there should be exactly as many output lines as there are input IDs.
View emoji.py
# -*- encoding: utf-8 -*-
# actually that encoding line is NOT important codewise. only for doc purposes.
"""
Detect emoji or other emoji-like things in Python.
The regular expressions here can be used to either identify emoji or to remove it.
The comments are written from the perspective of removing it.
The regexes get some stuff besides emoji.
by Brendan O'Connor (http://brenocon.com) 2016-10-20
originally written as part of https://arxiv.org/abs/1608.08868
View lda.pyx
# by brendan o'connor (http://brenocon.com) written in early 2012
# parallelized collapsed gibbs sampling for LDA with threads in cython
# need to delete these lines to get the cython instructions to work...
#cython: boundscheck=False, cdivision=True
# vim:sts=4:sw=4
import numpy as np
cimport numpy as np
cimport cython
cimport openmp
You can’t perform that action at this time.