Skip to content

Instantly share code, notes, and snippets.

View lintool's full-sized avatar

Jimmy Lin lintool

View GitHub Profile
@lintool
lintool / emnlp2019_all_authors.txt
Created September 3, 2019 19:15
Tally of *all authors* from all (long, short, demo) accepted papers at EMNLP 2019 https://www.emnlp-ijcnlp2019.org/program/accepted/
Ting Liu, 10
Lidong Bing, 8
Zhiyuan Liu, 8
Shuming Shi, 8
Luke Zettlemoyer, 8
Dongyan Zhao, 8
Kai-Wei Chang, 7
Iryna Gurevych, 7
Jimmy Lin, 7
Xiang Ren, 7
@lintool
lintool / emnlp2019_first_authors.txt
Created September 3, 2019 19:13
Tally of *first authors* from all (long, short, demo) accepted papers at EMNLP 2019 https://www.emnlp-ijcnlp2019.org/program/accepted/
Dongyeop Kang, 3
Eric Wallace, 3
Chuhan Wu, 3
Jingjing Xu, 3
Deng Cai, 2
Zhangming Chan, 2
Mingda Chen, 2
Yiming Cui, 2
Jesse Dodge, 2
Zi-Yi Dou, 2
@lintool
lintool / sigir2019-all-authors.txt
Created April 22, 2019 18:29
Tally of (all authors, first authors only) of all (long, short, DC) accepted papers at SIGIR 2019 http://sigir.org/sigir2019/program/accepted/
Yiqun Liu, 6
Shaoping Ma, 6
Maarten de Rijke, 6
Min Zhang, 6
Yongfeng Zhang, 6
Liqiang Nie, 5
Jiaxin Mao, 4
Ji-Rong Wen, 4
Qi Zhang, 4
Michael Bendersky, 3
@lintool
lintool / gist:6633667ed8501dd00136bf9684ab4546
Created April 2, 2019 16:02
Tally of *first authors* from all (long, short, industry, etc.) accepted papers at NAACL 2019 https://naacl2019.org/program/accepted/
Alan Akbik, 2
Hadi Amiri, 2
Manuel Ciosici, 2
Marco Damonte, 2
Nelson F. Liu, 2
Muhammad Tasnim Mohiuddin, 2
Mohammad Taher Pilehvar, 2
Alexey Romanov, 2
Cory Shain, 2
Ehsan Shareghi, 2
@lintool
lintool / gist:ef00c7e5154a524694f46c5f9e32122b
Created April 2, 2019 15:59
Tally of authors of all (long, short, industry, etc.) accepted papers at NAACL 2019 https://naacl2019.org/program/accepted/
Graham Neubig, 7
Ryan Cotterell, 6
William Yang Wang, 6
Jonathan Berant, 5
Pushpak Bhattacharyya, 4
Jianfeng Gao, 4
Yoav Goldberg, 4
Iryna Gurevych, 4
Eduard Hovy, 4
Heng Ji, 4
@lintool
lintool / GraphJet-vs-Cassovary.md
Created October 28, 2016 19:53
GraphJet vs. Cassovary

GraphJet PageRank on Cassovary graph:

Performance counter stats for 'nohup mvn exec:java -pl graphjet-demo -Dexec.mainClass=com.twitter.graphjet.demo.PageRankCassovaryDemo -Dexec.args=-inputDir='soc-LiveJournal1' -inputFilePrefix='soc-LiveJournal1'':

     647397.221854      task-clock (msec)         #    1.079 CPUs utilized          
            51,017      context-switches          #    0.079 K/sec                  
               928      cpu-migrations            #    0.001 K/sec                  
           120,450      page-faults               #    0.186 K/sec                  
 1,664,268,387,487      cycles                    #    2.571 GHz                      (40.02%)
@lintool
lintool / gist:10925877
Last active August 29, 2015 13:59
In defense of Google Flu

In Defense of Google Flu

April 16, 2014

Disclaimer: I have not worked on Google Flu and have no inside knowledge about the project.

tl;dr Saying that Google Flu doesn't work is a bit like pulling up a spam classifier trained on data from 2009, applying it to spam today, seeing that the results are pretty shitty, and then concluding... well, Bayesian classification doesn't work.

Recent reports have concluded that Google Flu "doesn't work". See for example an article by Lazer et al. and a piece by Steven Salzberg. This is cited as an example of "big data hubris".

@lintool
lintool / StreamOfColumns.java
Created May 19, 2011 20:24
Stream of columns from a directory of tab-delimited flat files in HDFS
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.Iterator;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;