Skip to content

Instantly share code, notes, and snippets.

Avatar

Jimmy Lin lintool

View GitHub Profile
@lintool
lintool / emnlp2019_all_authors.txt
Created Sep 3, 2019
Tally of *all authors* from all (long, short, demo) accepted papers at EMNLP 2019 https://www.emnlp-ijcnlp2019.org/program/accepted/
View emnlp2019_all_authors.txt
Ting Liu, 10
Lidong Bing, 8
Zhiyuan Liu, 8
Shuming Shi, 8
Luke Zettlemoyer, 8
Dongyan Zhao, 8
Kai-Wei Chang, 7
Iryna Gurevych, 7
Jimmy Lin, 7
Xiang Ren, 7
@lintool
lintool / emnlp2019_first_authors.txt
Created Sep 3, 2019
Tally of *first authors* from all (long, short, demo) accepted papers at EMNLP 2019 https://www.emnlp-ijcnlp2019.org/program/accepted/
View emnlp2019_first_authors.txt
Dongyeop Kang, 3
Eric Wallace, 3
Chuhan Wu, 3
Jingjing Xu, 3
Deng Cai, 2
Zhangming Chan, 2
Mingda Chen, 2
Yiming Cui, 2
Jesse Dodge, 2
Zi-Yi Dou, 2
@lintool
lintool / sigir2019-all-authors.txt
Created Apr 22, 2019
Tally of (all authors, first authors only) of all (long, short, DC) accepted papers at SIGIR 2019 http://sigir.org/sigir2019/program/accepted/
View sigir2019-all-authors.txt
Yiqun Liu, 6
Shaoping Ma, 6
Maarten de Rijke, 6
Min Zhang, 6
Yongfeng Zhang, 6
Liqiang Nie, 5
Jiaxin Mao, 4
Ji-Rong Wen, 4
Qi Zhang, 4
Michael Bendersky, 3
@lintool
lintool / gist:6633667ed8501dd00136bf9684ab4546
Created Apr 2, 2019
Tally of *first authors* from all (long, short, industry, etc.) accepted papers at NAACL 2019 https://naacl2019.org/program/accepted/
View gist:6633667ed8501dd00136bf9684ab4546
Alan Akbik, 2
Hadi Amiri, 2
Manuel Ciosici, 2
Marco Damonte, 2
Nelson F. Liu, 2
Muhammad Tasnim Mohiuddin, 2
Mohammad Taher Pilehvar, 2
Alexey Romanov, 2
Cory Shain, 2
Ehsan Shareghi, 2
@lintool
lintool / gist:ef00c7e5154a524694f46c5f9e32122b
Created Apr 2, 2019
Tally of authors of all (long, short, industry, etc.) accepted papers at NAACL 2019 https://naacl2019.org/program/accepted/
View gist:ef00c7e5154a524694f46c5f9e32122b
Graham Neubig, 7
Ryan Cotterell, 6
William Yang Wang, 6
Jonathan Berant, 5
Pushpak Bhattacharyya, 4
Jianfeng Gao, 4
Yoav Goldberg, 4
Iryna Gurevych, 4
Eduard Hovy, 4
Heng Ji, 4
View GraphJet-vs-Cassovary.md

GraphJet PageRank on Cassovary graph:

Performance counter stats for 'nohup mvn exec:java -pl graphjet-demo -Dexec.mainClass=com.twitter.graphjet.demo.PageRankCassovaryDemo -Dexec.args=-inputDir='soc-LiveJournal1' -inputFilePrefix='soc-LiveJournal1'':

     647397.221854      task-clock (msec)         #    1.079 CPUs utilized          
            51,017      context-switches          #    0.079 K/sec                  
               928      cpu-migrations            #    0.001 K/sec                  
           120,450      page-faults               #    0.186 K/sec                  
 1,664,268,387,487      cycles                    #    2.571 GHz                      (40.02%)
@lintool
lintool / gist:10925877
Last active Aug 29, 2015
In defense of Google Flu
View gist:10925877

In Defense of Google Flu

April 16, 2014

Disclaimer: I have not worked on Google Flu and have no inside knowledge about the project.

tl;dr Saying that Google Flu doesn't work is a bit like pulling up a spam classifier trained on data from 2009, applying it to spam today, seeing that the results are pretty shitty, and then concluding... well, Bayesian classification doesn't work.

Recent reports have concluded that Google Flu "doesn't work". See for example an article by Lazer et al. and a piece by Steven Salzberg. This is cited as an example of "big data hubris".

@lintool
lintool / StreamOfColumns.java
Created May 19, 2011
Stream of columns from a directory of tab-delimited flat files in HDFS
View StreamOfColumns.java
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.Iterator;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;