dmarx/Arxiv Archive.md

## Arxiv Archive.md

      
    Raw
  

              Arxiv Archive.md
            
          
    To Read:


Publication Date
Article
Notes


2016
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Cited in multi-task sciERC (2018, below)


2018-10-11
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding


Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction
Probably a lot of useful citations in here, not sure we need the coreference stuff.
* SciERC datasets: http://nlp.cs.washington.edu/sciIE/
* Code: https://bitbucket.org/luanyi/scierc/src/master/
* Pretrained (best) models: NER, Coref, Relation


2017-08-08
Structural patterns of information cascades and their implications for dynamics and semantics


2014-03-18
Can Cascades be Predicted?
Lada Adamic


Scrutinizing Gartner's hype cycle approach


variational deep embedding


stick breaking variational autoencoders


2015
Adding Gradient Noise Improves Learning for Very Deep Networks


2012
A Neural Autoregressive Topic Model


2016-01-01
A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data


2010
Online Multiscale Dynamic Topic Models


Topic Model Zoo
gist w lots of topic modeling articles


2017
A Correlated Topic Model Using Word Embeddings


2016
Nonparametric Spherical Topic Modeling with Word Embeddings


2015
Gaussian LDA for Topic Models with Word Embeddings
https://github.com/rajarshd/Gaussian_LDA


2015
Knowledge Transfer Using Latent Variable Models
loooong. dissertation. looks like it contains some cool ideas though.


2016
TopicSketch: Real-Time Bursty Topic Detection from Twitter
Tensor decomposition for bursty topic detection


2009
Detecting Topic Evolution in Scientific Literature: How CanCitations Help?
dynamic topic model augmented by a citation graph


2011
TextFlow: Towards Better Understanding of Evolving Topics in Text
Visualization of topic model evolution


2012-06-13
Continuous Time Dynamic Topic Models
Blei cDTM


2018-05-01
Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time
Simultaneously model trends in topics and keywords


2018-09-14
Google’s Next Generation Music Recognition
Sound search with embeddings for music recognition


2018-09-12
An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation


Large-Scale Validation and Analysis of Interleaved Search Evaluation
"interleaving" for eliciting reliable preferences from implicit feedback (e.g. click data)


Online Structured Prediction via Coactive Learning


The K-armed dueling bandits problem
Recommendation as a policy learning task


2014
DeepWalk - Online Learning Of Social Representations
DepWalk


2017
A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS
TF-IDF weighting of pre-trained word vectors
http://www.offconvex.org/2016/02/14/word-embeddings-2/


2018-02-17
Interpretable VAEs for nonlinear group factor analysis
oi-VAE


2017-07-28
Dynamics of homelessness in urban America


2017-06-12
Attention Is All You Need
transformer networks: seq2seq learning without an RNN


2018-07-24
Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors


2004-04-06
Finding scientific topics
Gibbs sampler for LDA


2017-04-10
Multi-Agent Diverse Generative Adversarial Networks
MAD-GAN


2002
BLEU: a Method for Automatic Evaluation of Machine Translation
BiLingual Evaluation Understudy, for scoring generated text. Quicker overviews:
 * https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
* https://en.wikipedia.org/wiki/BLEU


2014-12-14
Sequence to Sequence Learning with Neural Networks


2016-09-26
Discriminative Embeddings of Latent Variable Models for Structured Data
https://github.com/Hanjun-Dai/pytorch_structure2vec


2015-07-08
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution


2016-09-13
Deep Coevolutionary Network: Embedding User and Item Features for Recommendation


2016
Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions
Man... these GA Tech CS guys are reading my mind


2015-07-10
Hawkes Processes
Background for "Time-Sensitive Recommendation From Recurrent User Activities"


2015 NIPS
Time-Sensitive Recommendation From Recurrent User Activities
This looks super useful for my reddit evolution project.
https://papers.nips.cc/author/nan-du-5356


2015-11-20
On the energy landscape of deep networks
*https://www.youtube.com/watch?v=NFeZ6MggJjw
*http://vision.ucla.edu/~pratikac/
*https://scholar.google.com/citations?user=c_z5hWEAAAAJ&hl=en


2018-01-12
Improved asynchronous parallel optimization analysis for stochastic incremental methods


2016
Recurrent Marked Temporal Point Processes: Embedding Event History to Vector


Modeling Popularity in Asynchronous Social Media Streamswith Recurrent Neural Networks
More RNN point-process stuff


2017
Wasserstein Learning of Deep Generative Point Process Models


2017
Modeling the Intensity Function of Point Process via Recurrent Neural Networkss


2016-09-08
Distill: Attention and Augmented Recurrent Neural Networks


2017-09-07
Inducing Semantic Micro-Clusters from Deep Multi-View Representations of Novels


2017-11-29
A Multi-Horizon Quantile Recurrent Forecaster
conf


2017-03-24
Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks


2017-05-24
Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks
https://github.com/xiaoshuai09/Recurrent-Point-Process


2017
Poincaré Embeddings for Learning Hierarchical Representations
Current research from the scikit-tensor guy (multiplex network community detection), currently at facebook AI


2008
Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance
Highlights issues with RF permutation VI (OOB mean decrease accuracy)


2015-10-22
A computationally fast variable importance test for random forests for high-dimensional data
Vita method, as cited in Evaluation of variable selection methods for random forests and omics data sets. Asserts classical gini index and mean decrease accuracy exhibit unfavorable statistical properties. Proposed algorithm: use 2-fold CV to approximate null distributions for VIs, score VIs by computing p-values against the VI null CDF. Sounds like a cool idea, suspect it only works in the presence of lots of variables. If I understand correctly, it would be impossible to achieve p<.05 with fewer than 10 variables


2017-10-16
Evaluation of variable selection methods for random forests and omics data sets
Comparison of Variable Importance algs, concludes Boruta and Vita (above) are current SOTA


2016-04-22
Entity Embeddings of Categorical Variables


2017-11-27
Distilling a Neural Network Into a Soft Decision Tree


2015-03-04
The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets


2005-12-19
An introduction to ROC analysis
Probably doesn't have anythign I don't already know, but looks like a good review of some fundamentals and might be a good resource to point "uninitiated" people to


2017-06-19
Enriching Word Vectors with Subword Information


2002-12-08
Laplacian Eigenmaps for Dimensionality Reduction and Data Represenation
Referenced in Changpinyo 2016 ("Synthesized Classifiers for ZSL", below). Produces embeddings that preserve the structure of the proximity graph. Sounds similar to Isomap. ...from these slides it's basically PCA using a different similarity kernel. PCA is eigendecomp of covariance, LEM is eigendecomp of graph laplacian of a proximity graph.


2017-08-20
Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning


2017-01-11
An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild


2016-05-27
Synthesized Classifiers for Zero-Shot Learning
Training an image classifier to recognize classes with no representation in the training data by augmenting model training with wordvectors for classes and text descriptions (wikipedia), and then "aligning" the semantic space with the image model space


2017
Attention is all you need
Transformer blocks


2017
Deep Image Prior
/r/machinelearning best paper 2017


2017-10-27
Progressive Growing of GANs for Improved Quality, Stability, and Variation
* Arxiv
* Paper Video
* 1hr of generated faces
* Github
* pre-trained


2015-02-11
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift


1999
On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions
KD-Tree (efficient nearest neighbor data structure)


2013
Generalized Degrees of Freedom and Adaptive Model Selection in Linear Mixed-Effects Models
Derivation of GDF for LME


2004
The Estimation of Prediction Error: Covariance Penalties and Cross-Validation
[Efron] Relationship between non-parametric model evaluation (e.g. CV, boot) and penalty criteria (e.g. AIC, Cp)


1998
On Measuring and Correcting the Effects of Data Mining and Model Selection
Generalized Degrees of Freedom (GDF)


2013-01-23
Loopy Belief Propagation for Approximate Inference: An Empirical Study
Kevin Murphy, LBP


1971
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
VC Dimension


1984
A Theory of the Learnable
PAC learning


2012
Estimating Uncertainty for Massive Data Streams
Introduces (?) Poisson Bootstrap (sec 3) - http://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html


2012-06-28
A Scalable Bootstrap for Massive Data
Bag of Little Bootstraps (BLB)


2016
“Why Should I Trust You?” Explaining the Predictions of Any Classifier
LIME: Locally Interpretable Model-Agnostic Explanations


2016-05-30
Forest Floor Visualizations of Random Forests
http://forestfloor.dk/


2013-12-04
Interpreting random forest classification models using a feature contribution method
Pretty sure this is where "feature contributions" in treeinterpreter came from: 
* https://github.com/andosa/treeinterpreter 
* http://blog.datadive.net/interpreting-random-forests/


2015-02-26
Human-level control through deep reinforcement learning
Deep Q-learning (re-enforcement learning)


2014-06-10
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization


2002
Evolving Neural Networks through Augmenting Topologies
NEAT


2005-10-05
Predictive Learning via Rule Ensembles
RuleFit, J. Friedman
https://www.r-bloggers.com/rulefit-when-disassembled-trees-meet-lasso/


2008
One-class Classification by Combining Density and Class Probability Estimation
Stuart found this, looks like it's equivalent to CADE but published six years earlier


2017-06-26
Do GANs actually learn the distribution? An empirical study


2017-03-17
Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
https://www.youtube.com/watch?v=NHQsDaycwyQ
https://www.dropbox.com/s/svh5610fpfh7np1/drann-poster.pdf?dl=0


2017-04-13
On the Effects of Batch and Weight Normalization in Generative Adversarial Networks
https://github.com/stormraiser/GAN-weight-norm


2017-06-05
Language Generation with Recurrent Generative Adversarial Networks without Pre-training
https://github.com/amirbar/rnn.wgan/


2017-06-05
A simple neural network module for relational reasoning
Relation networks (deepmind)


2014
Linguistic Regularities in Sparse and Explicit Word Representations
Discusses how the softmax loss directly encourages the linear arithmetic properties of word2vec


2014
Neural Word Embedding as Implicit Matrix Factorization


2015-06-22
Skip-Thought Vectors
https://github.com/ryankiros/skip-thoughts


2016-12-10
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
https://github.com/hanzhanggit/StackGAN


2017-05-19
The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables


2015-11-23
Multi-Scale Context Aggregation by Dilated Convolutions


2017-05-18
Building effective deep neural network architectures one feature at a time
"...starting from just a single feature per layer to attain effective representational capacities needed for a specific task by greedily adding feature by feature."


2017-05-16
The power of deeper networks for expressing natural functions
Theoretical analysis demonstrating that deep networks need significantly fewer neurons than shallow networks for the same approximation power


2017-04-07
Learning to Generate Reviews and Discovering Sentiment
* Unsupervised sentiment analysis with char-RNN 
 * https://blog.openai.com/unsupervised-sentiment-neuron/


2016-06-14
Learning to learn by gradient descent by gradient descent
* Learning optimization algorithms catered to the problem-class to supplant traditional approaches (e.g. SGD)


2010
Factorization Machines
* Collaborative Filtering


2017
Why Momentum Really Works


2017-1-26
Wasserstein GAN
WGAN


1997
Long Short-Term Memory


2016-02-12
Learning Step Size Controllers for Robust Neural Network Training


2016-11-05
Neural Architecture Search with Reinforcement Learning


2015-05-03
Highway Networks


2015-12-10
Deep Residual Learning for Image Recognition
ResNet


Recently read (or just want to keep handy):


Publication Date
Article
Notes


2008
Stacked Graphs – Geometry & Aesthetics
streamgraph
 * http://leebyron.com/streamgraph/ 
 * https://matplotlib.org/gallery/lines_bars_and_markers/stackplot_demo.html 
 * https://bl.ocks.org/mbostock/4060954


2014-04-05
My solution for the Galaxy Zoo challenge
Data augmentation


2006
Reducing the Dimensionality of Data with Neural Networks
Hinton, encoder-decoder autoencoder architecture for dimensionality reduction. Motivates reconstruction loss in CycleGAN (not cited in DiscoGAN paper, but same difference)


2017-05-31
BEGAN: Boundary Equilibrium Generative Adversarial Networks
https://github.com/carpedm20/BEGAN-tensorflow


2017-03-15
Learning to Discover Cross-Domain Relations with Generative Adversarial Networks
DiscoGAN


2016-06-10
Improved Techniques for Training GANs
https://github.com/openai/improved-gan


2015-11-19
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
DCGAN
https://github.com/soumith/dcgan.torch


2014-06-10
Generative Adversarial Networks
GAN
https://github.com/goodfeli/adversarial


2003
Latent Dirichlet Allocation
LDA


2001-10
Random Forests


2009-03-02
Extracting the multiscale backbone of complex weighted networks


2015-08-27
Understanding LSTM Networks
LSTM


2017-03-30
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
CycleGAN. Basically equivalent to DiscoGAN, with added weighting term on reconstruction loss (which they up-weight) 
 * https://hardikbansal.github.io/CycleGANBlog/


2017-05-02
Visual Attribute Transfer through Deep Image Analogy
Incredible style transfer results using nearest neighbor fields (NNF) to construct correspondences between the VGG16 feature maps of two images. Want to better understand how they perform the warp operation and image reconstruction. I get the gist...


2015-06-17
Inceptionism: Going Deeper into Neural Networks
https://github.com/google/deepdream


2016-12-16
GAN Hacks


2017-04-13
Stochastic Gradient Descent as Approximate Bayesian Inference
Blei


2017-05-22
pix2code: Generating Code from a Graphical User Interface Screenshot
https://uizard.io/research#pix2code


2016-10-07
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Technique for visualizing the regions of an image responsible for its output classification


2014
“Classifier-Adjusted Density Estimation for Anomaly Detection and One-Class Classification
CADE
 * http://people.cs.umass.edu/~lfriedl/pubs/SDM2014-supp.pdf 
 * http://darrkj.github.io/blog/2014/may102014/
Publication Date	Article	Notes
2016	End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures	Cited in multi-task sciERC (2018, below)
2018-10-11	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
	Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction	Probably a lot of useful citations in here, not sure we need the coreference stuff. * SciERC datasets: http://nlp.cs.washington.edu/sciIE/ * Code: https://bitbucket.org/luanyi/scierc/src/master/ * Pretrained (best) models: NER, Coref, Relation
2017-08-08	Structural patterns of information cascades and their implications for dynamics and semantics
2014-03-18	Can Cascades be Predicted?	Lada Adamic
	Scrutinizing Gartner's hype cycle approach
	variational deep embedding
	stick breaking variational autoencoders
2015	Adding Gradient Noise Improves Learning for Very Deep Networks
2012	A Neural Autoregressive Topic Model
2016-01-01	A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
2010	Online Multiscale Dynamic Topic Models
	Topic Model Zoo	gist w lots of topic modeling articles
2017	A Correlated Topic Model Using Word Embeddings
2016	Nonparametric Spherical Topic Modeling with Word Embeddings
2015	Gaussian LDA for Topic Models with Word Embeddings	https://github.com/rajarshd/Gaussian_LDA
2015	Knowledge Transfer Using Latent Variable Models	loooong. dissertation. looks like it contains some cool ideas though.
2016	TopicSketch: Real-Time Bursty Topic Detection from Twitter	Tensor decomposition for bursty topic detection
2009	Detecting Topic Evolution in Scientific Literature: How CanCitations Help?	dynamic topic model augmented by a citation graph
2011	TextFlow: Towards Better Understanding of Evolving Topics in Text	Visualization of topic model evolution
2012-06-13	Continuous Time Dynamic Topic Models	Blei cDTM
2018-05-01	Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time	Simultaneously model trends in topics and keywords
2018-09-14	Google’s Next Generation Music Recognition	Sound search with embeddings for music recognition
2018-09-12	An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation
	Large-Scale Validation and Analysis of Interleaved Search Evaluation	"interleaving" for eliciting reliable preferences from implicit feedback (e.g. click data)
	Online Structured Prediction via Coactive Learning
	The K-armed dueling bandits problem	Recommendation as a policy learning task
2014	DeepWalk - Online Learning Of Social Representations	DepWalk
2017	A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS	TF-IDF weighting of pre-trained word vectors http://www.offconvex.org/2016/02/14/word-embeddings-2/
2018-02-17	Interpretable VAEs for nonlinear group factor analysis	oi-VAE
2017-07-28	Dynamics of homelessness in urban America
2017-06-12	Attention Is All You Need	transformer networks: seq2seq learning without an RNN
2018-07-24	Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors
2004-04-06	Finding scientific topics	Gibbs sampler for LDA
2017-04-10	Multi-Agent Diverse Generative Adversarial Networks	MAD-GAN
2002	BLEU: a Method for Automatic Evaluation of Machine Translation	BiLingual Evaluation Understudy, for scoring generated text. Quicker overviews: * https://machinelearningmastery.com/calculate-bleu-score-for-text-python/ * https://en.wikipedia.org/wiki/BLEU
2014-12-14	Sequence to Sequence Learning with Neural Networks
2016-09-26	Discriminative Embeddings of Latent Variable Models for Structured Data	https://github.com/Hanjun-Dai/pytorch_structure2vec
2015-07-08	COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
2016-09-13	Deep Coevolutionary Network: Embedding User and Item Features for Recommendation
2016	Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions	Man... these GA Tech CS guys are reading my mind
2015-07-10	Hawkes Processes	Background for "Time-Sensitive Recommendation From Recurrent User Activities"
2015 NIPS	Time-Sensitive Recommendation From Recurrent User Activities	This looks super useful for my reddit evolution project. https://papers.nips.cc/author/nan-du-5356
2015-11-20	On the energy landscape of deep networks	https://www.youtube.com/watch?v=NFeZ6MggJjw http://vision.ucla.edu/~pratikac/ *https://scholar.google.com/citations?user=c_z5hWEAAAAJ&hl=en
2018-01-12	Improved asynchronous parallel optimization analysis for stochastic incremental methods
2016	Recurrent Marked Temporal Point Processes: Embedding Event History to Vector
	Modeling Popularity in Asynchronous Social Media Streamswith Recurrent Neural Networks	More RNN point-process stuff
2017	Wasserstein Learning of Deep Generative Point Process Models
2017	Modeling the Intensity Function of Point Process via Recurrent Neural Networkss
2016-09-08	Distill: Attention and Augmented Recurrent Neural Networks
2017-09-07	Inducing Semantic Micro-Clusters from Deep Multi-View Representations of Novels
2017-11-29	A Multi-Horizon Quantile Recurrent Forecaster	conf
2017-03-24	Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks
2017-05-24	Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks	https://github.com/xiaoshuai09/Recurrent-Point-Process
2017	Poincaré Embeddings for Learning Hierarchical Representations	Current research from the scikit-tensor guy (multiplex network community detection), currently at facebook AI
2008	Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance	Highlights issues with RF permutation VI (OOB mean decrease accuracy)
2015-10-22	A computationally fast variable importance test for random forests for high-dimensional data	Vita method, as cited in Evaluation of variable selection methods for random forests and omics data sets. Asserts classical gini index and mean decrease accuracy exhibit unfavorable statistical properties. Proposed algorithm: use 2-fold CV to approximate null distributions for VIs, score VIs by computing p-values against the VI null CDF. Sounds like a cool idea, suspect it only works in the presence of lots of variables. If I understand correctly, it would be impossible to achieve p<.05 with fewer than 10 variables
2017-10-16	Evaluation of variable selection methods for random forests and omics data sets	Comparison of Variable Importance algs, concludes Boruta and Vita (above) are current SOTA
2016-04-22	Entity Embeddings of Categorical Variables
2017-11-27	Distilling a Neural Network Into a Soft Decision Tree
2015-03-04	The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets
2005-12-19	An introduction to ROC analysis	Probably doesn't have anythign I don't already know, but looks like a good review of some fundamentals and might be a good resource to point "uninitiated" people to
2017-06-19	Enriching Word Vectors with Subword Information
2002-12-08	Laplacian Eigenmaps for Dimensionality Reduction and Data Represenation	Referenced in Changpinyo 2016 ("Synthesized Classifiers for ZSL", below). Produces embeddings that preserve the structure of the proximity graph. Sounds similar to Isomap. ...from these slides it's basically PCA using a different similarity kernel. PCA is eigendecomp of covariance, LEM is eigendecomp of graph laplacian of a proximity graph.
2017-08-20	Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning
2017-01-11	An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
2016-05-27	Synthesized Classifiers for Zero-Shot Learning	Training an image classifier to recognize classes with no representation in the training data by augmenting model training with wordvectors for classes and text descriptions (wikipedia), and then "aligning" the semantic space with the image model space
2017	Attention is all you need	Transformer blocks
2017	Deep Image Prior	/r/machinelearning best paper 2017
2017-10-27	Progressive Growing of GANs for Improved Quality, Stability, and Variation	* Arxiv * Paper Video * 1hr of generated faces * Github * pre-trained
2015-02-11	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
1999	On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions	KD-Tree (efficient nearest neighbor data structure)
2013	Generalized Degrees of Freedom and Adaptive Model Selection in Linear Mixed-Effects Models	Derivation of GDF for LME
2004	The Estimation of Prediction Error: Covariance Penalties and Cross-Validation	[Efron] Relationship between non-parametric model evaluation (e.g. CV, boot) and penalty criteria (e.g. AIC, Cp)
1998	On Measuring and Correcting the Effects of Data Mining and Model Selection	Generalized Degrees of Freedom (GDF)
2013-01-23	Loopy Belief Propagation for Approximate Inference: An Empirical Study	Kevin Murphy, LBP
1971	On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities	VC Dimension
1984	A Theory of the Learnable	PAC learning
2012	Estimating Uncertainty for Massive Data Streams	Introduces (?) Poisson Bootstrap (sec 3) - http://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html
2012-06-28	A Scalable Bootstrap for Massive Data	Bag of Little Bootstraps (BLB)
2016	“Why Should I Trust You?” Explaining the Predictions of Any Classifier	LIME: Locally Interpretable Model-Agnostic Explanations
2016-05-30	Forest Floor Visualizations of Random Forests	http://forestfloor.dk/
2013-12-04	Interpreting random forest classification models using a feature contribution method	Pretty sure this is where "feature contributions" in treeinterpreter came from: * https://github.com/andosa/treeinterpreter * http://blog.datadive.net/interpreting-random-forests/
2015-02-26	Human-level control through deep reinforcement learning	Deep Q-learning (re-enforcement learning)
2014-06-10	Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2002	Evolving Neural Networks through Augmenting Topologies	NEAT
2005-10-05	Predictive Learning via Rule Ensembles	RuleFit, J. Friedman https://www.r-bloggers.com/rulefit-when-disassembled-trees-meet-lasso/
2008	One-class Classification by Combining Density and Class Probability Estimation	Stuart found this, looks like it's equivalent to CADE but published six years earlier
2017-06-26	Do GANs actually learn the distribution? An empirical study
2017-03-17	Deciding How to Decide: Dynamic Routing in Artificial Neural Networks	https://www.youtube.com/watch?v=NHQsDaycwyQ https://www.dropbox.com/s/svh5610fpfh7np1/drann-poster.pdf?dl=0
2017-04-13	On the Effects of Batch and Weight Normalization in Generative Adversarial Networks	https://github.com/stormraiser/GAN-weight-norm
2017-06-05	Language Generation with Recurrent Generative Adversarial Networks without Pre-training	https://github.com/amirbar/rnn.wgan/
2017-06-05	A simple neural network module for relational reasoning	Relation networks (deepmind)
2014	Linguistic Regularities in Sparse and Explicit Word Representations	Discusses how the softmax loss directly encourages the linear arithmetic properties of word2vec
2014	Neural Word Embedding as Implicit Matrix Factorization
2015-06-22	Skip-Thought Vectors	https://github.com/ryankiros/skip-thoughts
2016-12-10	StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks	https://github.com/hanzhanggit/StackGAN
2017-05-19	The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables
2015-11-23	Multi-Scale Context Aggregation by Dilated Convolutions
2017-05-18	Building effective deep neural network architectures one feature at a time	"...starting from just a single feature per layer to attain effective representational capacities needed for a specific task by greedily adding feature by feature."
2017-05-16	The power of deeper networks for expressing natural functions	Theoretical analysis demonstrating that deep networks need significantly fewer neurons than shallow networks for the same approximation power
2017-04-07	Learning to Generate Reviews and Discovering Sentiment	* Unsupervised sentiment analysis with char-RNN * https://blog.openai.com/unsupervised-sentiment-neuron/
2016-06-14	Learning to learn by gradient descent by gradient descent	* Learning optimization algorithms catered to the problem-class to supplant traditional approaches (e.g. SGD)
2010	Factorization Machines	* Collaborative Filtering
2017	Why Momentum Really Works
2017-1-26	Wasserstein GAN	WGAN
1997	Long Short-Term Memory
2016-02-12	Learning Step Size Controllers for Robust Neural Network Training
2016-11-05	Neural Architecture Search with Reinforcement Learning
2015-05-03	Highway Networks
2015-12-10	Deep Residual Learning for Image Recognition	ResNet
Publication Date	Article	Notes
2008	Stacked Graphs – Geometry & Aesthetics	streamgraph * http://leebyron.com/streamgraph/ * https://matplotlib.org/gallery/lines_bars_and_markers/stackplot_demo.html * https://bl.ocks.org/mbostock/4060954
2014-04-05	My solution for the Galaxy Zoo challenge	Data augmentation
2006	Reducing the Dimensionality of Data with Neural Networks	Hinton, encoder-decoder autoencoder architecture for dimensionality reduction. Motivates reconstruction loss in CycleGAN (not cited in DiscoGAN paper, but same difference)
2017-05-31	BEGAN: Boundary Equilibrium Generative Adversarial Networks	https://github.com/carpedm20/BEGAN-tensorflow
2017-03-15	Learning to Discover Cross-Domain Relations with Generative Adversarial Networks	DiscoGAN
2016-06-10	Improved Techniques for Training GANs	https://github.com/openai/improved-gan
2015-11-19	Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks	DCGAN https://github.com/soumith/dcgan.torch
2014-06-10	Generative Adversarial Networks	GAN https://github.com/goodfeli/adversarial
2003	Latent Dirichlet Allocation	LDA
2001-10	Random Forests
2009-03-02	Extracting the multiscale backbone of complex weighted networks
2015-08-27	Understanding LSTM Networks	LSTM
2017-03-30	Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks	CycleGAN. Basically equivalent to DiscoGAN, with added weighting term on reconstruction loss (which they up-weight) * https://hardikbansal.github.io/CycleGANBlog/
2017-05-02	Visual Attribute Transfer through Deep Image Analogy	Incredible style transfer results using nearest neighbor fields (NNF) to construct correspondences between the VGG16 feature maps of two images. Want to better understand how they perform the warp operation and image reconstruction. I get the gist...
2015-06-17	Inceptionism: Going Deeper into Neural Networks	https://github.com/google/deepdream
2016-12-16	GAN Hacks
2017-04-13	Stochastic Gradient Descent as Approximate Bayesian Inference	Blei
2017-05-22	pix2code: Generating Code from a Graphical User Interface Screenshot	https://uizard.io/research#pix2code
2016-10-07	Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization	Technique for visualizing the regions of an image responsible for its output classification
2014	“Classifier-Adjusted Density Estimation for Anomaly Detection and One-Class Classification	CADE * http://people.cs.umass.edu/~lfriedl/pubs/SDM2014-supp.pdf * http://darrkj.github.io/blog/2014/may102014/