Skip to content

Instantly share code, notes, and snippets.

@dmarx
Last active April 18, 2019 23:03
Show Gist options
  • Save dmarx/edfcf6efac2776b4f7ec9eebea98a730 to your computer and use it in GitHub Desktop.
Save dmarx/edfcf6efac2776b4f7ec9eebea98a730 to your computer and use it in GitHub Desktop.
Machine learning articles I want to read or have read, mostly arxiv.org articles discussing recent advancements in deep learning.

To Read:

Publication Date Article Notes
2016 End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures Cited in multi-task sciERC (2018, below)
2018-10-11 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction Probably a lot of useful citations in here, not sure we need the coreference stuff.
* SciERC datasets: http://nlp.cs.washington.edu/sciIE/
* Code: https://bitbucket.org/luanyi/scierc/src/master/
* Pretrained (best) models: NER, Coref, Relation
2017-08-08 Structural patterns of information cascades and their implications for dynamics and semantics
2014-03-18 Can Cascades be Predicted? Lada Adamic
Scrutinizing Gartner's hype cycle approach
variational deep embedding
stick breaking variational autoencoders
2015 Adding Gradient Noise Improves Learning for Very Deep Networks
2012 A Neural Autoregressive Topic Model
2016-01-01 A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
2010 Online Multiscale Dynamic Topic Models
Topic Model Zoo gist w lots of topic modeling articles
2017 A Correlated Topic Model Using Word Embeddings
2016 Nonparametric Spherical Topic Modeling with Word Embeddings
2015 Gaussian LDA for Topic Models with Word Embeddings https://github.com/rajarshd/Gaussian_LDA
2015 Knowledge Transfer Using Latent Variable Models loooong. dissertation. looks like it contains some cool ideas though.
2016 TopicSketch: Real-Time Bursty Topic Detection from Twitter Tensor decomposition for bursty topic detection
2009 Detecting Topic Evolution in Scientific Literature: How CanCitations Help? dynamic topic model augmented by a citation graph
2011 TextFlow: Towards Better Understanding of Evolving Topics in Text Visualization of topic model evolution
2012-06-13 Continuous Time Dynamic Topic Models Blei cDTM
2018-05-01 Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time Simultaneously model trends in topics and keywords
2018-09-14 Google’s Next Generation Music Recognition Sound search with embeddings for music recognition
2018-09-12 An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation
Large-Scale Validation and Analysis of Interleaved Search Evaluation "interleaving" for eliciting reliable preferences from implicit feedback (e.g. click data)
Online Structured Prediction via Coactive Learning
The K-armed dueling bandits problem Recommendation as a policy learning task
2014 DeepWalk - Online Learning Of Social Representations DepWalk
2017 A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS TF-IDF weighting of pre-trained word vectors
http://www.offconvex.org/2016/02/14/word-embeddings-2/
2018-02-17 Interpretable VAEs for nonlinear group factor analysis oi-VAE
2017-07-28 Dynamics of homelessness in urban America
2017-06-12 Attention Is All You Need transformer networks: seq2seq learning without an RNN
2018-07-24 Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors
2004-04-06 Finding scientific topics Gibbs sampler for LDA
2017-04-10 Multi-Agent Diverse Generative Adversarial Networks MAD-GAN
2002 BLEU: a Method for Automatic Evaluation of Machine Translation BiLingual Evaluation Understudy, for scoring generated text. Quicker overviews:
* https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
* https://en.wikipedia.org/wiki/BLEU
2014-12-14 Sequence to Sequence Learning with Neural Networks
2016-09-26 Discriminative Embeddings of Latent Variable Models for Structured Data https://github.com/Hanjun-Dai/pytorch_structure2vec
2015-07-08 COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
2016-09-13 Deep Coevolutionary Network: Embedding User and Item Features for Recommendation
2016 Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions Man... these GA Tech CS guys are reading my mind
2015-07-10 Hawkes Processes Background for "Time-Sensitive Recommendation From Recurrent User Activities"
2015 NIPS Time-Sensitive Recommendation From Recurrent User Activities This looks super useful for my reddit evolution project.
https://papers.nips.cc/author/nan-du-5356
2015-11-20 On the energy landscape of deep networks *https://www.youtube.com/watch?v=NFeZ6MggJjw
*http://vision.ucla.edu/~pratikac/
*https://scholar.google.com/citations?user=c_z5hWEAAAAJ&hl=en
2018-01-12 Improved asynchronous parallel optimization analysis for stochastic incremental methods
2016 Recurrent Marked Temporal Point Processes: Embedding Event History to Vector
Modeling Popularity in Asynchronous Social Media Streamswith Recurrent Neural Networks More RNN point-process stuff
2017 Wasserstein Learning of Deep Generative Point Process Models
2017 Modeling the Intensity Function of Point Process via Recurrent Neural Networkss
2016-09-08 Distill: Attention and Augmented Recurrent Neural Networks
2017-09-07 Inducing Semantic Micro-Clusters from Deep Multi-View Representations of Novels
2017-11-29 A Multi-Horizon Quantile Recurrent Forecaster conf
2017-03-24 Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks
2017-05-24 Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks https://github.com/xiaoshuai09/Recurrent-Point-Process
2017 Poincaré Embeddings for Learning Hierarchical Representations Current research from the scikit-tensor guy (multiplex network community detection), currently at facebook AI
2008 Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance Highlights issues with RF permutation VI (OOB mean decrease accuracy)
2015-10-22 A computationally fast variable importance test for random forests for high-dimensional data Vita method, as cited in Evaluation of variable selection methods for random forests and omics data sets. Asserts classical gini index and mean decrease accuracy exhibit unfavorable statistical properties. Proposed algorithm: use 2-fold CV to approximate null distributions for VIs, score VIs by computing p-values against the VI null CDF. Sounds like a cool idea, suspect it only works in the presence of lots of variables. If I understand correctly, it would be impossible to achieve p<.05 with fewer than 10 variables
2017-10-16 Evaluation of variable selection methods for random forests and omics data sets Comparison of Variable Importance algs, concludes Boruta and Vita (above) are current SOTA
2016-04-22 Entity Embeddings of Categorical Variables
2017-11-27 Distilling a Neural Network Into a Soft Decision Tree
2015-03-04 The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets
2005-12-19 An introduction to ROC analysis Probably doesn't have anythign I don't already know, but looks like a good review of some fundamentals and might be a good resource to point "uninitiated" people to
2017-06-19 Enriching Word Vectors with Subword Information
2002-12-08 Laplacian Eigenmaps for Dimensionality Reduction and Data Represenation Referenced in Changpinyo 2016 ("Synthesized Classifiers for ZSL", below). Produces embeddings that preserve the structure of the proximity graph. Sounds similar to Isomap. ...from these slides it's basically PCA using a different similarity kernel. PCA is eigendecomp of covariance, LEM is eigendecomp of graph laplacian of a proximity graph.
2017-08-20 Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning
2017-01-11 An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
2016-05-27 Synthesized Classifiers for Zero-Shot Learning Training an image classifier to recognize classes with no representation in the training data by augmenting model training with wordvectors for classes and text descriptions (wikipedia), and then "aligning" the semantic space with the image model space
2017 Attention is all you need Transformer blocks
2017 Deep Image Prior /r/machinelearning best paper 2017
2017-10-27 Progressive Growing of GANs for Improved Quality, Stability, and Variation * Arxiv
* Paper Video
* 1hr of generated faces
* Github
* pre-trained
2015-02-11 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
1999 On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions KD-Tree (efficient nearest neighbor data structure)
2013 Generalized Degrees of Freedom and Adaptive Model Selection in Linear Mixed-Effects Models Derivation of GDF for LME
2004 The Estimation of Prediction Error: Covariance Penalties and Cross-Validation [Efron] Relationship between non-parametric model evaluation (e.g. CV, boot) and penalty criteria (e.g. AIC, Cp)
1998 On Measuring and Correcting the Effects of Data Mining and Model Selection Generalized Degrees of Freedom (GDF)
2013-01-23 Loopy Belief Propagation for Approximate Inference: An Empirical Study Kevin Murphy, LBP
1971 On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities VC Dimension
1984 A Theory of the Learnable PAC learning
2012 Estimating Uncertainty for Massive Data Streams Introduces (?) Poisson Bootstrap (sec 3) - http://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html
2012-06-28 A Scalable Bootstrap for Massive Data Bag of Little Bootstraps (BLB)
2016 “Why Should I Trust You?” Explaining the Predictions of Any Classifier LIME: Locally Interpretable Model-Agnostic Explanations
2016-05-30 Forest Floor Visualizations of Random Forests http://forestfloor.dk/
2013-12-04 Interpreting random forest classification models using a feature contribution method Pretty sure this is where "feature contributions" in treeinterpreter came from:
* https://github.com/andosa/treeinterpreter
* http://blog.datadive.net/interpreting-random-forests/
2015-02-26 Human-level control through deep reinforcement learning Deep Q-learning (re-enforcement learning)
2014-06-10 Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2002 Evolving Neural Networks through Augmenting Topologies NEAT
2005-10-05 Predictive Learning via Rule Ensembles RuleFit, J. Friedman
https://www.r-bloggers.com/rulefit-when-disassembled-trees-meet-lasso/
2008 One-class Classification by Combining Density and Class Probability Estimation Stuart found this, looks like it's equivalent to CADE but published six years earlier
2017-06-26 Do GANs actually learn the distribution? An empirical study
2017-03-17 Deciding How to Decide: Dynamic Routing in Artificial Neural Networks https://www.youtube.com/watch?v=NHQsDaycwyQ
https://www.dropbox.com/s/svh5610fpfh7np1/drann-poster.pdf?dl=0
2017-04-13 On the Effects of Batch and Weight Normalization in Generative Adversarial Networks https://github.com/stormraiser/GAN-weight-norm
2017-06-05 Language Generation with Recurrent Generative Adversarial Networks without Pre-training https://github.com/amirbar/rnn.wgan/
2017-06-05 A simple neural network module for relational reasoning Relation networks (deepmind)
2014 Linguistic Regularities in Sparse and Explicit Word Representations Discusses how the softmax loss directly encourages the linear arithmetic properties of word2vec
2014 Neural Word Embedding as Implicit Matrix Factorization
2015-06-22 Skip-Thought Vectors https://github.com/ryankiros/skip-thoughts
2016-12-10 StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks https://github.com/hanzhanggit/StackGAN
2017-05-19 The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables
2015-11-23 Multi-Scale Context Aggregation by Dilated Convolutions
2017-05-18 Building effective deep neural network architectures one feature at a time "...starting from just a single feature per layer to attain effective representational capacities needed for a specific task by greedily adding feature by feature."
2017-05-16 The power of deeper networks for expressing natural functions Theoretical analysis demonstrating that deep networks need significantly fewer neurons than shallow networks for the same approximation power
2017-04-07 Learning to Generate Reviews and Discovering Sentiment * Unsupervised sentiment analysis with char-RNN
* https://blog.openai.com/unsupervised-sentiment-neuron/
2016-06-14 Learning to learn by gradient descent by gradient descent * Learning optimization algorithms catered to the problem-class to supplant traditional approaches (e.g. SGD)
2010 Factorization Machines * Collaborative Filtering
2017 Why Momentum Really Works
2017-1-26 Wasserstein GAN WGAN
1997 Long Short-Term Memory
2016-02-12 Learning Step Size Controllers for Robust Neural Network Training
2016-11-05 Neural Architecture Search with Reinforcement Learning
2015-05-03 Highway Networks
2015-12-10 Deep Residual Learning for Image Recognition ResNet

Recently read (or just want to keep handy):

Publication Date Article Notes
2008 Stacked Graphs – Geometry & Aesthetics streamgraph
* http://leebyron.com/streamgraph/
* https://matplotlib.org/gallery/lines_bars_and_markers/stackplot_demo.html
* https://bl.ocks.org/mbostock/4060954
2014-04-05 My solution for the Galaxy Zoo challenge Data augmentation
2006 Reducing the Dimensionality of Data with Neural Networks Hinton, encoder-decoder autoencoder architecture for dimensionality reduction. Motivates reconstruction loss in CycleGAN (not cited in DiscoGAN paper, but same difference)
2017-05-31 BEGAN: Boundary Equilibrium Generative Adversarial Networks https://github.com/carpedm20/BEGAN-tensorflow
2017-03-15 Learning to Discover Cross-Domain Relations with Generative Adversarial Networks DiscoGAN
2016-06-10 Improved Techniques for Training GANs https://github.com/openai/improved-gan
2015-11-19 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks DCGAN
https://github.com/soumith/dcgan.torch
2014-06-10 Generative Adversarial Networks GAN
https://github.com/goodfeli/adversarial
2003 Latent Dirichlet Allocation LDA
2001-10 Random Forests
2009-03-02 Extracting the multiscale backbone of complex weighted networks
2015-08-27 Understanding LSTM Networks LSTM
2017-03-30 Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks CycleGAN. Basically equivalent to DiscoGAN, with added weighting term on reconstruction loss (which they up-weight)
* https://hardikbansal.github.io/CycleGANBlog/
2017-05-02 Visual Attribute Transfer through Deep Image Analogy Incredible style transfer results using nearest neighbor fields (NNF) to construct correspondences between the VGG16 feature maps of two images. Want to better understand how they perform the warp operation and image reconstruction. I get the gist...
2015-06-17 Inceptionism: Going Deeper into Neural Networks https://github.com/google/deepdream
2016-12-16 GAN Hacks
2017-04-13 Stochastic Gradient Descent as Approximate Bayesian Inference Blei
2017-05-22 pix2code: Generating Code from a Graphical User Interface Screenshot https://uizard.io/research#pix2code
2016-10-07 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Technique for visualizing the regions of an image responsible for its output classification
2014 “Classifier-Adjusted Density Estimation for Anomaly Detection and One-Class Classification CADE
* http://people.cs.umass.edu/~lfriedl/pubs/SDM2014-supp.pdf
* http://darrkj.github.io/blog/2014/may102014/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment