2016 |
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures |
Cited in multi-task sciERC (2018, below) |
2018-10-11 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
|
|
Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction |
Probably a lot of useful citations in here, not sure we need the coreference stuff. * SciERC datasets: http://nlp.cs.washington.edu/sciIE/ * Code: https://bitbucket.org/luanyi/scierc/src/master/ * Pretrained (best) models: NER, Coref, Relation |
2017-08-08 |
Structural patterns of information cascades and their implications for dynamics and semantics |
|
2014-03-18 |
Can Cascades be Predicted? |
Lada Adamic |
|
Scrutinizing Gartner's hype cycle approach |
|
|
variational deep embedding |
|
|
stick breaking variational autoencoders |
|
2015 |
Adding Gradient Noise Improves Learning for Very Deep Networks |
|
2012 |
A Neural Autoregressive Topic Model |
|
2016-01-01 |
A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data |
|
2010 |
Online Multiscale Dynamic Topic Models |
|
|
Topic Model Zoo |
gist w lots of topic modeling articles |
2017 |
A Correlated Topic Model Using Word Embeddings |
|
2016 |
Nonparametric Spherical Topic Modeling with Word Embeddings |
|
2015 |
Gaussian LDA for Topic Models with Word Embeddings |
https://github.com/rajarshd/Gaussian_LDA |
2015 |
Knowledge Transfer Using Latent Variable Models |
loooong. dissertation. looks like it contains some cool ideas though. |
2016 |
TopicSketch: Real-Time Bursty Topic Detection from Twitter |
Tensor decomposition for bursty topic detection |
2009 |
Detecting Topic Evolution in Scientific Literature: How CanCitations Help? |
dynamic topic model augmented by a citation graph |
2011 |
TextFlow: Towards Better Understanding of Evolving Topics in Text |
Visualization of topic model evolution |
2012-06-13 |
Continuous Time Dynamic Topic Models |
Blei cDTM |
2018-05-01 |
Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time |
Simultaneously model trends in topics and keywords |
2018-09-14 |
Google’s Next Generation Music Recognition |
Sound search with embeddings for music recognition |
2018-09-12 |
An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation |
|
|
Large-Scale Validation and Analysis of Interleaved Search Evaluation |
"interleaving" for eliciting reliable preferences from implicit feedback (e.g. click data) |
|
Online Structured Prediction via Coactive Learning |
|
|
The K-armed dueling bandits problem |
Recommendation as a policy learning task |
2014 |
DeepWalk - Online Learning Of Social Representations |
DepWalk |
2017 |
A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS |
TF-IDF weighting of pre-trained word vectors http://www.offconvex.org/2016/02/14/word-embeddings-2/ |
2018-02-17 |
Interpretable VAEs for nonlinear group factor analysis |
oi-VAE |
2017-07-28 |
Dynamics of homelessness in urban America |
|
2017-06-12 |
Attention Is All You Need |
transformer networks: seq2seq learning without an RNN |
2018-07-24 |
Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors |
|
2004-04-06 |
Finding scientific topics |
Gibbs sampler for LDA |
2017-04-10 |
Multi-Agent Diverse Generative Adversarial Networks |
MAD-GAN |
2002 |
BLEU: a Method for Automatic Evaluation of Machine Translation |
BiLingual Evaluation Understudy, for scoring generated text. Quicker overviews: * https://machinelearningmastery.com/calculate-bleu-score-for-text-python/ * https://en.wikipedia.org/wiki/BLEU |
2014-12-14 |
Sequence to Sequence Learning with Neural Networks |
|
2016-09-26 |
Discriminative Embeddings of Latent Variable Models for Structured Data |
https://github.com/Hanjun-Dai/pytorch_structure2vec |
2015-07-08 |
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution |
|
2016-09-13 |
Deep Coevolutionary Network: Embedding User and Item Features for Recommendation |
|
2016 |
Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions |
Man... these GA Tech CS guys are reading my mind |
2015-07-10 |
Hawkes Processes |
Background for "Time-Sensitive Recommendation From Recurrent User Activities" |
2015 NIPS |
Time-Sensitive Recommendation From Recurrent User Activities |
This looks super useful for my reddit evolution project. https://papers.nips.cc/author/nan-du-5356 |
2015-11-20 |
On the energy landscape of deep networks |
*https://www.youtube.com/watch?v=NFeZ6MggJjw *http://vision.ucla.edu/~pratikac/ *https://scholar.google.com/citations?user=c_z5hWEAAAAJ&hl=en |
2018-01-12 |
Improved asynchronous parallel optimization analysis for stochastic incremental methods |
|
2016 |
Recurrent Marked Temporal Point Processes: Embedding Event History to Vector |
|
|
Modeling Popularity in Asynchronous Social Media Streamswith Recurrent Neural Networks |
More RNN point-process stuff |
2017 |
Wasserstein Learning of Deep Generative Point Process Models |
|
2017 |
Modeling the Intensity Function of Point Process via Recurrent Neural Networkss |
|
2016-09-08 |
Distill: Attention and Augmented Recurrent Neural Networks |
|
2017-09-07 |
Inducing Semantic Micro-Clusters from Deep Multi-View Representations of Novels |
|
2017-11-29 |
A Multi-Horizon Quantile Recurrent Forecaster |
conf |
2017-03-24 |
Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks |
|
2017-05-24 |
Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks |
https://github.com/xiaoshuai09/Recurrent-Point-Process |
2017 |
Poincaré Embeddings for Learning Hierarchical Representations |
Current research from the scikit-tensor guy (multiplex network community detection), currently at facebook AI |
2008 |
Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance |
Highlights issues with RF permutation VI (OOB mean decrease accuracy) |
2015-10-22 |
A computationally fast variable importance test for random forests for high-dimensional data |
Vita method, as cited in Evaluation of variable selection methods for random forests and omics data sets. Asserts classical gini index and mean decrease accuracy exhibit unfavorable statistical properties. Proposed algorithm: use 2-fold CV to approximate null distributions for VIs, score VIs by computing p-values against the VI null CDF. Sounds like a cool idea, suspect it only works in the presence of lots of variables. If I understand correctly, it would be impossible to achieve p<.05 with fewer than 10 variables |
2017-10-16 |
Evaluation of variable selection methods for random forests and omics data sets |
Comparison of Variable Importance algs, concludes Boruta and Vita (above) are current SOTA |
2016-04-22 |
Entity Embeddings of Categorical Variables |
|
2017-11-27 |
Distilling a Neural Network Into a Soft Decision Tree |
|
2015-03-04 |
The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets |
|
2005-12-19 |
An introduction to ROC analysis |
Probably doesn't have anythign I don't already know, but looks like a good review of some fundamentals and might be a good resource to point "uninitiated" people to |
2017-06-19 |
Enriching Word Vectors with Subword Information |
|
2002-12-08 |
Laplacian Eigenmaps for Dimensionality Reduction and Data Represenation |
Referenced in Changpinyo 2016 ("Synthesized Classifiers for ZSL", below). Produces embeddings that preserve the structure of the proximity graph. Sounds similar to Isomap. ...from these slides it's basically PCA using a different similarity kernel. PCA is eigendecomp of covariance, LEM is eigendecomp of graph laplacian of a proximity graph. |
2017-08-20 |
Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning |
|
2017-01-11 |
An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild |
|
2016-05-27 |
Synthesized Classifiers for Zero-Shot Learning |
Training an image classifier to recognize classes with no representation in the training data by augmenting model training with wordvectors for classes and text descriptions (wikipedia), and then "aligning" the semantic space with the image model space |
2017 |
Attention is all you need |
Transformer blocks |
2017 |
Deep Image Prior |
/r/machinelearning best paper 2017 |
2017-10-27 |
Progressive Growing of GANs for Improved Quality, Stability, and Variation |
* Arxiv * Paper Video * 1hr of generated faces * Github * pre-trained |
2015-02-11 |
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift |
|
1999 |
On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions |
KD-Tree (efficient nearest neighbor data structure) |
2013 |
Generalized Degrees of Freedom and Adaptive Model Selection in Linear Mixed-Effects Models |
Derivation of GDF for LME |
2004 |
The Estimation of Prediction Error: Covariance Penalties and Cross-Validation |
[Efron] Relationship between non-parametric model evaluation (e.g. CV, boot) and penalty criteria (e.g. AIC, Cp) |
1998 |
On Measuring and Correcting the Effects of Data Mining and Model Selection |
Generalized Degrees of Freedom (GDF) |
2013-01-23 |
Loopy Belief Propagation for Approximate Inference: An Empirical Study |
Kevin Murphy, LBP |
1971 |
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities |
VC Dimension |
1984 |
A Theory of the Learnable |
PAC learning |
2012 |
Estimating Uncertainty for Massive Data Streams |
Introduces (?) Poisson Bootstrap (sec 3) - http://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html |
2012-06-28 |
A Scalable Bootstrap for Massive Data |
Bag of Little Bootstraps (BLB) |
2016 |
“Why Should I Trust You?” Explaining the Predictions of Any Classifier |
LIME: Locally Interpretable Model-Agnostic Explanations |
2016-05-30 |
Forest Floor Visualizations of Random Forests |
http://forestfloor.dk/ |
2013-12-04 |
Interpreting random forest classification models using a feature contribution method |
Pretty sure this is where "feature contributions" in treeinterpreter came from: * https://github.com/andosa/treeinterpreter * http://blog.datadive.net/interpreting-random-forests/ |
2015-02-26 |
Human-level control through deep reinforcement learning |
Deep Q-learning (re-enforcement learning) |
2014-06-10 |
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization |
|
2002 |
Evolving Neural Networks through Augmenting Topologies |
NEAT |
2005-10-05 |
Predictive Learning via Rule Ensembles |
RuleFit, J. Friedman https://www.r-bloggers.com/rulefit-when-disassembled-trees-meet-lasso/ |
2008 |
One-class Classification by Combining Density and Class Probability Estimation |
Stuart found this, looks like it's equivalent to CADE but published six years earlier |
2017-06-26 |
Do GANs actually learn the distribution? An empirical study |
|
2017-03-17 |
Deciding How to Decide: Dynamic Routing in Artificial Neural Networks |
https://www.youtube.com/watch?v=NHQsDaycwyQ https://www.dropbox.com/s/svh5610fpfh7np1/drann-poster.pdf?dl=0 |
2017-04-13 |
On the Effects of Batch and Weight Normalization in Generative Adversarial Networks |
https://github.com/stormraiser/GAN-weight-norm |
2017-06-05 |
Language Generation with Recurrent Generative Adversarial Networks without Pre-training |
https://github.com/amirbar/rnn.wgan/ |
2017-06-05 |
A simple neural network module for relational reasoning |
Relation networks (deepmind) |
2014 |
Linguistic Regularities in Sparse and Explicit Word Representations |
Discusses how the softmax loss directly encourages the linear arithmetic properties of word2vec |
2014 |
Neural Word Embedding as Implicit Matrix Factorization |
|
2015-06-22 |
Skip-Thought Vectors |
https://github.com/ryankiros/skip-thoughts |
2016-12-10 |
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks |
https://github.com/hanzhanggit/StackGAN |
2017-05-19 |
The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables |
|
2015-11-23 |
Multi-Scale Context Aggregation by Dilated Convolutions |
|
2017-05-18 |
Building effective deep neural network architectures one feature at a time |
"...starting from just a single feature per layer to attain effective representational capacities needed for a specific task by greedily adding feature by feature." |
2017-05-16 |
The power of deeper networks for expressing natural functions |
Theoretical analysis demonstrating that deep networks need significantly fewer neurons than shallow networks for the same approximation power |
2017-04-07 |
Learning to Generate Reviews and Discovering Sentiment |
* Unsupervised sentiment analysis with char-RNN * https://blog.openai.com/unsupervised-sentiment-neuron/ |
2016-06-14 |
Learning to learn by gradient descent by gradient descent |
* Learning optimization algorithms catered to the problem-class to supplant traditional approaches (e.g. SGD) |
2010 |
Factorization Machines |
* Collaborative Filtering |
2017 |
Why Momentum Really Works |
|
2017-1-26 |
Wasserstein GAN |
WGAN |
1997 |
Long Short-Term Memory |
|
2016-02-12 |
Learning Step Size Controllers for Robust Neural Network Training |
|
2016-11-05 |
Neural Architecture Search with Reinforcement Learning |
|
2015-05-03 |
Highway Networks |
|
2015-12-10 |
Deep Residual Learning for Image Recognition |
ResNet |