Last active
April 26, 2020 03:15
-
-
Save pbamotra/6455da32d3d627663471b23e3ea08b75 to your computer and use it in GitHub Desktop.
ICLR 2020 Favorites - WIP
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Calendar: https://iclr.cc/virtual/calendar.html#tab-calendar | |
Paper search: https://iclr.cc/virtual/papers.html?filter=keywords | |
Papers: | |
1. Title: | |
Tree-Structured Attention with Hierarchical Accumulation | |
Authority: | |
Richard Socher | |
Url: | |
https://iclr.cc/virtual/poster_HJxK5pEYvr.html | |
Reference: | |
- Pay less attention with lightweight and dynamic convolutions | |
Code: | |
sad :( | |
Perception: | |
Combine transformers and tree-LSTM to incoporate hierarchical structure of language while | |
keeping the cost down to same as that for transformers | |
2. Title: | |
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks | |
Authority: | |
Sanjeev Arora, Ruslan Salakhutdinov | |
Url: | |
https://iclr.cc/virtual/poster_rkl8sJBYvH.html | |
Reference: | |
- None | |
Code: | |
https://github.com/LeoYu/neural-tangent-kernel-UCI | |
Perception: | |
Authors are suggesting that for small datasets with hundreds to few thousand examples, their | |
method NTK outperforms Random Forests and Neural networks too, so use it out of the box | |
3. Title: | |
PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS | |
Authority: | |
None | |
Url: | |
https://iclr.cc/virtual/poster_SJxpsxrYPS.html | |
Reference: | |
- VAE: Kingma and Welling 2013 | |
- MMD: Gretton 2007 | |
- Generative moment matching networks, 2015 | |
- Training generative neural networks via MMD optimization | |
Code: | |
https://github.com/Zhiyuan1991/proVLAE (TF, :/) | |
Perception: | |
Distangled representations would be a great thing for creative applications and generative | |
modeling. The authors are improving upon the research work in area of Variational AE. | |
4. Title: | |
Self-labelling via simultaneous clustering and representation learning | |
Authority: | |
VGG at Oxford | |
Url: | |
https://iclr.cc/virtual/poster_Hyx-jyBFPr.html | |
Reference: | |
- Deep clustering for unsupervised learning of visual features | |
Code: | |
https://github.com/yukimasano/self-label | |
Perception: | |
Clustering and representation learning can be done simulataneouly and meaningfully | |
5. Title: | |
Robust training with ensemble consensus | |
Authority: | |
KAIST | |
Url: | |
https://iclr.cc/virtual/poster_ryxOUTVYDH.html | |
Reference: | |
- None | |
Code: | |
sad :( | |
Perception: | |
Use ensembles to differentiate between clean labels and noisy labels both of which can | |
produce small loss however the difference is that clean labels will generalize while | |
noisy labels will be memorized. | |
6. Title: | |
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well | |
Authority: | |
Apple | |
Url: | |
https://iclr.cc/virtual/poster_rygFWAEFwS.html | |
Reference: | |
- None | |
Code: | |
sad :( | |
Perception: | |
Applied science paper. But, looks weak for conference such as ICLR, nevertheless, good | |
result. Lot of optimisations happening in this direction. Look at those too! | |
7. Title: | |
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well | |
Authority: | |
Apple | |
Url: | |
https://iclr.cc/virtual/poster_rygFWAEFwS.html | |
Reference: | |
- None | |
Code: | |
sad :( | |
Perception: | |
Applied science paper. But, good result. Lot of optimisations happening in this direction. | |
7. Title: | |
Target-Embedding Autoencoders for Supervised Representation Learning | |
Authority: | |
Cambridge | |
Url: | |
https://iclr.cc/virtual/poster_BygXFkSYDH.html | |
Reference: | |
- None | |
Code: | |
sad :( | |
Perception: | |
Target reconstruction loss as regularization for neural network classifiers | |
8. Title: | |
Ridge Regression: Structure, Cross-Validation, and Sketching | |
Authority: | |
Stanford, UPenn | |
Url: | |
https://iclr.cc/virtual/poster_HklRwaEKwB.html | |
Reference: | |
- None | |
Code: | |
https://github.com/liusf15/RidgeRegression | |
Perception: | |
Too mathematical, intro video is not so good at explanation and incomplete! | |
9. Title: | |
Encoding word order in complex embeddings | |
Authority: | |
UCopenhegen | |
Url: | |
https://iclr.cc/virtual/poster_Hke-WTVtwr.html | |
Reference: | |
- None | |
Code: | |
https://github.com/iclr-complex-order/complex-order (TF, :/) | |
Perception: | |
Finds an alternative to the position encoding (PE in BERT for eg.) so as to encode | |
word order in the word embeddings | |
10. Title: | |
An Exponential Learning Rate Schedule for Deep Learning | |
Authority: | |
Sanjeev Arora | |
Url: | |
https://iclr.cc/virtual/poster_rJg8TeSFDH.html | |
Reference: | |
- Fix your classifier: the marginal value of training the last weight layer | |
Code: | |
sad :( | |
Perception: | |
Training can be done using SGD with momentum and an exponentially in- creasing learning | |
rate schedule | |
11. Title: | |
Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks | |
Authority: | |
Rice and TAMU | |
Url: | |
https://iclr.cc/virtual/poster_BJxsrgStvr.html | |
Reference: | |
- Progressive Pruning, Frankle 2019 | |
Code: | |
https://github.com/RICE-EIC/Early-Bird-Tickets | |
Perception: | |
Applied science, training scheme proposed, good results, faster training. Compare hamming | |
distance between epochs. | |
Workshops: | |
Url: | |
https://iclr.cc/virtual/workshops_8.html | |
Site: | |
https://sites.google.com/nyu.edu/ml-irl-2020/ | |
Papers: | |
- ATTENTION-BASED PROTOTYPICAL LEARNING (Google) | |
- GETTING A CLUE: A METHOD FOR EXPLAINING UNCERTAINTY ESTIMATES (MSR) | |
- MACHINE LEARNING FOR DIGITAL TRY-ON (UMaryland) | |
- IDENTIFYING INTERPRETABLE WORD VECTOR SUBSPACES WITH PCA (Harvard) | |
Misc: | |
1. [3.5/5] Deep Double Descent: Where Bigger Models and More Data Hurt (OpenAI) | |
2. [4.5/5] Automatically Discovering and Learning New Visual Categories with Ranking Statistics (VGG, Oxford) | |
https://github.com/k-han/AutoNovel | |
3. [5/5] Learning Robust Representations via Multi-View Information Bottleneck (MSR) | |
https://github.com/mfederici/Multi-View-Information-Bottleneck | |
4. [4/5] Picking Winning Tickets Before Training by Preserving Gradient Flow (UToronto, Vectore Institute) | |
5. [5/5] Differentiable Reasoning over a Virtual Knowledge Base (CMU) | |
https://www.cs.cmu.edu/~bdhingra/pages/drkit.html | |
Andoni et. al. 2015, approx nn | |
6. [4/5] Neural Machine Translation with Universal Visual Representation (NICT, Japan) | |
https://github.com/cooelf/UVR-NMT | |
7. [4/5] Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel (Cognizant AI) | |
https://github.com/leaf-ai/rio-paper | |
8. [4/5] The Shape of Data: Intrinsic Distance for Data Distributions (Technion, Israel) | |
https://github.com/xgfs/imd | |
9. [4/5] Learning from Rules Generalizing Labeled Exemplars (IIT, Bombay) | |
Interesting use of Snorkel | |
https://github.com/awasthiabhijeet/Learning-From-Rules | |
10. [4.5/5] On the Relationship between Self-Attention and Convolutional Layers (EPFL) | |
https://github.com/epfml/attention-cnn | |
11. [5/5] A critical analysis of self-supervision, or what we can learn from a single image (VGG, Oxford) | |
12. [4/5] Learning Space Partitions for Nearest Neighbor Search (MSR) | |
https://anonymous.4open.science/r/cdd789a8-818c-4675-98fd-39f8da656129/ | |
13. [3.5/5] Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth (Butterfly network) | |
14. [4/5] The Early Phase of Neural Network Training (MIT) | |
15. [4.5/5] VL-BERT: Pre-training of Generic Visual-Linguistic Representations (MSR) | |
16. [5/5] Distance-Based Learning from Errors for Confidence Calibration (Google) | |
https://drive.google.com/drive/folders/1UThGvkkvFvKX8ogsfwvdA3uY8xzDlIuL | |
17. [4/5] Understanding and Improving Information Transfer in Multi-Task Learning (Stanford) | |
18. [3.5/5] Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks (KAIST) | |
19. [4/5] Rethinking the Hyperparameters for Fine-tuning (AWS) | |
20. [4/5] Gradient $\ell_1$ Regularization for Quantization Robustness (Qualcomm AI) | |
21. [4/5] Novelty Detection Via Blurring (KAIST) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment