Skip to content

Instantly share code, notes, and snippets.

@pbamotra
Last active April 26, 2020 03:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pbamotra/6455da32d3d627663471b23e3ea08b75 to your computer and use it in GitHub Desktop.
Save pbamotra/6455da32d3d627663471b23e3ea08b75 to your computer and use it in GitHub Desktop.
ICLR 2020 Favorites - WIP
Calendar: https://iclr.cc/virtual/calendar.html#tab-calendar
Paper search: https://iclr.cc/virtual/papers.html?filter=keywords
Papers:
1. Title:
Tree-Structured Attention with Hierarchical Accumulation
Authority:
Richard Socher
Url:
https://iclr.cc/virtual/poster_HJxK5pEYvr.html
Reference:
- Pay less attention with lightweight and dynamic convolutions
Code:
sad :(
Perception:
Combine transformers and tree-LSTM to incoporate hierarchical structure of language while
keeping the cost down to same as that for transformers
2. Title:
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Authority:
Sanjeev Arora, Ruslan Salakhutdinov
Url:
https://iclr.cc/virtual/poster_rkl8sJBYvH.html
Reference:
- None
Code:
https://github.com/LeoYu/neural-tangent-kernel-UCI
Perception:
Authors are suggesting that for small datasets with hundreds to few thousand examples, their
method NTK outperforms Random Forests and Neural networks too, so use it out of the box
3. Title:
PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS
Authority:
None
Url:
https://iclr.cc/virtual/poster_SJxpsxrYPS.html
Reference:
- VAE: Kingma and Welling 2013
- MMD: Gretton 2007
- Generative moment matching networks, 2015
- Training generative neural networks via MMD optimization
Code:
https://github.com/Zhiyuan1991/proVLAE (TF, :/)
Perception:
Distangled representations would be a great thing for creative applications and generative
modeling. The authors are improving upon the research work in area of Variational AE.
4. Title:
Self-labelling via simultaneous clustering and representation learning
Authority:
VGG at Oxford
Url:
https://iclr.cc/virtual/poster_Hyx-jyBFPr.html
Reference:
- Deep clustering for unsupervised learning of visual features
Code:
https://github.com/yukimasano/self-label
Perception:
Clustering and representation learning can be done simulataneouly and meaningfully
5. Title:
Robust training with ensemble consensus
Authority:
KAIST
Url:
https://iclr.cc/virtual/poster_ryxOUTVYDH.html
Reference:
- None
Code:
sad :(
Perception:
Use ensembles to differentiate between clean labels and noisy labels both of which can
produce small loss however the difference is that clean labels will generalize while
noisy labels will be memorized.
6. Title:
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
Authority:
Apple
Url:
https://iclr.cc/virtual/poster_rygFWAEFwS.html
Reference:
- None
Code:
sad :(
Perception:
Applied science paper. But, looks weak for conference such as ICLR, nevertheless, good
result. Lot of optimisations happening in this direction. Look at those too!
7. Title:
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
Authority:
Apple
Url:
https://iclr.cc/virtual/poster_rygFWAEFwS.html
Reference:
- None
Code:
sad :(
Perception:
Applied science paper. But, good result. Lot of optimisations happening in this direction.
7. Title:
Target-Embedding Autoencoders for Supervised Representation Learning
Authority:
Cambridge
Url:
https://iclr.cc/virtual/poster_BygXFkSYDH.html
Reference:
- None
Code:
sad :(
Perception:
Target reconstruction loss as regularization for neural network classifiers
8. Title:
Ridge Regression: Structure, Cross-Validation, and Sketching
Authority:
Stanford, UPenn
Url:
https://iclr.cc/virtual/poster_HklRwaEKwB.html
Reference:
- None
Code:
https://github.com/liusf15/RidgeRegression
Perception:
Too mathematical, intro video is not so good at explanation and incomplete!
9. Title:
Encoding word order in complex embeddings
Authority:
UCopenhegen
Url:
https://iclr.cc/virtual/poster_Hke-WTVtwr.html
Reference:
- None
Code:
https://github.com/iclr-complex-order/complex-order (TF, :/)
Perception:
Finds an alternative to the position encoding (PE in BERT for eg.) so as to encode
word order in the word embeddings
10. Title:
An Exponential Learning Rate Schedule for Deep Learning
Authority:
Sanjeev Arora
Url:
https://iclr.cc/virtual/poster_rJg8TeSFDH.html
Reference:
- Fix your classifier: the marginal value of training the last weight layer
Code:
sad :(
Perception:
Training can be done using SGD with momentum and an exponentially in- creasing learning
rate schedule
11. Title:
Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
Authority:
Rice and TAMU
Url:
https://iclr.cc/virtual/poster_BJxsrgStvr.html
Reference:
- Progressive Pruning, Frankle 2019
Code:
https://github.com/RICE-EIC/Early-Bird-Tickets
Perception:
Applied science, training scheme proposed, good results, faster training. Compare hamming
distance between epochs.
Workshops:
Url:
https://iclr.cc/virtual/workshops_8.html
Site:
https://sites.google.com/nyu.edu/ml-irl-2020/
Papers:
- ATTENTION-BASED PROTOTYPICAL LEARNING (Google)
- GETTING A CLUE: A METHOD FOR EXPLAINING UNCERTAINTY ESTIMATES (MSR)
- MACHINE LEARNING FOR DIGITAL TRY-ON (UMaryland)
- IDENTIFYING INTERPRETABLE WORD VECTOR SUBSPACES WITH PCA (Harvard)
Misc:
1. [3.5/5] Deep Double Descent: Where Bigger Models and More Data Hurt (OpenAI)
2. [4.5/5] Automatically Discovering and Learning New Visual Categories with Ranking Statistics (VGG, Oxford)
https://github.com/k-han/AutoNovel
3. [5/5] Learning Robust Representations via Multi-View Information Bottleneck (MSR)
https://github.com/mfederici/Multi-View-Information-Bottleneck
4. [4/5] Picking Winning Tickets Before Training by Preserving Gradient Flow (UToronto, Vectore Institute)
5. [5/5] Differentiable Reasoning over a Virtual Knowledge Base (CMU)
https://www.cs.cmu.edu/~bdhingra/pages/drkit.html
Andoni et. al. 2015, approx nn
6. [4/5] Neural Machine Translation with Universal Visual Representation (NICT, Japan)
https://github.com/cooelf/UVR-NMT
7. [4/5] Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel (Cognizant AI)
https://github.com/leaf-ai/rio-paper
8. [4/5] The Shape of Data: Intrinsic Distance for Data Distributions (Technion, Israel)
https://github.com/xgfs/imd
9. [4/5] Learning from Rules Generalizing Labeled Exemplars (IIT, Bombay)
Interesting use of Snorkel
https://github.com/awasthiabhijeet/Learning-From-Rules
10. [4.5/5] On the Relationship between Self-Attention and Convolutional Layers (EPFL)
https://github.com/epfml/attention-cnn
11. [5/5] A critical analysis of self-supervision, or what we can learn from a single image (VGG, Oxford)
12. [4/5] Learning Space Partitions for Nearest Neighbor Search (MSR)
https://anonymous.4open.science/r/cdd789a8-818c-4675-98fd-39f8da656129/
13. [3.5/5] Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth (Butterfly network)
14. [4/5] The Early Phase of Neural Network Training (MIT)
15. [4.5/5] VL-BERT: Pre-training of Generic Visual-Linguistic Representations (MSR)
16. [5/5] Distance-Based Learning from Errors for Confidence Calibration (Google)
https://drive.google.com/drive/folders/1UThGvkkvFvKX8ogsfwvdA3uY8xzDlIuL
17. [4/5] Understanding and Improving Information Transfer in Multi-Task Learning (Stanford)
18. [3.5/5] Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks (KAIST)
19. [4/5] Rethinking the Hyperparameters for Fine-tuning (AWS)
20. [4/5] Gradient $\ell_1$ Regularization for Quantization Robustness (Qualcomm AI)
21. [4/5] Novelty Detection Via Blurring (KAIST)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment