pbamotra/iclr-2020-favorites-pbamotra.txt

## iclr-2020-favorites-pbamotra.txt
Calendar: https://iclr.cc/virtual/calendar.html#tab-calendar
Paper search: https://iclr.cc/virtual/papers.html?filter=keywords

Papers:
1.  Title:
        Tree-Structured Attention with Hierarchical Accumulation
    Authority:
        Richard Socher
    Url:
        https://iclr.cc/virtual/poster_HJxK5pEYvr.html
    Reference:
        - Pay less attention with lightweight and dynamic convolutions
    Code:
        sad :(
    Perception:
        Combine transformers and tree-LSTM to incoporate hierarchical structure of language while
        keeping the cost down to same as that for transformers

2.  Title:
        Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
    Authority:
        Sanjeev Arora, Ruslan Salakhutdinov
    Url:
        https://iclr.cc/virtual/poster_rkl8sJBYvH.html
    Reference:
        - None
    Code:
        https://github.com/LeoYu/neural-tangent-kernel-UCI
    Perception:
        Authors are suggesting that for small datasets with hundreds to few thousand examples, their
        method NTK outperforms Random Forests and Neural networks too, so use it out of the box

3.  Title:
        PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS
    Authority:
        None
    Url:
        https://iclr.cc/virtual/poster_SJxpsxrYPS.html
    Reference:
        - VAE: Kingma and Welling 2013
        - MMD: Gretton 2007
        - Generative moment matching networks, 2015
        - Training generative neural networks via MMD optimization
    Code:
        https://github.com/Zhiyuan1991/proVLAE (TF, :/)
    Perception:
        Distangled representations would be a great thing for creative applications and generative
        modeling. The authors are improving upon the research work in area of Variational AE.


4.  Title:
        Self-labelling via simultaneous clustering and representation learning
    Authority:
        VGG at Oxford
    Url:
        https://iclr.cc/virtual/poster_Hyx-jyBFPr.html
    Reference:
        - Deep clustering for unsupervised learning of visual features
    Code:
        https://github.com/yukimasano/self-label
    Perception:
        Clustering and representation learning can be done simulataneouly and meaningfully


5.  Title:
        Robust training with ensemble consensus
    Authority:
        KAIST
    Url:
        https://iclr.cc/virtual/poster_ryxOUTVYDH.html
    Reference:
        - None
    Code:
        sad :(
    Perception:
        Use ensembles to differentiate between clean labels and noisy labels both of which can
        produce small loss however the difference is that clean labels will generalize while
        noisy labels will be memorized.

6.  Title:
        Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
    Authority:
        Apple
    Url:
        https://iclr.cc/virtual/poster_rygFWAEFwS.html
    Reference:
        - None
    Code:
        sad :(
    Perception:
        Applied science paper. But, looks weak for conference such as ICLR, nevertheless, good
        result. Lot of optimisations happening in this direction. Look at those too!


7.  Title:
        Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
    Authority:
        Apple
    Url:
        https://iclr.cc/virtual/poster_rygFWAEFwS.html
    Reference:
        - None
    Code:
        sad :(
    Perception:
        Applied science paper. But, good result. Lot of optimisations happening in this direction.

7.  Title:
        Target-Embedding Autoencoders for Supervised Representation Learning
    Authority:
        Cambridge
    Url:
        https://iclr.cc/virtual/poster_BygXFkSYDH.html
    Reference:
        - None
    Code:
        sad :(
    Perception:
        Target reconstruction loss as regularization for neural network classifiers


8.  Title:
        Ridge Regression: Structure, Cross-Validation, and Sketching
    Authority:
        Stanford, UPenn
    Url:
        https://iclr.cc/virtual/poster_HklRwaEKwB.html
    Reference:
        - None
    Code:
        https://github.com/liusf15/RidgeRegression
    Perception:
        Too mathematical, intro video is not so good at explanation and incomplete!

9.  Title:
        Encoding word order in complex embeddings
    Authority:
        UCopenhegen
    Url:
        https://iclr.cc/virtual/poster_Hke-WTVtwr.html
    Reference:
        - None
    Code:
        https://github.com/iclr-complex-order/complex-order (TF, :/)
    Perception:
        Finds an alternative to the position encoding (PE in BERT for eg.) so as to encode
        word order in the word embeddings

10. Title:
        An Exponential Learning Rate Schedule for Deep Learning
    Authority:
        Sanjeev Arora
    Url:
        https://iclr.cc/virtual/poster_rJg8TeSFDH.html
    Reference:
        - Fix your classifier: the marginal value of training the last weight layer
    Code:
        sad :(
    Perception:
        Training can be done using SGD with momentum and an exponentially in- creasing learning
        rate schedule

11. Title:
        Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
    Authority:
        Rice and TAMU
    Url:
        https://iclr.cc/virtual/poster_BJxsrgStvr.html
    Reference:
        - Progressive Pruning, Frankle 2019
    Code:
        https://github.com/RICE-EIC/Early-Bird-Tickets
    Perception:
         Applied science, training scheme proposed, good results, faster training. Compare hamming
         distance between epochs.


Workshops:
    Url:
        https://iclr.cc/virtual/workshops_8.html
    Site:
        https://sites.google.com/nyu.edu/ml-irl-2020/
    Papers:
        - ATTENTION-BASED PROTOTYPICAL LEARNING (Google)
        - GETTING A CLUE: A METHOD FOR EXPLAINING UNCERTAINTY ESTIMATES (MSR)
        - MACHINE LEARNING FOR DIGITAL TRY-ON (UMaryland)
        - IDENTIFYING INTERPRETABLE WORD VECTOR SUBSPACES WITH PCA (Harvard)

Misc:
1. [3.5/5] Deep Double Descent: Where Bigger Models and More Data Hurt (OpenAI)
2. [4.5/5] Automatically Discovering and Learning New Visual Categories with Ranking Statistics (VGG, Oxford)
    https://github.com/k-han/AutoNovel
3. [5/5] Learning Robust Representations via Multi-View Information Bottleneck (MSR)
    https://github.com/mfederici/Multi-View-Information-Bottleneck
4. [4/5] Picking Winning Tickets Before Training by Preserving Gradient Flow (UToronto, Vectore Institute)
5. [5/5] Differentiable Reasoning over a Virtual Knowledge Base (CMU)
    https://www.cs.cmu.edu/~bdhingra/pages/drkit.html
    Andoni et. al. 2015, approx nn
6. [4/5] Neural Machine Translation with Universal Visual Representation (NICT, Japan)
    https://github.com/cooelf/UVR-NMT
7. [4/5] Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel (Cognizant AI)
    https://github.com/leaf-ai/rio-paper
8. [4/5] The Shape of Data: Intrinsic Distance for Data Distributions (Technion, Israel)
    https://github.com/xgfs/imd
9. [4/5] Learning from Rules Generalizing Labeled Exemplars (IIT, Bombay)
    Interesting use of Snorkel
    https://github.com/awasthiabhijeet/Learning-From-Rules
10. [4.5/5] On the Relationship between Self-Attention and Convolutional Layers (EPFL)
     https://github.com/epfml/attention-cnn
11. [5/5] A critical analysis of self-supervision, or what we can learn from a single image (VGG, Oxford)
12. [4/5] Learning Space Partitions for Nearest Neighbor Search (MSR)
     https://anonymous.4open.science/r/cdd789a8-818c-4675-98fd-39f8da656129/
13. [3.5/5] Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth (Butterfly network)
14. [4/5] The Early Phase of Neural Network Training (MIT)
15. [4.5/5] VL-BERT: Pre-training of Generic Visual-Linguistic Representations (MSR)
16. [5/5] Distance-Based Learning from Errors for Confidence Calibration (Google)
     https://drive.google.com/drive/folders/1UThGvkkvFvKX8ogsfwvdA3uY8xzDlIuL
17. [4/5] Understanding and Improving Information Transfer in Multi-Task Learning (Stanford)
18. [3.5/5] Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks (KAIST)
19. [4/5] Rethinking the Hyperparameters for Fine-tuning (AWS)
20. [4/5] Gradient $\ell_1$ Regularization for Quantization Robustness (Qualcomm AI)
21. [4/5] Novelty Detection Via Blurring (KAIST)
	Calendar: https://iclr.cc/virtual/calendar.html#tab-calendar
	Paper search: https://iclr.cc/virtual/papers.html?filter=keywords

	Papers:
	1. Title:
	Tree-Structured Attention with Hierarchical Accumulation
	Authority:
	Richard Socher
	Url:
	https://iclr.cc/virtual/poster_HJxK5pEYvr.html
	Reference:
	- Pay less attention with lightweight and dynamic convolutions
	Code:
	sad :(
	Perception:
	Combine transformers and tree-LSTM to incoporate hierarchical structure of language while
	keeping the cost down to same as that for transformers

	2. Title:
	Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
	Authority:
	Sanjeev Arora, Ruslan Salakhutdinov
	Url:
	https://iclr.cc/virtual/poster_rkl8sJBYvH.html
	Reference:
	- None
	Code:
	https://github.com/LeoYu/neural-tangent-kernel-UCI
	Perception:
	Authors are suggesting that for small datasets with hundreds to few thousand examples, their
	method NTK outperforms Random Forests and Neural networks too, so use it out of the box

	3. Title:
	PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS
	Authority:
	None
	Url:
	https://iclr.cc/virtual/poster_SJxpsxrYPS.html
	Reference:
	- VAE: Kingma and Welling 2013
	- MMD: Gretton 2007
	- Generative moment matching networks, 2015
	- Training generative neural networks via MMD optimization
	Code:
	https://github.com/Zhiyuan1991/proVLAE (TF, :/)
	Perception:
	Distangled representations would be a great thing for creative applications and generative
	modeling. The authors are improving upon the research work in area of Variational AE.


	4. Title:
	Self-labelling via simultaneous clustering and representation learning
	Authority:
	VGG at Oxford
	Url:
	https://iclr.cc/virtual/poster_Hyx-jyBFPr.html
	Reference:
	- Deep clustering for unsupervised learning of visual features
	Code:
	https://github.com/yukimasano/self-label
	Perception:
	Clustering and representation learning can be done simulataneouly and meaningfully


	5. Title:
	Robust training with ensemble consensus
	Authority:
	KAIST
	Url:
	https://iclr.cc/virtual/poster_ryxOUTVYDH.html
	Reference:
	- None
	Code:
	sad :(
	Perception:
	Use ensembles to differentiate between clean labels and noisy labels both of which can
	produce small loss however the difference is that clean labels will generalize while
	noisy labels will be memorized.

	6. Title:
	Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
	Authority:
	Apple
	Url:
	https://iclr.cc/virtual/poster_rygFWAEFwS.html
	Reference:
	- None
	Code:
	sad :(
	Perception:
	Applied science paper. But, looks weak for conference such as ICLR, nevertheless, good
	result. Lot of optimisations happening in this direction. Look at those too!


	7. Title:
	Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
	Authority:
	Apple
	Url:
	https://iclr.cc/virtual/poster_rygFWAEFwS.html
	Reference:
	- None
	Code:
	sad :(
	Perception:
	Applied science paper. But, good result. Lot of optimisations happening in this direction.

	7. Title:
	Target-Embedding Autoencoders for Supervised Representation Learning
	Authority:
	Cambridge
	Url:
	https://iclr.cc/virtual/poster_BygXFkSYDH.html
	Reference:
	- None
	Code:
	sad :(
	Perception:
	Target reconstruction loss as regularization for neural network classifiers


	8. Title:
	Ridge Regression: Structure, Cross-Validation, and Sketching
	Authority:
	Stanford, UPenn
	Url:
	https://iclr.cc/virtual/poster_HklRwaEKwB.html
	Reference:
	- None
	Code:
	https://github.com/liusf15/RidgeRegression
	Perception:
	Too mathematical, intro video is not so good at explanation and incomplete!

	9. Title:
	Encoding word order in complex embeddings
	Authority:
	UCopenhegen
	Url:
	https://iclr.cc/virtual/poster_Hke-WTVtwr.html
	Reference:
	- None
	Code:
	https://github.com/iclr-complex-order/complex-order (TF, :/)
	Perception:
	Finds an alternative to the position encoding (PE in BERT for eg.) so as to encode
	word order in the word embeddings

	10. Title:
	An Exponential Learning Rate Schedule for Deep Learning
	Authority:
	Sanjeev Arora
	Url:
	https://iclr.cc/virtual/poster_rJg8TeSFDH.html
	Reference:
	- Fix your classifier: the marginal value of training the last weight layer
	Code:
	sad :(
	Perception:
	Training can be done using SGD with momentum and an exponentially in- creasing learning
	rate schedule

	11. Title:
	Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
	Authority:
	Rice and TAMU
	Url:
	https://iclr.cc/virtual/poster_BJxsrgStvr.html
	Reference:
	- Progressive Pruning, Frankle 2019
	Code:
	https://github.com/RICE-EIC/Early-Bird-Tickets
	Perception:
	Applied science, training scheme proposed, good results, faster training. Compare hamming
	distance between epochs.


	Workshops:
	Url:
	https://iclr.cc/virtual/workshops_8.html
	Site:
	https://sites.google.com/nyu.edu/ml-irl-2020/
	Papers:
	- ATTENTION-BASED PROTOTYPICAL LEARNING (Google)
	- GETTING A CLUE: A METHOD FOR EXPLAINING UNCERTAINTY ESTIMATES (MSR)
	- MACHINE LEARNING FOR DIGITAL TRY-ON (UMaryland)
	- IDENTIFYING INTERPRETABLE WORD VECTOR SUBSPACES WITH PCA (Harvard)

	Misc:
	1. [3.5/5] Deep Double Descent: Where Bigger Models and More Data Hurt (OpenAI)
	2. [4.5/5] Automatically Discovering and Learning New Visual Categories with Ranking Statistics (VGG, Oxford)
	https://github.com/k-han/AutoNovel
	3. [5/5] Learning Robust Representations via Multi-View Information Bottleneck (MSR)
	https://github.com/mfederici/Multi-View-Information-Bottleneck
	4. [4/5] Picking Winning Tickets Before Training by Preserving Gradient Flow (UToronto, Vectore Institute)
	5. [5/5] Differentiable Reasoning over a Virtual Knowledge Base (CMU)
	https://www.cs.cmu.edu/~bdhingra/pages/drkit.html
	Andoni et. al. 2015, approx nn
	6. [4/5] Neural Machine Translation with Universal Visual Representation (NICT, Japan)
	https://github.com/cooelf/UVR-NMT
	7. [4/5] Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel (Cognizant AI)
	https://github.com/leaf-ai/rio-paper
	8. [4/5] The Shape of Data: Intrinsic Distance for Data Distributions (Technion, Israel)
	https://github.com/xgfs/imd
	9. [4/5] Learning from Rules Generalizing Labeled Exemplars (IIT, Bombay)
	Interesting use of Snorkel
	https://github.com/awasthiabhijeet/Learning-From-Rules
	10. [4.5/5] On the Relationship between Self-Attention and Convolutional Layers (EPFL)
	https://github.com/epfml/attention-cnn
	11. [5/5] A critical analysis of self-supervision, or what we can learn from a single image (VGG, Oxford)
	12. [4/5] Learning Space Partitions for Nearest Neighbor Search (MSR)
	https://anonymous.4open.science/r/cdd789a8-818c-4675-98fd-39f8da656129/
	13. [3.5/5] Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth (Butterfly network)
	14. [4/5] The Early Phase of Neural Network Training (MIT)
	15. [4.5/5] VL-BERT: Pre-training of Generic Visual-Linguistic Representations (MSR)
	16. [5/5] Distance-Based Learning from Errors for Confidence Calibration (Google)
	https://drive.google.com/drive/folders/1UThGvkkvFvKX8ogsfwvdA3uY8xzDlIuL
	17. [4/5] Understanding and Improving Information Transfer in Multi-Task Learning (Stanford)
	18. [3.5/5] Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks (KAIST)
	19. [4/5] Rethinking the Hyperparameters for Fine-tuning (AWS)
	20. [4/5] Gradient $\ell_1$ Regularization for Quantization Robustness (Qualcomm AI)
	21. [4/5] Novelty Detection Via Blurring (KAIST)