- denoising diffusion probabilistic models (DDPM)
- self-supervised learning (SSL)
- vision transformers (ViT)
- self-attention
- masked language modeling, cf. BERT
- contrastive instance discrimination
- distillation
- ablation study
- batch normalization
- Dosovitskiy, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv:2010.11929 (2020).
- Brock, et al. "High-performance large-scale image recognition without normalization." arXiv:2102.06171 (2021).
- Caron, et al. "Emerging properties in self-supervised vision transformers." arXiv:2104.14294 (2021).
- Dhariwal, et Nichol. "Diffusion models beat gans on image synthesis." arXiv:2105.05233 (2021).
- Karras, et al. "Alias-free generative adversarial networks." arXiv:2106.12423 (2021).