Created March 8, 2023 08:50
Title Tweets Citations Organization Country Org Type
Highly accurate protein structure prediction with AlphaFold 8783 DeepMind, Seoul National University South Korea, UK industry
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 383 5389 Microsoft USA industry
Learning Transferable Visual Models From Natural Language Supervision 178 3658 OpenAI USA industry
Accurate prediction of protein structures and interactions using a three-track neural network 1659 Harvard University, Lawrence Berkeley National Laboratory, North-West University, Stanford University, UC Berkeley, University of British Columbia, University of Cambridge, University of Graz, University of Texas Southwestern Medical Center, University of Victoria, University of Washington, University of the Free State Austria, Canada, South Africa, UK, USA
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions 69 1306 Inception Institute of AI, Nanjing University, Nanjing University of Science and Technology, SenseTime, University of Hong Kong China, UAE academia
Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers 20 1280 Fudan University, Meta, Tencent, University of Oxford, University of Surrey China, UK, USA academia
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 1241 Black in AI, University of Washington USA academia
Masked Autoencoders Are Scalable Vision Learners 843 1234 Meta USA industry
Emerging Properties in Self-Supervised Vision Transformers 269 1219 INRIA, Meta, Sorbonne University France, USA industry
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions 1210 Queensland University of Technology Australia academia
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation 1178 DeepMind, German Cancer Research Center, Heidelberg University Hospital, University of Heidelberg Germany, UK academia
Zero-Shot Text-to-Image Generation 155 1177 OpenAI USA industry
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation 46 998 East China Normal University, Johns Hopkins University, PAII Inc., Stanford University, University of Electronic Science and Technology of China China, USA academia
Barlow Twins: Self-Supervised Learning via Redundancy Reduction 1076 951 Meta, New York University USA industry
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet 13 912 National University of Singapore, YITU Technology China, Singapore academia
MLP-Mixer: An all-MLP Architecture for Vision 671 896 Google USA industry
SimCSE: Simple Contrastive Learning of Sentence Embeddings 85 866 Princeton University, Tsinghua University China, USA academia
Coordinate Attention for Efficient Mobile Network Design 49 860 National University of Singapore Singapore academia
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers 100 831 California Institute of Technology, NVIDIA, Nanjing University, University of Hong Kong China, USA academia
BEiT: BERT Pre-Training of Image Transformers 143 785 Harbin Institute of Technology, Microsoft China, USA industry
CvT: Introducing Convolutions to Vision Transformers 761 McGill University, Microsoft Canada, USA industry
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 41 759 Google USA industry
Transformers in Vision: A Survey 158 757 Inception Institute of AI, Mohamed bin Zayed University of Artificial Intelligence, Monash University, University of Central Florida Australia, UAE, USA academia
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing 201 737 Carnegie Mellon University, National University of Singapore Singapore, USA academia
EfficientNetV2: Smaller Models and Faster Training 666 730 Google USA industry
Is Space-Time Attention All You Need for Video Understanding? 84 729 Dartmouth College, Meta USA academia, industry
ViViT: A Video Vision Transformer 66 713 Google USA industry
Diffusion Models Beat GANs on Image Synthesis 566 694 OpenAI USA industry
An Empirical Study of Training Self-Supervised Vision Transformers 76 601 Meta USA industry
The Power of Scale for Parameter-Efficient Prompt Tuning 227 594 Google USA industry
SwinIR: Image Restoration Using Swin Transformer 34 578 ETH Zurich, KU Leuven Belgium, Switzerland academia
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 120 576 Google USA industry
Protein complex prediction with AlphaFold-Multimer 561 DeepMind UK industry
Bottleneck Transformers for Visual Recognition 46 542 Google, UC Berkeley USA industry
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units 43 534
Alias-Free Generative Adversarial Networks 77 520 Aalto University, NVIDIA Finland, USA industry
Towards Causal Representation Learning 117 504 CIFAR, ETH Zurich, Google, Max Planck Institute for Intelligent Systems, Mila, University of Montreal Canada, Germany, Switzerland, USA academia
Vision Transformers for Dense Prediction 360 486 Intel USA industry
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges 480 DeepMind, Imperial College London, New York University, Qualcomm, Twitter UK, USA industry
High-Resolution Image Synthesis with Latent Diffusion Models 210 480 Ludwig Maximilian University of Munich, Runway, University of Heidelberg Germany, USA academia
Segmenter: Transformer for Semantic Segmentation 59 468 INRIA France academia
RepVGG: Making VGG-style ConvNets Great Again 467 Aberystwyth University, Hong Kong University of Science and Technology, Megvii, Tsinghua University China, UK industry
Multiscale Vision Transformers 99 452 Meta, UC Berkeley USA industry
CoAtNet: Marrying Convolution and Attention for All Data Sizes 442 Google USA industry
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification 34 435 IBM, MIT USA academia, industry
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 435 Kakao Brain, Kakao Enterprise, NAVER South Korea industry
Video Swin Transformer 32 415 Huazhong University of Science and Technology, Microsoft, Tsinghua University, University of Science and Technology of China China, USA industry
End-to-End Video Instance Segmentation With Transformers 411 Meituan, University of Adelaide Australia, China industry
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery 602 401 Adobe, Hebrew University of Jerusalem, Tel Aviv University Israel, USA academia
Evaluating Large Language Models Trained on Code 934 400 Anthropic, OpenAI, Zipeline USA industry
Improved Denoising Diffusion Probabilistic Models 50 397 OpenAI USA industry
VinVL: Revisiting Visual Representations in Vision-Language Models 3 373 Microsoft, University of Washington USA industry
ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis 361 Deakin University, Nanyang Technological University, Ngee Ann Polytechnic, University of Shahrekord Australia, Iran, Singapore academia
Out-of-Distribution Generalization via Risk Extrapolation (REx) 354 McGill University, Meta, Mila, University of Montreal, University of Toronto, Vector Canada, USA academia
UNETR: Transformers for 3D Medical Image Segmentation 55 351 NVIDIA, Vanderbilt University USA industry
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases 344 Meta, École normale supérieure France, USA industry
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 30 337 Salesforce USA industry
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models 1614 333 OpenAI USA industry
Perceiver: General Perception with Iterative Attention 329 OpenAI USA industry
Scaling Vision Transformers 237 324 Google USA industry
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning 241 314 INRIA, Meta, New York University France, USA academia, industry
Machine learning accelerated computational fluid dynamics 19 312 Google, Harvard University USA industry
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI 310 Google USA industry
Per-Pixel Classification is Not All You Need for Semantic Segmentation 69 309 Meta, University of Illinois Urbana-Champaign USA industry
Finetuned Language Models Are Zero-Shot Learners 402 307 Google USA industry
Multitask Prompted Training Enables Zero-Shot Task Generalization 640 300 ASUS, BigScience Team, Birla Institute of Technology and Science, Pilani, Booz Allen Hamilton, Brown University, Charles River Analytics, EleutherAI, Hugging Face, Hyperscience, IBM, IMATAG, INRIA, IRISA, Institute for Infocomm Research, King Fahd University of Petroleum and Minerals, NAVER, Nanyang Technological University, New York University, Parity, SAP, SambaNova Systems, Snorkel AI, Stanford University, UC Berkeley, UC San Diego, University of Rome, University of Virginia, VU Amsterdam, Walmart, ZEALS France, Germany, India, Italy, Japan, Netherlands, Saudi Arabia, Singapore, South Korea, Taiwan, UK, USA industry
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up 77 297 IBM, MIT, UC Santa Barbara, University of Texas at Austin USA academia
Scene Text Detection and Recognition: The Deep Learning Era. 294 ByteDance, Carnegie Mellon University, Megvii China, USA industry
PlenOctrees for Real-time Rendering of Neural Radiance Fields 71 278 UC Berkeley, University of Southern California USA academia
High-Performance Large-Scale Image Recognition Without Normalization 179 275 DeepMind UK industry
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields 33 268 Google, UC Berkeley USA industry
GPT Understands, Too 87 264 Beijing Academy of Artificial Intelligence, MIT, Recurrent AI, Tsinghua University China, USA academia
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling 260 Microsoft, University of North Carolina at Chapel Hill USA industry
SimMIM: A Simple Framework for Masked Image Modeling 76 257 Microsoft, Tsinghua University, Xi’an Jiaotong University China, USA industry
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text 285 255 Columbia University, Cornell University, Google USA industry
Restormer: Efficient Transformer for High-Resolution Image Restoration 52 247 Google, Inception Institute of AI, Mohamed bin Zayed University of Artificial Intelligence, Monash University, UC Merced, Yonsei University Australia, South Korea, UAE, USA academia, industry
Understanding adversarial attacks on deep learning based medical image analysis systems. 1 246 Beihang University, Chinese Academy of Sciences, National Institute of Informatics, Shanghai Jiao Tong University, University of Melbourne Australia, China, Japan academia
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search 1 245 Xiaomi China industry
Calibrate Before Use: Improving Few-Shot Performance of Language Models 90 243 UC Berkeley, UC Irvine, University of Maryland USA academia
Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision 1 242 Microsoft, Peking University China, USA industry
IBRNet: Learning Multi-View Image-Based Rendering 8 241 Cornell University, Google, Princeton University USA academia, industry
E(n) Equivariant Graph Neural Networks 60 238 Bosch, University of Amsterdam Germany, Netherlands academia, industry
LoFTR: Detector-Free Local Feature Matching with Transformers 95 238 SenseTime, Zhejiang University China academia
Plant leaf disease classification using EfficientNet deep learning model 237 Iskenderun Technical University, Karabuk University, Kastamonu University Turkey academia
How Attentive are Graph Attention Networks? 56 234 Carnegie Mellon University, Technion Israel, USA academia
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking 15 234 Hefei Comprehensive National Science Center, University of Science and Technology of China China academia
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding 233 Meta, New York University USA academia
Learning to Prompt for Vision-Language Models 61 231 Nanyang Technological University Singapore academia
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision 231 Carnegie Mellon University, Google, University of Washington USA industry
Scaling Language Models: Methods, Analysis & Insights from Training Gopher 229 DeepMind UK industry
How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned 42 220 Google, UC Berkeley, X, The Moonshot Factory USA academia, industry
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 8 217 Google USA industry
Model-Contrastive Federated Learning 1 209 National University of Singapore, UC Berkeley Singapore, USA academia
SpeechBrain: A General-Purpose Speech Toolkit 117 208 Aalto University, Academia Sinica, Avignon Université, HEC Montreal, Indian Institute of Technology Madras, Marche Polytechnic University, McGill University, Mila, NVIDIA, Ohio State University, Samsung, Toulouse Institute of Computer Science Research, Toyota Technological Institute at Chicago, University of Cambridge, University of Edinburgh, University of Montreal, University of Sherbrooke Canada, Finland, France, India, Italy, South Korea, Taiwan, UK, USA academia
MagFace: A Universal Representation for Face Recognition and Quality Assessment 10 206 Aibee China industry
Offline Reinforcement Learning as One Big Sequence Modeling Problem 110 200 UC Berkeley USA academia
Unified Pre-training for Program Understanding and Generation 16 200 Columbia University, UC Los Angeles USA academia
Image Super-Resolution via Iterative Refinement 401 198 Google USA industry
FastNeRF: High-Fidelity Neural Rendering at 200FPS 164 194 Microsoft USA industry
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models 95 194 Technical University of Darmstadt Germany academia
Measurement and Fairness. 26 191 Microsoft, University of Michigan USA industry
