sergicastellasape/top-cited-2021-papers.tsv

## top-cited-2021-papers.tsv

          
            Title
            Tweets
            Citations
            Organization
            Country
            Org Type

            
              Highly accurate protein structure prediction with AlphaFold
              
              8783
              DeepMind, Seoul National University
              South Korea, UK
              industry

            
              Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
              383
              5389
              Microsoft
              USA
              industry

            
              Learning Transferable Visual Models From Natural Language Supervision
              178
              3658
              OpenAI
              USA
              industry

            
              Accurate prediction of protein structures and interactions using a three-track neural network
              
              1659
              Harvard University, Lawrence Berkeley National Laboratory, North-West University, Stanford University, UC Berkeley, University of British Columbia, University of Cambridge, University of Graz, University of Texas Southwestern Medical Center, University of Victoria, University of Washington, University of the Free State
              Austria, Canada, South Africa, UK, USA

            
              Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
              69
              1306
              Inception Institute of AI, Nanjing University, Nanjing University of Science and Technology, SenseTime, University of Hong Kong
              China, UAE
              academia

            
              Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers
              20
              1280
              Fudan University, Meta, Tencent, University of Oxford, University of Surrey
              China, UK, USA
              academia

            
              On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
              
              1241
              Black in AI, University of Washington
              USA
              academia

            
              Masked Autoencoders Are Scalable Vision Learners
              843
              1234
              Meta
              USA
              industry

            
              Emerging Properties in Self-Supervised Vision Transformers
              269
              1219
              INRIA, Meta, Sorbonne University
              France, USA
              industry

            
              Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
              
              1210
              Queensland University of Technology
              Australia
              academia

            
              nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation
              
              1178
              DeepMind, German Cancer Research Center, Heidelberg University Hospital, University of Heidelberg
              Germany, UK
              academia

            
              Zero-Shot Text-to-Image Generation
              155
              1177
              OpenAI
              USA
              industry

            
              TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
              46
              998
              East China Normal University, Johns Hopkins University, PAII Inc., Stanford University, University of Electronic Science and Technology of China
              China, USA
              academia

            
              Barlow Twins: Self-Supervised Learning via Redundancy Reduction
              1076
              951
              Meta, New York University
              USA
              industry

            
              Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
              13
              912
              National University of Singapore, YITU Technology
              China, Singapore
              academia

            
              MLP-Mixer: An all-MLP Architecture for Vision
              671
              896
              Google
              USA
              industry

            
              SimCSE: Simple Contrastive Learning of Sentence Embeddings
              85
              866
              Princeton University, Tsinghua University
              China, USA
              academia

            
              Coordinate Attention for Efficient Mobile Network Design
              49
              860
              National University of Singapore
              Singapore
              academia

            
              SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
              100
              831
              California Institute of Technology, NVIDIA, Nanjing University, University of Hong Kong
              China, USA
              academia

            
              BEiT: BERT Pre-Training of Image Transformers
              143
              785
              Harbin Institute of Technology, Microsoft
              China, USA
              industry

            
              CvT: Introducing Convolutions to Vision Transformers
              
              761
              McGill University, Microsoft
              Canada, USA
              industry

            
              Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
              41
              759
              Google
              USA
              industry

            
              Transformers in Vision: A Survey
              158
              757
              Inception Institute of AI, Mohamed bin Zayed University of Artificial Intelligence, Monash University, University of Central Florida
              Australia, UAE, USA
              academia

            
              Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
              201
              737
              Carnegie Mellon University, National University of Singapore
              Singapore, USA
              academia

            
              EfficientNetV2: Smaller Models and Faster Training
              666
              730
              Google
              USA
              industry

            
              Is Space-Time Attention All You Need for Video Understanding?
              84
              729
              Dartmouth College, Meta
              USA
              academia, industry

            
              ViViT: A Video Vision Transformer
              66
              713
              Google
              USA
              industry

            
              Diffusion Models Beat GANs on Image Synthesis
              566
              694
              OpenAI
              USA
              industry

            
              An Empirical Study of Training Self-Supervised Vision Transformers
              76
              601
              Meta
              USA
              industry

            
              The Power of Scale for Parameter-Efficient Prompt Tuning
              227
              594
              Google
              USA
              industry

            
              SwinIR: Image Restoration Using Swin Transformer
              34
              578
              ETH Zurich, KU Leuven
              Belgium, Switzerland
              academia

            
              Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
              120
              576
              Google
              USA
              industry

            
              Protein complex prediction with AlphaFold-Multimer
              
              561
              DeepMind
              UK
              industry

            
              Bottleneck Transformers for Visual Recognition
              46
              542
              Google, UC Berkeley
              USA
              industry

            
              HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
              43
              534

            
              Alias-Free Generative Adversarial Networks
              77
              520
              Aalto University, NVIDIA
              Finland, USA
              industry

            
              Towards Causal Representation Learning
              117
              504
              CIFAR, ETH Zurich, Google, Max Planck Institute for Intelligent Systems, Mila, University of Montreal
              Canada, Germany, Switzerland, USA
              academia

            
              Vision Transformers for Dense Prediction
              360
              486
              Intel
              USA
              industry

            
              Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
              
              480
              DeepMind, Imperial College London, New York University, Qualcomm, Twitter
              UK, USA
              industry

            
              High-Resolution Image Synthesis with Latent Diffusion Models
              210
              480
              Ludwig Maximilian University of Munich, Runway, University of Heidelberg
              Germany, USA
              academia

            
              Segmenter: Transformer for Semantic Segmentation
              59
              468
              INRIA
              France
              academia

            
              RepVGG: Making VGG-style ConvNets Great Again
              
              467
              Aberystwyth University, Hong Kong University of Science and Technology, Megvii, Tsinghua University
              China, UK
              industry

            
              Multiscale Vision Transformers
              99
              452
              Meta, UC Berkeley
              USA
              industry

            
              CoAtNet: Marrying Convolution and Attention for All Data Sizes
              
              442
              Google
              USA
              industry

            
              CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
              34
              435
              IBM, MIT
              USA
              academia, industry

            
              ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
              
              435
              Kakao Brain, Kakao Enterprise, NAVER
              South Korea
              industry

            
              Video Swin Transformer
              32
              415
              Huazhong University of Science and Technology, Microsoft, Tsinghua University, University of Science and Technology of China
              China, USA
              industry

            
              End-to-End Video Instance Segmentation With Transformers
              
              411
              Meituan, University of Adelaide
              Australia, China
              industry

            
              StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
              602
              401
              Adobe, Hebrew University of Jerusalem, Tel Aviv University
              Israel, USA
              academia

            
              Evaluating Large Language Models Trained on Code
              934
              400
              Anthropic, OpenAI, Zipeline
              USA
              industry

            
              Improved Denoising Diffusion Probabilistic Models
              50
              397
              OpenAI
              USA
              industry

            
              VinVL: Revisiting Visual Representations in Vision-Language Models
              3
              373
              Microsoft, University of Washington
              USA
              industry

            
              ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis
              
              361
              Deakin University, Nanyang Technological University, Ngee Ann Polytechnic, University of Shahrekord
              Australia, Iran, Singapore
              academia

            
              Out-of-Distribution Generalization via Risk Extrapolation (REx)
              
              354
              McGill University, Meta, Mila, University of Montreal, University of Toronto, Vector
              Canada, USA
              academia

            
              UNETR: Transformers for 3D Medical Image Segmentation
              55
              351
              NVIDIA, Vanderbilt University
              USA
              industry

            
              ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
              
              344
              Meta, École normale supérieure
              France, USA
              industry

            
              Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
              30
              337
              Salesforce
              USA
              industry

            
              GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
              1614
              333
              OpenAI
              USA
              industry

            
              Perceiver: General Perception with Iterative Attention
              
              329
              OpenAI
              USA
              industry

            
              Scaling Vision Transformers
              237
              324
              Google
              USA
              industry

            
              VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
              241
              314
              INRIA, Meta, New York University
              France, USA
              academia, industry

            
              Machine learning accelerated computational fluid dynamics
              19
              312
              Google, Harvard University
              USA
              industry

            
              “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
              
              310
              Google
              USA
              industry

            
              Per-Pixel Classification is Not All You Need for Semantic Segmentation
              69
              309
              Meta, University of Illinois Urbana-Champaign
              USA
              industry

            
              Finetuned Language Models Are Zero-Shot Learners
              402
              307
              Google
              USA
              industry

            
              Multitask Prompted Training Enables Zero-Shot Task Generalization
              640
              300
              ASUS, BigScience Team, Birla Institute of Technology and Science, Pilani, Booz Allen Hamilton, Brown University, Charles River Analytics, EleutherAI, Hugging Face, Hyperscience, IBM, IMATAG, INRIA, IRISA, Institute for Infocomm Research, King Fahd University of Petroleum and Minerals, NAVER, Nanyang Technological University, New York University, Parity, SAP, SambaNova Systems, Snorkel AI, Stanford University, UC Berkeley, UC San Diego, University of Rome, University of Virginia, VU Amsterdam, Walmart, ZEALS
              France, Germany, India, Italy, Japan, Netherlands, Saudi Arabia, Singapore, South Korea, Taiwan, UK, USA
              industry

            
              TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
              77
              297
              IBM, MIT, UC Santa Barbara, University of Texas at Austin
              USA
              academia

            
              Scene Text Detection and Recognition: The Deep Learning Era.
              
              294
              ByteDance, Carnegie Mellon University, Megvii
              China, USA
              industry

            
              PlenOctrees for Real-time Rendering of Neural Radiance Fields
              71
              278
              UC Berkeley, University of Southern California
              USA
              academia

            
              High-Performance Large-Scale Image Recognition Without Normalization
              179
              275
              DeepMind
              UK
              industry

            
              Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
              33
              268
              Google, UC Berkeley
              USA
              industry

            
              GPT Understands, Too
              87
              264
              Beijing Academy of Artificial Intelligence, MIT, Recurrent AI, Tsinghua University
              China, USA
              academia

            
              Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
              
              260
              Microsoft, University of North Carolina at Chapel Hill
              USA
              industry

            
              SimMIM: A Simple Framework for Masked Image Modeling
              76
              257
              Microsoft, Tsinghua University, Xi’an Jiaotong University
              China, USA
              industry

            
              VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
              285
              255
              Columbia University, Cornell University, Google
              USA
              industry

            
              Restormer: Efficient Transformer for High-Resolution Image Restoration
              52
              247
              Google, Inception Institute of AI, Mohamed bin Zayed University of Artificial Intelligence, Monash University, UC Merced, Yonsei University
              Australia, South Korea, UAE, USA
              academia, industry

            
              Understanding adversarial attacks on deep learning based medical image analysis systems.
              1
              246
              Beihang University, Chinese Academy of Sciences, National Institute of Informatics, Shanghai Jiao Tong University, University of Melbourne
              Australia, China, Japan
              academia

            
              FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
              1
              245
              Xiaomi
              China
              industry

            
              Calibrate Before Use: Improving Few-Shot Performance of Language Models
              90
              243
              UC Berkeley, UC Irvine, University of Maryland
              USA
              academia

            
              Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
              1
              242
              Microsoft, Peking University
              China, USA
              industry

            
              IBRNet: Learning Multi-View Image-Based Rendering
              8
              241
              Cornell University, Google, Princeton University
              USA
              academia, industry

            
              E(n) Equivariant Graph Neural Networks
              60
              238
              Bosch, University of Amsterdam
              Germany, Netherlands
              academia, industry

            
              LoFTR: Detector-Free Local Feature Matching with Transformers
              95
              238
              SenseTime, Zhejiang University
              China
              academia

            
              Plant leaf disease classification using EfficientNet deep learning model
              
              237
              Iskenderun Technical University, Karabuk University, Kastamonu University
              Turkey
              academia

            
              How Attentive are Graph Attention Networks?
              56
              234
              Carnegie Mellon University, Technion
              Israel, USA
              academia

            
              Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
              15
              234
              Hefei Comprehensive National Science Center, University of Science and Technology of China
              China
              academia

            
              MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
              
              233
              Meta, New York University
              USA
              academia

            
              Learning to Prompt for Vision-Language Models
              61
              231
              Nanyang Technological University
              Singapore
              academia

            
              SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
              
              231
              Carnegie Mellon University, Google, University of Washington
              USA
              industry

            
              Scaling Language Models: Methods, Analysis & Insights from Training Gopher
              
              229
              DeepMind
              UK
              industry

            
              How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned
              42
              220
              Google, UC Berkeley, X, The Moonshot Factory
              USA
              academia, industry

            
              Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
              8
              217
              Google
              USA
              industry

            
              Model-Contrastive Federated Learning
              1
              209
              National University of Singapore, UC Berkeley
              Singapore, USA
              academia

            
              SpeechBrain: A General-Purpose Speech Toolkit
              117
              208
              Aalto University, Academia Sinica, Avignon Université, HEC Montreal, Indian Institute of Technology Madras, Marche Polytechnic University, McGill University, Mila, NVIDIA, Ohio State University, Samsung, Toulouse Institute of Computer Science Research, Toyota Technological Institute at Chicago, University of Cambridge, University of Edinburgh, University of Montreal, University of Sherbrooke
              Canada, Finland, France, India, Italy, South Korea, Taiwan, UK, USA
              academia

            
              MagFace: A Universal Representation for Face Recognition and Quality Assessment
              10
              206
              Aibee
              China
              industry

            
              Offline Reinforcement Learning as One Big Sequence Modeling Problem
              110
              200
              UC Berkeley
              USA
              academia

            
              Unified Pre-training for Program Understanding and Generation
              16
              200
              Columbia University, UC Los Angeles
              USA
              academia

            
              Image Super-Resolution via Iterative Refinement
              401
              198
              Google
              USA
              industry

            
              FastNeRF: High-Fidelity Neural Rendering at 200FPS
              164
              194
              Microsoft
              USA
              industry

            
              BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
              95
              194
              Technical University of Darmstadt
              Germany
              academia

            
              Measurement and Fairness.
              26
              191
              Microsoft, University of Michigan
              USA
              industry
Title	Tweets	Citations	Organization	Country	Org Type
Highly accurate protein structure prediction with AlphaFold		8783	DeepMind, Seoul National University	South Korea, UK	industry
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	383	5389	Microsoft	USA	industry
Learning Transferable Visual Models From Natural Language Supervision	178	3658	OpenAI	USA	industry
Accurate prediction of protein structures and interactions using a three-track neural network		1659	Harvard University, Lawrence Berkeley National Laboratory, North-West University, Stanford University, UC Berkeley, University of British Columbia, University of Cambridge, University of Graz, University of Texas Southwestern Medical Center, University of Victoria, University of Washington, University of the Free State	Austria, Canada, South Africa, UK, USA
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions	69	1306	Inception Institute of AI, Nanjing University, Nanjing University of Science and Technology, SenseTime, University of Hong Kong	China, UAE	academia
Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers	20	1280	Fudan University, Meta, Tencent, University of Oxford, University of Surrey	China, UK, USA	academia
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?		1241	Black in AI, University of Washington	USA	academia
Masked Autoencoders Are Scalable Vision Learners	843	1234	Meta	USA	industry
Emerging Properties in Self-Supervised Vision Transformers	269	1219	INRIA, Meta, Sorbonne University	France, USA	industry
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions		1210	Queensland University of Technology	Australia	academia
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation		1178	DeepMind, German Cancer Research Center, Heidelberg University Hospital, University of Heidelberg	Germany, UK	academia
Zero-Shot Text-to-Image Generation	155	1177	OpenAI	USA	industry
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation	46	998	East China Normal University, Johns Hopkins University, PAII Inc., Stanford University, University of Electronic Science and Technology of China	China, USA	academia
Barlow Twins: Self-Supervised Learning via Redundancy Reduction	1076	951	Meta, New York University	USA	industry
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	13	912	National University of Singapore, YITU Technology	China, Singapore	academia
MLP-Mixer: An all-MLP Architecture for Vision	671	896	Google	USA	industry
SimCSE: Simple Contrastive Learning of Sentence Embeddings	85	866	Princeton University, Tsinghua University	China, USA	academia
Coordinate Attention for Efficient Mobile Network Design	49	860	National University of Singapore	Singapore	academia
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	100	831	California Institute of Technology, NVIDIA, Nanjing University, University of Hong Kong	China, USA	academia
BEiT: BERT Pre-Training of Image Transformers	143	785	Harbin Institute of Technology, Microsoft	China, USA	industry
CvT: Introducing Convolutions to Vision Transformers		761	McGill University, Microsoft	Canada, USA	industry
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision	41	759	Google	USA	industry
Transformers in Vision: A Survey	158	757	Inception Institute of AI, Mohamed bin Zayed University of Artificial Intelligence, Monash University, University of Central Florida	Australia, UAE, USA	academia
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing	201	737	Carnegie Mellon University, National University of Singapore	Singapore, USA	academia
EfficientNetV2: Smaller Models and Faster Training	666	730	Google	USA	industry
Is Space-Time Attention All You Need for Video Understanding?	84	729	Dartmouth College, Meta	USA	academia, industry
ViViT: A Video Vision Transformer	66	713	Google	USA	industry
Diffusion Models Beat GANs on Image Synthesis	566	694	OpenAI	USA	industry
An Empirical Study of Training Self-Supervised Vision Transformers	76	601	Meta	USA	industry
The Power of Scale for Parameter-Efficient Prompt Tuning	227	594	Google	USA	industry
SwinIR: Image Restoration Using Swin Transformer	34	578	ETH Zurich, KU Leuven	Belgium, Switzerland	academia
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	120	576	Google	USA	industry
Protein complex prediction with AlphaFold-Multimer		561	DeepMind	UK	industry
Bottleneck Transformers for Visual Recognition	46	542	Google, UC Berkeley	USA	industry
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units	43	534
Alias-Free Generative Adversarial Networks	77	520	Aalto University, NVIDIA	Finland, USA	industry
Towards Causal Representation Learning	117	504	CIFAR, ETH Zurich, Google, Max Planck Institute for Intelligent Systems, Mila, University of Montreal	Canada, Germany, Switzerland, USA	academia
Vision Transformers for Dense Prediction	360	486	Intel	USA	industry
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges		480	DeepMind, Imperial College London, New York University, Qualcomm, Twitter	UK, USA	industry
High-Resolution Image Synthesis with Latent Diffusion Models	210	480	Ludwig Maximilian University of Munich, Runway, University of Heidelberg	Germany, USA	academia
Segmenter: Transformer for Semantic Segmentation	59	468	INRIA	France	academia
RepVGG: Making VGG-style ConvNets Great Again		467	Aberystwyth University, Hong Kong University of Science and Technology, Megvii, Tsinghua University	China, UK	industry
Multiscale Vision Transformers	99	452	Meta, UC Berkeley	USA	industry
CoAtNet: Marrying Convolution and Attention for All Data Sizes		442	Google	USA	industry
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification	34	435	IBM, MIT	USA	academia, industry
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision		435	Kakao Brain, Kakao Enterprise, NAVER	South Korea	industry
Video Swin Transformer	32	415	Huazhong University of Science and Technology, Microsoft, Tsinghua University, University of Science and Technology of China	China, USA	industry
End-to-End Video Instance Segmentation With Transformers		411	Meituan, University of Adelaide	Australia, China	industry
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery	602	401	Adobe, Hebrew University of Jerusalem, Tel Aviv University	Israel, USA	academia
Evaluating Large Language Models Trained on Code	934	400	Anthropic, OpenAI, Zipeline	USA	industry
Improved Denoising Diffusion Probabilistic Models	50	397	OpenAI	USA	industry
VinVL: Revisiting Visual Representations in Vision-Language Models	3	373	Microsoft, University of Washington	USA	industry
ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis		361	Deakin University, Nanyang Technological University, Ngee Ann Polytechnic, University of Shahrekord	Australia, Iran, Singapore	academia
Out-of-Distribution Generalization via Risk Extrapolation (REx)		354	McGill University, Meta, Mila, University of Montreal, University of Toronto, Vector	Canada, USA	academia
UNETR: Transformers for 3D Medical Image Segmentation	55	351	NVIDIA, Vanderbilt University	USA	industry
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases		344	Meta, École normale supérieure	France, USA	industry
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation	30	337	Salesforce	USA	industry
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models	1614	333	OpenAI	USA	industry
Perceiver: General Perception with Iterative Attention		329	OpenAI	USA	industry
Scaling Vision Transformers	237	324	Google	USA	industry
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning	241	314	INRIA, Meta, New York University	France, USA	academia, industry
Machine learning accelerated computational fluid dynamics	19	312	Google, Harvard University	USA	industry
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI		310	Google	USA	industry
Per-Pixel Classification is Not All You Need for Semantic Segmentation	69	309	Meta, University of Illinois Urbana-Champaign	USA	industry
Finetuned Language Models Are Zero-Shot Learners	402	307	Google	USA	industry
Multitask Prompted Training Enables Zero-Shot Task Generalization	640	300	ASUS, BigScience Team, Birla Institute of Technology and Science, Pilani, Booz Allen Hamilton, Brown University, Charles River Analytics, EleutherAI, Hugging Face, Hyperscience, IBM, IMATAG, INRIA, IRISA, Institute for Infocomm Research, King Fahd University of Petroleum and Minerals, NAVER, Nanyang Technological University, New York University, Parity, SAP, SambaNova Systems, Snorkel AI, Stanford University, UC Berkeley, UC San Diego, University of Rome, University of Virginia, VU Amsterdam, Walmart, ZEALS	France, Germany, India, Italy, Japan, Netherlands, Saudi Arabia, Singapore, South Korea, Taiwan, UK, USA	industry
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up	77	297	IBM, MIT, UC Santa Barbara, University of Texas at Austin	USA	academia
Scene Text Detection and Recognition: The Deep Learning Era.		294	ByteDance, Carnegie Mellon University, Megvii	China, USA	industry
PlenOctrees for Real-time Rendering of Neural Radiance Fields	71	278	UC Berkeley, University of Southern California	USA	academia
High-Performance Large-Scale Image Recognition Without Normalization	179	275	DeepMind	UK	industry
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields	33	268	Google, UC Berkeley	USA	industry
GPT Understands, Too	87	264	Beijing Academy of Artificial Intelligence, MIT, Recurrent AI, Tsinghua University	China, USA	academia
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling		260	Microsoft, University of North Carolina at Chapel Hill	USA	industry
SimMIM: A Simple Framework for Masked Image Modeling	76	257	Microsoft, Tsinghua University, Xi’an Jiaotong University	China, USA	industry
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text	285	255	Columbia University, Cornell University, Google	USA	industry
Restormer: Efficient Transformer for High-Resolution Image Restoration	52	247	Google, Inception Institute of AI, Mohamed bin Zayed University of Artificial Intelligence, Monash University, UC Merced, Yonsei University	Australia, South Korea, UAE, USA	academia, industry
Understanding adversarial attacks on deep learning based medical image analysis systems.	1	246	Beihang University, Chinese Academy of Sciences, National Institute of Informatics, Shanghai Jiao Tong University, University of Melbourne	Australia, China, Japan	academia
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search	1	245	Xiaomi	China	industry
Calibrate Before Use: Improving Few-Shot Performance of Language Models	90	243	UC Berkeley, UC Irvine, University of Maryland	USA	academia
Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision	1	242	Microsoft, Peking University	China, USA	industry
IBRNet: Learning Multi-View Image-Based Rendering	8	241	Cornell University, Google, Princeton University	USA	academia, industry
E(n) Equivariant Graph Neural Networks	60	238	Bosch, University of Amsterdam	Germany, Netherlands	academia, industry
LoFTR: Detector-Free Local Feature Matching with Transformers	95	238	SenseTime, Zhejiang University	China	academia
Plant leaf disease classification using EfficientNet deep learning model		237	Iskenderun Technical University, Karabuk University, Kastamonu University	Turkey	academia
How Attentive are Graph Attention Networks?	56	234	Carnegie Mellon University, Technion	Israel, USA	academia
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking	15	234	Hefei Comprehensive National Science Center, University of Science and Technology of China	China	academia
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding		233	Meta, New York University	USA	academia
Learning to Prompt for Vision-Language Models	61	231	Nanyang Technological University	Singapore	academia
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision		231	Carnegie Mellon University, Google, University of Washington	USA	industry
Scaling Language Models: Methods, Analysis & Insights from Training Gopher		229	DeepMind	UK	industry
How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned	42	220	Google, UC Berkeley, X, The Moonshot Factory	USA	academia, industry
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts	8	217	Google	USA	industry
Model-Contrastive Federated Learning	1	209	National University of Singapore, UC Berkeley	Singapore, USA	academia
SpeechBrain: A General-Purpose Speech Toolkit	117	208	Aalto University, Academia Sinica, Avignon Université, HEC Montreal, Indian Institute of Technology Madras, Marche Polytechnic University, McGill University, Mila, NVIDIA, Ohio State University, Samsung, Toulouse Institute of Computer Science Research, Toyota Technological Institute at Chicago, University of Cambridge, University of Edinburgh, University of Montreal, University of Sherbrooke	Canada, Finland, France, India, Italy, South Korea, Taiwan, UK, USA	academia
MagFace: A Universal Representation for Face Recognition and Quality Assessment	10	206	Aibee	China	industry
Offline Reinforcement Learning as One Big Sequence Modeling Problem	110	200	UC Berkeley	USA	academia
Unified Pre-training for Program Understanding and Generation	16	200	Columbia University, UC Los Angeles	USA	academia
Image Super-Resolution via Iterative Refinement	401	198	Google	USA	industry
FastNeRF: High-Fidelity Neural Rendering at 200FPS	164	194	Microsoft	USA	industry
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models	95	194	Technical University of Darmstadt	Germany	academia
Measurement and Fairness.	26	191	Microsoft, University of Michigan	USA	industry