MantejGill/benchmark_datasets.csv

## benchmark_datasets.csv

          
            Dataset
            Type
            Description
            Link

            
              BenchMD
              Medical Modalities
              The BenchMD benchmark consists of 19 real-world medical datasets across 7 medical modalities, including X-ray, CT, MRI, ultrasound, fundus, OCT, and pathology
              https://www.rajpurkarlab.hms.harvard.edu/benchmd

            
              ImageNet
              Image Classification
              The ImageNet dataset is a large-scale image classification dataset with over 1.2 million images in 1,000 categories
              https://www.image-net.org/

            
              COCO
              Object Detection
              The COCO dataset is a large-scale object detection, segmentation, and captioning dataset with over 330,000 images and 2.5 million object instances labeled across 80 object categories
              https://cocodataset.org/#home

            
              GLUE
              Natural Language Processing
              The GLUE benchmark is a collection of nine natural language understanding tasks, including sentiment analysis, question answering, and textual entailment
              https://gluebenchmark.com/

            
              Tencent-MVSE
              Video Similarity
              The Tencent-MVSE dataset is a large-scale benchmark dataset for multi-modal video similarity evaluation, which includes video frames, Chinese title, ASR text, and several semantic tags
              https://tencent-mvse.github.io/

            
              MultiBench
              Multimodal Learning
              The MultiBench dataset is a large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas
              https://github.com/pliang279/MultiBench

            
              Penn Machine Learning Benchmarks (PMLB)
              Supervised Learning
              Penn Machine Learning Benchmarks (PMLB) is a large collection of curated benchmark datasets for evaluating and comparing supervised machine learning algorithms.
              https://epistasislab.github.io/pmlb/

            
              MNIST
              Image Classification
              The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems.
              https://www.tensorflow.org/datasets/catalog/mnist
Dataset	Type	Description	Link
BenchMD	Medical Modalities	The BenchMD benchmark consists of 19 real-world medical datasets across 7 medical modalities, including X-ray, CT, MRI, ultrasound, fundus, OCT, and pathology	https://www.rajpurkarlab.hms.harvard.edu/benchmd
ImageNet	Image Classification	The ImageNet dataset is a large-scale image classification dataset with over 1.2 million images in 1,000 categories	https://www.image-net.org/
COCO	Object Detection	The COCO dataset is a large-scale object detection, segmentation, and captioning dataset with over 330,000 images and 2.5 million object instances labeled across 80 object categories	https://cocodataset.org/#home
GLUE	Natural Language Processing	The GLUE benchmark is a collection of nine natural language understanding tasks, including sentiment analysis, question answering, and textual entailment	https://gluebenchmark.com/
Tencent-MVSE	Video Similarity	The Tencent-MVSE dataset is a large-scale benchmark dataset for multi-modal video similarity evaluation, which includes video frames, Chinese title, ASR text, and several semantic tags	https://tencent-mvse.github.io/
MultiBench	Multimodal Learning	The MultiBench dataset is a large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas	https://github.com/pliang279/MultiBench
Penn Machine Learning Benchmarks (PMLB)	Supervised Learning	Penn Machine Learning Benchmarks (PMLB) is a large collection of curated benchmark datasets for evaluating and comparing supervised machine learning algorithms.	https://epistasislab.github.io/pmlb/
MNIST	Image Classification	The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems.	https://www.tensorflow.org/datasets/catalog/mnist