Skip to content

Instantly share code, notes, and snippets.

@MantejGill
Created December 7, 2023 12:51
Show Gist options
  • Save MantejGill/42a221c8a5819fef00d66585556798a6 to your computer and use it in GitHub Desktop.
Save MantejGill/42a221c8a5819fef00d66585556798a6 to your computer and use it in GitHub Desktop.
Dataset Type Description Link
BenchMD Medical Modalities The BenchMD benchmark consists of 19 real-world medical datasets across 7 medical modalities, including X-ray, CT, MRI, ultrasound, fundus, OCT, and pathology https://www.rajpurkarlab.hms.harvard.edu/benchmd
ImageNet Image Classification The ImageNet dataset is a large-scale image classification dataset with over 1.2 million images in 1,000 categories https://www.image-net.org/
COCO Object Detection The COCO dataset is a large-scale object detection, segmentation, and captioning dataset with over 330,000 images and 2.5 million object instances labeled across 80 object categories https://cocodataset.org/#home
GLUE Natural Language Processing The GLUE benchmark is a collection of nine natural language understanding tasks, including sentiment analysis, question answering, and textual entailment https://gluebenchmark.com/
Tencent-MVSE Video Similarity The Tencent-MVSE dataset is a large-scale benchmark dataset for multi-modal video similarity evaluation, which includes video frames, Chinese title, ASR text, and several semantic tags https://tencent-mvse.github.io/
MultiBench Multimodal Learning The MultiBench dataset is a large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas https://github.com/pliang279/MultiBench
Penn Machine Learning Benchmarks (PMLB) Supervised Learning Penn Machine Learning Benchmarks (PMLB) is a large collection of curated benchmark datasets for evaluating and comparing supervised machine learning algorithms. https://epistasislab.github.io/pmlb/
MNIST Image Classification The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. https://www.tensorflow.org/datasets/catalog/mnist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment