Skip to content

Instantly share code, notes, and snippets.

@jamesqo
Last active March 27, 2024 15:48
Show Gist options
  • Save jamesqo/c0e06cf995c9e25740126f2bca124919 to your computer and use it in GitHub Desktop.
Save jamesqo/c0e06cf995c9e25740126f2bca124919 to your computer and use it in GitHub Desktop.

NCI-CRDC Datahub

The Cancer Research Data Commons (CRDC) is an initiative by the National Cancer Institute (NCI) that provides access to multiple cancer data sources from the federal government. Sources include the Genomic Data Commons (GDC), Proteomic Data Commons (PDC), and others.

This directory contains NCI-CRDC studies generated using the ISB-CGC portal. Data is pulled from the ISB-CGC BigQuery tables once every 3 months and reflects the latest data available for each study. More details about methods and data transformations can be found in the README files for each individual study.

Program Overview

TCGA

  • Cancer type mapping: Each study corresponds to one TCGA project. The suffix of the TCGA project is taken and converted to an OncoTree code, which is used for the name of the study.
    • Example: For the TCGA project TCGA-LAML, the LAML suffix is taken and converted to the Oncotree code AML. The resulting cBioPortal study is aml_tcga_gdc.
    • Mapping file

List of TCGA cBioPortal Studies

  • acc_tcga_gdc: TCGA-ACC, Adenocortical Carcinoma
  • aml_tcga_gdc: TCGA-LAML, Acute Myeloid Leukemia
  • blca_tcga_gdc: TCGA-BLCA, Bladder Urothelial Carcinoma
  • brca_tcga_gdc: TCGA-BRCA, Breast Invasive Carcinoma
  • ccrcc_tcga_gdc: TCGA-KIRC, Kidney Renal Clear Cell Carcinoma
  • cesc_tcga_gdc: TCGA-CESC, Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
  • chol_tcga_gdc: TCGA-CHOL, Cholangiocarcinoma
  • chrcc_tcga_gdc: TCGA-KICH, Kidney Chromophobe
  • coad_tcga_gdc: TCGA-COAD, Colon Adenocarcinoma
  • difg_tcga_gdc: TCGA-LGG, Brain Lower Grade Glioma
  • dlbclnos_tcga_gdc: TCGA-DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma
  • esca_tcga_gdc: TCGA-ESCA, Esophageal Carcinoma
  • gbm_tcga_gdc: TCGA-GBM, Glioblastoma Multiforme
  • hcc_tcga_gdc: TCGA-LIHC, Liver Hepatocellular Carcinoma
  • hgsoc_tcga_gdc: TCGA-OV, Ovarian Serous Cystadenocarcinoma
  • hnsc_tcga_gdc: TCGA-HNSC, Head and Neck Squamous Cell Carcinoma
  • luad_tcga_gdc: TCGA-LUAD, Lung Adenocarcinoma
  • lusc_tcga_gdc: TCGA-LUSC, Lung Squamous Cell Carcinoma
  • mnet_tcga_gdc: TCGA-PCPG, Pheochromocytoma and Paraganglioma
  • nsgct_tcga_gdc: TCGA-TGCT, Testicular Germ Cell Tumors
  • paad_tcga_gdc: TCGA-PAAD, Pancreatic Adenocarcinoma
  • plmeso_tcga_gdc: TCGA-MESO, Mesothelioma
  • prad_tcga_gdc: TCGA-PRAD, Prostate Adenocarcinoma
  • prcc_tcga_gdc: TCGA-KIRP, Kidney Renal Papillary Cell Carcinoma
  • read_tcga_gdc: TCGA-READ, Rectum Adenocarcinoma
  • skcm_tcga_gdc: TCGA-SKCM, Skin Cutaneous Melanoma
  • soft_tissue_tcga_gdc: TCGA-SARC, Sarcoma
  • stad_tcga_gdc: TCGA-STAD, Stomach Adenocarcinoma
  • thpa_tcga_gdc: TCGA-THCA, Thyroid Carcinoma
  • thym_tcga_gdc: TCGA-THYM, Thymoma
  • ucec_tcga_gdc: TCGA-UCEC, Uterine Corpus Endometrial Carcinoma
  • ucs_tcga_gdc: TCGA-UCS, Uterine Carcinosarcoma
  • um_tcga_gdc: TCGA-UVM, Uveal Melanoma

CPTAC

  • Cancer type mapping: CPTAC is comprised of the CPTAC-2 and CPTAC-3 projects, both of which encompass multiple cancer types. Each study corresponds to a subset of these projects with a particular OncoTree code. The code is determined by looking at disease_type and primary_site in the BigQuery tables.
    • Example: The luad_cptac study is generated from all samples with primary site Bronchus and lung and disease type Adenomas and Adenocarcinomas.
    • Mapping file)

List of CPTAC cBioPortal Studies

TARGET

  • Cancer type mapping: Each study corresponds to one or more TARGET projects. The name of the project is derived from the OncoTree code defined in the mapping file.
    • Example: The bll_target_gdc study has OncoTree code BLL and is sourced from the GDC projects TARGET-ALL-P1 and TARGET-ALL-P2.
    • Mapping file

List of TARGET cBioPortal Studies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment