- Study is updated once every 3 months with latest data from ISB-CGC BigQuery tables
- Reference genome used: hg38
- Only tumor sample data is included (no normal samples)
-
Patient data: Retrieved from BigQuery table
isb-cgc-bq.TCGA.clinical_gdc_current
-
Sample data: Retrieved from BigQuery table
isb-cgc-bq.TCGA.biospecimen_gdc_current
-
DFS_STATUS
andDFS_MONTHS
are unavailable from BigQuery, so instead they're pulled from existing TCGA studies in datahub.- First, the corresponding pancan study is checked. If the patient ID is not found there, then the value from the legacy TCGA study is used.
-
Transformations
AGE
is clipped from 18 to 89.OS_MONTHS
is converted fromdemo__days_to_death
when that value if present. If the patient is still alive, it is converted fromdiag__days_to_last_follow_up
.
-
Remapped columns: TODO
- Retrieved from BigQuery table
isb-cgc-bq.TCGA.copy_number_gene_level_hg38_gdc_current
- Transformations
- Ensembl gene IDs are mapped to Entrez IDs using the Genome Nexus hg38 canonical transcript file.
- If a sample has multiple aliquots, it is "reduced" to a single aliquot chosen to represent the entire sample.
- This is done by choosing the aliquot ID with the highest sort value (eg. highest plate number). This follows the same policy used by GDAC used to reduce aliquot data in their studies.
- Copy number values from the BigQuery tables are converted from ASCAT to GISTIC 2.0.
- Retrieved from BigQuery table
isb-cgc-bq.TCGA.copy_number_segment_masked_hg38_gdc_current
- Remapped columns:
Original | cBioPortal |
---|---|
sample_barcode | ID |
chromosome | chrom |
start_pos | loc.start |
end_pos | loc.end |
num_probes | num.mark |
segment_mean | seg.mean |
- Retrieved from BigQuery table
isb-cgc-bq.TCGA.masked_somatic_mutation_hg38_gdc_current
- Remapped columns:
Original | cBioPortal |
---|---|
sample_barcode_tumor | Tumor_Sample_Barcode |
sample_barcode_normal | Matched_Norm_Sample_Barcode |