Skip to content

Instantly share code, notes, and snippets.

The following fields were ADDED:
EUR
ECOG_KPS
HISTORY_OF_D_MMR
EAS
AFR
NAM
FRACTION_GENOME_ALTERED
SAS
NUM_NONREF_ASJ_MARKERS
@jamesqo
jamesqo / json_snippet.json
Created May 15, 2024 20:13
P-0005078-T03-IM7
"meta-data" : {
"alys2sample_id" : 98172,
"cbx_patient_id" : 5042,
"cbx_sample_id" : 85960,
"date_tumor_sequencing" : "Wed, 11 Nov 2020 14:20:50 GMT",
"dmp_alys_task_id" : 7293,
"dmp_alys_task_name" : "IMPACTv7-CLIN-20200586",
"dmp_patient_id" : "P-0005078",
"dmp_sample_id" : "P-0005078-T03-IM7",
"dmp_sample_so_id" : 88202,
This file has been truncated, but you can view the full file.
#genome_nexus_version: 1.0.2
#isoform: mskcc
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number genomic_location_explanation Annotation_Status
ISG15 9636 GRCh37 1 948846 948847 + 5_prime_UTR_variant 5'UTR INS - A rs3841266 647V_URINARY_TRACT ENST00000379389.4:c.-107dup p.*36* ENST00000379389 NM_005101.3 1/2 SUCCESS
ISG15 9636 GRCh37 1 948846 948847 + 5_prime_UTR_
[2024-04-29 13:20:12,394] {setup_studies.py:94} INFO - Sample filtering is ON
[2024-04-29 13:20:12,395] {setup_studies.py:96} WARNING - --datatypes or --limit-query-size was provided, resulting clinical sample file may be inaccurate
[2024-04-29 13:20:13,543] {retry.py:351} DEBUG - Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2024-04-29 13:20:13,547] {requests.py:192} DEBUG - Making request: POST https://oauth2.googleapis.com/token
[2024-04-29 13:20:13,548] {connectionpool.py:1014} DEBUG - Starting new HTTPS connection (1): oauth2.googleapis.com:443
[2024-04-29 13:20:13,906] {connectionpool.py:473} DEBUG - https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
[2024-04-29 13:20:13,907] {connectionpool.py:1014} DEBUG - Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2024-04-29 13:20:14,850] {connectionpool.py:473} DEBUG - https://bigquery.googleapis.com:443 "POST /upload/bigquery/v2/projects/isb-cgc-cbioportal/jobs?uploadType=multi

NCI-CRDC Datahub

The Cancer Research Data Commons (CRDC) is an initiative by the National Cancer Institute (NCI) that provides access to multiple cancer data sources from the federal government. Sources include the Genomic Data Commons (GDC), Proteomic Data Commons (PDC), and others.

This directory contains NCI-CRDC studies generated using the ISB-CGC portal. Data is pulled from the ISB-CGC BigQuery tables once every 3 months and reflects the latest data available for each study. More details about methods and data transformations can be found in the README files for each individual study.

Program Overview

TCGA

This file has been truncated, but you can view the full file.
#genome_nexus_version: 1.0.2
#isoform: mskcc
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count Annotation_Status
1 948846 948847 - g.chr1:948846_948847insA 647V_URINARY_TRACT FAILED
1 948846 948847 - g.chr1:948846_948847insA A204_SOFT_TISSUE FAILED
1 948846 948847 - g.chr1:948846_948847insA BICR31_UPPER_AERODIGESTIVE_TRACT FAILED
1 948846 948847 - g.chr1:948846_948847insA CAMA1_BREA
This file has been truncated, but you can view the full file.
Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2 Tumor_Sample_Barcode
1 948846 948847 - g.chr1:948846_948847insA 647V_URINARY_TRACT
1 948846 948847 - g.chr1:948846_948847insA A204_SOFT_TISSUE
1 948846 948847 - g.chr1:948846_948847insA BICR31_UPPER_AERODIGESTIVE_TRACT
1 948846 948847 - g.chr1:948846_948847insA CAMA1_BREAST
1 948846 948847 - g.chr1:948846_948847insA CMK_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE
1 948846 948847 - g.chr1:948846_948847insA DAOY_CENTRAL_NERVOUS_SYSTEM
1 948846 948847 - g.chr1:948846_948847insA G361_SKIN
1 948846 948847 - g.chr1:948846_948847insA HCC44_LUNG
1 948846 948847 - g.chr1:948846_948847insA HS840T_UPPER_AERODIGESTIVE_TRACT
(Pdb) [2024-01-24 16:06:57,882] {writer.py:259} DEBUG - creating new intake connection to unix:///opt/datadog/apm/inject/run/apm.socket with timeout 2
[2024-01-24 16:06:57,882] {writer.py:263} DEBUG - Sending request: PUT v0.4/traces {'Datadog-Meta-Lang': 'python', 'Datadog-Meta-Lang-Version': '3.7.16', 'Datadog-Meta-Lang-Interpreter': 'CPython', 'Datadog-Meta-Tracer-Version': '1.20.5', 'Datadog-Client-Computed-Top-Level': 'yes', 'Content-Type': 'application/msgpack', 'X-Datadog-Trace-Count': '3'}
[2024-01-24 16:06:57,883] {writer.py:271} DEBUG - Got response: 200 OK
[2024-01-24 16:06:57,883] {writer.py:277} DEBUG - sent 2.19KB in 0.00126s to unix:///opt/datadog/apm/inject/run/apm.socket/v0.4/traces
[2024-01-24 16:06:58,671] {runtime_metrics.py:162} DEBUG - Updating constant tags ['lang:python', 'lang_interpreter:CPython', 'lang_version:3.7.16', 'tracer_version:1.20.5', 'service:node', '_dd.injection.mode:host']
[2024-01-24 16:06:58,672] {runtime_metrics.py:150} DEBUG - Writing metric runtime.python.gc.count.
[{
"Acquisition_Method_Type": "Other Acquisition Method"
}, {
"Acquisition_Method_Type": "Blood draw"
}, {
"Acquisition_Method_Type": "Surgical Resection"
}, {
"Acquisition_Method_Type": "Biopsy"
}, {
"Acquisition_Method_Type": "Punch Biopsy"
// Sample ID: P-0100984-T01-IM7
{'aa_change': 'p.R112Pfs*8', 'alt_allele': 'CG', 'cDNA_change': 'c.334dupC', 'chromosome': '9', 'clinical-signed-out': '1', 'comments': None, 'confidence_class': 'AUTO_OK', 'confidence_cv_id': 3, 'cosmic_id': '', 'd_tumor_ad': None, 'd_tumor_dp': None, 'd_tumor_rd': None, 'd_tumor_vfreq': None, 'dbSNP_id': '', 'dmp_sample_mrev_id': 179229, 'dmp_sample_so_id': 178935, 'dmp_variant_id': 530685, 'exon_num': 'exon2', 'gene_id': 'CDKN2A', 'is_hotspot': 0, 'is_reported': 1, 'level': 'LEVEL_4', 'mafreq_1000g': '', 'mrev_comments': '', 'mrev_status_cv_id': 3, 'mrev_status_name': 'MANUL_REVIEW_COMPLETED', 'normal_ad': 0, 'normal_dp': 454, 'normal_vfreq': 0.0, 'occurance_in_normal': '0;0', 'occurance_in_pop': None, 'oncogenic': 'Likely Oncogenic', 'oncokb_interpretation': 'The CDKN2A gene encodes two proteins, p16INK4A and p14ARF, that regulate the cell growth and survival. CDKN2A is altered by mutation and/or deletion in a broad range of solid and hematologic cancers. The CDKN2A R112Pfs