Skip to content

Instantly share code, notes, and snippets.

@jamesqo
Created June 4, 2024 18:34
Show Gist options
  • Save jamesqo/65de730c0fc5aff367f54e1a418f4e35 to your computer and use it in GitHub Desktop.
Save jamesqo/65de730c0fc5aff367f54e1a418f4e35 to your computer and use it in GitHub Desktop.
The following fields were ADDED:
EUR
ECOG_KPS
HISTORY_OF_D_MMR
EAS
AFR
NAM
FRACTION_GENOME_ALTERED
SAS
NUM_NONREF_ASJ_MARKERS
ASJ
The following fields were REMOVED:
EURP
PATIENT_ID
EASP
SEX
NAMP
AFRP
SAMPLE_ID
SASP
CYCLE_THRESHOLD
The following fields were CHANGED:
CURRENT_AGE_DEID
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: Current age (limit 89). Derived from Birthdate (RMS) ---MISSING DATA: Contact RMS Database Maintainers ---SOURCE:CDM Generated (Via IDB/RMS dates)'
to
' ---DESCRIPTION: Current age (limit 89). Derived from Birthdate (Revenue Management System (RMS)) ---MISSING DATA: Contact Revenue Management System (RMS) Database Maintainers ---SOURCE:Derived from IDB/Revenue Management System (RMS) dates'
PRIOR_MED_TO_MSK
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: Binary indication if patient recieve anti-cancer medication prior to MSK ---MISSING DATA: Initial consult (IC) notes need to be (1) within a list of acceptable note types, (2) be within 90 days of first MSK dates if patients have multiple primaries. Note types consist of medical oncologists, radiation oncologists, surgery, inpatient services and others, as they are most likely to contain information about external treatments prior to MSK. ---SOURCE:CDM Generated (NLP)'
to
' ---DESCRIPTION: Binary indication if patient recieve anti-cancer medication prior to MSK ---MISSING DATA: Initial consult (IC) notes need to be (1) within a list of acceptable note types, (2) be within 90 days of first MSK dates if patients have multiple primaries. Note types consist of medical oncologists, radiation oncologists, surgery, inpatient services and others, as they are most likely to contain information about external treatments prior to MSK. ---SOURCE:NLP generated from medical oncology notes'
SMOKING_PREDICTIONS_3_CLASSES
DESCRIPTIONS: Value changed from:
"---DESCRIPTION: Inferred smoking history. Classes: Current/Former, Never, Unknown ---MISSING DATA: The patient might not have had a 'Smoking Status' section in their Clindoc note. Potential factor for older notes, if the formatting of the section has changed; majority are from 2015-->present. ---SOURCE:CDM Generated (NLP)"
to
" ---DESCRIPTION: Inferred smoking history. Classes: Current/Former, Never, Unknown ---MISSING DATA: The patient might not have had a 'Smoking Status' section in their Clindoc note. Potential factor for older notes, if the formatting of the section has changed; majority are from 2015-->present. ---SOURCE:NLP generated from medical oncology notes"
ADMIXTURE_LABEL
DESCRIPTIONS: Value changed from:
"---DESCRIPTION: This is similar to the previous column, except here we don't differentiate between non-Ashkenazi Jewish and Ashkenazi Jewish Europeans. ---MISSING DATA: Ancestry pipeline not run on patient. ---SOURCE:Kanika Arora (Berger Lab)"
to
'Continental-level genetic ancestry label for the patient. The patient is assigned European (EUR), African (AFR), East Asian (EAS), South Asian (SAS) or Native American (NAM) ancestry label if the inferred contribution of that population to their ancestry is at least 80%, otherwise they are labeled as admixed/other (ADM)'
ANCESTRY_LABEL
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: This is the main column with assigned ancestry label for the patient. The values in this column can be EUR, ASJ, AFR, EAS, SAS, NAM or ADM which stand for European (excluding Ashkenazi Jewish), Ashkenazi Jewish European, African, East Asian, South Asian, Native American and Admixed/Other ---MISSING DATA: Ancestry pipeline not run on patient. ---SOURCE:Kanika Arora (Berger Lab)'
to
'Final genetic ancestry label assigned to the patient based on ANCESTRY_CONTINENTAL_LABEL and ANCESTRY_ASJ. EUR = European (excluding Ashkenazi Jewish), ASJ = Ashkenazi Jewish European, AFR = African, EAS = East Asian, SAS = South Asian, NAM = Native American, ADM = Admixed/Other'
HISTORY_OF_PDL1
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: History if patient ever had a pathology specimen annotated as PD-L1 positive ---MISSING DATA: Patient not tested for PD-L1 ---SOURCE:CDM Generated (NLP)'
to
' ---DESCRIPTION: History if patient ever had a pathology specimen annotated as PD-L1 positive ---MISSING DATA: Patient not tested for PD-L1 ---SOURCE:NLP generated from pathology reports'
INTRA_ABDOMINAL
DISPLAY_NAME: Value changed from:
'Intra Abdominal'
to
'Tumor Site: Intra-Abdominal (NLP)'
DESCRIPTIONS: Value changed from:
'Intra Abdominal'
to
'---DESCRIPTION: History of Intra-Abdominal as a tumor site as indicated in radiology report impression ---MISSING DATA: No CT/PET/MRI radiology report available for patient. Impression section missing from report. ---SOURCE:NLP generated from radiology reports'
GLEASON_HIGHEST_REPORTED
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: Patient level summary of the highest reported Gleason score from pathology reports. NLP used to derive ---MISSING DATA: No reported Gleason Score for patient. Rule-based NLP did not sense the word "gleason" any pathology reports. ---SOURCE:CDM Generated (NLP)'
to
' ---DESCRIPTION: Patient level summary of the highest reported Gleason score from pathology reports. NLP used to derive ---MISSING DATA: No reported Gleason Score for patient. Rule-based NLP did not sense the word "gleason" any pathology reports. ---SOURCE:NLP generated from pathology reports'
STAGE_HIGHEST_RECORDED
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: Highest recorded stage in tumor registry (ICD-O). Data has been aggregated from the CDM derived stage information, which bins stage in "stage 1-3" and "stage 4". ---MISSING DATA: Stage data not entered by admins in tumor registry ---SOURCE:CDM Generated (Via IDB/RMS dates)'
to
' ---DESCRIPTION: Highest recorded stage in tumor registry (ICD-O). Data has been aggregated from the CDM derived stage information, which bins stage in "stage 1-3" and "stage 4". ---MISSING DATA: Stage data not entered by admins in tumor registry ---SOURCE:Derived from IDB/Revenue Management System (RMS) dates'
ADRENAL_GLANDS
DESCRIPTIONS: Value changed from:
'---DESCRIPTION: History of Adrenal Gland as a tumor site as indicated in radiology report impression ---MISSING DATA: No CT/PET/MRI radiology report available for patient. Impression section missing from report. ---SOURCE:CDM Generated (NLP)'
to
' ---DESCRIPTION: History of Adrenal Gland as a tumor site as indicated in radiology report impression ---MISSING DATA: No CT/PET/MRI radiology report available for patient. Impression section missing from report. ---SOURCE:NLP generated from radiology reports'
MSI_SCORE
DESCRIPTIONS: Value changed from:
'Microsatellite Instability (MSI) score. Source: MPath '
to
'Microsatellite Instability (MSI) score. Source: MPath\r\n'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment