- Understand the different types of clinical data available from GDC
- Expected timeframe: 1-2 weeks
- Deliverable: Google Doc / Markdown doc explaining all of them
- Decide which of these clinical data fields we'd like to fetch
- Expected timeframe: 1 week
- Deliverable: Updated Google doc / Markdown
- Understand the genomic data present in GDC and decide which ones to fetch
- Expected timeframe: 2 weeks
- Deliverable: Google Doc / Markdown
- Familiarize self with CDA
- Expected timeframe: 2 weeks
- Deliverable: build a tiny app / Jupyter notebook that demonstrates some functionality from the library
- Create the cohort builder library
- Expected timeframe: 2-4 weeks
- Understand the cBioPortal input file format
- Expected timeframe: 1-2 weeks
- Familiarize self with other stuff on the side that will be useful down the line, eg. Apache Airflow, molecular biology basics
- Over time