Skip to content

Instantly share code, notes, and snippets.

@sbamin
Last active April 6, 2020 20:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sbamin/57fd3dc2f544f510f89a4e0fc1fe41d2 to your computer and use it in GitHub Desktop.
Save sbamin/57fd3dc2f544f510f89a4e0fc1fe41d2 to your computer and use it in GitHub Desktop.
How to fetch archival tcga data from firehose using firehose api

Firehose API

Fetch tcga archival data from Broad Firehose using Firehose API, http://firebrowse.org/api-docs/#!/Archives/StandardData

@sbamin

Standard Data

Parse JSON

mkdir -p lowpass

## require http cli from https://httpie.org
http -b -o firehose_api_get_lowpass_pancancer.json "http://firebrowse.org/api/v1/Archives/StandardData?format=json&date=2016_01_28&data_type=LowPass&page=1&page_size=2500&sort_by=cohort"

## require jq binary, https://stedolan.github.io/jq/download/
cat firehose_api_get_lowpass_pancancer.json | jq '.StandardData[] .urls[0]'

mkdir -p cnsegs

echo "LowPass Stats"
cat firehose_api_get_lowpass_pancancer.json | jq -r '.StandardData[] | "\(.cohort)\t\(.active)\t\(.data_type)\t\(.level)\t\(.platform)\t\(.protocol)\t\(.tool)\t\(.urls[0])"' | tee -a stats_firehose_api_get_lowpass_pancancer.tsv

Download Data

cat firehose_api_get_lowpass_pancancer.json | jq -r '.StandardData[] .urls[0]' | parallel -j4 wget --directory-prefix=cnsegs {}

Analyses: LowPass GISTIC2

  • Get analysis-specific nozzle reports
http -b -o firehose_api_get_lowpass_gistic2_pancancer.json "http://firebrowse.org/api/v1/Analyses/Reports?format=json&date=2016_01_28&name=CopyNumberLowPass_Gistic2&type=CopyNumber&page=1&page_size=2500&sort_by=date"

## download GISTIC2 reports
mkdir -p gistic2
cat firehose_api_get_lowpass_gistic2_pancancer.json | jq -r '.Reports[] .report_uri' | tee -a firehose_api_get_lowpass_gistic2_pancancer_reports.tsv
  • Go to one or two report URLs and get URl under Methods and Data > Download Results > Analysis results:
http://gdac.broadinstitute.org/runs/analyses__2016_01_28/data/LUAD-TP/20160128/gdac.broadinstitute.org_LUAD-TP.CopyNumberLowPass_Gistic2.Level_4.2016012800.0.0.tar.gz
http://gdac.broadinstitute.org/runs/analyses__2016_01_28/data/SKCM-TM/20160128/gdac.broadinstitute.org_SKCM-TM.CopyNumberLowPass_Gistic2.Level_4.2016012800.0.0.tar.gz
  • Use string pattern replacement to change URL we saved at firehose_api_get_lowpass_gistic2_pancancer_reports.tsv to match respective results tarball url and download those.
cat firehose_api_get_lowpass_gistic2_pancancer_results.tsv | parallel -j4 wget -P gistic2 {}

END

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment