Skip to content

Instantly share code, notes, and snippets.

@smrgit
Last active October 26, 2017 02:23
Show Gist options
  • Save smrgit/f55e69ad6e1284ed4af76a933916cf83 to your computer and use it in GitHub Desktop.
Save smrgit/f55e69ad6e1284ed4af76a933916cf83 to your computer and use it in GitHub Desktop.
Identify GDC data available in ISB-CGC GCS buckets
WITH
t1 AS (
SELECT
DISTINCT file_gdc_id
FROM
`isb-cgc.GDC_metadata.rel8_GDCfileID_to_GCSurl` ),
j1 AS (
SELECT
a.dbName,
a.file_id,
a.acl,
a.cases__project__program__name AS program_name,
a.data_category,
a.data_type,
a.data_format,
a.experimental_strategy
FROM
`isb-cgc.GDC_metadata.rel8_fileData_legacy` a
JOIN
t1 b
ON
a.file_id=b.file_gdc_id ),
j2 AS (
SELECT
a.dbName,
a.file_id,
a.acl,
a.cases__project__program__name AS program_name,
a.data_category,
a.data_type,
a.data_format,
a.experimental_strategy
FROM
`isb-cgc.GDC_metadata.rel8_fileData_current` a
JOIN
t1 b
ON
a.file_id=b.file_gdc_id ),
u1 AS (
SELECT
*
FROM
j1
UNION ALL
SELECT
*
FROM
j2 )
SELECT
dbName,
acl,
program_name,
data_category,
data_format,
experimental_strategy,
COUNT(*) AS n
FROM
u1
GROUP BY
1,
2,
3,
4,
5,
6
ORDER BY
n DESC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment