This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# paste this first part into the BigQuery "Query Editor" window | |
SELECT | |
gene, | |
chr, | |
CORR(avgCNsegMean,avglogExp) AS corr, | |
COUNT(*) AS n | |
FROM ( | |
SELECT | |
annotCN.gene AS gene, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Copyright 2016, Institute for Systems Biology | |
Licensed under the Apache License, Version 2.0 (the "License"); | |
you may not use this file except in compliance with the License. | |
You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TEMPORARY FUNCTION | |
-- In this function, we're going to be working on arrays of values. | |
-- we're also going to define a set of functions 'inside' the kMeans. | |
-- *heavily borrowing from https://github.com/NathanEpstein/clusters* -- | |
kMeans(x ARRAY<FLOAT64>, -- ESR1 gene expression | |
y ARRAY<FLOAT64>, -- EGFR gene expression | |
iterations FLOAT64, -- the number of iterations |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
hg38_d1 AS ( | |
-- we start with a table at the aliquot level, in case there are multiple aliquots | |
-- for a single sample; the SUM() is to sum the isoforms since we're working | |
-- with the Isoform_Expression tables | |
SELECT | |
sample_barcode, | |
aliquot_barcode, | |
mirna_id, | |
mirna_accession, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
hg38_d1 AS ( | |
-- we start with a table at the aliquot level, in case there are multiple aliquots | |
-- for a single sample; the SUM() is to sum the isoforms since we're working | |
-- with the Isoform_Expression tables | |
SELECT | |
sample_barcode, | |
aliquot_barcode, | |
mirna_id, | |
mirna_accession, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
-- first we get the 77 samples that passed the QC tests | |
qcSet AS ( | |
SELECT | |
TCGA_case_ID AS case_barcode | |
FROM | |
`isb-cgc.hg19_data_previews.TCGA_Breast_SuppTable01` | |
WHERE | |
QC_Status="pass" ), | |
-- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
-- first we get the 77 samples that passed the QC tests | |
qcSet AS ( | |
SELECT | |
TCGA_case_ID AS case_barcode | |
FROM | |
`isb-cgc.hg19_data_previews.TCGA_Breast_SuppTable01` | |
WHERE | |
QC_Status="pass" ), | |
-- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
hg38_d1 AS ( | |
-- we start with a table at the aliquot level, in case there are multiple aliquots | |
-- for a single sample; | |
SELECT | |
sample_barcode, | |
aliquot_barcode, | |
mirna_id, | |
reads_per_million_miRNA_mapped AS RPM | |
FROM |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
-- | |
-- we start by translating the correlations that we got to ranks, | |
-- based on sorting the genes on corrByGene "DESC" | |
-- this will result in the highest positive correlation getting | |
-- rank #1, etc | |
-- we also lightly filter the genes by excluding any with near-zero | |
-- or negative correlation coefficients, and the result is a list | |
-- of approx 9000 genes with symbol, correlation, and rank | |
geneScoresT AS ( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
aList AS ( | |
SELECT | |
aliquot_barcode AS abarcode | |
FROM | |
`isb-cgc-04-0010.draft_new_data.bcgsc_hg38_isoforms` | |
GROUP BY | |
abarcode ), | |
gdcData AS ( | |
SELECT |
OlderNewer