Skip to content

Instantly share code, notes, and snippets.

@liangyy
liangyy / get_ci_from_pvalue.R
Last active February 6, 2023 06:40
Confidence interval of p-value
# confidence interval of p-value
# ref: https://gist.github.com/hakyim/38431b74c6c0bf90c12f
## copied from ref:
## the jth order statistic from a
## uniform(0,1) sample
## has a beta(j,n-j+1) distribution
## (Casella & Berger, 2002,
## 2nd edition, pg 230, Duxbury)
## this portion was posted by anonymous on
## http://gettinggeneticsdone.blogspot.com/2009/11/qq-plots-of-p-values-in-r-using-ggplot2.html
@liangyy
liangyy / minimal_example_08092021.ipynb
Created August 10, 2021 05:19
Minimal example for transethnic_prs
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
qqplot_by_group <- function(pval, group, ...) {
n <- length(pval)
pexp <- rank(pval) / n
df <- data.frame(p.val = pval, grp = group) %>% group_by(grp) %>% mutate(p.exp = rank(p.val) / (n() + 1)) %>% ungroup()
p <- ggplot(df) +
geom_point(aes(x = -log10(p.exp), y = -log10(p.val), color = grp), ...) +
geom_hline(yintercept = -log10(0.05 / n)) +
geom_abline(slope = 1, intercept = 0, linetype = 2)
return(p)
}
@liangyy
liangyy / fast_linear_regression
Last active February 12, 2023 23:02
Fast linear regression function in R
# simple and fast linear regression solver
# y ~ x + covariate
# input:
# y: n x 1 (vector)
# x: n x k (matrix)
# covariate: n x m (optional; matrix)
#
# output:
# coefficient of x[i] in y ~ x[i] + covariate
# and corresponding p-value (from t value)
load_mixqtl_data_bundle = function(data_dir = 'sample_data/', gene_id = 'ENSG00000198951.6', cis_window = 10000){
if(!dir.exists(data_dir)) {
message('data_dir: ', data_dir, ' does not exist! Exit!')
quit()
}
haplotype1 = 'geuvadis_22_hap1.txt.gz'
haplotype2 = 'geuvadis_22_hap2.txt.gz'
total_count = 'expression_count.txt.gz'
allele_specific_count_hap1 = 'geuvadis.asc.h1.tsv.gz'
allele_specific_count_hap2 = 'geuvadis.asc.h2.tsv.gz'
@liangyy
liangyy / fast_gwas.R
Created April 7, 2020 16:14
Fast linear regression (single predictor with intercept) for GWAS
run_gwas = function(X, y) {
a_y = sd(y)
b_x = apply(X, 2, sd)
y_tilde = y / a_y
# message(1)
x_tilde = sweep(X, 2, FUN = '/', b_x)
# message(2)
xtx = colSums(x_tilde ^ 2)
xbar = colMeans(x_tilde)
ybar = mean(y_tilde)
@liangyy
liangyy / fopenmp_workaround.sh
Created March 27, 2020 04:33
R installation issue with `gcc -fopenmp`
# with clang binary downloaded from: http://r.research.att.com/libs/clang-4.0.0-darwin15.6-Release.tar.gz
env CC=/path/to/local/clang4/bin/clang pip install rpy2
@liangyy
liangyy / load_gtex_color_board.R
Last active March 23, 2020 19:56
Get GTEx V8 color board
load_gtex_color_board = function() {
gtex_color = read.csv('https://bitbucket.org/yanyul/rotation-at-imlab/raw/b49d35b44c0ab936496733268b8578db77eb7f73/data/gtex_tissue_colors.csv', stringsAsFactors = FALSE)
gtex_color$tissue_site_detail_id[gtex_color$tissue_site_detail_id == 'Cells_Transformed_fibroblasts'] = 'Cells_Cultured_fibroblasts'
color_board = paste0('#', gtex_color$tissue_color_hex)
names(color_board) = gtex_color$tissue_site_detail_id
return(color_board)
}
@liangyy
liangyy / download.sh
Created February 28, 2020 20:33
Download and clean up dependent data for lab 7 (HG471 @ UChicago)
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.indels.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.indels.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz.md5
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/hapmap_3.3.b37.vcf.gz
wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/hapmap_3.3.b37.v
@liangyy
liangyy / vcf2predixcan_format.sh
Created February 25, 2020 16:50
Floating around script to convert VCF to PrediXcan format (for the old repo)
bcftools query -f query -f chr%CHROM\\t%ID\\t%POS\\t%REF\\t%ALT\\t%QUAL[\\t%DS]\\n test3_vcf.vcf.gz | gzip > test3_to_predixcan_format.gz
bcftools query -l test3_vcf.vcf.gz | tr '_' '\t' > test3_to_predixcan_format.sample