Skip to content

Instantly share code, notes, and snippets.

View wangfan860's full-sized avatar

Fan Wang wangfan860

  • University of Chicago
View GitHub Profile
@wangfan860
wangfan860 / predicted_vs_real_expression.py
Last active November 22, 2019 16:48
For comparison between TCGA-BRCA real expression with predicted gene expression. Raw Pearson is low even in normal tissue samples. Integrate quantile normalization and inverse quantile normalization to process the real expression in the last part
from gtfparse import read_gtf
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import gzip
import subprocess
import scipy.stats as stats
import argparse
import os
@wangfan860
wangfan860 / maftools.r
Last active May 13, 2019 15:55
make canine and human maftools plot
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("maftools")
##download rngtools from https://cran.r-project.org/src/contrib/Archive/rngtools/
install.packages("~/Desktop/nature_commu/rngtools_1.3.1.tar.gz", repos = NULL, type = "source")
library(maftools)
uvm = read.maf(maf = '.maf')
skcm = read.maf(maf = '.maf')
# download CanFam3.1 from ftp://ftp.ensembl.org/pub/release-96/gtf/canis_familiaris/
@wangfan860
wangfan860 / oncoprint_plot.r
Created May 7, 2019 17:03
simple function to plot oncoprint for few genes
# This function sorts the matrix for better visualization of mutual exclusivity across genes
memoSort <- function(M) {
geneOrder <- sort(rowSums(M), decreasing=TRUE, index.return=TRUE)$ix;
scoreCol <- function(x) {
score <- 0;
for(i in 1:length(x)) {
if(x[i]) {
score <- score + 2^(length(x)-i);
}
}
@wangfan860
wangfan860 / haplotype_summary.py
Last active March 14, 2019 17:31
haplotype_summary
#be sure to have scikit-allel, pandas, glob intalled
import allel
import pandas as pd
import glob
#change this line to match the path for vcf.gz files
all_files = glob.glob("*.vcf.gz")
#very first vcf for merging
file1 = allel.read_vcf(all_files[0])