Skip to content

Instantly share code, notes, and snippets.

View ericminikel's full-sized avatar

Eric Minikel ericminikel

View GitHub Profile
import os
import sys
working_dir = '/data/HD/analysis/038ea/analysis/1/fastqc/'
fastqc_summary = open(working_dir+'fastqc_summary.txt',mode='wb')
fastqc_details = open(working_dir+'fastqc_details.txt',mode='wb')
for root, dirs, files in os.walk(working_dir): # walk a directory containing FastQC output for multiple samples
for name in files:
drop table if exists ea1.fastqc_summary;
create table ea1.fastqc_summary (
fileid varchar,
module varchar,
status varchar
);
drop table if exists ea1.fastqc_details;
create table ea1.fastqc_details (
id serial primary key,
@ericminikel
ericminikel / findbadbams.bash
Last active December 28, 2015 11:09
A bash + R script to plot the file size of BAMs vs. that of corresponding FASTQs to identify outliers where a BAM appears to have gotten truncated.
cd /my/working/dir/
du -sb *.bam > bam.filesize # two-column list of BAMs and their size in bytes
cat fastq.12col | awk '{print "du -sb "$1}' | bash > fastq.filesize # two-column list of FASTQs and their size in bytes
# switch to R
R
fastq = read.table('fastq.filesize',header=FALSE)
bam = read.table('bam.filesize',header=FALSE)
bam$name = substr(bam$V2,1,21) # parse a unique ID for the BAMs from the path
fastq$name = substr(fastq$V2,71,91) # same for FASTQs
bam$bamsize=bam$V1
@ericminikel
ericminikel / human-bodymap2.0-fpkms.bash
Created November 18, 2013 18:56
Bash script to calculate FPKMs from the Human BodyMap 2.0 mRNA-seq data.
# Eric Minikel
# CureFFI.org
# 2013-07-04
# Bash script to generate FPKMs from Ensembl Human Bodymap 2.0 RNA-seq data
# Announcement of Human BodyMap 2.0 data availability: http://www.ensembl.info/blog/2011/05/24/human-bodymap-2-0-data-from-illumina/
# BAM list: ftp://ftp.ensembl.org/pub/release-70/bam/homo_sapiens/genebuild/
# STEP 1: DOWNLOAD THE BAMS FROM ENSEMBL
@ericminikel
ericminikel / antiprion-vs-all.r
Created November 18, 2013 19:48
Plotting antiprion compounds in the chemical space of drugs and libraries. Introduced in blog post: http://www.cureffi.org/2013/11/06/plotting-antiprion-compounds-in-the-chemical-space-of-drugs-and-libraries/
# Eric Minikel
# CureFFI.org
# 2013-11-06
# Anti-prion compounds plotted in the space of drugs and chemical libraries
setwd('c:/sci/037cinf/analysis/1')
options(stringsAsFactors=FALSE)
library(stringr)
library(rcdk) # cheminformatics
library(sp) # for point-in-polygon operations
@ericminikel
ericminikel / prpsc-diffeq-derivation
Last active December 28, 2015 22:39
Derivation of exponential decay equations for mRNA, PrPC and PrPSc.
mRNA
====
R(0) = R0
R(infty) = R1
R(t) = R1 + (R0 - R1)exp(-lambda*t)
PrPC
====
@ericminikel
ericminikel / degradation-model.r
Last active December 30, 2015 01:19
Comparison of simulated vs. analytical solutions for degradation of PrP in cell culture
# Eric Minikel
# CureFFI.org
# 2013-12-03
# Comparison of numerically simulated and analytically derived PrP degradation models
# half lives from literature
mrna_thalf = 7 # Pfeiffer 1993 http://www.ncbi.nlm.nih.gov/pubmed/8095862
prpc_thalf = 5 # Borcheldt 1990 http://www.ncbi.nlm.nih.gov/pubmed/1968466/
prpsc_thalf = 30 # Peretz 2001 http://www.ncbi.nlm.nih.gov/pubmed/11507642
division_time = 24 # Ghaemmaghami 2007 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2084281/
@ericminikel
ericminikel / max-inhibition.r
Created December 2, 2013 20:03
What percent of maximum inhibition will be achieved after 1, 5, or 6 days for assay readouts if we inhibit PrP transcription?
########### what percent of possible degradation is achieved by 1 day, 5 days and 6 days?
maxint = .Machine$integer.max # longest possible time in future to model
# transcription inhibition
r = l0 * 0.50 #
l = l0 * 1.00 #
a = m0 * 1.00
m = m0 * 1.00
b = v0 * 1.00
@ericminikel
ericminikel / prpc-downreg.r
Last active December 31, 2015 18:49
Simulation of effects of PrPC downregulation on kinetics of prion infection in vivo
# simulate PrPC downregulation
kdeg = log(2)/1.5 # 36 half life for PrPSc
kconv = 1.7 # a conversion rate that works out well for wt mice
PrPSc0 = 10^1.5 # initial titer as of 4 dpi
symptomatic_threshold = 10^9 # titer when mice become asymptomatic
downreg_threshold = sqrt(PrPSc0 * symptomatic_threshold) # geometric mean for 1/2way through disease course
plot(NA,NA,xlim=range(times),ylim=c(1,10^10),log="y", # empty plot over 300 days and 10 log10s
xlab='days post-inoculation', ylab='prion titer', main=paste("kconversion = ",kconv,sep=''))
@ericminikel
ericminikel / rcdk-sar-tutorial.r
Last active May 2, 2020 11:49
Introductory tutorial on how to use rcdk to conduct an SAR
require(rcdk) # chemical informatics functionality in R
require(gap) # for qq plots later
options(stringsAsFactors=FALSE)
setwd('c:/sci/037cinf/analysis/sar/')
# plot molecules in R plot window instead of separate Java window
rcdkplot = function(molecule,width=500,height=500,marg=0,main='') {
par(mar=c(marg,marg,marg,marg)) # set margins to zero since this isn't a real plot
temp1 = view.image.2d(molecule,width,height) # get Java representation into an image matrix. set number of pixels you want horiz and vertical
plot(NA,NA,xlim=c(1,10),ylim=c(1,10),xaxt='n',yaxt='n',xlab='',ylab='',main=main) # create an empty plot