Skip to content

Instantly share code, notes, and snippets.

View seandavi's full-sized avatar

Sean Davis seandavi

View GitHub Profile
@seandavi
seandavi / testingHaarSegOnSNPData.R
Created April 13, 2010 20:44
Testing HaarSeg segmentation algorithm on SNP data
plot(abs(baf(eset)[tmp,223]-baf(eset)[tmp,1]),pch='.')
plot(abs(baf(eset)[tmp,223]-baf(eset)[tmp,1]),pch='.',xlim=c(0,50000))
lines(x$Segmented,col='red')
plot(abs(baf(eset)[tmp,223]-baf(eset)[tmp,1]),pch='.',xlim=c(25000,50000))
lines(x$Segmented,col='red')
plot(abs(baf(eset)[tmp,223]-baf(eset)[tmp,1]),pch='.',xlim=c(27000,38000))
lines(x$Segmented,col='red')
x = haarSeg(abs(baf(eset)[tmp,223]-baf(eset)[tmp,1]),breaksFdrQ=0.000001,haarStartLevel=3,haarEndLevel=6)
lines(x$Segmented,col='green')
x = haarSeg(abs(baf(eset)[tmp,223]-baf(eset)[tmp,1]),breaksFdrQ=0.000001,haarStartLevel=4,haarEndLevel=6)
@seandavi
seandavi / varFuncTranscript.py
Created November 4, 2010 13:34
Find functional location of a variant in a transcript database
#!/usr/bin/env python
from operator import itemgetter
class Transcript:
def __init__(self,chromosome,name,exons,cds,strand):
self.chromosome=chromosome
self.name=name
self.strand=strand
self.exons=exons
self.cds=cds
@seandavi
seandavi / phat.R
Created December 8, 2010 13:08
Calculate genotype probabilities
phat <-
function(x,n,error=0.01,hetprob=0.001) {
#doesn't work as expected yet
prv=hetprob
pvv=(1-hetprob)/2
phet = pbinom(x,n,0.5)*prv
phomref=pbinom(x,n,error)*(1-prv-pvv)
phomvar=pbinom(x,n,1-error)*pvv
return(c(phomvar=phomvar,phet=phet,phomref=phomref))}
@seandavi
seandavi / gist:735654
Created December 10, 2010 02:08
Playing with awk and pileup
cat ${i} | awk '/[RKYW]/ { if ( ( $8 > 9 ) && ( $5 > 30 ) ) print $1, $2 }' > ${i}.goodhetsites
@seandavi
seandavi / makeCuffDiffMatrix.R
Created December 17, 2010 11:59
Make a combined data matrix from cuffdiff output for gene-level data
makeCuffDiffMatrix <- function(directory='.') {
m = read.delim(file.path(directory,'genes.fpkm_tracking'),header=TRUE)
d = read.delim(file.path(directory,'gene_exp.diff'),header=TRUE)
x = seq(1,nrow(d),nrow(m))
dt1 = m
for(i in 1:length(x)) {
dt = d[x[i]:(x[i]+nrow(m)-1),]
dt$fdr=p.adjust(dt$p_value,method='fdr')
colnames(dt)=paste(dt[1,4],dt[1,5],colnames(dt),sep="_")
dt1 = cbind(dt1,dt[,-c(1:5,7,8)])
@seandavi
seandavi / symd_tool.xml
Created February 17, 2011 05:10
Python wrapper around SymD executable and xml tool description for running behind galaxy
<tool id="symd_tool" name="Perform SymD calculation on a PDB file">
<description>Use this tool to run symd on a PDB file</description>
<command interpreter="python">symd_wrapper.py $input $trfm_file $info_file ${input.name} </command>
<inputs>
<param format="txt" name="input" type="data" label="PDB file"/>
</inputs>
<outputs>
<data format="txt" name="trfm_file" />
<data format="txt" name="info_file" />
</outputs>
@seandavi
seandavi / javedgene.py
Created March 28, 2011 22:58
Use this file as a sketch for finding the genomic location of a base relative to an transcript, exon (or intron), and offset
#!/usr/bin/env python
import csv
import itertools
import argparse
# File "refGene.txt" downloaded from UCSC and gunzipped
# must be in this directory
# Works only on "+" strand genes
# Minus strand genes left as an exercise
@seandavi
seandavi / somaticScan.sh
Created June 13, 2011 13:17
Call varScan using named pipes (fifos)
#!/bin/bash
#
# Author: Sean Davis <seandavi@gmail.com>
#
# Uses named pipes (FIFO) to reduce storage needs
# to call varscan somatic
# Some details may need to be edited to meet particular needs
# call with the following parameters
# 1) The FASTA file containing the reference genome
# 2) The Normal bam file
@seandavi
seandavi / liftoverData.py
Created August 29, 2011 13:12
liftover Agilent CGH FE files to another genome build. Relies on UCSC liftover executable.
import sys
import os
import subprocess
import argparse
import tempfile
import csv
#liftoverChain = sys.argv[2]
#fname = sys.argv[1]
@seandavi
seandavi / countJuctions.py
Created September 12, 2011 19:50
Given an bam file, find all potential introns, optionally within a given region.
#!/usr/bin/env python
import pysam
import argparse
import collections
parser = argparse.ArgumentParser()
parser.add_argument('bamfile',
help="bamfile for analysis")
parser.add_argument('-r','--region',
help="Limit to region chrN:XXX-YYY")