Skip to content

Instantly share code, notes, and snippets.

View arq5x's full-sized avatar

Aaron Quinlan arq5x

View GitHub Profile
@arq5x
arq5x / performance-comparo.sh
Created August 12, 2011 20:09
pybedtools manuscript
export PBTHOME=/home/arq5x/cphg-home/projects/pybedtools
shuffle 1094PC0012.bed > a.shuf.full.bed
shuffle 1094PC0012.bed > b.shuf.full.bed
head -1000 a.shuf.full.bed > a.shuf.1e3.bed; head -1000 b.shuf.full.bed > b.shuf.1e3.bed
head -10000 a.shuf.full.bed > a.shuf.1e4.bed; head -10000 b.shuf.full.bed > b.shuf.1e4.bed
head -100000 a.shuf.full.bed > a.shuf.1e5.bed; head -100000 b.shuf.full.bed > b.shuf.1e5.bed
head -1000000 a.shuf.full.bed > a.shuf.1e6.bed; head -1000000 b.shuf.full.bed > b.shuf.1e6.bed
head -10000000 a.shuf.full.bed > a.shuf.1e7.bed; head -10000000 b.shuf.full.bed > b.shuf.1e7.bed
head -100000000 a.shuf.full.bed > a.shuf.1e8.bed; head -100000000 b.shuf.full.bed > b.shuf.1e8.bed
@arq5x
arq5x / test.py
Created September 6, 2011 01:11
test
from pybedtools import BedTool
snps = BedTool('../test/data/snps.bed.gz')
genes = BedTool('../test/data/hg19.gff')
intergenic_snps = (snps - genes)
nearby = genes.closest(intergenic_snps,
d=True,
stream=True)
for gene in nearby:
if int(gene[-1]) < 5000:
@arq5x
arq5x / get_bam_allele.py
Created October 21, 2011 14:47
Using pybedtools to report a BAM read's allele when it overlaps a SNP
#!/usr/bin/env python
"""
get_bam_allele.py
Usage: python get_bam_allele.py [BAM] [BED]
"""
from pybedtools import BedTool
import sys
@arq5x
arq5x / python-prototype.py
Created October 30, 2011 17:06
Protoype for detecting overlaps (partial and complete) among N BED/GFF/VCF files
#!/usr/bin/env python
from collections import namedtuple, defaultdict
from pybedtools import BedTool
import argparse
Point = namedtuple('Point', ['id', 'pos', 'type'])
Interval = namedtuple('Interval', ['chrom', 'start', 'end'])
@arq5x
arq5x / aldskfnaslkvc.sh
Created February 1, 2012 17:54
working with ira
nldksvnasl
dfbv
dfgb
zdf
gfg
ncg
hn
fghn
@arq5x
arq5x / setup.py
Created March 14, 2012 10:57
pop packaging
import ez_setup
ez_setup.use_setuptools()
import glob
import os
import sys
from setuptools import setup
from distutils.extension import Extension
if 'setuptools.extension' in sys.modules:
@arq5x
arq5x / ka-ching-steps.sh
Created March 21, 2012 00:48
ka-ching your git commit
# Note, these instructions assume OS X.
# Alternatives to the afplay command likely exist on
# other POSIX systems. Google is your friend.
# NOTE: Obviously, you can put the MP3 file where ever you so desire.
# 1. Make a new directory called "ka-ching" under /usr/share/
sudo mkdir /usr/share/ka-ching
@arq5x
arq5x / overlap.sh
Created April 3, 2012 21:47
aparna
read 11960......................................................................12030
exon 12010..............................<snip>12057
overlap (21bp) ********************
@arq5x
arq5x / diff-file.sh
Created April 13, 2012 18:17
VCF->TPED-PLINK-BED
# Create SNPs-only subset of the 1000G callset VCF
$ curl -s http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20120316_phase1_integrated_release_version2/ALL.chr22.phase1_release_v2.20101123.snps_indels_svs.vcf.gz | \
zcat | \
awk '$0 ~ /^#/ || $0 ~ "VT=SNP"' | \
gzip > ALL.chr22.phase1_release_v2.20101123.snps_indels_svs.snpsonly.vcf.gz
# Use VCFTools to convert to TPED
$ vcftools
VCFtools (v0.1.8)
@arq5x
arq5x / basic-exome-outline.sh
Created June 15, 2012 13:35
Exome pipeline for Charles
##########################################
# Step 0. setup a list of sample names.
# Assume that each of your gzipped
# FASTQ files is named as follows:
# sample1.1.fq.gz
# sample1.2.fq.gz
# sample2.1.fq.gz
# sample2.2.fq.gz
# ...
# sampleN.1.fq.gz