Skip to content

Instantly share code, notes, and snippets.

View adamewing's full-sized avatar
🗿

adamewing

🗿
  • Mater Research Institute, University of Queensland
  • Woolloongabba, QLD
View GitHub Profile
@adamewing
adamewing / gist:3416649
Created August 21, 2012 15:36
count reads in indexed .bam file
samtools idxstats my.bam | awk '{reads += ($3+$4)} END {print reads}'
@adamewing
adamewing / fix_mapsplice_bam.py
Created March 24, 2013 00:31
Attempts to fix various validation issues that exist in BAM files generated by mapsplice.
#!/usr/bin/env python
''' Script for correcting broken mapsplice BAMs using pysam
Adam Ewing (ewingad@soe.ucsc.edu)
'''
import sys
import pysam
import os
import subprocess
@adamewing
adamewing / ascp_get.sh
Last active December 21, 2015 02:29
SRA download script
#!/bin/sh
if [ $# -ne 1 ]
then
cat << EOF
usage: $0 <list of SRA location, name (whitespace seperated)>
For example, make a file called sra.txt and add this to it:
#!/usr/bin/env python
import vcf
import sys
import traceback
import gzip
'''
validatevcf.py
Contact: Adam Ewing (ewingad@soe.ucsc.edu)
@adamewing
adamewing / guess_gender.py
Created November 22, 2013 18:40
Need to know if your BAM file is male or female?
#!/usr/bin/env python
import pysam
import sys
import numpy as np
from itertools import izip
from random import uniform
def get_depth(bam, chrom, pos):
@adamewing
adamewing / fixRGpairs.py
Last active August 29, 2015 13:56
Python/pysam script to correct cases where read groups on paired end reads are mismatched
#!/usr/bin/env python
import pysam
import sys
from re import sub
def getRG(tags):
''' fetch RG, tags is from a pysam.AlignedRead.tags, returns RG name '''
for tag, val in tags:
if tag == 'RG':
@adamewing
adamewing / remove_svtype.py
Created April 20, 2014 04:36
Make INDELs parse as INDELs via PyVCF by removing SVTYPE field
#!/usr/bin/env python
import vcf
import sys
'''
PyVCF will parse INDELs as SVs if they have SVTYPE in the INFO field.
This provides a quick-fix.
contact: adam.ewing@mater.uq.edu.au
#!/usr/bin/env python
'''
Join and re-annotate repeatmasker entities.
Expects input to be from UCSC public mysql server repeatmasker tables, using a command similar to the following:
echo "select genoName, genoStart, genoEnd, repStart, repEnd, strand, repName from rmsk where repClass = 'Other'" | mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N | sort -k1,1 -k2,2n
requires: pysam (for Fastafile), biopython, exonerate (https://www.ebi.ac.uk/~guy/exonerate/)
@adamewing
adamewing / wg_cnprofile.py
Last active December 12, 2017 03:28
WGS CNV profiles from paired samples
#!/usr/bin/env python
'''
Simple copy number profiling script for tumour / normal pairs.
Adam Ewing
adam.ewing@mater.uq.edu.au
'''
#!/usr/bin/env python
from collections import defaultdict as dd
from bx.intervals.intersection import Intersecter, Interval
def interval_forest(bed_file):
''' build dictionary of interval trees '''
forest = dd(Intersecter)
with open(bed_file, 'r') as bed: