Skip to content

Instantly share code, notes, and snippets.

View vsbuffalo's full-sized avatar

Vince Buffalo vsbuffalo

View GitHub Profile
@vsbuffalo
vsbuffalo / ten-commandments.md
Created August 10, 2012 06:02
The Ten Commandments of Scientific Coding

The Ten Commandments of Scientific Coding

  1. Thou shall use version control.

  2. Thou shall comment thy code.

  3. Thou shall use existing libraries whenever possible.

  4. Thou shall try to unit test.

@vsbuffalo
vsbuffalo / hmmerfix.py
Created December 5, 2012 06:50
A function that writes a parser
# hmmerfix.py - fix terrible output format (fixed width) of HMMER
# We make a simple assumption about the data: any non-delimiter spaces
# are in the last column. Under this assumption, we build a regular
# expression programmatically with strict types. Strict typing and the
# number of grouped elements ensure some safety.
import re
import sys
from collections import OrderedDict
@vsbuffalo
vsbuffalo / mapqcov.c
Created December 7, 2012 20:53
Get coverage of high-quality mapping reads only.
/*
(c) Vince Buffalo, 2012; License: GPL.
I compile this on my system (OS X with samtools installed via Homebrew) with:
clang -g -lz -L/usr/local/Cellar/samtools/0.1.18/lib/ -lbam -I/usr/local/Cellar/samtools/0.1.18/include/bam/ -o mapqcov mapqcov.c
But you'll likely need to compile differently for you libraries. This code works and is consistent with the slower results from mpileup. This is just a standalone beta; it will likely be incorporated into something else.
*/
#include <stdio.h>
@vsbuffalo
vsbuffalo / gist:5536020
Created May 7, 2013 20:52
Instantaneous mapping rate
tail -f your_mapping.sam | bioawk -csam 'BEGIN{total=0;mapped=0} {total = total + 1; mapped = mapped + !and($flag, 4); if (total % 100 == 0) { printf "\tmapping rate: %f\r",mapped/total }}'
"""
Generate a IUPAC nucleotide table. The order is a modification of Heng
Li's. I have added '-' to mean gap. X means non-IUPAC character and
can be a useful warning.
"""
import sys
rev_iupac = "XACMGRSVTWYHKDBN-"
rev_iupac_alt = "xacmgrsvtwyhkdbn."
@vsbuffalo
vsbuffalo / trim.sh
Created August 8, 2013 05:17
generic, slightly insane paired end quality trimming script
#!/bin/bash
# trim.sh - generic, slightly insane paired end quality trimming script
# Vince Buffalo <vsbuffaloAAAAAA@gmail.com> (sans poly-A)
set -e
set -u
## pre-config
ADAPTERS=illumina_adapters.fa
SAMPLE_NAME=some_sample_name
IN1=in1.fastq
@vsbuffalo
vsbuffalo / .tmux
Created August 19, 2013 01:27
My tmux configuration
# use GNU screen's C-a binding, since it's programmed in my brain
set-option -g prefix C-a
unbind C-b
# use GNU screen's C-a C-a for last window
bind-key C-a last-window
# use 1-based indexing, since 1 is close
set -g base-index 1
import numpy as np
from itertools import combinations
from collections import Counter
import datetime as dt
np.random.seed(0)
def repeat_mutation_sim(G, N, L, mu=3e-8):
"""
Generate N repeats of length L mutating at rate
@vsbuffalo
vsbuffalo / naive_nshared.py
Created September 5, 2013 23:33
Calculate number of minor alleles (not in consensus sequence).
import sys
from readfq import readfq
from itertools import combinations
from datetime import datetime
def num_shared(seq_a, seq_b, consensus_seq):
"""
Given two alignment sequences in multiple alignment FASTA format,
calculate the number of shared SNPs (for minor alleles only, not
in consensus).
@vsbuffalo
vsbuffalo / entropy_vince.py
Created September 26, 2013 17:57
Vince's version of entropy in Python
"""
entropy.py
Calculate entropy of a given list.
"""
from math import log, log10
from collections import Counter
import pdb
def entropy(x, logfun=lambda x: log(x, 2)):