Skip to content

Instantly share code, notes, and snippets.

View standage's full-sized avatar

Daniel Standage standage

View GitHub Profile
@standage
standage / asmbleval.pl
Last active February 15, 2020 17:46
Calculate summary statistics for a genome/transcriptome assembly.
#!/usr/bin/env perl
use strict;
$/ = ">";
<STDIN>; # Discard "junk", if any, at beginning of the file.
my @sequencelengths;
my $gccontent = 0;
my $atcontent = 0;
my $combinedlength = 0;
@standage
standage / fq-strip-contam.sh
Last active November 26, 2019 17:55
Procedure for removing contaminants from paired-end sequence data. The bwa-mem algorithm is used to map reads against a database of contaminants, a small Perl one-liner is used to filter out reads that map to the contaminants, the SAM data is converted to BAM format, which is then fed (via process substitution) to Tophat's bam2fastx to convert b…
# -q: output in Fastq format
# -Q: ignore BAM quality flags
# -P: paired-end data
bam2fastx -qQP -o clean.fq <(bwa mem contaminants.fasta reads.1.fq reads.2.fq | \
perl -ne '@v = split(/\t/); print if(m/^@/ or ($v[1] & 4 and $v[1] & 8))' | \
samtools view -bhS -)
ProbandSex Chrom MotherDist FatherDist ProbandDist
Male X mu mu/2 mu/2
Male Y ??? mu/2 mu/2
Female X mu mu/2 mu
@standage
standage / inheritance-scenarios.tsv
Created October 25, 2018 16:53
Inheritance scenarios
Proband Mother Father ValidAutosomal ValidSonChrX ValidSonChrY ValidDaughterChrX
0/0 0/0 0/0 + + + +
0/0 0/0 0/1 + + + +
0/0 0/0 1/1 - - - -
0/0 0/1 0/0 + + - +
0/0 0/1 0/1 + + - +
0/0 0/1 1/1 - - - -
0/0 1/1 0/0 - - - -
0/0 1/1 0/1 - - - -
0/0 1/1 1/1 - - - -
alias list='ls -lhp'
alias vsc='open -a /Applications/Visual\ Studio\ Code.app/'
alias msu='ssh hpcc.msu.edu'
alias cab='ssh cabernet.genomecenter.ucdavis.edu'
alias rum='ssh rum.genomecenter.ucdavis.edu'
alias gremlin='ssh gremlin2.soic.indiana.edu'
export CLICOLOR=1
export HISTSIZE=1000000
export HISTFILESIZE=1000000
export EDITOR=nano
@standage
standage / sample.py
Created July 31, 2018 20:14
Simple kevlar utility for reservoir sampling of partitioned reads. The sampling function might be of general interest.
#!/usr/bin/env python
import argparse
import kevlar
import random
import sys
def resv_samp(objstream, n=100, stopafter=None, filterfunc=None):
sample = list()
for counter, obj in enumerate(objstream):
if filterfunc:
INDIVIDUALS = ('proband', 'mother', 'father')
KSIZES = ('27', '31', '35', '39', '43', '47', '51')
rule all:
input:
expand('{coverage}x_k{ksize}_kevlar_calls_like.vcf',
coverage=('30'), ksize=KSIZES)
rule novel_reads:
@standage
standage / fasta-validate.pl
Last active June 8, 2018 15:15
NCBI provides specifications for several Fasta defline identifier formats/conventions at ftp://ftp.ncbi.nih.gov/blast/documents/formatdb.html. This script will automatically detect the convention used sequence-by-sequence, and convert all deflines to the requested format.
#!/usr/bin/env perl
# Copyright (c) 2012-2013, Daniel S. Standage <daniel.standage@gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
@standage
standage / like.py
Last active May 11, 2018 18:33
Computing variant likelihoods
#!/usr/bin/env python
import argparse
from collections import defaultdict
import khmer
from math import log
import re
import scipy.stats
import sys
@standage
standage / filter.cpp
Last active May 9, 2018 22:15
A filter class implementing a Bloom filter and Count-min sketch. Compile with `g++ -Wall -Werror -O3 -o test --std=c++11 testfilter.cpp filter.cpp`.
#include <iostream>
#include <assert.h>
#include <cmath>
#include "filter.hpp"
template<typename ElementType, typename CounterType, size_t maxcount>
filter<ElementType, CounterType, maxcount>::filter(std::vector<size_t> array_sizes)
: _cells_occupied(array_sizes.size(), 0),
_arrays(array_sizes.size(), std::vector<CounterType>())
{