Eric T. Dawson edawson

edawson / import_vcf_into_df.R
Created June 18, 2017 17:55 — forked from sephraim/import_vcf_into_df.R
Import VCF file into data frame in R
# Import VCF
my.vcf <- read.vcfR('my.vcf.gz')
# Combine CHROM thru FILTER cols + INFO cols
my.vcf.df <- cbind(, INFO2df(my.vcf))
edawson /
Created January 30, 2018 18:23 — forked from nicktoumpelis/
Cleans and resets a git repo and its submodules
git clean -xfd
git submodule foreach --recursive git clean -xfd
git reset --hard
git submodule foreach --recursive git reset --hard
git submodule update --init --recursive
edawson /
Created July 20, 2018 10:33
Convert the default UCSC table browser format to BRASS-compatible centro/telomere dictionary file (python3).
import sys
import gzip
from collections import defaultdict
if __name__ == "__main__":
d_telo= defaultdict(list)
d_centro = {}
with[1], "rt") as ifi:
for line in ifi:
edawson /
Created December 11, 2018 01:24
Convert a Picard interval list file to a bzip'ed, tabix-indexed BED file.
grep -v "^@" $1 | awk '{print $1"\t"$2"\t"$3"\t"$5}' > $(dirname $1)/$(basename $1 .interval_list).bed && \
bgzip $(dirname $1)/$(basename $1 .interval_list).bed && \
tabix $(dirname $1)/$(basename $1 .interval_list).bed.gz
edawson /
Last active February 5, 2019 14:07
Split a FASTQ (or pair) into 100K read splits using GNU split and pigz. Modified from an original script by @ekg.
ddir=$(dirname $first_reads)
obase_first=$(basename $first_reads .fastq.gz)
obase_second=$(basename $second_reads .fastq.gz)
if [ ! -z ${first_reads} ] && [ -e ${first_reads} ]
edawson / wdl_idioms.wdl
Last active March 3, 2019 17:16
An example WDL file which documents some idioms of the language
## Tasks are upper camel-cased
task CheckSex{
File sampleBAM
File sampleIndex
## Optional parameters receive a '?' after the type
Int? diskGB
## select_first can be used to set default values
diskGB = select_first([diskGB, 100])
edawson /
Last active March 5, 2019 20:42 — forked from arq5x/
Natural sort a VCF
chmod a+x trio.trim.vep.vcf.gz

Bedtools Cheatsheet


Tools Description
flank Create new intervals from the flanks of existing intervals.
slop Adjust the size of intervals.
shift Adjust the position of intervals.
subtract Remove intervals based on overlaps b/w two files.
edawson /
Created May 17, 2019 14:19 — forked from andy-thomason/
Genomics a programmers introduction

Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.

edawson / readBam.C
Created March 19, 2020 03:02 — forked from PoisonAlien/readBam.C
reading bam files in C using htslib
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <htslib/sam.h>
int main(int argc, char *argv[]){
samFile *fp_in = hts_open(argv[1],"r"); //open bam file
bam_hdr_t *bamHdr = sam_hdr_read(fp_in); //read header
bam1_t *aln = bam_init1(); //initialize an alignment