Skip to content

Instantly share code, notes, and snippets.

View edawson's full-sized avatar

Eric T. Dawson edawson

View GitHub Profile
@edawson
edawson / import_vcf_into_df.R
Created June 18, 2017 17:55 — forked from sephraim/import_vcf_into_df.R
Import VCF file into data frame in R
library(vcfR)
# Import VCF
my.vcf <- read.vcfR('my.vcf.gz')
# Combine CHROM thru FILTER cols + INFO cols
my.vcf.df <- cbind(as.data.frame(getFIX(my.vcf)), INFO2df(my.vcf))
@edawson
edawson / repo-rinse.sh
Created January 30, 2018 18:23 — forked from nicktoumpelis/repo-rinse.sh
Cleans and resets a git repo and its submodules
git clean -xfd
git submodule foreach --recursive git clean -xfd
git reset --hard
git submodule foreach --recursive git reset --hard
git submodule update --init --recursive
@edawson
edawson / processCentroAndTelo.py
Created July 20, 2018 10:33
Convert the default UCSC table browser format to BRASS-compatible centro/telomere dictionary file (python3).
import sys
import gzip
from collections import defaultdict
if __name__ == "__main__":
d_telo= defaultdict(list)
d_centro = {}
with gzip.open(sys.argv[1], "rt") as ifi:
for line in ifi:
@edawson
edawson / fastq_splitter.sh
Last active February 5, 2019 14:07
Split a FASTQ (or pair) into 100K read splits using GNU split and pigz. Modified from an original script by @ekg.
first_reads=$1
second_reads=$2
ddir=$(dirname $first_reads)
obase_first=$(basename $first_reads .fastq.gz)
obase_second=$(basename $second_reads .fastq.gz)
splitsz=4000000
if [ ! -z ${first_reads} ] && [ -e ${first_reads} ]
@edawson
edawson / picard_intervals_to_bed.sh
Created December 11, 2018 01:24
Convert a Picard interval list file to a bzip'ed, tabix-indexed BED file.
grep -v "^@" $1 | awk '{print $1"\t"$2"\t"$3"\t"$5}' > $(dirname $1)/$(basename $1 .interval_list).bed && \
bgzip $(dirname $1)/$(basename $1 .interval_list).bed && \
tabix $(dirname $1)/$(basename $1 .interval_list).bed.gz
@edawson
edawson / example.sh
Last active March 5, 2019 20:42 — forked from arq5x/example.sh
Natural sort a VCF
chmod a+x vcfsort.sh
vcfsort.sh trio.trim.vep.vcf.gz
@edawson
edawson / wdl_idioms.wdl
Last active March 3, 2019 17:16
An example WDL file which documents some idioms of the language
## Tasks are upper camel-cased
task CheckSex{
File sampleBAM
File sampleIndex
## Optional parameters receive a '?' after the type
Int? diskGB
## select_first can be used to set default values
diskGB = select_first([diskGB, 100])

Bedtools Cheatsheet

General:

Tools Description
flank Create new intervals from the flanks of existing intervals.
slop Adjust the size of intervals.
shift Adjust the position of intervals.
subtract Remove intervals based on overlaps b/w two files.
@edawson
edawson / Genomics_A_Programmers_Guide.md
Created May 17, 2019 14:19 — forked from andy-thomason/Genomics_A_Programmers_Guide.md
Genomics a programmers introduction

Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.

https://www.genomicsplc.com

@edawson
edawson / readBam.C
Created March 19, 2020 03:02 — forked from PoisonAlien/readBam.C
reading bam files in C using htslib
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <htslib/sam.h>
int main(int argc, char *argv[]){
samFile *fp_in = hts_open(argv[1],"r"); //open bam file
bam_hdr_t *bamHdr = sam_hdr_read(fp_in); //read header
bam1_t *aln = bam_init1(); //initialize an alignment