start new:
tmux
start new with session name:
tmux new -s myname
Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.
References:
Steps:
https://basespace.illumina.com/sample/9804795/files/tree/NA12878-L1_S1_L001_R1_001.fastq.gz?id=515013503
. The "id" is the unique file identifier.wget -O filename 'https://api.basespace.illumina.com/v1pre3/files/{id}/content?access_token={token}'
, where {token} is from step 1 and {id} from step 2.I 'm fleshing out some of these ideas here: https://github.com/lynaghk/todoFRP/tree/master/todo/angular-cljs
This is a small experiment on the alignment of ~50bp INDELs. The query sequences are shown in 0.01.fq
below, where seq_ori
is a 204bp sequence extracted from the human reference genome, seq_del54
contains a 54bp deletion in the middle, seq_del84
contains a 84bp deletion in a 120bp read, and seq_ins40
contains a 40bp insertion in a 140bp read. These four short sequences were mapped to the human reference genome with Bowtie2, BWA-MEM, LAST, Novoalign, SNAP and Stampy with default settings. Non-default scoring functions were also tested for Bowtie2 (--rdg 5,1 --rfg 5,1), BWA-MEM (-A2 -E1) and LAST (-r2 -q4). The output by various mappers/settings can be found in this gist. The following table gives my summary:
Mapper | Setting | -84bp | -54bp | +40bp |
---|---|---|---|---|
BBMAP | default | Yes | Yes | Yes |
Bowtie2 | default | No | No | No |
Bowtie2 | --rdg 5,1 --rfg 5,1 | as insertion | as insertion | Yes |
BWA-MEM | default | as split | Yes | Yes |
BWA-MEM | -A2 -E1 | Yes | Yes | Yes |
LAST | default | as split | as split |
These notes build from several excellent sources:
and assume you're working with GATK 2.2-16. These notes also assume
(ns pom2proj | |
(:require [clojure.xml :as xml] | |
[clojure.zip :as zip] | |
[clojure.java.io :as io] | |
[clojure.data.zip.xml :as zx]) | |
(:use [clojure.pprint :only [pprint]])) | |
(defn- text-attrs | |
[loc ks] | |
(map (fn [k] |
############################################################ | |
# Novoalign | |
############################################################ | |
export GENOME=/home/arq5x/cphg-home/shared/genomes/hg19/bwa/gatk/hg19_gatk.fa.novo.k14.s1.idx | |
export IRCHOME=/net/midtier18/vol79/cphg-quinlan2/projects/irradiated-clones | |
export STEPNAME=ircnovo | |
export QSUB="qsub -W group_list=cphg_arq5x -q arq5xlab -V -l select=1:mem=32000m:ncpus=16 -N $STEPNAME -m bea -M arq5x@virginia.edu"; | |
echo "cd $IRCHOME; novoalign -d $GENOME -o SAM $'@RG\tID:parental\tSM:parental' -r Random \ | |
-f fastq/CgmW_AGTCAA_L001_R1.fastq.gz fastq/CgmW_AGTCAA_L001_R2.fastq.gz \ |
#!/bin/bash | |
# trim.sh - generic, slightly insane paired end quality trimming script | |
# Vince Buffalo <vsbuffaloAAAAAA@gmail.com> (sans poly-A) | |
set -e | |
set -u | |
## pre-config | |
ADAPTERS=illumina_adapters.fa | |
SAMPLE_NAME=some_sample_name | |
IN1=in1.fastq |