Skip to content

Instantly share code, notes, and snippets.

@jrherr
Forked from davfre/bamfilter_oneliners.md
Last active September 9, 2015 15:19
Show Gist options
  • Save jrherr/a9e97c8f1008f2ce0bc3 to your computer and use it in GitHub Desktop.
Save jrherr/a9e97c8f1008f2ce0bc3 to your computer and use it in GitHub Desktop.

SAM and BAM filtering one-liners

@author: David Fredman, david.fredmanAAAAAA@gmail.com (sans poly-A tail)
@dependencies: http://sourceforge.net/projects/bamtools/ and http://samtools.sourceforge.net/

Please comment or extend with additional/faster/better solutions.

BWA mapping (using piping for minimal disk I/O)

bwa aln -t 8 targetGenome.fa reads.fastq | bwa samse targetGenome.fa - reads.fastq\
| samtools view -bt targetGenome.fa - | samtools sort - reads.bwa.targetGenome

samtools index reads.bwa.targetGenome.bam

Count number of reads/mapped locations in a bam file:

samtools view -c aligned_reads.bam

Count with flagstat for additional information:

samtools flagstat reads.bam

Count mapped locations only (some reads to multiple locations)

samtools view -F 0x04 -c aligned_reads

Count number of mapped reads (not mapped locations) for left and right mate in read pairs

samtools view -F 0x40 aligned_reads.bam | cut -f1 | sort | uniq | wc -l
samtools view -f 0x40 -F 0x4 aligned_reads.bam | cut -f1 | sort | uniq | wc -l #left mate
samtools view -f 0x80 -F 0x4 aligned_reads.bam | cut -f1 | sort | uniq  | wc -l #right mate

Remove unmapped reads, keep the mapped reads:

samtools view -F 0x04 -b in.bam > out.aligned.bam

Count UNmapped reads:

samtools view -f4 -c aligned_reads.bam

Require minimum mapping quality (to retain reliably mapped reads):

samtools view -q 30 -b reads.bam > reads.q30.bam
samtools view -q 30 -c reads.bam #to count

Require match to be on the sense strand of the reference (samtools flag)

samtools view -F 16

Require match to be on antisense strand (samtools flag)

samtools view -f 16

Require at least N matches at the start of the read:

$N=6
samtools view aligned_reads.bam \
| perl -lane 'next unless $F[5] =~ /^(\d+)M/;print if $1 >= $N;'

Filter by number of mismatches in BWA generated output, use BWA-specific flag:

Tag Meaning
NM     Edit distance
MD     Mismatching positions/bases
AS     Alignment score
BC     Barcode sequence
X0     Number of best hits
X1     Number of suboptimal hits found by BWA
XN     Number of ambiguous bases in the reference
XM     Number of mismatches in the alignment
XO     Number of gap opens
XG     Number of gap extentions
XT     Type: Unique/Repeat/N/Mate-sw
XA     Alternative hits; format: (chr,pos,CIGAR,NM;)*
XS     Suboptimal alignment score
XF     Support from forward/reverse alignment
XE     Number of supporting seeds

To keep only reads that map without any mismatches:

bamtools filter -tag XM:0 -in reads.bam -out reads.noMismatch.bam

Retain only uniquely mapping reads:

Here we need to filter by the BWA XT flag value U for unique (or R for repeat if you are curious). I did not find a simple way to do this with samtools or bamtools, so grep to the resque:

samtools view reads.bam | grep 'XT:A:U' | samtools view -bS -T referenceSequence.fa - > reads.uniqueMap.bam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment