Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
SAM and BAM filtering oneliners

SAM and BAM filtering one-liners

@author: David Fredman, david.fredmanAAAAAA@gmail.com (sans poly-A tail)
@dependencies: http://sourceforge.net/projects/bamtools/ and http://samtools.sourceforge.net/

Please extend with additional/faster/better solutions via a pull request!

BWA mapping (using piping for minimal disk I/O)

bwa aln -t 8 targetGenome.fa reads.fastq | bwa samse targetGenome.fa - reads.fastq\
| samtools view -bt targetGenome.fa - | samtools sort - reads.bwa.targetGenome

samtools index reads.bwa.targetGenome.bam

Count number of records (unmapped reads + each aligned location per mapped read) in a bam file:

samtools view -c filename.bam

Count with flagstat for additional information:

samtools flagstat filename.bam

Count the number of alignments (reads mapping to multiple locations counted multiple times)

samtools view -F 0x04 -c filename.bam

Count number of mapped reads (not mapped locations) for left and right mate in read pairs

samtools view -F 0x40 filename.bam | cut -f1 | sort | uniq | wc -l
samtools view -f 0x40 -F 0x4 filename.bam | cut -f1 | sort | uniq | wc -l #left mate
samtools view -f 0x80 -F 0x4 filename.bam | cut -f1 | sort | uniq  | wc -l #right mate

Remove unmapped reads, keep the mapped reads:

samtools view -F 0x04 -b in.bam > out.aligned.bam

Count UNmapped reads:

samtools view -f4 -c in.bam

Require minimum mapping quality (to retain reliably mapped reads):

samtools view -q 30 -b in.bam > aligned_reads.q30.bam
samtools view -q 30 -c in.bam #to count alignments with score >30

Require match to be on the sense strand of the reference (samtools flag)

samtools view -F 16

Require match to be on antisense strand (samtools flag)

samtools view -f 16

Require at least N matches at the start of the read:

$N=6
samtools view in.bam \
| perl -lane 'next unless $F[5] =~ /^(\d+)M/;print if $1 >= $N;'

Filter by number of mismatches in BWA generated output, use BWA-specific flag:

Tag Meaning
NM     Edit distance
MD     Mismatching positions/bases
AS     Alignment score
BC     Barcode sequence
X0     Number of best hits
X1     Number of suboptimal hits found by BWA
XN     Number of ambiguous bases in the reference
XM     Number of mismatches in the alignment
XO     Number of gap opens
XG     Number of gap extentions
XT     Type: Unique/Repeat/N/Mate-sw
XA     Alternative hits; format: (chr,pos,CIGAR,NM;)*
XS     Suboptimal alignment score
XF     Support from forward/reverse alignment
XE     Number of supporting seeds

To keep only reads that map without any mismatches:

bamtools filter -tag XM:0 -in reads.bam -out reads.noMismatch.bam

Retain only uniquely mapping reads (reads with a single unambigous mapping location):

If BWA was used it is possible to use the BWA XT flag value U for unique (analogously, R is for repeat). I did not find a simple way to do this with samtools or bamtools, so grep to the rescue:

samtools view reads.bam | grep 'XT:A:U' | samtools view -bS -T referenceSequence.fa - > reads.uniqueMap.bam

However, the concept of "uniquely mapping" is not the cleanest idea - in most scenarios any given read could be placed elsewhere although it may be a lower scoring alignment. Thus, you could instead filter based on mapping quality, to retain the "reliably mapped" reads. Different mappers have different scoring models. As a rule of thumb, min values of 5 or 10 will work well. If you used bowtie/bowtie2, try:

samtools view -b -q 10 foo.bam > foo.filtered.bam
@bakerwm

This comment has been minimized.

Show comment
Hide comment
@bakerwm

bakerwm May 1, 2015

What is the definition of uniquely reads in your last section?

bakerwm commented May 1, 2015

What is the definition of uniquely reads in your last section?

@darencard

This comment has been minimized.

Show comment
Hide comment
@darencard

darencard Sep 22, 2015

Probably reads that map only once ("uniquely") in your reference.

Probably reads that map only once ("uniquely") in your reference.

@davfre

This comment has been minimized.

Show comment
Hide comment
@davfre

davfre Jan 15, 2016

Yes! "uniquely mapping reads" = "one unambiguous mapping location". Thanks for the input, I clarified the section.
[Pardon the very late reply - I was not receiving notifications from github and only saw this now]

Owner

davfre commented Jan 15, 2016

Yes! "uniquely mapping reads" = "one unambiguous mapping location". Thanks for the input, I clarified the section.
[Pardon the very late reply - I was not receiving notifications from github and only saw this now]

@nalini1507

This comment has been minimized.

Show comment
Hide comment
@nalini1507

nalini1507 Feb 28, 2016

Hi! Could you complete your answer on " if bowtie was used"..
I basically want to extract uniquely mapped reads from a bam/sam file that has been obtained using bowtie2. Please help! Thanks!

Hi! Could you complete your answer on " if bowtie was used"..
I basically want to extract uniquely mapped reads from a bam/sam file that has been obtained using bowtie2. Please help! Thanks!

@davfre

This comment has been minimized.

Show comment
Hide comment
@davfre

davfre Apr 26, 2016

Answer updated to state that filtering on mapping quality than "uniqueness" is a concept that translates better across tools, and probably what you should do when using bowtie2 if you wish to remove reads with ambiguous mapping locations.

Owner

davfre commented Apr 26, 2016

Answer updated to state that filtering on mapping quality than "uniqueness" is a concept that translates better across tools, and probably what you should do when using bowtie2 if you wish to remove reads with ambiguous mapping locations.

@WinterLi1993

This comment has been minimized.

Show comment
Hide comment
@WinterLi1993

WinterLi1993 Jun 23, 2016

HI, I wanna keep uniquely mapped reads from a bam file that has been obtained using BWA version 0.7.10 mem . Please help ! thanks !

HI, I wanna keep uniquely mapped reads from a bam file that has been obtained using BWA version 0.7.10 mem . Please help ! thanks !

@cnoutsos

This comment has been minimized.

Show comment
Hide comment
@cnoutsos

cnoutsos Jul 8, 2016

Hi,
I have a bam file generated by bowtie (-m 100 and -k 100), I want to keep reads that mapped in a location at least 100 times. Is there a way to do that? thanks!

cnoutsos commented Jul 8, 2016

Hi,
I have a bam file generated by bowtie (-m 100 and -k 100), I want to keep reads that mapped in a location at least 100 times. Is there a way to do that? thanks!

@jmf11

This comment has been minimized.

Show comment
Hide comment
@jmf11

jmf11 Sep 2, 2016

Since primary alignments only appear once and all mapped reads must have a primary alignment, can't you replace this:

samtools view -f 0x40 -F 0x4 filename.bam | cut -f1 | sort | uniq | wc -l #left mate

with:

samtools view -c -f 0x40 -F 0x904 filename.bam

This will exclude unmapped (0x4), secondary (0x100), and supplementary (0x800) alignments.

Am I right?

jmf11 commented Sep 2, 2016

Since primary alignments only appear once and all mapped reads must have a primary alignment, can't you replace this:

samtools view -f 0x40 -F 0x4 filename.bam | cut -f1 | sort | uniq | wc -l #left mate

with:

samtools view -c -f 0x40 -F 0x904 filename.bam

This will exclude unmapped (0x4), secondary (0x100), and supplementary (0x800) alignments.

Am I right?

@davfre

This comment has been minimized.

Show comment
Hide comment
@davfre

davfre Nov 11, 2016

@jmf11: that looks very sensible! I just haven't seen these comments.. if you'd like, please make a pull request with the fix!

Owner

davfre commented Nov 11, 2016

@jmf11: that looks very sensible! I just haven't seen these comments.. if you'd like, please make a pull request with the fix!

@davfre

This comment has been minimized.

Show comment
Hide comment
@davfre

davfre Nov 11, 2016

@cnoutsos: did you find out how to? feel free to insert a line on your solution :)

Owner

davfre commented Nov 11, 2016

@cnoutsos: did you find out how to? feel free to insert a line on your solution :)

@eldariont

This comment has been minimized.

Show comment
Hide comment
@eldariont

eldariont Mar 16, 2017

I'm unsure whether this really computes the number of mapped reads:
samtools view -F 0x40 filename.bam | cut -f1 | sort | uniq | wc -l

Shouldn't it be:
samtools view -F 0x4 filename.bam | cut -f1 | sort | uniq | wc -l

To my knowledge, the 0x4 flag marks unmapped segments :-)

I'm unsure whether this really computes the number of mapped reads:
samtools view -F 0x40 filename.bam | cut -f1 | sort | uniq | wc -l

Shouldn't it be:
samtools view -F 0x4 filename.bam | cut -f1 | sort | uniq | wc -l

To my knowledge, the 0x4 flag marks unmapped segments :-)

@Leelouh

This comment has been minimized.

Show comment
Hide comment
@Leelouh

Leelouh May 8, 2017

Hi, I want to remove all the "second alignment". Do you know how I can do that?

Leelouh commented May 8, 2017

Hi, I want to remove all the "second alignment". Do you know how I can do that?

@otradnaya

This comment has been minimized.

Show comment
Hide comment
@otradnaya

otradnaya Nov 1, 2017

To remove secondary alignments, you can use
samtools view -h -F 0x900 filename.bam
Explanation: remove secondary (0x100) and supplementary (0x800) alignments

To remove secondary alignments, you can use
samtools view -h -F 0x900 filename.bam
Explanation: remove secondary (0x100) and supplementary (0x800) alignments

@kruacademic

This comment has been minimized.

Show comment
Hide comment
@kruacademic

kruacademic Apr 11, 2018

hi i want to retrieve only 22 match sequence read from the bam file.can you suggest in detail?

hi i want to retrieve only 22 match sequence read from the bam file.can you suggest in detail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment