Last active
November 26, 2019 17:55
-
-
Save standage/407b9c89c20491be43b1 to your computer and use it in GitHub Desktop.
Procedure for removing contaminants from paired-end sequence data. The bwa-mem algorithm is used to map reads against a database of contaminants, a small Perl one-liner is used to filter out reads that map to the contaminants, the SAM data is converted to BAM format, which is then fed (via process substitution) to Tophat's bam2fastx to convert b…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -q: output in Fastq format | |
# -Q: ignore BAM quality flags | |
# -P: paired-end data | |
bam2fastx -qQP -o clean.fq <(bwa mem contaminants.fasta reads.1.fq reads.2.fq | \ | |
perl -ne '@v = split(/\t/); print if(m/^@/ or ($v[1] & 4 and $v[1] & 8))' | \ | |
samtools view -bhS -) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Great, one more question. I am assuming you use the SAM out file to create the BAM file not the original SAM file from bwa mem output.