Skip to content

Instantly share code, notes, and snippets.

@jrherr
Forked from standage/fq-strip-contam.sh
Last active September 9, 2015 15:28
Show Gist options
  • Save jrherr/6c405cb253eb744ff422 to your computer and use it in GitHub Desktop.
Save jrherr/6c405cb253eb744ff422 to your computer and use it in GitHub Desktop.
Procedure for removing contaminants from paired-end sequence data. The bwa-mem algorithm is used to map reads against a database of contaminants, a small Perl one-liner is used to filter out reads that map to the contaminants, the SAM data is converted to BAM format, which is then fed (via process substitution) to Tophat's bam2fastx to convert b…
# -q: output in Fastq format
# -Q: ignore BAM quality flags
# -P: paired-end data
bam2fastx -qQP -o clean.fq <(bwa mem contaminants.fasta reads.1.fq reads.2.fq | \
perl -ne '@v = split(/\t/); print if(m/^@/ or ($v[1] & 4 and $v[1] & 8))' | \
samtools view -bhS -)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment