Skip to content

Instantly share code, notes, and snippets.

@bsmith89
Last active March 22, 2017 15:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bsmith89/02a6ee811b191829f87556b21b2b5482 to your computer and use it in GitHub Desktop.
Save bsmith89/02a6ee811b191829f87556b21b2b5482 to your computer and use it in GitHub Desktop.
A boiled-down version of the commands I use to remove mouse sequence from my Illumina HiSeq reads.
#!/usr/bin/env bash
# Remove pairs which hit the mouse reference.
#
# I haven't tested this script, but you should be able to use the commands in it
# to do your contaminant filtering.
# It would probably be best to feed in fully trimmed reads. Otherwise you
# might want to use "local" mapping.
# See: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#end-to-end-alignment-versus-local-alignment
# Adapter trim and quality trim your HiSeq reads
# Download your reference genome (in this case it's compressed using gzip)
wget -P mouse.fn.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/vertebrate_mammalian/Mus_musculus/latest_assembly_versions/GCA_001632575.1_C3H_HeJ_v1/GCA_001632575.1_C3H_HeJ_v1_genomic.fna.gz
# Unzip the reference. My reference is in FASTA format
gunzip -c mouse.fn.gz > mouse.fn
# I use the ".fn" extension to mean nucleotide FASTA
bowtie2-build mouse.fn mouse
# This produces mouse.1.bt2 mouse.2.bt2 mouse.3.bt2 mouse.4.bt2 mouse.rev.1.bt2 mouse.rev.2.bt2
# Assuming your adapter-trimmed, quality-trimmed sequence data is in a compressed FASTQ format
# (one per end): reads.r1.trim.fastq.gz reads.r2.trim.fastq.gz
# Map each of the paired-end reads (in fastq format) to the mouse reference
bowtie2 -x mouse -q -1 reads.r1.trim.fastq.gz -2 reads.r2.trim.fastq.gz > reads.trim.sam
# Use the samtools tool called "fastq" which converts SAM files to FASTQ format.
# Only output reads if they match the flag "12" which means that
# neither the focal read, nor its paired end mapped to the reference.
# See "FLAGS" on http://www.htslib.org/doc/samtools.html
samtools fastq reads.trim.sam -f12 -n -1 reads.trim.r1.fastq -2 reads.trim.r2.fastq
# To avoid producing what is probably a VERY large intermediate SAM file (and, conveniently,
# compressing the output files), you could combine the previous two commands into:
#
# bowtie2 -x mouse -q -1 reads.r1.trim.fastq.gz -2 reads.r2.trim.fastq.gz \
# | samtools fastq - -f12 -n \
# -1 >(gzip -c > seq/split/$*.r1.trim.filt.fastq.gz) \
# -2 >(gzip -c > seq/split/$*.r2.trim.filt.fastq.gz)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment