Skip to content

Instantly share code, notes, and snippets.

@brentp
Created September 15, 2011 15:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brentp/1219575 to your computer and use it in GitHub Desktop.
Save brentp/1219575 to your computer and use it in GitHub Desktop.
solid-trimmer help

BWA, BFAST and Mosaik, Bowtie and LifeScope all take colorspace reads in a different format. There is no tool available to trim colorspace reads and output a format that is compatible with all of those aligners. This is an attempt to fill that gap.

Compatibility

It can trim reads and output for
  • BWA (-p ends in .fq or .fastq and --encode is specified)
  • BFAST/Mosaik (-p ends in .fq/.fastq --encode is NOT specified)
  • Bowtie/LifeScope (-p does NOT end in .fq/.fastq and --encode is NOT specified

Examples

This example will create the files example_ma_F3.csfasta and example_F3_QV.qual trimming reads reads by creating a moving average of 7 bases and then keeping all reads above 12 starting from the left end. It will truncate reads with more than 3 '.'s It will then discard any reads with a length less than 25.:

$ solid-trimmer.py -c $CSFASTA \
             -q  $QUAL \
             -p example_ma \
             --max-ns 3 \
             --moving-average 7:12 \
             --min-read-length 25

if -p were example_ma.fastq, a .fastq file would be created for BFAST/Mosaik. Adding --encode would make that compatible with BWA

if --min-read-length is not specified, all reads are kept, not matter how short. This makes is simpler to post-process paired-end reads for joint filtering.

optional arguments:

-h, --help show this help message and exit

inputs/outputs:

-c C csfasta file -q Q qual file -p PREFIX prefix of the output files (does not include the '_F3'. if this endswith .fastq[.gz] .fq[.gz] the output is a single fastq file rather than new .csfasta, qual files --encode output doubly encoded FASTQ sequences e.g. for use in BWA. default is False, for use in Mosaik, BFAST

trimming:

options for trimming:

--min-qual MINQ       bases with quality below this value will be trimmed
                      from the end
--max-ns MAXN         reads with more than this number of '.'s are chopped
--moving-average MA   creating a moving average of window-size `window` on
                      the quals chop as soon as the mov. avg. drops below
                      `min` specified as: window:min e.g.: 7:12. The window
                      must be odd
--q-trim QTRIM        BWA's -q parameter for quality trimming default: -1
                      means no trimming
filtering::

By default no filtering is done

--min-read-length MIN_LEN

reads shorter than this after trimming are not printed. default: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment