BWA, BFAST and Mosaik, Bowtie and LifeScope all take colorspace reads in a different format. There is no tool available to trim colorspace reads and output a format that is compatible with all of those aligners. This is an attempt to fill that gap.
- It can trim reads and output for
- BWA (-p ends in .fq or .fastq and --encode is specified)
- BFAST/Mosaik (-p ends in .fq/.fastq --encode is NOT specified)
- Bowtie/LifeScope (-p does NOT end in .fq/.fastq and --encode is NOT specified
This example will create the files example_ma_F3.csfasta and example_F3_QV.qual trimming reads reads by creating a moving average of 7 bases and then keeping all reads above 12 starting from the left end. It will truncate reads with more than 3 '.'s It will then discard any reads with a length less than 25.:
$ solid-trimmer.py -c $CSFASTA \
-q $QUAL \
-p example_ma \
--max-ns 3 \
--moving-average 7:12 \
--min-read-length 25
if -p were example_ma.fastq, a .fastq file would be created for BFAST/Mosaik. Adding --encode would make that compatible with BWA
if --min-read-length is not specified, all reads are kept, not matter how short. This makes is simpler to post-process paired-end reads for joint filtering.
- optional arguments:
-h, --help show this help message and exit
inputs/outputs:
-c C csfasta file -q Q qual file -p PREFIX prefix of the output files (does not include the '_F3'. if this endswith .fastq[.gz] .fq[.gz] the output is a single fastq file rather than new .csfasta, qual files --encode output doubly encoded FASTQ sequences e.g. for use in BWA. default is False, for use in Mosaik, BFAST
- trimming:
options for trimming:
--min-qual MINQ bases with quality below this value will be trimmed from the end --max-ns MAXN reads with more than this number of '.'s are chopped --moving-average MA creating a moving average of window-size `window` on the quals chop as soon as the mov. avg. drops below `min` specified as: window:min e.g.: 7:12. The window must be odd --q-trim QTRIM BWA's -q parameter for quality trimming default: -1 means no trimming
- filtering::
By default no filtering is done
- --min-read-length MIN_LEN
reads shorter than this after trimming are not printed. default: 0