This document summarizes some of the distinguishing features of tools used to analyze RNA-Seq data.
Aligners:
Name | Type | Algorithm | Links |
---|---|---|---|
bowtie2 | aligner | seed and extend | Manual, Download |
subread | aligner | seed and vote | Manual, Download |
TopHat2 | aligner | seed and extend | Manual, Download |
Transcript quantifiers:
Name | Type | Algorithm | Links |
---|---|---|---|
cufflinks | quantifier | ? | Manual, Download |
featureCounts | counter | overlap | Manual, Download |
RSEM | quantifier | expectation maximization | Manual, Download |
A read's map quality score Q depends on the probability P that the read is not mapped to the true point of origin. A mapping quality of 10 or less indicates at least a 1 in 10 chance that the read truly originated elsewhere.
-
The alignment score for a paired-end alignment is the sum of the two mate scores.
-
Additional alignments are flagged as secondary (256) in the SAM column 2.
-
Commands:
# Count unique alignments: samtools view -c -F 256 file.bam
A read's map quality score Q is a function of its sequencing base-call qualities and alignment:
L read length
p_i base-call P(incorrect) for the ith base
b_m set of match positions
b_{mm} set of mismatched positions
-
Base-call P(incorrect) depends on base quality Q reported by the sequencer:
-
Read bases found to be insertions are treated as matched bases in the MQS calculation.
-
The MQS is read-length normalized in the range [0, 60).
TopHat2 does not have mapping quality scores.
Instead, the MAPQ column describes the number of different alignments for each read.
MAPQ | Alignments |
---|---|
50 | 1 |
3 | 2 |
2 | 3 |
1 | 4-9 |
0 | >=10 |
-
Commands:
# Count unique alignments: samtools view -c -q 50 file.bam
Count values span several orders of magnitude.
- After logarithmic transformation, low abundance fragments will tend to show large standard deviations across samples.
- With untransformed data, high abundance fragments will tend to show large standard deviations across samples.
This heteroscedasticity skews the analysis.
The variance-stabilizing transformation was introduced by Anders and Huber (2010), and implemented in the DESeq2 package (Loveet al.,2014). After this transformation, the standard deviations show less dependence on the fragment abundance.