slowkow/rnaseq-notes.md

## rnaseq-notes.md

      
    Raw
  

              rnaseq-notes.md
            
          
    RNA-Seq Notes

This document summarizes some of the distinguishing features of tools
used to analyze RNA-Seq data.
Aligners:


Name
Type
Algorithm
Links


bowtie2
aligner
seed and extend
Manual, Download


subread
aligner
seed and vote
Manual, Download


TopHat2
aligner
seed and extend
Manual, Download


Transcript quantifiers:


Name
Type
Algorithm
Links


cufflinks
quantifier
?
Manual, Download


featureCounts
counter
overlap
Manual, Download


RSEM
quantifier
expectation maximization
Manual, Download


Aligners

bowtie2

Map quality score

A read's map quality score Q depends on the probability P that the read is not mapped to the true point of origin. A mapping quality of 10 or less indicates at least a 1 in 10 chance that the read truly originated elsewhere.


The alignment score for a paired-end alignment is the sum of the two mate scores.


Additional alignments are flagged as secondary (256) in the SAM column 2.


Commands:
  # Count unique alignments:
  samtools view -c -F 256 file.bam


Subread

Map quality score

A read's map quality score Q is a function of its sequencing base-call qualities and alignment:

L      read length
p_i    base-call P(incorrect) for the ith base
b_m    set of match positions
b_{mm} set of mismatched positions


Base-call P(incorrect) depends on base quality Q reported by the sequencer: 


Read bases found to be insertions are treated as matched bases in the MQS calculation.


The MQS is read-length normalized in the range [0, 60).


TopHat2

Map quality score

TopHat2 does not have mapping quality scores.
Instead, the MAPQ column describes the number of different alignments for each read.


MAPQ
Alignments


50
1


3
2


2
3


1
4-9


0
>=10


Source


Commands:
  # Count unique alignments:
  samtools view -c -q 50 file.bam


Variance-stabilizing transformation

Count values span several orders of magnitude.

After logarithmic transformation, low abundance fragments will tend to show large standard deviations across samples.
With untransformed data, high abundance fragments will tend to show large standard deviations across samples.

This heteroscedasticity skews the analysis.
The variance-stabilizing transformation was introduced by Anders and Huber (2010), and implemented in the DESeq2 package (Loveet al.,2014). After this transformation, the standard deviations show less dependence on the fragment abundance.
Name	Type	Algorithm	Links
bowtie2	aligner	seed and extend	Manual, Download
subread	aligner	seed and vote	Manual, Download
TopHat2	aligner	seed and extend	Manual, Download
Name	Type	Algorithm	Links
cufflinks	quantifier	?	Manual, Download
featureCounts	counter	overlap	Manual, Download
RSEM	quantifier	expectation maximization	Manual, Download