Skip to content

Instantly share code, notes, and snippets.

@slowkow
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slowkow/45cae343e89f78cb1582 to your computer and use it in GitHub Desktop.
Save slowkow/45cae343e89f78cb1582 to your computer and use it in GitHub Desktop.
RNA-seq notes

RNA-Seq Notes

This document summarizes some of the distinguishing features of tools used to analyze RNA-Seq data.

Aligners:

Name Type Algorithm Links
bowtie2 aligner seed and extend Manual, Download
subread aligner seed and vote Manual, Download
TopHat2 aligner seed and extend Manual, Download

Transcript quantifiers:

Name Type Algorithm Links
cufflinks quantifier ? Manual, Download
featureCounts counter overlap Manual, Download
RSEM quantifier expectation maximization Manual, Download

Aligners

bowtie2

Map quality score

A read's map quality score Q depends on the probability P that the read is not mapped to the true point of origin. A mapping quality of 10 or less indicates at least a 1 in 10 chance that the read truly originated elsewhere.

bowtie2 mapq equations

  • The alignment score for a paired-end alignment is the sum of the two mate scores.

  • Additional alignments are flagged as secondary (256) in the SAM column 2.

  • Commands:

      # Count unique alignments:
      samtools view -c -F 256 file.bam
    

Subread

Map quality score

A read's map quality score Q is a function of its sequencing base-call qualities and alignment:

subread mapq equation

L      read length
p_i    base-call P(incorrect) for the ith base
b_m    set of match positions
b_{mm} set of mismatched positions
  • Base-call P(incorrect) depends on base quality Q reported by the sequencer: asd

  • Read bases found to be insertions are treated as matched bases in the MQS calculation.

  • The MQS is read-length normalized in the range [0, 60).

TopHat2

Map quality score

TopHat2 does not have mapping quality scores.

Instead, the MAPQ column describes the number of different alignments for each read.

MAPQ Alignments
50 1
3 2
2 3
1 4-9
0 >=10

Source

  • Commands:

      # Count unique alignments:
      samtools view -c -q 50 file.bam
    

Variance-stabilizing transformation

Count values span several orders of magnitude.

  • After logarithmic transformation, low abundance fragments will tend to show large standard deviations across samples.
  • With untransformed data, high abundance fragments will tend to show large standard deviations across samples.

This heteroscedasticity skews the analysis.

The variance-stabilizing transformation was introduced by Anders and Huber (2010), and implemented in the DESeq2 package (Loveet al.,2014). After this transformation, the standard deviations show less dependence on the fragment abundance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment