The negative binomial (NB) distribution is the canonical statistical model for single-cell RNA-seq (scRNA-seq) gene expression integer counts. It’s favored over the Poisson distribution because real scRNA-seq data are overdispersed—their variance greatly exceeds their mean–due to both biological and technical factors (cell heterogeneity, bursty transcription, varied sequencing depth, dropouts, etc.).
The Poisson’s simple assumption (variance equals mean) fails in this context. In contrast, the NB augments the Poisson distribution with a dispersion parameter (
From a generative perspective, the NB arises as a Poisson–Gamma mixture: each cell/gene’s unknown expression rate