Kaixhin/prob-notation.md

## prob-notation.md

      
    Raw
  

              prob-notation.md
            
          
    Probability notation

Note: Great refresher/glossary on probability/statistics and related topics here


Notation
Definition


X
Random variable


P(X)
Probability distribution over random variable X


X ~ P(X)
Random variable X follows (~) the probability distribution P(X) *


x ~ P(X)
Value x sampled (~) from the probability distribution P(X) via a generative process


p(X)
Density function of the probability distribution P(X); scalar function over the measure space of X


p(X = x) (shorthand p(x))
Density function evaluated at value x


* Note that P(X) is used as an arbitray placeholder for a defined probability distribution, such as N(0, 1), and this seemingly recursive statement is only used to define syntax.


Many academic papers use the terms "variables", "distributions", "densities", and even "models" interchangeably. This is not necessarily wrong per se, since X, P(X), and p(X) all imply each other via a one-to-one correspondence. However, it's confusing to mix these words together because their types are different (it doesn't make sense to sample a function, nor does it make sense to integrate a distribution).

Random events are events that have a chance of happening. Random variables are functions that map a set of possible outcomes to (almost always) a real number. Random variables belong to a space, e.g. X ∈ X, such that sampled values are integers, real numbers etc.
Systems can be modelled as collections of random variables, including observed variables (e.g. X) and usually hidden/latent variables (e.g. Z). Joint probability distributions are defined over multiple variables, e.g. P(X, Z). Conditional probabilities denote the effect of some variables, e.g. Z, possibly determining others, e.g. X, denoted as P(X|Z). The joint distribution can be constructed from the conditional by the rule P(X, Z) = P(X|Z)·P(Z). Additionally, a distribution over a subset of variables can be extracted via marginalisation P(Z) = ΣₓP(X, Z). Finally, a joint distribution can be factorised via the chain/product rule P(X, Y, Z) = P(X|Y, Z)·P(Y|Z)·P(Z).
Probabilistic models can be drawn as graphs, with variables as vertices and dependencies as edges. For example, a directed edge from Z to X expresses the conditional probability P(X|Z).
Bayes Theorem

P(Z|X) = P(X|Z) . P(Z) / P(X)


Notation
Term


P(Z|X)
Posterior


P(X|Z)
Likelihood


P(Z)
Prior


P(X)
Evidence


The evidence is typically treated as just a normalisation constant, as the desired result is merely the proportional relationship P(Z|X) ∝ P(X|Z) . P(Z).

Mainly adapted from A Beginner's Guide to Variational Methods: Mean-Field Approximation
Notation	Definition
X	Random variable
P(X)	Probability distribution over random variable X
X ~ P(X)	Random variable X follows (~) the probability distribution P(X) *
x ~ P(X)	Value x sampled (~) from the probability distribution P(X) via a generative process
p(X)	Density function of the probability distribution P(X); scalar function over the measure space of X
p(X = x) (shorthand p(x))	Density function evaluated at value x
Notation	Term
P(Z\|X)	Posterior
P(X\|Z)	Likelihood
P(Z)	Prior
P(X)	Evidence