Skip to content

Instantly share code, notes, and snippets.

@Kaixhin
Last active November 15, 2018 11:25
Show Gist options
  • Save Kaixhin/65c2b6bb896c7cf27cb430c11023d4e4 to your computer and use it in GitHub Desktop.
Save Kaixhin/65c2b6bb896c7cf27cb430c11023d4e4 to your computer and use it in GitHub Desktop.
Probability notation

Probability notation

Note: Great refresher/glossary on probability/statistics and related topics here

Notation Definition
X Random variable
P(X) Probability distribution over random variable X
X ~ P(X) Random variable X follows (~) the probability distribution P(X) *
x ~ P(X) Value x sampled (~) from the probability distribution P(X) via a generative process
p(X) Density function of the probability distribution P(X); scalar function over the measure space of X
p(X = x) (shorthand p(x)) Density function evaluated at value x

* Note that P(X) is used as an arbitray placeholder for a defined probability distribution, such as N(0, 1), and this seemingly recursive statement is only used to define syntax.


Many academic papers use the terms "variables", "distributions", "densities", and even "models" interchangeably. This is not necessarily wrong per se, since X, P(X), and p(X) all imply each other via a one-to-one correspondence. However, it's confusing to mix these words together because their types are different (it doesn't make sense to sample a function, nor does it make sense to integrate a distribution).

Random events are events that have a chance of happening. Random variables are functions that map a set of possible outcomes to (almost always) a real number. Random variables belong to a space, e.g. XX, such that sampled values are integers, real numbers etc.

Systems can be modelled as collections of random variables, including observed variables (e.g. X) and usually hidden/latent variables (e.g. Z). Joint probability distributions are defined over multiple variables, e.g. P(X, Z). Conditional probabilities denote the effect of some variables, e.g. Z, possibly determining others, e.g. X, denoted as P(X|Z). The joint distribution can be constructed from the conditional by the rule P(X, Z) = P(X|Z)·P(Z). Additionally, a distribution over a subset of variables can be extracted via marginalisation P(Z) = ΣₓP(X, Z). Finally, a joint distribution can be factorised via the chain/product rule P(X, Y, Z) = P(X|Y, Z)·P(Y|Z)·P(Z).

Probabilistic models can be drawn as graphs, with variables as vertices and dependencies as edges. For example, a directed edge from Z to X expresses the conditional probability P(X|Z).

Bayes Theorem

P(Z|X) = P(X|Z) . P(Z) / P(X)

Notation Term
P(Z|X) Posterior
P(X|Z) Likelihood
P(Z) Prior
P(X) Evidence

The evidence is typically treated as just a normalisation constant, as the desired result is merely the proportional relationship P(Z|X)P(X|Z) . P(Z).


Mainly adapted from A Beginner's Guide to Variational Methods: Mean-Field Approximation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment