wjlewis/random-variables.md

## random-variables.md

      
    Raw
  

              random-variables.md
            
          
    Random Variables

In probability we often see statements like
$$
P(X \le 4) = \frac{1}{3}
$$
where $P$ is a function and $X$ is a random variable.
I remember being totally confused by this as a novice: why isn't $X$ being used on the righthand side?
In what sense is $X$ "random"?
This is a quick description of what random variables are with a hint of why they're useful.
A probability space consists of

A set of outcomes AKA "things that can happen" AKA the "sample space", usually denoted $\Omega$.
A set of events AKA "things whose probability we can measure" AKA the "event space", usually denoted $\mathcal{F}$.
This is a $\sigma$-algebra on $\Omega$, although this condition is only really important if
the sample space is uncountable.
If $\Omega$ is countable, we can just use $2^\Omega$ (the power set) as our event space.
A function that assigns probabilities to events AKA a probability measure, denoted $P$.
That is $P : \mathcal{F} \to [0, 1]$ with some special conditions (measure properties):


$P(\emptyset) = 0$.
For any event $A \in \mathcal{F}$, $P(A^c) = 1 - P(A)$.
For a countable family $\mathcal{A}$ of pairwise-disjoint events, $$P\left(\bigcup_{A \in \mathcal{A}} A\right) = \sum_{A \in \mathcal{A}} P(A)$$


Now the elements of the sample space $\Omega$ can be anything, which means we need to treat $\Omega$ as a "plain old set".
As an example, consider a sample space intended to represent the outcomes of drawing a single card from a standard 52-card deck.
We might represent it like $$\Omega = \{ AC, 1C, 2C, \ldots, AD, 1D, 2D, \ldots, AH, 1H, 2H, \ldots, AS, 1S, 2S, \ldots \}$$
Here $2C$ represents the 2 of clubs, etc.
The point is that this is "just" a set: we don't know how to add 2 cards, multiply them, compose them, etc.
In many cases, what we really want to know are things like: "What's the probability of winning $100 in a game in which
I win $100 for rolling snake eyes (double 1s) and lose $10 for rolling anything else?".
Or, more realistically, "If I play this game a lot, will I win in the long run?".
These statements involve outcomes ("rolling snake eyes"), but the outcomes have some additional meaning attached ("I win $100$").
This is exactly what random variables do: random variables allow us to attach meaning to outcomes.
To see how this works, lets look at the dice game.
The sample space ("things that can happen") is the set of all possible pairs of dice throws, which we might
represent as
$$\{ (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), \ldots, (6, 5), (6, 6) \}$$
For reasons we'll go into elsewhere, we can basically forget about $\mathcal{F}$, and just define the probability
of each outcome.
In this case let's suppose the dice are fair and that each of these outcomes is equally likely.
That is, $P(\{ (i, j) \}) = \frac{1}{36}$.
Now we're ready to "attach" meaning to the outcomes via a random variable, which we'll call $W$ (for "winnings").
$W$ needs to basically say, "Rolling $(1, 1)$ means 100 dollars, and rolling any other outcome $(i, j)$ means -10 dollars".
This is a task perfectly suited to...functions.
That is, $W$ is just a function that maps outcomes to values in some "meaningful" set, usually $\mathbb{R}$.
Symbolically, $W : \Omega \to \mathbb{R}$.
We require that $W$ have some additional properties, but we don't need to worry about those here.
In our case, we can define $W$ by (switching to Haskell since GitHub MD doesn't seem to handle \cases):
W (1, 1) = 100
W (_, _) = -10
It might help to think of $W$ as a table in which outcomes are "tagged" with their meaning:
outcome | meaning
------- + -------
(1, 1)  | 100
(1, 2)  | -10
(1, 3)  | -10
...     | ...

Now to discover the "meaning" of an outcome we just apply $W$ to it: $W((1, 1)) = 100$, $W((3, 4)) = -10$, etc.
We're now in a position to answer some of our original questions.
The expressions $W = 100$, $W \le 50$, etc. are all sets; in particular, they're all events i.e. things
whose probability we can compute.
This is just "syntactic sugar" defined like this: $$W = v \equiv \{ \omega \in \Omega \mid W(\omega) = v \}$$
That is, $W = v$ is the set of all outcomes $\omega$ for which $W(\omega) = v$.
Likewise $$W \le v \equiv \{ \omega \in \Omega \mid W(\omega) \le v \}$$
Because these $W \textrm{ op } v$ expressions identify events, we can also use set operations on them.
For instance, $X = 100 \cup X = 50$ is the union of two events.
So the probability of me winning 100 dollars is just $P(W = 100) = P(\{ (1, 1) \}) = \frac{1}{36}$.
The "expected value of $W$" (more on this later) $E[W]$¹ is $$100 \cdot P(W = 100) + -10 \cdot P(W = -10) = \frac{100}{36} + \frac{-350}{36} \approx -6.9$$
We probably shouldn't play this game.
Footnotes


This turns out to be a minor detail, but note that $E[W]$ implies a probability measure $P$.
That is $W$ doesn't contain information about probabilities: remember, it's just a table of "tags" for outcomes.
It might make sense to use the same $W$ with different probability measure (maybe a fair die and a weighted die),
but in this case $E[W]$ will depend on which measure we use.
This will always be clear from the context, but I think it's worth noting since the expression $E[W]$ might
give the impression that $W$ "knows" something about probabilities. ↩