Skip to content

Instantly share code, notes, and snippets.

@jaidevd
Created February 3, 2023 05:43
Show Gist options
  • Save jaidevd/9747fd0937f621b672594b7e3e8d2f3c to your computer and use it in GitHub Desktop.
Save jaidevd/9747fd0937f621b672594b7e3e8d2f3c to your computer and use it in GitHub Desktop.
Sample markdown

Suppose you have $n$ random variables, $\mathbf{x}_1, \mathbf{x}_2, x_3, \ldots x_n $.

Now, between any two random variables $x_i, x_j$ the covariance is defined as:

$$ \sigma_{\mathbf{x}_i, \mathbf{x}_j} = E[(\mathbf{x}i - \mu{\mathbf{x}_i})(\mathbf{x}j - \mu{\mathbf{x}_j})] $$

(where $\mu_{\mathbf{x}_i}$ is the mean of $\mathbf{x}_i$).

Important assumption: For the sake of simplicity, assume that all data is centered, as @njoshi pointed out. In other words, $\mu_{\mathbf{x}_i} = 0 \forall i \in [1, n]$.

So the expression for covariance changes to the following:

$$ \sigma_{\mathbf{x}_i, \mathbf{x}_j} = E[\mathbf{x}_i\mathbf{x}_j ] $$

I.e. the covariance of two centered RVs is expectation of their elementwise product.

Key point: There is a crucial relationship between the expectation of the product of two RVs, and the inner product of the RVs. Notice that, the RHS in the above expression can be simplified as:

$$ E[\mathbf{x}i\mathbf{x}j] = E[x{i1} \times x{j1}, x_{i2} \times x_{j2}, x_{i3} \times x_{j3}, \ldots, x_{in} \times x_{jn}] $$ $$ \therefore E[\mathbf{x}i\mathbf{x}j] = \frac{\sum^{n}{k=1}{x{ik} \times x_{jk}}}{n} $$

Note that the numerator of the RHS is just the inner product of $\mathbf{x}_i$ and $\mathbf{x}_j$!

Thus, the inner product of $\mathbf{x}_i$ and $\mathbf{x}_j$, can be expressed as:

$$ \overline{\mathbf{x}_i\mathbf{x}j} = n\times\sigma{\mathbf{x}_i, \mathbf{x}_j} $$

(Hold that thought, we'll come back to it.)


Now, consider a matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$ such that each column of it is a random variable $\mathbf{x}_i$, where $i \in [1, n]$ (any tabular dataset is of this form - each feature can be interpreted as an RV).

$$ \mathbf{X} = \begin{bmatrix} \mathbf{x}{11} & \mathbf{x}{21} & \mathbf{x}{31} & \ldots & \mathbf{x}{n1} \ \mathbf{x}{12} & \mathbf{x}{22} & \mathbf{x}{32} & \ldots & \mathbf{x}{n2} \ \mathbf{x}{13} & \mathbf{x}{23} & \mathbf{x}{33} & \ldots & \mathbf{x}{n3} \ \vdots & \vdots & \vdots & \ldots \ \ \mathbf{x}{1m} & \mathbf{x}{2m} & \mathbf{x}{3m} & \ldots & \mathbf{x}{nm} \ \end{bmatrix} $$

(Note that each RV $\mathbf{x}i$ is expanded above as $[\mathbf{x}{i1} \mathbf{x}{i3} \mathbf{x}{i3} \ldots \mathbf{x}_{in}]$)

So now, if we evaluate $\mathbf{X}^T\mathbf{X}$, it comes out to:

$$ \mathbf{X}^T\mathbf{X} = \begin{bmatrix} \overline{\mathbf{x}_1\mathbf{x}_1} & \overline{\mathbf{x}_1\mathbf{x}_2} & \overline{\mathbf{x}_1\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_1\mathbf{x}_n} \

\overline{\mathbf{x}_2\mathbf{x}_1} & \overline{\mathbf{x}_2\mathbf{x}_2} & \overline{\mathbf{x}_2\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_2\mathbf{x}_n} \

\overline{\mathbf{x}_3\mathbf{x}_1} & \overline{\mathbf{x}_3\mathbf{x}_2} & \overline{\mathbf{x}_3\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_3\mathbf{x}_n} \

\vdots & \vdots & \vdots & \ldots & \vdots \

\overline{\mathbf{x}_n\mathbf{x}_1} & \overline{\mathbf{x}_n\mathbf{x}_2} & \overline{\mathbf{x}_n\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_n\mathbf{x}_n}

\end{bmatrix} $$

where $\overline{\mathbf{x}_i\mathbf{x}_j}$ represents the inner product of $\mathbf{x}_i$ and $\mathbf{x}_j$.

Now, from the result in the previous section, we know that the inner product of two RVs is $n$ times their covariance (if the data is centered / means are zero, of course). So, the matrix above can be written as:

$$ \mathbf{X}^T\mathbf{X} = \begin{bmatrix} n\sigma_{\mathbf{x}_1\mathbf{x}1} & n\sigma{\mathbf{x}_1\mathbf{x}2} & n\sigma{\mathbf{x}_1\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_1\mathbf{x}_n} \

n\sigma_{\mathbf{x}_2\mathbf{x}1} & n\sigma{\mathbf{x}_2\mathbf{x}2} & n\sigma{\mathbf{x}_2\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_2\mathbf{x}_n} \

n\sigma_{\mathbf{x}_3\mathbf{x}1} & n\sigma{\mathbf{x}_3\mathbf{x}2} & n\sigma{\mathbf{x}_3\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_3\mathbf{x}_n} \

\vdots & \vdots & \vdots & \ldots & \vdots \

n\sigma_{\mathbf{x}_n\mathbf{x}1} & n\sigma{\mathbf{x}_n\mathbf{x}2} & n\sigma{\mathbf{x}_n\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_n\mathbf{x}_n} \end{bmatrix} $$

Dividing both sides by $n$, we get: $$ \frac{1}{n}\mathbf{X}^T\mathbf{X} = \begin{bmatrix} \sigma_{\mathbf{x}_1\mathbf{x}1} & \sigma{\mathbf{x}_1\mathbf{x}2} & \sigma{\mathbf{x}_1\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_1\mathbf{x}_n} \

\sigma_{\mathbf{x}_2\mathbf{x}1} & \sigma{\mathbf{x}_2\mathbf{x}2} & \sigma{\mathbf{x}_2\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_2\mathbf{x}_n} \

\sigma_{\mathbf{x}_3\mathbf{x}1} & \sigma{\mathbf{x}_3\mathbf{x}2} & \sigma{\mathbf{x}_3\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_3\mathbf{x}_n} \

\vdots & \vdots & \vdots & \ldots & \vdots \

\sigma_{\mathbf{x}_n\mathbf{x}1} & \sigma{\mathbf{x}_n\mathbf{x}2} & \sigma{\mathbf{x}_n\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_n\mathbf{x}_n}

\end{bmatrix} $$

$QED.$


The covariance matrix is a compact, convenient way to represent covariances of pairs of vectors.

  1. Covariance is symmetrical, i.e. $\sigma(\mathbf{x}_i, \mathbf{x}_j) = \sigma(\mathbf{x}_j, \mathbf{x}_i)$. Therefore the matrix is also symmetrical.
  2. The diagonal represents variances, since the covariance of an RV with itself is just the variance of that RV.
  3. Among $n$ RVs, the number of ways in which they can be paired is $\binom{n}{2}$ ways - and that is exactly half the number of non--diagonal entries in a matrix of size $n \times n$. (the remaining half is the same, since the matrix is symmetrical).
  4. Being symmetric and positive semi-definite, a lot of convenient properties apply to the covariance matrix - and it is because of these that we can, among other things, perform PCA and SVD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment