Skip to content

Instantly share code, notes, and snippets.

@hturner
Last active June 28, 2023 08:13
Show Gist options
  • Save hturner/2c4699d8a43dd251a3a29d3ffdd0c4b2 to your computer and use it in GitHub Desktop.
Save hturner/2c4699d8a43dd251a3a29d3ffdd0c4b2 to your computer and use it in GitHub Desktop.

Log-likelihood Computation for (Weighted) Linear Regression in R (stats::logLik.lm)

The likelihood for a linear model is given by:

$$L = \prod_{i = 1}^N \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left( \frac{-(y_i - \mu)^2}{2\sigma^2} \right) $$

To derive the computation of the log-likelihood used in stats::logLik.lm we first need to incorporate weights.

From ?lm

weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances)

Let

$$w_i = \frac{\sigma^2}{\sigma_i^2} \Rightarrow \sigma_i^2 = \frac{\sigma^2}{w_i}$$

Then the likelihood for a weighted linear regression is

$$ \begin{aligned} L &= \prod_{i = 1}^N \frac{\sqrt{w_i}}{\sqrt{2\pi\sigma^2}} \exp \left( \frac{-w_i(y_i - \mu)^2}{2\sigma^2} \right) \\ \Rightarrow \log(L) &= \sum_{i = 1}^N \frac{1}{2}\log(w_i) - \frac{N}{2}\left(\log(2\pi) + \log(\sigma^2)\right) - \frac{\sum\limits_{i=1}^{n} w_i(y_i - \mu)^2}{2\sigma^2} \end{aligned} $$

To compute the log-likelihood for a fitted model, we need to evaluate it at the maximum likelihood estimates for $\mu$ and $\sigma^2$:

$$ \begin{align} \hat{\mu} &= \hat{\boldsymbol{\beta}}\boldsymbol{X} & \hat{\sigma^2} &= \frac{1}{N}\sum\limits_{i=1}^{n} w_i(y_i - \hat{\mu})^2 \end{align} $$

Note that this is the maximium likelihood estimate for $\sigma^2$, not the unbiased estimator for the residual variance, which has $N - p$ in the denominator.

Given the residuals $r_i = y_i - \hat{\boldsymbol{\beta}}\boldsymbol{X}$ we have

$$ \begin{aligned} \log(L)\rvert_{\hat{\mu}, \hat{\sigma}} & = \sum_{i = 1}^N \frac{1}{2}\log(w_i) - \frac{N}{2}\left(\log(2\pi) + \log(\sum\limits_{i=1}^{n} w_ir_i^2) - \log(N)\right) - \frac{N\hat{\sigma^2}}{2\hat{\sigma}^2} \\ & = \frac{1}{2} \left( \sum_{i = 1}^N \log(w_i) - N \left(\log(2\pi) + \log(\sum\limits_{i=1}^{n} w_ir_i^2) - \log(N) + 1 \right) \right) \end{aligned} $$

This matches the code to compute the log-likelihood in https://svn.r-project.org/R/trunk/src/library/stats/R/logLik.R

val <- .5* (sum(log(w)) - N * (log(2 * pi) + 1 - log(N) +
                                   log(sum(w*res^2))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment