hturner/logLik.lm_computation.md

## logLik.lm_computation.md

      
    Raw
  

              logLik.lm_computation.md
            
          
    Log-likelihood Computation for (Weighted) Linear Regression in R (stats::logLik.lm)

The likelihood for a linear model is given by:
$$L = \prod_{i = 1}^N \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left( \frac{-(y_i - \mu)^2}{2\sigma^2} \right) $$
To derive the computation of the log-likelihood used in stats::logLik.lm we first need to incorporate weights.
From ?lm

weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances)

Let
$$w_i = \frac{\sigma^2}{\sigma_i^2} \Rightarrow \sigma_i^2 = \frac{\sigma^2}{w_i}$$
Then the likelihood for a weighted linear regression is
$$
\begin{aligned}
L &= \prod_{i = 1}^N \frac{\sqrt{w_i}}{\sqrt{2\pi\sigma^2}} \exp \left( \frac{-w_i(y_i - \mu)^2}{2\sigma^2} \right) \\
\Rightarrow \log(L) &= \sum_{i = 1}^N  \frac{1}{2}\log(w_i) - \frac{N}{2}\left(\log(2\pi) + \log(\sigma^2)\right) -  \frac{\sum\limits_{i=1}^{n} w_i(y_i - \mu)^2}{2\sigma^2}
\end{aligned}
$$
To compute the log-likelihood for a fitted model, we need to evaluate it at the maximum likelihood estimates for $\mu$ and $\sigma^2$:
$$
\begin{align}
\hat{\mu} &= \hat{\boldsymbol{\beta}}\boldsymbol{X}         &  \hat{\sigma^2} &= \frac{1}{N}\sum\limits_{i=1}^{n} w_i(y_i - \hat{\mu})^2
\end{align}
$$
Note that this is the maximium likelihood estimate for $\sigma^2$, not the unbiased estimator for the residual variance, which has $N - p$ in the denominator.
Given the residuals $r_i = y_i - \hat{\boldsymbol{\beta}}\boldsymbol{X}$ we have
$$
\begin{aligned}
\log(L)\rvert_{\hat{\mu}, \hat{\sigma}} & = \sum_{i = 1}^N  \frac{1}{2}\log(w_i) - \frac{N}{2}\left(\log(2\pi) + \log(\sum\limits_{i=1}^{n} w_ir_i^2) - \log(N)\right) -  \frac{N\hat{\sigma^2}}{2\hat{\sigma}^2} \\
& = \frac{1}{2} \left( \sum_{i = 1}^N \log(w_i) - N \left(\log(2\pi) + \log(\sum\limits_{i=1}^{n} w_ir_i^2) - \log(N) + 1 \right) \right)
\end{aligned}
$$
This matches the code to compute the log-likelihood in https://svn.r-project.org/R/trunk/src/library/stats/R/logLik.R
val <- .5* (sum(log(w)) - N * (log(2 * pi) + 1 - log(N) +
                                   log(sum(w*res^2))))