Skip to content

Instantly share code, notes, and snippets.

@petermchale
Last active December 2, 2023 15:46
Show Gist options
  • Save petermchale/a4fc2ca750048d21a0cbb8fafcc690af to your computer and use it in GitHub Desktop.
Save petermchale/a4fc2ca750048d21a0cbb8fafcc690af to your computer and use it in GitHub Desktop.
A derivation of the bias-variance decomposition of test error in machine learning.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@petermchale
Copy link
Author

petermchale commented Apr 5, 2018

@bradyneal, I understand what @sdangi meant. My issue is that I don't see why the suggested change is correct. I could be wrong, though. Happy to look over the algebra if you or @sdangi are interested in providing it. Best, Peter

@bradyneal
Copy link

bias variance peter

@petermchale
Copy link
Author

@bradyneal @sdangi : you are correct! Thanks! I've updated the jupyter notebook accordingly.

@rafgonsi
Copy link

In section 'Reducible and irreducible error' why is E_e[2\epsilon (f - h)] equal to 0?

I agree that E_e[\epsilon f] = 0, but why E_e[\epsilon h] = 0? The hypothesis h is learned from some test set (X,Y) and Y depends on \epsilon and so the parameters of the learned model do. Thus one cannot write E_e[\epsilon h] = h E_e[\epsilon].

Could you explain this?

@petermchale
Copy link
Author

@rafgonsi : ... In performing the triple integral over $X$, $\cal{D}$ and $\epsilon$, I fix two variables ($X$ and $\cal{D}$) and vary the third ($\epsilon$). Since $X$ is fixed, so are $f$ and $h$, which may therefore be "pulled out" of the innermost integral over $\epsilon$.

@petermchale
Copy link
Author

An entirely analogous result to that outlined in this gist is obtained when one computes the error of an estimator of a parameter. Namely the mean square error of any estimator is equal to its variance plus (the square of) its bias. See section 7.7 at https://www.sciencedirect.com/science/article/pii/B9780123948113500071

@petermchale
Copy link
Author

petermchale commented Dec 2, 2023

In active machine learning, we assume that the learner is unbiased, and focus on algorithms that minimize the learner's variance, as shown in Cohn et al (1996): https://arxiv.org/abs/cs/9603104 (Eq. 4 is difficult to interpret precisely, though, in the absence of further reading).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment