Skip to content

Instantly share code, notes, and snippets.

@petermchale
Last active December 2, 2023 15:46
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save petermchale/a4fc2ca750048d21a0cbb8fafcc690af to your computer and use it in GitHub Desktop.
Save petermchale/a4fc2ca750048d21a0cbb8fafcc690af to your computer and use it in GitHub Desktop.
A derivation of the bias-variance decomposition of test error in machine learning.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sdangi
Copy link

sdangi commented Dec 14, 2017

Thanks for the helpful gist Peter! One potential correction -- instead of:
Adding and subtracting $$E_{\\cal D} \\left[ h^2 \\right]$$
I believe it should be:
Adding and subtracting $$E_{\\cal D} \\left[ h \\right]^2$$

@petermchale
Copy link
Author

@sdangi, I don't see why.

@bradyneal
Copy link

Nice gist! Regarding the term that should be added and subtracted, I think you accidentally typed up the square inside of the expectation rather than outside of the expectation. This is necessary for the "rearranging terms, we may write the right-hand side above as" math to carry through.

@petermchale
Copy link
Author

petermchale commented Apr 5, 2018

@bradyneal, I understand what @sdangi meant. My issue is that I don't see why the suggested change is correct. I could be wrong, though. Happy to look over the algebra if you or @sdangi are interested in providing it. Best, Peter

@bradyneal
Copy link

bias variance peter

@petermchale
Copy link
Author

@bradyneal @sdangi : you are correct! Thanks! I've updated the jupyter notebook accordingly.

@rafgonsi
Copy link

In section 'Reducible and irreducible error' why is E_e[2\epsilon (f - h)] equal to 0?

I agree that E_e[\epsilon f] = 0, but why E_e[\epsilon h] = 0? The hypothesis h is learned from some test set (X,Y) and Y depends on \epsilon and so the parameters of the learned model do. Thus one cannot write E_e[\epsilon h] = h E_e[\epsilon].

Could you explain this?

@petermchale
Copy link
Author

@rafgonsi : ... In performing the triple integral over $X$, $\cal{D}$ and $\epsilon$, I fix two variables ($X$ and $\cal{D}$) and vary the third ($\epsilon$). Since $X$ is fixed, so are $f$ and $h$, which may therefore be "pulled out" of the innermost integral over $\epsilon$.

@petermchale
Copy link
Author

An entirely analogous result to that outlined in this gist is obtained when one computes the error of an estimator of a parameter. Namely the mean square error of any estimator is equal to its variance plus (the square of) its bias. See section 7.7 at https://www.sciencedirect.com/science/article/pii/B9780123948113500071

@petermchale
Copy link
Author

petermchale commented Dec 2, 2023

In active machine learning, we assume that the learner is unbiased, and focus on algorithms that minimize the learner's variance, as shown in Cohn et al (1996): https://arxiv.org/abs/cs/9603104 (Eq. 4 is difficult to interpret precisely, though, in the absence of further reading).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment