xpe/question.md

Created October 27, 2019 19:54

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/xpe/72b33a50b010312def3128e8e36bed5d.js"></script>
Save xpe/72b33a50b010312def3128e8e36bed5d to your computer and use it in GitHub Desktop.

Download ZIP

Machine Leaning Theory: Double Descent Curve

Raw

question.md

Neural networks are capable of interpolating (fitting the training set perfectly) and driving test error lower.

Can the AdaBoost algorithm also do this?

Why or why not?

Author

xpe commented Oct 27, 2019

Framed in terms of capacity, for each additional boosting round, AdaBoost adds a weight parameter and whatever parameters are needed for the base classifier.

For example, for a stump base (weak) learner, each round of boosting adds 3 degrees of freedom (one for the index of the input to split on), one for the threshold, and one for the AdaBoost weight (alpha).

With this in mind, if capacity is the only (or primary) factor in moving into the interpolation regime, shouldn’t we expect boosting over stumps to succeed (is show a double descent curve)?

Author

xpe commented Oct 27, 2019

See: Boosting by Schapire and Freund (2012) notice this phenomenon. Compare AdaBoost over stumps versus AdaBoost over C4.5 trees.

Author

xpe commented Oct 27, 2019

Page 16: “In chapter 5, we present a theoretical explanation of how, why, and when AdaBoost works and in particular why it often does not overfit.”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment