Skip to content

Instantly share code, notes, and snippets.

@xpe xpe/question.md
Created Oct 27, 2019

Embed
What would you like to do?
Machine Leaning Theory: Double Descent Curve

Neural networks are capable of interpolating (fitting the training set perfectly) and driving test error lower.

Can the AdaBoost algorithm also do this?

Why or why not?

@xpe

This comment has been minimized.

Copy link
Owner Author

xpe commented Oct 27, 2019

Framed in terms of capacity, for each additional boosting round, AdaBoost adds a weight parameter and whatever parameters are needed for the base classifier.

For example, for a stump base (weak) learner, each round of boosting adds 3 degrees of freedom (one for the index of the input to split on), one for the threshold, and one for the AdaBoost weight (alpha).

With this in mind, if capacity is the only (or primary) factor in moving into the interpolation regime, shouldn’t we expect boosting over stumps to succeed (is show a double descent curve)?

@xpe

This comment has been minimized.

Copy link
Owner Author

xpe commented Oct 27, 2019

See: Boosting by Schapire and Freund (2012) notice this phenomenon. Compare AdaBoost over stumps versus AdaBoost over C4.5 trees.

@xpe

This comment has been minimized.

Copy link
Owner Author

xpe commented Oct 27, 2019

Page 16: “In chapter 5, we present a theoretical explanation of how, why, and when AdaBoost works and in particular why it often does not overfit.”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.