Skip to content

Instantly share code, notes, and snippets.

@xpe
Created October 27, 2019 19:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xpe/72b33a50b010312def3128e8e36bed5d to your computer and use it in GitHub Desktop.
Save xpe/72b33a50b010312def3128e8e36bed5d to your computer and use it in GitHub Desktop.
Machine Leaning Theory: Double Descent Curve

Neural networks are capable of interpolating (fitting the training set perfectly) and driving test error lower.

Can the AdaBoost algorithm also do this?

Why or why not?

@xpe
Copy link
Author

xpe commented Oct 27, 2019

Framed in terms of capacity, for each additional boosting round, AdaBoost adds a weight parameter and whatever parameters are needed for the base classifier.

For example, for a stump base (weak) learner, each round of boosting adds 3 degrees of freedom (one for the index of the input to split on), one for the threshold, and one for the AdaBoost weight (alpha).

With this in mind, if capacity is the only (or primary) factor in moving into the interpolation regime, shouldn’t we expect boosting over stumps to succeed (is show a double descent curve)?

@xpe
Copy link
Author

xpe commented Oct 27, 2019

See: Boosting by Schapire and Freund (2012) notice this phenomenon. Compare AdaBoost over stumps versus AdaBoost over C4.5 trees.

@xpe
Copy link
Author

xpe commented Oct 27, 2019

Page 16: “In chapter 5, we present a theoretical explanation of how, why, and when AdaBoost works and in particular why it often does not overfit.”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment