Neural networks are capable of interpolating (fitting the training set perfectly) and driving test error lower.
Can the AdaBoost algorithm also do this?
Why or why not?
Neural networks are capable of interpolating (fitting the training set perfectly) and driving test error lower.
Can the AdaBoost algorithm also do this?
Why or why not?
See: Boosting by Schapire and Freund (2012) notice this phenomenon. Compare AdaBoost over stumps versus AdaBoost over C4.5 trees.
Page 16: “In chapter 5, we present a theoretical explanation of how, why, and when AdaBoost works and in particular why it often does not overfit.”
Framed in terms of capacity, for each additional boosting round, AdaBoost adds a weight parameter and whatever parameters are needed for the base classifier.
For example, for a stump base (weak) learner, each round of boosting adds 3 degrees of freedom (one for the index of the input to split on), one for the threshold, and one for the AdaBoost weight (alpha).
With this in mind, if capacity is the only (or primary) factor in moving into the interpolation regime, shouldn’t we expect boosting over stumps to succeed (is show a double descent curve)?