Neural networks are capable of interpolating (fitting the training set perfectly) and driving test error lower.
Can the AdaBoost algorithm also do this?
Why or why not?
Framed in terms of capacity, for each additional boosting round, AdaBoost adds a weight parameter and whatever parameters are needed for the base classifier.
For example, for a stump base (weak) learner, each round of boosting adds 3 degrees of freedom (one for the index of the input to split on), one for the threshold, and one for the AdaBoost weight (alpha).
With this in mind, if capacity is the only (or primary) factor in moving into the interpolation regime, shouldn’t we expect boosting over stumps to succeed (is show a double descent curve)?