Gradient Boosting
Gradient Boosting — Fit each new tree to the residual errors of the last.
Gradient boosting builds an ensemble by having each new tree predict the residual errors left over by the current ensemble. Add them up with a small learning rate and the errors shrink round after round.
- Data
- Ensemble fit
- Residuals
Drag any point, or click empty space to add one, and the model re-fits. Each round fits a small tree to the leftover residuals (red arrows) and adds a shrunken step. The fit tightens every round. A high learning rate with many rounds drives train error to zero while test error climbs.
Drag any point, or click empty space to add one, and the model re-fits. Each round fits a small tree to the leftover residuals (red arrows) and adds a shrunken step. The fit tightens every round. A high learning rate with many rounds drives train error to zero while test error climbs.
The idea in plain words
Gradient boosting builds its ensemble by having each new tree predict the residual errors left over by the current model. Add that tree with a small learning rate, and the leftover error shrinks. Repeat, and the fit tightens round after round.
The red arrows are the residuals each new stump chases. Unlike AdaBoost’s reweighting, this is literally gradient descent in function space. Too high a learning rate with too many rounds overfits — test error starts to climb.
Now, the math
Each stage adds a shrunken tree fit to the current residuals:
- the m-th tree, fit to the residuals of the current ensemble.
- the learning rate (shrinkage) — small steps generalize better.
▸ Show the derivation
For squared-error loss the negative gradient at each point is exactly the residual y − F(x), so fitting a tree to the residuals is a gradient-descent step in function space. Shrinkage (small η) trades more rounds for better generalization; large η with many rounds memorizes the training noise.
Now Break It
Try this: A high learning rate with many rounds overshoots and overfits the training residuals.
Control: Learning rate slider (set high)
Last updated .