AdaBoost
AdaBoost — Chain weak learners, each fixing the last one’s mistakes.
Boosting builds an ensemble sequentially. Each new weak learner focuses on the examples the previous ones got wrong, re-weighting the hard cases until the combined model is strong.
- Class 0
- Class 1
- Point weight
Drag any point, or click empty space to drop a new one, and AdaBoost re-runs from scratch. Each round grows the weights on misclassified points (bigger circles), and the next stump focuses on them. With label noise, AdaBoost obsesses over impossible points and overfits.
Drag any point, or click empty space to drop a new one, and AdaBoost re-runs from scratch. Each round grows the weights on misclassified points (bigger circles), and the next stump focuses on them. With label noise, AdaBoost obsesses over impossible points and overfits.
The idea in plain words
Boosting builds an ensemble sequentially rather than in parallel. AdaBoost starts with equal weights on every point, fits a weak learner (a one-split stump), then increases the weight of the points it got wrong so the next stump focuses on them.
Watch the hard points swell round by round and the combined boundary bend to catch them. But with label noise, AdaBoost obsesses over impossible points — it can never classify them — and overfits.
Now, the math
Each stump’s vote weight α depends on its weighted error ε:
- the stump’s weighted error rate this round.
- its vote weight — larger when the stump is more accurate.
▸ Show the derivation
After each round, misclassified points have their weights scaled up by a factor of e^α and correct ones scaled down, then renormalized — so the next stump is trained on a distribution that emphasizes the current mistakes. The final classifier is the α-weighted vote of all stumps.
Now Break It
Try this: Too many rounds on noisy data starts fitting the noise — boosting can overfit outliers.
Control: Number of rounds slider (set high on noisy data)
Last updated .