AdaBoost

EnsemblesAdvanced~8 min

AdaBoost — Chain weak learners, each fixing the last one’s mistakes.

Boosting builds an ensemble sequentially. Each new weak learner focuses on the examples the previous ones got wrong, re-weighting the hard cases until the combined model is strong.

Class 0
Class 1
Point weight

Iteration 0 / 15

Add points as

Drag any point, or click empty space to drop a new one, and AdaBoost re-runs from scratch. Each round grows the weights on misclassified points (bigger circles), and the next stump focuses on them. With label noise, AdaBoost obsesses over impossible points and overfits.

Iteration 0 / 15

Add points as

The idea in plain words

Boosting builds an ensemble sequentially rather than in parallel. AdaBoost starts with equal weights on every point, fits a weak learner (a one-split stump), then increases the weight of the points it got wrong so the next stump focuses on them.

Watch the hard points swell round by round and the combined boundary bend to catch them. But with label noise, AdaBoost obsesses over impossible points — it can never classify them — and overfits.

Now, the math

Each stump’s vote weight α depends on its weighted error ε:

\alpha_t = \tfrac{1}{2}\ln\frac{1 - \varepsilon_t}{\varepsilon_t}

$\varepsilon_t$: the stump’s weighted error rate this round.
$\alpha_t$: its vote weight — larger when the stump is more accurate.

▸ Show the derivation

After each round, misclassified points have their weights scaled up by a factor of e^α and correct ones scaled down, then renormalized — so the next stump is trained on a distribution that emphasizes the current mistakes. The final classifier is the α-weighted vote of all stumps.

Now Break It

Try this: Too many rounds on noisy data starts fitting the noise — boosting can overfit outliers.

Control: Number of rounds slider (set high on noisy data)

← Back to all visualizations Continue on the Learning Path →

Last updated July 3, 2026.