Bagging
Bagging — Average many models trained on bootstrap samples.
One deep decision tree overfits. But train many trees, each on a different random resample of the data, and average their votes — the noise cancels out. That’s bagging: bootstrap aggregating.
No ensemble benefit! With a single estimator you get all the variance of one overfit tree.
- Class 0
- Class 1
Each tree overfits its own bootstrap resample of the data (jagged boundary). Averaging dozens of them cancels the noise into one smooth, confident boundary — variance reduction you can watch. Drag any point, or click empty space to drop a new one, and the ensemble retrains live.
Each tree overfits its own bootstrap resample of the data (jagged boundary). Averaging dozens of them cancels the noise into one smooth, confident boundary — variance reduction you can watch. Drag any point, or click empty space to drop a new one, and the ensemble retrains live.
The idea in plain words
One deep decision tree overfits — its boundary is jagged and unstable. Bagging (bootstrap aggregating) trains many trees, each on a different random resample of the data drawn with replacement, then averages their votes. The noise cancels out.
Drag the number of trees up and the averaged boundary resolves from noisy to smooth. It’s pure variance reduction: averaging many high-variance, low-bias models keeps the low bias while shrinking the variance.
Now, the math
The ensemble prediction averages B trees, each fit on a bootstrap sample:
- the number of trees (bootstrap replicates).
- the b-th tree, fit on a resample drawn with replacement.
▸ Show the derivation
If each tree has variance σ² and the trees were independent, averaging B of them would cut the variance to σ²/B. Real trees are correlated (they share the same data distribution), so the gain is smaller — which is exactly what random forests improve by decorrelating the trees.
Now Break It
Try this: With too few trees the ensemble is still noisy; bagging identical models adds nothing.
Control: Number of estimators slider (set to 1)
Last updated .