Random Forest

EnsemblesIntermediate~7 min

Random Forest — Bag decision trees with random feature subsets.

A random forest takes bagging one step further: each tree not only sees a different sample of the data but also a random subset of features at each split. This decorrelates the trees and makes the forest even more robust.

Class 0
Class 1

Number of trees20

Features per split

Dataset

Add points as

Drag from 1 to 60 trees and watch the boundary smooth from noisy to confident. With every feature considered at each split, the trees become near-identical and the ensemble stops helping. Drag any point, or click empty space to drop a new one, and the forest retrains live.

Number of trees20

Features per split

Dataset

Add points as

The idea in plain words

A random forest takes bagging one step further: each tree not only sees a different resample of the data but also considers only a random subset of features at each split. This decorrelates the trees, so averaging them helps far more.

Out-of-bag error — scoring each point using only the trees that didn’t train on it — gives a free validation estimate. If you let every split see all features, the trees become near-identical and the ensemble stops improving.

Now, the math

Averaging correlated trees only reduces variance so far:

\text{Var} = \rho\,\sigma^2 + \frac{1-\rho}{B}\sigma^2

$\rho$: the correlation between trees — feature subsampling lowers it.
$\sigma^2$: the variance of a single tree.

▸ Show the derivation

As B → ∞ the second term vanishes but the first, ρσ², remains — so the only way to keep reducing variance is to lower ρ. Restricting each split to a random feature subset does exactly that, at the cost of a little more bias per tree. Using all features sends ρ → 1 and erases the benefit.

Now Break It

Try this: Using all features per split makes every tree nearly identical, defeating the ensemble.

Control: Features-per-split slider (set to all)

← Back to all visualizations Continue on the Learning Path →

Last updated July 3, 2026.