Multilayer Perceptron

Neural NetworksAdvanced~9 min

Multilayer Perceptron — Stack neurons into layers to learn nonlinear boundaries.

Stack neurons into layers and a network can carve any boundary at all. The multilayer perceptron is the workhorse feedforward network — the thing “deep learning” scaled up.

Click a first-layer neuron to highlight the line it learned

Loss vs iteration

Iteration 0 / 30

Hidden neurons per layer6

Activation

Dataset

Add points as

Each hidden neuron learns its own line; together they combine into a curved boundary. Too few neurons can’t bend enough to solve the spiral or circle. Drag any point, or click empty space to drop a new one, and the network retrains from scratch.

Loss vs iteration

Iteration 0 / 30

Hidden neurons per layer6

Activation

Dataset

Add points as

The idea in plain words

Stack neurons into layers and a network can carve any boundary at all. Each hidden neuron learns its own line; together they combine into a complex curved boundary that a single perceptron never could — solving circles, XOR, even spirals.

Watch the boundary reshape as the network trains, and click a first-layer neuron to see the line it learned. Too few neurons can’t bend enough to separate a hard dataset; the loss curve stalls high.

Now, the math

An MLP composes layers of nonlinear transformations:

\hat{y} = f\!\left(W^{(2)} f\!\left(W^{(1)} x + b^{(1)}\right) + b^{(2)}\right)

$f$: the nonlinear activation — what makes stacking meaningful.
$W^{(1)}, W^{(2)}$: the weight matrices of the hidden and output layers.

▸ Show the derivation

The universal approximation theorem says a single hidden layer with enough neurons can approximate any continuous function — but “enough” can be huge. Extra depth lets the network build features hierarchically, solving hard shapes like the spiral with far fewer neurons per layer. It’s trained by backpropagation.

Now Break It

Try this: Too few hidden units can’t bend enough to separate a spiral; too many overfit the noise.

Control: Hidden units slider (set to 1)

← Back to all visualizations Continue on the Learning Path →

Last updated July 3, 2026.