Forward Propagation

Neural NetworksAdvanced~7 min

Forward Propagation — Push inputs through the layers to compute a prediction.

Forward propagation is how a network makes a prediction: feed the inputs into the first layer, pass the outputs to the next, and repeat until the final layer produces an answer. Just matrix multiplies and activations, layer by layer.

Max activation per layer

0.80

0.98

0.96

0.88

0.40

Input space (drag the point)

Drag the input point (or use the x₁ / x₂ sliders) and watch the activations ripple forward.

Iteration 0 / 4

Input x₁0.8

Input x₂-0.5

Weight-init scale1.0×

Step the activation wavefront layer by layer. Poor weight initialization makes activations explode toward the saturation limits or vanish toward zero as they propagate forward.

Iteration 0 / 4

Input x₁0.8

Input x₂-0.5

Weight-init scale1.0×

Step the activation wavefront layer by layer. Poor weight initialization makes activations explode toward the saturation limits or vanish toward zero as they propagate forward.

The idea in plain words

Forward propagation is how a network makes a prediction: feed the inputs into the first layer, pass its outputs to the next, and repeat until the final layer produces an answer. Just matrix multiplies and activations, layer by layer.

Step the wavefront and watch each neuron light up with its value. Poor weight initialization makes those values explode toward the saturation limits or vanish toward zero as they propagate — which is why initialization schemes matter.

Now, the math

Each layer transforms the previous layer’s activations:

a^{(l)} = f\!\left(W^{(l)} a^{(l-1)} + b^{(l)}\right)

$W^{(l)}$: the weight matrix of layer l.
$a^{(l-1)}$: activations arriving from the previous layer.

▸ Show the derivation

If weights are too large, repeated multiplication amplifies the activations layer after layer until they saturate; too small, and they decay toward zero. Initialization schemes (Xavier, He) scale the weights by the layer width so activation variance stays roughly constant with depth.

Now Break It

Try this: Poor weight initialization makes activations explode or vanish as they propagate forward.

Control: Weight init scale slider (set very high or low)

← Back to all visualizations Continue on the Learning Path →

Last updated July 3, 2026.