Skip to content
ML Visualization

Forward Propagation

Neural NetworksAdvanced~7 min

Forward PropagationPush inputs through the layers to compute a prediction.

Forward propagation is how a network makes a prediction: feed the inputs into the first layer, pass the outputs to the next, and repeat until the final layer produces an answer. Just matrix multiplies and activations, layer by layer.

Max activation per layer
0.80
0.98
0.96
0.88
0.40
Input space (drag the point)

Drag the input point (or use the x₁ / x₂ sliders) and watch the activations ripple forward.

Iteration 0 / 4
0.8
-0.5
1.0×

Step the activation wavefront layer by layer. Poor weight initialization makes activations explode toward the saturation limits or vanish toward zero as they propagate forward.

The idea in plain words

Forward propagation is how a network makes a prediction: feed the inputs into the first layer, pass its outputs to the next, and repeat until the final layer produces an answer. Just matrix multiplies and activations, layer by layer.

Step the wavefront and watch each neuron light up with its value. Poor weight initialization makes those values explode toward the saturation limits or vanish toward zero as they propagate — which is why initialization schemes matter.

Now, the math

Each layer transforms the previous layer’s activations:

a(l)=f ⁣(W(l)a(l1)+b(l))a^{(l)} = f\!\left(W^{(l)} a^{(l-1)} + b^{(l)}\right)
W(l)W^{(l)}
the weight matrix of layer l.
a(l1)a^{(l-1)}
activations arriving from the previous layer.
Show the derivation

If weights are too large, repeated multiplication amplifies the activations layer after layer until they saturate; too small, and they decay toward zero. Initialization schemes (Xavier, He) scale the weights by the layer width so activation variance stays roughly constant with depth.

Now Break It

Try this: Poor weight initialization makes activations explode or vanish as they propagate forward.

Control: Weight init scale slider (set very high or low)

Last updated .