Skip to content
ML Visualization

Gradient Descent

FoundationsIntermediate~8 min

Gradient DescentGradient descent is an iterative optimization algorithm that minimizes a loss function by repeatedly stepping in the direction of its negative gradient. The learning rate controls the step size and determines whether it converges or diverges.

Imagine you’re blindfolded on a hilly landscape and you want to find the lowest valley. Gradient descent is the strategy: feel which way is downhill, take a step that direction, repeat.

  • Start
  • Descent path
  • Minimum
Loss surface
0.10
Iteration 0 / 67
Loss vs iteration
6.260Max loss on axis: 6.260

Drag the surface to orbit; drag on the contour map to set the start point.

The idea in plain words

Gradient descent finds the bottom of a valley by feeling which way is downhill and taking a step that direction, over and over. The learning rate is the step size. Nudge it up and the path descends faster; push it to the top and each step overshoots, bouncing to ever-larger loss until it flies off to infinity.

The valley is defined by a loss function, and this is exactly how models like linear regression are fit when there’s no shortcut.

Now, the math

Each parameter θ updates by stepping against the gradient:

θθηL(θ)\theta \leftarrow \theta - \eta\,\nabla L(\theta)
θ\theta
a model parameter being tuned.
η\eta
the learning rate — the step size.
L(θ)\nabla L(\theta)
the gradient: the uphill direction of the loss.
Show the derivation

On an elongated bowl, the steepest direction has the largest curvature. Convergence there requires the learning rate to stay below roughly twice the inverse of that curvature; above it, each step more than undoes the last and the loss diverges — which is exactly what the slider lets you trigger.

Now Break It

Try this: Crank the learning rate high — the path oscillates wildly and diverges to infinity.

Control: Learning rate slider (set to maximum)

Last updated .