Gradient Descent
Gradient Descent — Gradient descent is an iterative optimization algorithm that minimizes a loss function by repeatedly stepping in the direction of its negative gradient. The learning rate controls the step size and determines whether it converges or diverges.
Imagine you’re blindfolded on a hilly landscape and you want to find the lowest valley. Gradient descent is the strategy: feel which way is downhill, take a step that direction, repeat.
- Start
- Descent path
- Minimum
Drag the surface to orbit; drag on the contour map to set the start point.
Drag the surface to orbit; drag on the contour map to set the start point.
The idea in plain words
Gradient descent finds the bottom of a valley by feeling which way is downhill and taking a step that direction, over and over. The learning rate is the step size. Nudge it up and the path descends faster; push it to the top and each step overshoots, bouncing to ever-larger loss until it flies off to infinity.
The valley is defined by a loss function, and this is exactly how models like linear regression are fit when there’s no shortcut.
Now, the math
Each parameter θ updates by stepping against the gradient:
- a model parameter being tuned.
- the learning rate — the step size.
- the gradient: the uphill direction of the loss.
▸ Show the derivation
On an elongated bowl, the steepest direction has the largest curvature. Convergence there requires the learning rate to stay below roughly twice the inverse of that curvature; above it, each step more than undoes the last and the loss diverges — which is exactly what the slider lets you trigger.
Now Break It
Try this: Crank the learning rate high — the path oscillates wildly and diverges to infinity.
Control: Learning rate slider (set to maximum)
Last updated .