Ridge Regression (L2)
Ridge Regression (L2) — Shrink coefficients toward zero to reduce variance.
Ridge regression adds a penalty for large coefficients. It gently shrinks every weight toward zero, trading a little bias for a big drop in variance — taming wild overfit models.
- x1
- x2
- x3
- x4
- x5
- x6
The idea in plain words
Ridge regression adds a price for large coefficients. Instead of only minimizing error, it minimizes error plus the summed squares of the weights, so the fit trades a little bias for a big drop in variance — taming the wild swings a high-degree fit is prone to.
Turn the penalty λ up and every coefficient shrinks smoothly toward zero, but none ever reaches it exactly. Geometrically, the round L2 constraint has no corners for a coefficient to snap to — the crucial difference from lasso.
Now, the math
Ridge minimizes squared error plus an L2 penalty on the weights:
This still has a closed form — just a nudged normal equation:
- the regularization strength — how hard large weights are penalized.
- the L2 penalty — the squared length of the weight vector.
- the ridge added to the diagonal, which also fixes ill-conditioning.
▸ Show the derivation
Adding λI to X⊤X shifts every eigenvalue up by λ, so directions of low data variance (which cause instability) are damped most. As λ → ∞ the solution collapses toward the all-zero vector and the model predicts the mean of y — the failure you can drive with the slider.
Now Break It
Try this: Enormous λ crushes every coefficient to near zero — the model becomes a flat line ignoring the data.
Control: Lambda slider (set to maximum)
Last updated .