Skip to content
ML Visualization

Lasso Regression (L1)

RegressionIntermediate~7 min

Lasso Regression (L1)Drive some coefficients exactly to zero for feature selection.

Lasso uses a different penalty than ridge — one that pushes weak coefficients all the way to exactly zero. That means it doesn’t just shrink features, it deletes them, doing automatic feature selection.

Coefficient path — drag to set λ
  • x1
  • x2
  • x3
  • x4
  • x5
  • x6
L1 constraint (diamond)
Penalty
0.150
Non-zero features4 / 6

Diamond corners lie on the axes — the loss contour meets a corner and a coefficient snaps to exactly zero.

The idea in plain words

Lasso penalizes the absolute value of the weights instead of their squares. That one change lets it drive weak coefficients all the way to exactly zero — so it doesn’t just shrink features like ridge, it deletes them, performing automatic feature selection.

The reason is geometric: the L1 constraint region is a diamond with sharp corners on the axes. The elliptical loss contours almost always first touch that region at a corner, where one coordinate is zero. Toggle between ridge and lasso to see the circle-vs-diamond difference directly.

Now, the math

Lasso swaps the squared penalty for an absolute-value (L1) penalty:

J=1ni(yiy^i)2+λjwjJ = \tfrac{1}{n}\sum_i (y_i - \hat{y}_i)^2 + \lambda \sum_j |w_j|

There’s no closed form; coordinate descent applies a soft-threshold to each weight:

wj=sign(zj)max(zjλ, 0)w_j = \operatorname{sign}(z_j)\,\max(|z_j| - \lambda,\ 0)
jwj\sum_j |w_j|
the L1 penalty — the diamond-shaped constraint that yields sparsity.
zjz_j
the least-squares update for weight j before penalizing.
λ\lambda
the threshold — any weight with |z| below it is set to exactly zero.
Show the derivation

Soft-thresholding is the exact solution of the one-coordinate lasso problem: whenever a weight’s correlation with the residual falls below λ, the subgradient of |w| pins it to zero. Cycling this over all coordinates converges to the global optimum. With correlated features, though, lasso keeps one and zeroes the others somewhat arbitrarily — which motivates elastic net.

Now Break It

Try this: With correlated features, lasso arbitrarily picks one and zeroes the rest, which can be unstable.

Control: Lambda slider with correlated features enabled

Last updated .