Skip to content
ML Visualization

Elastic Net

RegressionAdvanced~6 min

Elastic NetBlend L1 and L2 penalties to get the best of both.

Elastic net mixes ridge and lasso in one penalty. You get lasso’s feature selection plus ridge’s stability with correlated features — controlled by a single mixing dial.

Coefficient path — drag to set λ
  • x1
  • x2
  • x3
  • x4
  • x5
  • x6
Constraint (rounded diamond)
0.50
0.200
Non-zero features5 / 6

Slide ρ from 0 to 1 and watch the constraint morph from a circle (ridge) to a diamond (lasso).

The idea in plain words

Elastic net simply adds both penalties at once: a slice of lasso’s L1 for sparsity and a slice of ridge’s L2 for stability. One mixing dial ρ slides between them, and you can watch the constraint region morph from a circle to a diamond as it moves.

The payoff shows up with correlated features. Pure lasso arbitrarily keeps one and drops the rest; the L2 part encourages correlated features to share the weight, so selection stays stable — the best of both penalties from a single knob.

Now, the math

Elastic net is a convex blend of the L1 and L2 penalties:

J=1ni(yiy^i)2+λ(ρjwj+(1ρ)jwj2)J = \tfrac{1}{n}\sum_i (y_i - \hat{y}_i)^2 + \lambda\Bigl(\rho \sum_j |w_j| + (1-\rho)\sum_j w_j^2\Bigr)
ρ\rho
the L1 ratio — 1 is pure lasso, 0 is pure ridge, between is a blend.
λ\lambda
the overall strength of the combined penalty.
1ρ1-\rho
the share given to the stabilizing L2 term.
Show the derivation

The constraint region ρ‖w‖₁ + (1−ρ)‖w‖² = c interpolates between the diamond (ρ = 1) and the circle (ρ = 0): its corners stay sharp enough to zero out irrelevant features while its sides round out enough to spread weight across correlated ones. Coordinate descent solves it with the same soft-threshold as lasso, divided by an extra 1 + λ(1−ρ) ridge shrinkage factor.

Now Break It

Try this: Wrong mixing ratio for the data either over-sparsifies or fails to select at all.

Control: Mixing ratio slider

Last updated .