Elastic Net
Elastic Net — Blend L1 and L2 penalties to get the best of both.
Elastic net mixes ridge and lasso in one penalty. You get lasso’s feature selection plus ridge’s stability with correlated features — controlled by a single mixing dial.
- x1
- x2
- x3
- x4
- x5
- x6
Slide ρ from 0 to 1 and watch the constraint morph from a circle (ridge) to a diamond (lasso).
Slide ρ from 0 to 1 and watch the constraint morph from a circle (ridge) to a diamond (lasso).
The idea in plain words
Elastic net simply adds both penalties at once: a slice of lasso’s L1 for sparsity and a slice of ridge’s L2 for stability. One mixing dial ρ slides between them, and you can watch the constraint region morph from a circle to a diamond as it moves.
The payoff shows up with correlated features. Pure lasso arbitrarily keeps one and drops the rest; the L2 part encourages correlated features to share the weight, so selection stays stable — the best of both penalties from a single knob.
Now, the math
Elastic net is a convex blend of the L1 and L2 penalties:
- the L1 ratio — 1 is pure lasso, 0 is pure ridge, between is a blend.
- the overall strength of the combined penalty.
- the share given to the stabilizing L2 term.
▸ Show the derivation
The constraint region ρ‖w‖₁ + (1−ρ)‖w‖² = c interpolates between the diamond (ρ = 1) and the circle (ρ = 0): its corners stay sharp enough to zero out irrelevant features while its sides round out enough to spread weight across correlated ones. Coordinate descent solves it with the same soft-threshold as lasso, divided by an extra 1 + λ(1−ρ) ridge shrinkage factor.
Now Break It
Try this: Wrong mixing ratio for the data either over-sparsifies or fails to select at all.
Control: Mixing ratio slider
Last updated .