Naive Bayes
Naive Bayes — Classify using Bayes’ rule and a strong independence assumption.
Naive Bayes flips the question around with Bayes’ rule: instead of asking “what class is this?” it asks “which class most likely produced these features?” The “naive” part assumes every feature is independent.
- Class 0
- Class 1
- NB Gaussian (2σ)
The ellipses stay axis-aligned no matter how the data tilts — that is the “naive” independence assumption. Raise ρ and watch the error climb. Drag any point, or click empty space to drop a new one, and the Gaussians refit live.
The ellipses stay axis-aligned no matter how the data tilts — that is the “naive” independence assumption. Raise ρ and watch the error climb. Drag any point, or click empty space to drop a new one, and the Gaussians refit live.
The idea in plain words
Naive Bayes flips classification around with Bayes’ rule: instead of “what class is this point?” it asks “which class most likely produced these features?” The “naive” part assumes the features are independent, so each class becomes an axis-aligned Gaussian blob.
That assumption is a shortcut. Correlate the features and the true clouds tilt, but Naive Bayes stubbornly keeps its ellipses square to the axes — and starts misclassifying exactly where the tilt matters most.
Now, the math
Bayes’ rule with the independence assumption factorizes the likelihood:
- the class prior — how common the class is.
- each feature’s likelihood, modeled as a 1-D Gaussian.
- the naive step: multiply as if features were independent.
▸ Show the derivation
Multiplying per-feature Gaussians is equivalent to a single Gaussian with a diagonal covariance — an ellipse whose axes are parallel to the coordinate axes. Real correlated data has off-diagonal covariance (a tilted ellipse), which the model cannot represent, so its posterior is skewed.
Now Break It
Try this: Strongly correlated features break the independence assumption and skew the probabilities.
Control: Correlation slider between features
Last updated .