Skip to content
ML Visualization

Naive Bayes

ClassificationIntermediate~7 min

Naive BayesClassify using Bayes’ rule and a strong independence assumption.

Naive Bayes flips the question around with Bayes’ rule: instead of asking “what class is this?” it asks “which class most likely produced these features?” The “naive” part assumes every feature is independent.

  • Class 0
  • Class 1
  • NB Gaussian (2σ)
0.10
Add points as
Misclassified10%

The ellipses stay axis-aligned no matter how the data tilts — that is the “naive” independence assumption. Raise ρ and watch the error climb. Drag any point, or click empty space to drop a new one, and the Gaussians refit live.

The idea in plain words

Naive Bayes flips classification around with Bayes’ rule: instead of “what class is this point?” it asks “which class most likely produced these features?” The “naive” part assumes the features are independent, so each class becomes an axis-aligned Gaussian blob.

That assumption is a shortcut. Correlate the features and the true clouds tilt, but Naive Bayes stubbornly keeps its ellipses square to the axes — and starts misclassifying exactly where the tilt matters most.

Now, the math

Bayes’ rule with the independence assumption factorizes the likelihood:

P(yx)P(y)iP(xiy)P(y \mid x) \propto P(y) \prod_i P(x_i \mid y)
P(y)P(y)
the class prior — how common the class is.
P(xiy)P(x_i\mid y)
each feature’s likelihood, modeled as a 1-D Gaussian.
i\prod_i
the naive step: multiply as if features were independent.
Show the derivation

Multiplying per-feature Gaussians is equivalent to a single Gaussian with a diagonal covariance — an ellipse whose axes are parallel to the coordinate axes. Real correlated data has off-diagonal covariance (a tilted ellipse), which the model cannot represent, so its posterior is skewed.

Now Break It

Try this: Strongly correlated features break the independence assumption and skew the probabilities.

Control: Correlation slider between features

Last updated .