K-Nearest Neighbors
K-Nearest Neighbors — K-nearest neighbors is a supervised learning algorithm that classifies a point by a majority vote of its k closest labeled examples under a distance metric. It does no training — it simply stores the data and measures distance at prediction time.
Want to classify something? Just look at the closest examples you’ve already seen and go with the majority. That’s KNN — no training needed, just memory and a sense of distance.
- Class A
- Class B
- Query point
Drag the ringed query point around the space. Push k to 1 (memorizes noise) or to the maximum (always the majority class).
Drag the ringed query point around the space. Push k to 1 (memorizes noise) or to the maximum (always the majority class).
The idea in plain words
KNN doesn’t train — it memorizes the data. To classify a new point, it looks at the k closest labeled examples and takes a majority vote. Drag the query point around and watch its predicted class flip as its neighborhood changes.
With k = 1 the boundary bends around every noisy point (overfitting); with k as large as the dataset it always returns the global majority (underfitting). It’s a useful contrast to a fitted model like linear regression.
Now, the math
Neighbors are ranked by Euclidean distance:
- two points being compared.
- the j-th feature (coordinate) of point p.
- how many nearest neighbors vote.
▸ Show the derivation
k controls the bias–variance balance: small k gives a flexible, high-variance boundary that chases noise; large k averages over a wide neighborhood, raising bias until the model ignores local structure entirely.
Now Break It
Try this: k=1 memorizes every noisy point; k=N always predicts the majority class regardless of position.
Control: k slider (set to 1, then to maximum)
Last updated .