t-SNE
t-SNE — Embed high-dimensional data in 2D preserving local neighborhoods.
t-SNE squeezes high-dimensional data down to 2D for visualization by keeping nearby points nearby. It reveals clusters beautifully — but its knobs are easy to misread.
- Cluster 0
- Cluster 1
- Cluster 2
Watch separated clusters emerge from a random blob. Low perplexity shatters real clusters into fake sub-blobs — and note that inter-cluster distances in t-SNE aren’t meaningful.
Watch separated clusters emerge from a random blob. Low perplexity shatters real clusters into fake sub-blobs — and note that inter-cluster distances in t-SNE aren’t meaningful.
The idea in plain words
t-SNE squeezes high-dimensional data down to 2-D for visualization by keeping nearby points nearby. Watch separated clusters emerge from a random blob, frame by frame — the archetypal “impressive” ML animation, computed live.
It reveals clusters beautifully but its knobs mislead: the perplexity sets the effective neighborhood size, and the distances and sizes between clusters are not meaningful. Unlike PCA, it’s nonlinear and non-deterministic.
Now, the math
t-SNE minimizes the KL divergence between neighbor distributions P (high-D) and Q (2-D):
- high-dimensional neighbor probability (Gaussian, set by perplexity).
- low-dimensional neighbor probability (heavy-tailed Student-t).
▸ Show the derivation
The heavy-tailed Student-t in 2-D lets moderately distant points spread out, avoiding crowding. Gradient descent on the KL pulls together points that are neighbors in high-D and pushes apart the rest. Too-low perplexity focuses on tiny neighborhoods and shatters real clusters into fake blobs. (This build is a simplified, teaching-scale t-SNE.)
Now Break It
Try this: Wrong perplexity fractures real clusters or invents fake ones; distances between clusters are meaningless.
Control: Perplexity slider (set very low)
Last updated .