Skip to content
ML Visualization

t-SNE

Unsupervised & Dim. ReductionAdvanced~9 min

t-SNEEmbed high-dimensional data in 2D preserving local neighborhoods.

t-SNE squeezes high-dimensional data down to 2D for visualization by keeping nearby points nearby. It reveals clusters beautifully — but its knobs are easy to misread.

  • Cluster 0
  • Cluster 1
  • Cluster 2
Iteration 0 / 38
18

Watch separated clusters emerge from a random blob. Low perplexity shatters real clusters into fake sub-blobs — and note that inter-cluster distances in t-SNE aren’t meaningful.

The idea in plain words

t-SNE squeezes high-dimensional data down to 2-D for visualization by keeping nearby points nearby. Watch separated clusters emerge from a random blob, frame by frame — the archetypal “impressive” ML animation, computed live.

It reveals clusters beautifully but its knobs mislead: the perplexity sets the effective neighborhood size, and the distances and sizes between clusters are not meaningful. Unlike PCA, it’s nonlinear and non-deterministic.

Now, the math

t-SNE minimizes the KL divergence between neighbor distributions P (high-D) and Q (2-D):

KL(PQ)=ijpijlogpijqij\mathrm{KL}(P\,\|\,Q) = \sum_{i \ne j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
pijp_{ij}
high-dimensional neighbor probability (Gaussian, set by perplexity).
qijq_{ij}
low-dimensional neighbor probability (heavy-tailed Student-t).
Show the derivation

The heavy-tailed Student-t in 2-D lets moderately distant points spread out, avoiding crowding. Gradient descent on the KL pulls together points that are neighbors in high-D and pushes apart the rest. Too-low perplexity focuses on tiny neighborhoods and shatters real clusters into fake blobs. (This build is a simplified, teaching-scale t-SNE.)

Now Break It

Try this: Wrong perplexity fractures real clusters or invents fake ones; distances between clusters are meaningless.

Control: Perplexity slider (set very low)

Last updated .