Principal Component Analysis

Unsupervised & Dim. ReductionIntermediate~8 min

Principal Component Analysis — Project data onto the directions of greatest variance.

PCA finds the directions in which your data varies most and projects onto them, compressing many correlated features into a few meaningful axes while keeping as much information as possible.

Data
Projection axis
First PC
Residuals

Explained variance

PC1

93%

PC2

Dataset

Drag to rotate the projection axis. The variance meter peaks — and the axis turns amber — exactly at the first principal component.

Explained variance

PC1

93%

PC2

Dataset

Drag to rotate the projection axis. The variance meter peaks — and the axis turns amber — exactly at the first principal component.

The idea in plain words

PCA finds the directions along which your data varies the most and projects onto them. Rotate the projection axis by hand and the “variance captured” meter peaks exactly at the first principal component — you discover PCA instead of being told it.

It’s a linear method, though. Hand it a curved manifold like a swiss roll and it can only flatten by projection — it can’t unroll the sheet. That limitation is what motivates t-SNE and UMAP.

Now, the math

The principal components are the eigenvectors of the covariance matrix:

\Sigma\, v_k = \lambda_k\, v_k,\qquad \Sigma = \tfrac{1}{n}\sum_i (x_i - \mu)(x_i - \mu)^\top

$v_k$: the k-th principal component (a direction).
$\lambda_k$: its eigenvalue — the variance captured along that direction.
$\Sigma$: the data covariance matrix.

▸ Show the derivation

The direction of maximum variance is the top eigenvector of the covariance matrix; the explained variance ratio is its eigenvalue over the total. Projecting onto the first few components keeps the most information for the fewest dimensions — but only along straight axes, so curved structure is lost.

Now Break It

Try this: Dropping to too few components loses the structure — reconstruction becomes a blur.

Control: Components-to-keep slider (set to 1)

← Back to all visualizations Continue on the Learning Path →

Last updated July 3, 2026.