Skip to content
ML Visualization

Softmax & Multiclass

ClassificationIntermediate~6 min

Softmax & MulticlassSoftmax converts a vector of raw class scores (logits) into probabilities that sum to 1 by exponentiating and normalizing. A temperature parameter sharpens it toward a hard argmax or flattens it toward uniform.

Three or more classes carve the space into colored regions. A bar panel shows raw scores becoming probabilities — and a temperature dial slides softmax from a confident winner-take-all to a flat shrug.

  • Class 0
  • Class 1
  • Class 2
Softmax probabilities at the query
C0
71%
C1
14%
C2
15%
1.00
Classes

The idea in plain words

Softmax turns a handful of raw class scores into probabilities that sum to one, by exponentiating and normalizing. It’s the multiclass generalization of the sigmoid, and it powers the output layer of nearly every neural network classifier.

A temperature dial controls how peaked it is. Near zero it becomes a hard argmax — one class takes everything — so tiny changes flip the winner. Turn it up and the probabilities flatten toward a uniform shrug.

Now, the math

Softmax with temperature T:

softmax(zi)=ezi/Tjezj/T\text{softmax}(z_i) = \frac{e^{z_i/T}}{\sum_j e^{z_j/T}}
ziz_i
the raw score (logit) for class i.
TT
temperature — low sharpens toward argmax, high flattens toward uniform.
Show the derivation

Dividing logits by T before exponentiating rescales the gaps between them. As T → 0 the largest logit dominates completely (probability 1); as T → ∞ all scaled logits approach 0 and the probabilities become equal. This same knob is used to calibrate confidence and to soften targets in model distillation.

Now Break It

Try this: Temperature near zero makes the classifier brittle — the winner flips on tiny changes.

Control: Temperature slider (set very low)

Last updated .