Softmax & Multiclass
Softmax & Multiclass — Softmax converts a vector of raw class scores (logits) into probabilities that sum to 1 by exponentiating and normalizing. A temperature parameter sharpens it toward a hard argmax or flattens it toward uniform.
Three or more classes carve the space into colored regions. A bar panel shows raw scores becoming probabilities — and a temperature dial slides softmax from a confident winner-take-all to a flat shrug.
- Class 0
- Class 1
- Class 2
The idea in plain words
Softmax turns a handful of raw class scores into probabilities that sum to one, by exponentiating and normalizing. It’s the multiclass generalization of the sigmoid, and it powers the output layer of nearly every neural network classifier.
A temperature dial controls how peaked it is. Near zero it becomes a hard argmax — one class takes everything — so tiny changes flip the winner. Turn it up and the probabilities flatten toward a uniform shrug.
Now, the math
Softmax with temperature T:
- the raw score (logit) for class i.
- temperature — low sharpens toward argmax, high flattens toward uniform.
▸ Show the derivation
Dividing logits by T before exponentiating rescales the gaps between them. As T → 0 the largest logit dominates completely (probability 1); as T → ∞ all scaled logits approach 0 and the probabilities become equal. This same knob is used to calibrate confidence and to soften targets in model distillation.
Now Break It
Try this: Temperature near zero makes the classifier brittle — the winner flips on tiny changes.
Control: Temperature slider (set very low)
Last updated .