← back

Derivative of Softmax

#219 · Calculus · Medium

⊣ Solve on deep-ml.com

Problem

Compute the derivative of the softmax function with respect to its input logits. Given a vector of logits, return the Jacobian matrix of the softmax output.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import math

def softmax_derivative(logits: list[float]) -> list[list[float]]:
    # Compute softmax
    max_l = max(logits)
    exps = [math.exp(x - max_l) for x in logits]
    total = sum(exps)
    s = [e / total for e in exps]

    n = len(s)
    jacobian = [[0.0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            if i == j:
                jacobian[i][j] = s[i] * (1 - s[i])
            else:
                jacobian[i][j] = -s[i] * s[j]
    return jacobian

Explanation

  1. First compute the softmax probabilities from the logits.
  2. The Jacobian of softmax has the form:
  3. - Diagonal entries: dS_i/dz_i = S_i * (1 - S_i)
  4. - Off-diagonal entries: dS_i/dz_j = -S_i * S_j
  5. This can also be expressed as diag(S) - S * S^T.

Complexity

  • Time: O(n^2) where n is the number of classes
  • Space: O(n^2) for the Jacobian matrix