Derivative of Cross-Entropy Loss w.r.t. Logits

#220 · Calculus · Medium

Problem

Compute the derivative of cross-entropy loss with respect to logits when using softmax output. Given logits and a one-hot encoded true label, return the gradient.

Solution

import math

def cross_entropy_loss_derivative(logits: list[float], targets: list[float]) -> list[float]:
    # Compute softmax
    max_l = max(logits)
    exps = [math.exp(x - max_l) for x in logits]
    total = sum(exps)
    softmax_probs = [e / total for e in exps]

    # Gradient of cross-entropy w.r.t. logits = softmax - targets
    grad = [softmax_probs[i] - targets[i] for i in range(len(logits))]
    return grad

Explanation

The cross-entropy loss is L = -sum(y_i * log(S_i)) where S_i is softmax output.
When combining softmax + cross-entropy, the gradient simplifies elegantly to dL/dz = S - y, where S is the softmax vector and y is the one-hot target.
This is one of the most important gradient formulas in deep learning, as it avoids computing the full Jacobian.

Complexity

Time: O(n) where n is the number of classes
Space: O(n) for the gradient vector

← #219 #221 →