Compute the derivative of cross-entropy loss with respect to logits when using softmax output. Given logits and a one-hot encoded true label, return the gradient.
import math
def cross_entropy_loss_derivative(logits: list[float], targets: list[float]) -> list[float]:
# Compute softmax
max_l = max(logits)
exps = [math.exp(x - max_l) for x in logits]
total = sum(exps)
softmax_probs = [e / total for e in exps]
# Gradient of cross-entropy w.r.t. logits = softmax - targets
grad = [softmax_probs[i] - targets[i] for i in range(len(logits))]
return gradL = -sum(y_i * log(S_i)) where S_i is softmax output.dL/dz = S - y, where S is the softmax vector and y is the one-hot target.