← back

Entropy & Cross-Entropy

#205 · Information Theory · Medium

⊣ Solve on deep-ml.com

Problem

Compute Entropy H(X) and Cross-Entropy H(P, Q) for discrete probability distributions. Entropy measures the average information content; cross-entropy measures the average number of bits needed to encode data from P using a code optimized for Q.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np

def entropy(p: np.ndarray) -> float:
    p = np.array(p, dtype=float)
    p = p / p.sum()
    mask = p > 0
    return float(-np.sum(p[mask] * np.log(p[mask])))

def cross_entropy(p: np.ndarray, q: np.ndarray) -> float:
    p = np.array(p, dtype=float)
    q = np.array(q, dtype=float)
    p = p / p.sum()
    q = q / q.sum()
    # Clip q to avoid log(0)
    q = np.clip(q, 1e-12, 1.0)
    return float(-np.sum(p * np.log(q)))

def kl_from_entropy(p: np.ndarray, q: np.ndarray) -> float:
    return cross_entropy(p, q) - entropy(p)

Explanation

  1. Entropy: H(P) = -sum(P(x) * log(P(x))) for all x where P(x) > 0. Measures the intrinsic uncertainty of distribution P.
  2. Cross-Entropy: H(P, Q) = -sum(P(x) * log(Q(x))). Measures the expected number of nats to encode samples from P using distribution Q.
  3. Cross-entropy is always >= entropy: H(P, Q) >= H(P), with equality when P = Q.
  4. The difference H(P, Q) - H(P) equals the KL divergence KL(P || Q), which measures the inefficiency of using Q to represent P.
  5. Q is clipped to avoid numerical issues with log(0).

Complexity

  • Time: O(n) where n is the number of elements in the distribution
  • Space: O(n) for intermediate arrays