Entropy & Cross-Entropy

#205 · Information Theory · Medium

Problem

Compute Entropy H(X) and Cross-Entropy H(P, Q) for discrete probability distributions. Entropy measures the average information content; cross-entropy measures the average number of bits needed to encode data from P using a code optimized for Q.

Solution

import numpy as np

def entropy(p: np.ndarray) -> float:
    p = np.array(p, dtype=float)
    p = p / p.sum()
    mask = p > 0
    return float(-np.sum(p[mask] * np.log(p[mask])))

def cross_entropy(p: np.ndarray, q: np.ndarray) -> float:
    p = np.array(p, dtype=float)
    q = np.array(q, dtype=float)
    p = p / p.sum()
    q = q / q.sum()
    # Clip q to avoid log(0)
    q = np.clip(q, 1e-12, 1.0)
    return float(-np.sum(p * np.log(q)))

def kl_from_entropy(p: np.ndarray, q: np.ndarray) -> float:
    return cross_entropy(p, q) - entropy(p)

Explanation

Entropy: H(P) = -sum(P(x) * log(P(x))) for all x where P(x) > 0. Measures the intrinsic uncertainty of distribution P.
Cross-Entropy: H(P, Q) = -sum(P(x) * log(Q(x))). Measures the expected number of nats to encode samples from P using distribution Q.
Cross-entropy is always >= entropy: H(P, Q) >= H(P), with equality when P = Q.
The difference H(P, Q) - H(P) equals the KL divergence KL(P || Q), which measures the inefficiency of using Q to represent P.
Q is clipped to avoid numerical issues with log(0).

Complexity

Time: O(n) where n is the number of elements in the distribution
Space: O(n) for intermediate arrays

← #204 #206 →