Jensen-Shannon Divergence

#203 · Information Theory · Medium

Problem

Compute the Jensen-Shannon Divergence (JSD) between two probability distributions. JSD is a symmetric and bounded measure based on KL divergence: JSD(P || Q) = 0.5 * KL(P || M) + 0.5 * KL(Q || M) where M = 0.5 * (P + Q).

Solution

import numpy as np

def kl_divergence(p: np.ndarray, q: np.ndarray) -> float:
    p = np.array(p, dtype=float)
    q = np.array(q, dtype=float)
    mask = p > 0
    return float(np.sum(p[mask] * np.log(p[mask] / q[mask])))

def jensen_shannon_divergence(p: np.ndarray, q: np.ndarray) -> float:
    p = np.array(p, dtype=float)
    q = np.array(q, dtype=float)
    # Normalize
    p = p / p.sum()
    q = q / q.sum()
    m = 0.5 * (p + q)
    return 0.5 * kl_divergence(p, m) + 0.5 * kl_divergence(q, m)

def jensen_shannon_distance(p: np.ndarray, q: np.ndarray) -> float:
    return float(np.sqrt(jensen_shannon_divergence(p, q)))

Explanation

KL Divergence: KL(P || Q) = sum(P(x) * log(P(x) / Q(x))). Only computed where P(x) > 0 to avoid log(0).
Mixture distribution: M = 0.5 * (P + Q). This ensures M(x) > 0 wherever P(x) > 0 or Q(x) > 0, so the KL terms are always finite.
JSD: Average the two KL divergences from P and Q to M. This makes it symmetric: JSD(P||Q) = JSD(Q||P).
JSD is bounded in [0, ln(2)] when using natural log, or [0, 1] when using log base 2.
The square root of JSD is a proper metric (Jensen-Shannon distance).

Complexity

Time: O(n) where n is the number of elements in the distributions
Space: O(n) for the mixture distribution M

← #202 #204 →