← back

Bhattacharyya Distance Between Two Distributions

#120 · Statistics · Easy

⊣ Solve on deep-ml.com

Problem

Compute the Bhattacharyya distance between two probability distributions. This distance measures the similarity of two discrete or continuous distributions, with 0 indicating identical distributions and larger values indicating greater divergence.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import math

def bhattacharyya_distance(p: list[float], q: list[float]) -> float:
    # Bhattacharyya coefficient
    bc = sum(math.sqrt(pi * qi) for pi, qi in zip(p, q))

    # Clamp to avoid log(0) or log of values > 1 due to floating point
    bc = min(bc, 1.0)

    if bc <= 0:
        return float('inf')

    distance = -math.log(bc)
    return round(distance, 4)

Explanation

  1. Bhattacharyya coefficient (BC): BC(p, q) = sum(sqrt(p_i * q_i)) for discrete distributions. This measures the overlap between two distributions, ranging from 0 (no overlap) to 1 (identical).
  2. Bhattacharyya distance: DB = -ln(BC). The negative log transforms the coefficient into a distance metric.
  3. Properties: DB >= 0, DB = 0 iff p = q, and it is related to the Hellinger distance by H^2 = 1 - BC.
  4. The distributions p and q must be valid probability distributions (non-negative, sum to 1).

Complexity

  • Time: O(n) where n is the number of bins/categories
  • Space: O(1)