← back

Hypergeometric Distribution PMF

#245 · Probability · Medium

⊣ Solve on deep-ml.com

Problem

Compute the probability mass function (PMF) of the hypergeometric distribution. Given a population of N items with K successes, compute the probability of getting exactly k successes in a sample of n drawn without replacement.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def hypergeometric_pmf(N: int, K: int, n: int, k: int) -> float:
    """
    N: population size
    K: number of success states in population
    n: number of draws
    k: number of observed successes
    """
    def log_comb(a, b):
        if b < 0 or b > a:
            return float('-inf')
        if b == 0 or b == a:
            return 0.0
        b = min(b, a - b)
        result = 0.0
        for i in range(b):
            result += log(a - i) - log(i + 1)
        return result

    from math import log, exp

    # P(X=k) = C(K,k) * C(N-K, n-k) / C(N, n)
    if k < max(0, n - (N - K)) or k > min(K, n):
        return 0.0

    log_pmf = log_comb(K, k) + log_comb(N - K, n - k) - log_comb(N, n)
    return round(exp(log_pmf), 6)

Explanation

  1. The hypergeometric PMF is: P(X=k) = C(K,k) * C(N-K, n-k) / C(N,n).
  2. C(K,k) counts ways to choose k successes from K success items.
  3. C(N-K, n-k) counts ways to choose the remaining draws from failure items.
  4. C(N,n) counts total ways to draw n items from N.
  5. Log-space computation avoids overflow with large factorials.
  6. Valid range check: k must be between max(0, n-(N-K)) and min(K, n).

Complexity

  • Time: O(min(k, K-k) + min(n-k, N-K-n+k) + min(n, N-n))
  • Space: O(1)