← back

Maximum A Posteriori (MAP) Estimation for Bernoulli Parameter

#338 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement Maximum A Posteriori (MAP) estimation for the parameter p of a Bernoulli distribution, using a Beta prior. Given observed binary data and Beta prior parameters (alpha, beta), compute the MAP estimate.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from typing import List, Dict

def bernoulli_map(
    data: List[int],
    alpha_prior: float = 1.0,
    beta_prior: float = 1.0
) -> Dict[str, float]:
    n = len(data)
    successes = sum(data)
    failures = n - successes

    # Posterior is Beta(alpha_prior + successes, beta_prior + failures)
    alpha_post = alpha_prior + successes
    beta_post = beta_prior + failures

    # MAP estimate for Beta distribution
    # Mode of Beta(a, b) = (a - 1) / (a + b - 2) when a > 1 and b > 1
    if alpha_post > 1 and beta_post > 1:
        p_map = (alpha_post - 1) / (alpha_post + beta_post - 2)
    elif alpha_post <= 1 and beta_post <= 1:
        # Bimodal, return 0.5 as convention
        p_map = 0.5
    elif alpha_post <= 1:
        p_map = 0.0
    else:
        p_map = 1.0

    # MLE for comparison
    p_mle = successes / n if n > 0 else 0.5

    return {
        "p_map": round(p_map, 4),
        "p_mle": round(p_mle, 4),
        "alpha_posterior": round(alpha_post, 4),
        "beta_posterior": round(beta_post, 4),
        "n_observations": n,
        "n_successes": successes
    }

Explanation

  1. The Beta distribution is the conjugate prior for the Bernoulli likelihood. With prior Beta(alpha, beta) and observing k successes in n trials, the posterior is Beta(alpha + k, beta + n - k).
  2. The MAP estimate is the mode of the posterior distribution: (alpha_post - 1) / (alpha_post + beta_post - 2), valid when both posterior parameters exceed 1.
  3. With a uniform prior (alpha=1, beta=1), the MAP estimate equals the MLE (k/n).
  4. The Beta prior acts as adding pseudo-observations, pulling the estimate toward the prior belief.

Complexity

  • Time: O(n) to count successes
  • Space: O(1)