Maximum A Posteriori (MAP) Estimation for Bernoulli Parameter

#338 · Machine Learning · Medium

Problem

Implement Maximum A Posteriori (MAP) estimation for the parameter p of a Bernoulli distribution, using a Beta prior. Given observed binary data and Beta prior parameters (alpha, beta), compute the MAP estimate.

Solution

from typing import List, Dict

def bernoulli_map(
    data: List[int],
    alpha_prior: float = 1.0,
    beta_prior: float = 1.0
) -> Dict[str, float]:
    n = len(data)
    successes = sum(data)
    failures = n - successes

    # Posterior is Beta(alpha_prior + successes, beta_prior + failures)
    alpha_post = alpha_prior + successes
    beta_post = beta_prior + failures

    # MAP estimate for Beta distribution
    # Mode of Beta(a, b) = (a - 1) / (a + b - 2) when a > 1 and b > 1
    if alpha_post > 1 and beta_post > 1:
        p_map = (alpha_post - 1) / (alpha_post + beta_post - 2)
    elif alpha_post <= 1 and beta_post <= 1:
        # Bimodal, return 0.5 as convention
        p_map = 0.5
    elif alpha_post <= 1:
        p_map = 0.0
    else:
        p_map = 1.0

    # MLE for comparison
    p_mle = successes / n if n > 0 else 0.5

    return {
        "p_map": round(p_map, 4),
        "p_mle": round(p_mle, 4),
        "alpha_posterior": round(alpha_post, 4),
        "beta_posterior": round(beta_post, 4),
        "n_observations": n,
        "n_successes": successes
    }

Explanation

The Beta distribution is the conjugate prior for the Bernoulli likelihood. With prior Beta(alpha, beta) and observing k successes in n trials, the posterior is Beta(alpha + k, beta + n - k).
The MAP estimate is the mode of the posterior distribution: (alpha_post - 1) / (alpha_post + beta_post - 2), valid when both posterior parameters exceed 1.
With a uniform prior (alpha=1, beta=1), the MAP estimate equals the MLE (k/n).
The Beta prior acts as adding pseudo-observations, pulling the estimate toward the prior belief.

Complexity

Time: O(n) to count successes
Space: O(1)

← #337 #339 →