Maximum Likelihood Estimation for Gaussian Distribution

#337 · Statistics · Medium

Problem

Implement Maximum Likelihood Estimation (MLE) for the parameters of a Gaussian (normal) distribution. Given observed data, estimate the mean and variance that maximize the likelihood of observing the data.

Solution

from typing import List, Dict

def gaussian_mle(data: List[float]) -> Dict[str, float]:
    n = len(data)
    if n == 0:
        raise ValueError("Data cannot be empty")

    # MLE for mean: sample mean
    mean = sum(data) / n

    # MLE for variance: average squared deviation (not Bessel-corrected)
    variance = sum((x - mean) ** 2 for x in data) / n

    std_dev = variance ** 0.5

    return {
        "mean": round(mean, 4),
        "variance": round(variance, 4),
        "std_dev": round(std_dev, 4)
    }

def gaussian_log_likelihood(
    data: List[float],
    mean: float,
    variance: float
) -> float:
    import math
    if variance <= 0:
        raise ValueError("Variance must be positive")

    n = len(data)
    ll = -n / 2 * math.log(2 * math.pi * variance)
    ll -= sum((x - mean) ** 2 for x in data) / (2 * variance)
    return round(ll, 4)

Explanation

The MLE for the mean of a Gaussian is the sample mean: mu_hat = (1/n) * sum(x_i).
The MLE for the variance is the average squared deviation from the mean: sigma^2_hat = (1/n) * sum((x_i - mu_hat)^2). Note this uses n, not n-1 (MLE is biased for variance).
The log-likelihood function is: -n/2 log(2pisigma^2) - sum((x_i - mu)^2) / (2sigma^2). These MLE estimates maximize this function.

Complexity

Time: O(n) where n is the number of data points
Space: O(1) extra space

← #336 #338 →