Confidence Interval for Population Mean

#212 · Statistics · Medium

Problem

Compute a Confidence Interval for a Population Mean given sample data and a confidence level. Support both cases: when the population standard deviation is known (z-interval) and unknown (t-interval).

Solution

import math

def confidence_interval(data: list, confidence: float = 0.95,
                        population_std: float = None) -> dict:
    n = len(data)
    mean = sum(data) / n
    sample_var = sum((x - mean) ** 2 for x in data) / (n - 1)
    sample_std = math.sqrt(sample_var)

    alpha = 1 - confidence

    if population_std is not None:
        # Z-interval
        z = _z_critical(1 - alpha / 2)
        margin = z * population_std / math.sqrt(n)
        method = "z-interval"
    else:
        # T-interval
        t = _t_critical(1 - alpha / 2, n - 1)
        margin = t * sample_std / math.sqrt(n)
        method = "t-interval"

    return {
        "mean": round(mean, 6),
        "lower": round(mean - margin, 6),
        "upper": round(mean + margin, 6),
        "margin_of_error": round(margin, 6),
        "method": method,
    }

def _z_critical(p):
    """Approximate the inverse standard normal CDF (quantile function)."""
    # Rational approximation (Abramowitz & Stegun)
    if p <= 0 or p >= 1:
        return 0.0
    if p < 0.5:
        return -_z_critical(1 - p)
    t = math.sqrt(-2 * math.log(1 - p))
    c0, c1, c2 = 2.515517, 0.802853, 0.010328
    d1, d2, d3 = 1.432788, 0.189269, 0.001308
    return t - (c0 + c1 * t + c2 * t ** 2) / (1 + d1 * t + d2 * t ** 2 + d3 * t ** 3)

def _t_critical(p, df):
    """Approximate t critical value using normal approximation + correction."""
    z = _z_critical(p)
    # Cornish-Fisher expansion for t-distribution
    g1 = (z ** 3 + z) / 4
    g2 = (5 * z ** 5 + 16 * z ** 3 + 3 * z) / 96
    g3 = (3 * z ** 7 + 19 * z ** 5 + 17 * z ** 3 - 15 * z) / 384
    return z + g1 / df + g2 / df ** 2 + g3 / df ** 3

Explanation

Compute the sample mean and sample standard deviation.
Z-interval (known population std): CI = mean +/- z * sigma / sqrt(n) where z is the critical value from the standard normal distribution.
T-interval (unknown population std): CI = mean +/- t * s / sqrt(n) where t is the critical value from the t-distribution with n-1 degrees of freedom.
The z critical value is approximated using a rational approximation. The t critical value uses a Cornish-Fisher expansion correction on top of the z value.
Returns the mean, lower and upper bounds, and the margin of error.

Complexity

Time: O(n) to compute sample statistics
Space: O(1) beyond the input data

← #211 #213 →