Sampling Distribution of the Mean

#181 · Probability · Easy

Problem

Demonstrate the Sampling Distribution of the Mean. Given a population distribution, simulate taking many samples of size n and show that the distribution of sample means has a smaller variance and approaches normality (by CLT).

Solution

import numpy as np

def sampling_distribution_of_mean(population: np.ndarray, sample_size: int,
                                  n_samples: int = 10000) -> dict:
    sample_means = np.array([
        np.mean(np.random.choice(population, size=sample_size, replace=True))
        for _ in range(n_samples)
    ])

    pop_mean = np.mean(population)
    pop_var = np.var(population)

    return {
        "population_mean": float(pop_mean),
        "population_variance": float(pop_var),
        "mean_of_sample_means": float(np.mean(sample_means)),
        "variance_of_sample_means": float(np.var(sample_means)),
        "theoretical_variance": float(pop_var / sample_size),
        "sample_means": sample_means,
    }

Explanation

Draw n_samples random samples of size sample_size from the population (with replacement).
Compute the mean of each sample.
The distribution of sample means should have approximately the same mean as the population.
The variance of sample means should be approximately population_variance / sample_size.
By the Central Limit Theorem, this distribution approaches a normal distribution regardless of the population shape (for large enough sample_size).

Complexity

Time: O(n_samples * sample_size)
Space: O(n_samples) for storing sample means

← #180 #182 →