← back

Sampling Distribution of the Mean

#181 · Probability · Easy

⊣ Solve on deep-ml.com

Problem

Demonstrate the Sampling Distribution of the Mean. Given a population distribution, simulate taking many samples of size n and show that the distribution of sample means has a smaller variance and approaches normality (by CLT).

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np

def sampling_distribution_of_mean(population: np.ndarray, sample_size: int,
                                  n_samples: int = 10000) -> dict:
    sample_means = np.array([
        np.mean(np.random.choice(population, size=sample_size, replace=True))
        for _ in range(n_samples)
    ])

    pop_mean = np.mean(population)
    pop_var = np.var(population)

    return {
        "population_mean": float(pop_mean),
        "population_variance": float(pop_var),
        "mean_of_sample_means": float(np.mean(sample_means)),
        "variance_of_sample_means": float(np.var(sample_means)),
        "theoretical_variance": float(pop_var / sample_size),
        "sample_means": sample_means,
    }

Explanation

  1. Draw n_samples random samples of size sample_size from the population (with replacement).
  2. Compute the mean of each sample.
  3. The distribution of sample means should have approximately the same mean as the population.
  4. The variance of sample means should be approximately population_variance / sample_size.
  5. By the Central Limit Theorem, this distribution approaches a normal distribution regardless of the population shape (for large enough sample_size).

Complexity

  • Time: O(n_samples * sample_size)
  • Space: O(n_samples) for storing sample means