← back

Central Limit Theorem Simulation

#182 · Probability · Medium

⊣ Solve on deep-ml.com

Problem

Implement a Central Limit Theorem Simulation. Demonstrate that the sum (or mean) of independent random variables from any distribution converges to a normal distribution as the number of variables increases.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import numpy as np

def clt_simulation(distribution: str = "uniform", n_samples: int = 1000,
                   sample_sizes: list = None, params: dict = None) -> dict:
    if sample_sizes is None:
        sample_sizes = [1, 2, 5, 10, 30, 100]
    if params is None:
        params = {}

    def draw(size):
        if distribution == "uniform":
            return np.random.uniform(params.get("low", 0), params.get("high", 1), size)
        elif distribution == "exponential":
            return np.random.exponential(params.get("scale", 1.0), size)
        elif distribution == "bernoulli":
            return np.random.binomial(1, params.get("p", 0.5), size)
        else:
            return np.random.uniform(0, 1, size)

    results = {}
    for n in sample_sizes:
        means = np.array([np.mean(draw(n)) for _ in range(n_samples)])
        results[n] = {
            "mean": float(np.mean(means)),
            "std": float(np.std(means)),
            "skewness": float(np.mean(((means - np.mean(means)) / (np.std(means) + 1e-10)) ** 3)),
            "kurtosis": float(np.mean(((means - np.mean(means)) / (np.std(means) + 1e-10)) ** 4) - 3),
        }
    return results

Explanation

  1. For each sample size n, repeatedly draw n values from the chosen distribution and compute their mean.
  2. Collect many such sample means to form the sampling distribution.
  3. Compute statistics: mean, standard deviation, skewness, and excess kurtosis.
  4. As n increases, skewness should approach 0 and excess kurtosis should approach 0, indicating convergence to normality.
  5. The standard deviation of sample means should decrease as 1/sqrt(n), matching the CLT prediction.

Complexity

  • Time: O(n_samples * max(sample_sizes))
  • Space: O(n_samples) for storing sample means at each sample size