← back

Calculate Covariance Matrix

#10 · Statistics · Easy

⊣ Solve on deep-ml.com

Problem

Calculate the covariance matrix for a given dataset where each column represents a feature and each row is an observation.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def calculate_covariance_matrix(vectors: list[list[float]]) -> list[list[float]]:
    n = len(vectors)
    num_features = len(vectors[0])
    # Calculate means for each feature
    means = [sum(vectors[r][c] for r in range(n)) / n for c in range(num_features)]
    # Calculate covariance matrix
    cov = []
    for i in range(num_features):
        row = []
        for j in range(num_features):
            covariance = sum(
                (vectors[r][i] - means[i]) * (vectors[r][j] - means[j])
                for r in range(n)
            ) / (n - 1)
            row.append(covariance)
        cov.append(row)
    return cov

Explanation

  1. Compute the mean of each feature (column).
  2. For each pair of features (i, j), compute the covariance: the average of (x_i - mean_i)(x_j - mean_j) over all observations, using n-1 for the sample covariance (Bessel's correction).
  3. The result is a symmetric num_features x num_features matrix.

Complexity

  • Time: O(n * f^2) where n is the number of observations and f is the number of features
  • Space: O(f^2) for the covariance matrix