← back

Calculate Explained Variance Ratio for PCA

#350 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Calculate the explained variance ratio for PCA. Given data, compute the principal components and determine what fraction of the total variance each component explains.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
from typing import Dict

def pca_explained_variance(
    X: np.ndarray,
    n_components: int = None
) -> Dict:
    n, p = X.shape
    if n_components is None:
        n_components = min(n, p)

    # Center the data
    mean = X.mean(axis=0)
    X_centered = X - mean

    # Covariance matrix
    cov = (X_centered.T @ X_centered) / (n - 1)

    # Eigendecomposition
    eigenvalues, eigenvectors = np.linalg.eigh(cov)

    # Sort descending
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    # Total variance
    total_variance = np.sum(eigenvalues)

    # Explained variance ratio
    explained_variance = eigenvalues[:n_components]
    explained_ratio = explained_variance / total_variance
    cumulative_ratio = np.cumsum(explained_ratio)

    # Project data
    components = eigenvectors[:, :n_components]
    X_transformed = X_centered @ components

    return {
        "explained_variance": explained_variance.tolist(),
        "explained_variance_ratio": [round(float(r), 4) for r in explained_ratio],
        "cumulative_variance_ratio": [round(float(r), 4) for r in cumulative_ratio],
        "total_variance": round(float(total_variance), 4),
        "n_components": n_components,
        "transformed_shape": list(X_transformed.shape)
    }

Explanation

  1. Center the data by subtracting the mean of each feature.
  2. Compute the covariance matrix and its eigendecomposition. Eigenvalues represent the variance along each principal component.
  3. Sort eigenvalues in descending order. The explained variance ratio for component i is eigenvalue_i / sum(all eigenvalues).
  4. The cumulative explained variance ratio shows how much total variance is captured by the first k components, useful for choosing the number of components.

Complexity

  • Time: O(p^3 + n * p^2) where n is samples and p is features
  • Space: O(p^2) for the covariance matrix