← back

Reconstruction Error from PCA

#353 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Given data X and the number of principal components k, compute the reconstruction error (mean squared error) when projecting X onto its top-k principal components and reconstructing back.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np

def pca_reconstruction_error(X: np.ndarray, k: int) -> float:
    # Center the data
    mean = X.mean(axis=0)
    X_centered = X - mean

    # Compute SVD
    U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)

    # Project onto top-k components and reconstruct
    V_k = Vt[:k]
    X_projected = X_centered @ V_k.T
    X_reconstructed = X_projected @ V_k

    # Compute mean squared error
    error = np.mean((X_centered - X_reconstructed) ** 2)
    return float(error)

Explanation

  1. Center the data by subtracting the mean of each feature.
  2. Perform SVD on the centered data to get principal components (rows of Vt).
  3. Project the data onto the top-k components, then reconstruct by projecting back.
  4. The reconstruction error is the mean squared difference between the original centered data and its reconstruction. Equivalently, this equals sum(S[k:]^2) / (n * d).

Complexity

  • Time: O(n d min(n, d)) for SVD
  • Space: O(n * d) for the centered data and reconstruction