← back

Principal Component Analysis (PCA) Implementation

#19 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement Principal Component Analysis (PCA) from scratch. Given a dataset, reduce its dimensionality to a specified number of principal components.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np

def pca(data: list[list[float]], num_components: int) -> list[list[float]]:
    X = np.array(data, dtype=float)
    n = X.shape[0]

    # Center the data (subtract mean of each feature)
    mean = X.mean(axis=0)
    X_centered = X - mean

    # Compute covariance matrix
    cov_matrix = (X_centered.T @ X_centered) / (n - 1)

    # Compute eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

    # Sort by eigenvalue descending
    idx = np.argsort(eigenvalues)[::-1]
    eigenvectors = eigenvectors[:, idx]

    # Select top num_components eigenvectors
    W = eigenvectors[:, :num_components]

    # Project data
    projected = X_centered @ W

    return np.round(projected, 4).tolist()

Explanation

  1. Center the data by subtracting the mean of each feature.
  2. Compute the covariance matrix of the centered data.
  3. Find the eigenvalues and eigenvectors of the covariance matrix.
  4. Sort eigenvectors by eigenvalue in descending order and select the top num_components.
  5. Project the centered data onto the selected eigenvectors to get the lower-dimensional representation.

Complexity

  • Time: O(n * f^2 + f^3) where n is samples and f is features
  • Space: O(f^2) for the covariance matrix