Implement LLE (Locally Linear Embedding)

#351 · Machine Learning · Hard

Problem

Implement Locally Linear Embedding (LLE), a nonlinear dimensionality reduction technique. Given a dataset of high-dimensional points, a number of neighbors k, and a target dimension d, reduce the data to d dimensions while preserving local neighborhood structure.

Solution

import numpy as np

def lle(X: np.ndarray, k: int, d: int) -> np.ndarray:
    n = X.shape[0]
    # Step 1: Find k nearest neighbors for each point
    from scipy.spatial.distance import cdist
    dists = cdist(X, X)
    neighbors = np.zeros((n, k), dtype=int)
    for i in range(n):
        sorted_idx = np.argsort(dists[i])
        neighbors[i] = sorted_idx[1:k+1]

    # Step 2: Compute reconstruction weights
    W = np.zeros((n, n))
    for i in range(n):
        Z = X[neighbors[i]] - X[i]
        C = Z @ Z.T
        C += np.eye(k) * 1e-3 * np.trace(C)
        w = np.linalg.solve(C, np.ones(k))
        w /= w.sum()
        W[i, neighbors[i]] = w

    # Step 3: Compute embedding via eigenvectors of (I - W)^T (I - W)
    M = (np.eye(n) - W)
    M = M.T @ M
    eigenvalues, eigenvectors = np.linalg.eigh(M)
    # Skip the smallest eigenvalue (near zero), take next d
    Y = eigenvectors[:, 1:d+1]
    return Y

Explanation

For each point, find its k nearest neighbors using Euclidean distance.
Compute reconstruction weights that best linearly reconstruct each point from its neighbors by solving a constrained least-squares problem. Weights for each point sum to 1.
Find the low-dimensional embedding that minimizes the reconstruction error with the same weights. This amounts to finding the bottom eigenvectors (excluding the trivial zero eigenvalue) of (I - W)^T (I - W).

Complexity

Time: O(n^2 D) for distance computation + O(n k^3) for weight solving + O(n^3) for eigen-decomposition
Space: O(n^2) for the weight and cost matrices

← #350 #352 →