← back

Gaussian Process for Regression

#186 · Machine Learning · Hard

⊣ Solve on deep-ml.com

Problem

Implement Gaussian Process Regression from scratch. Given training inputs X, training targets y, and test inputs X*, predict the posterior mean and variance at the test points using a squared-exponential (RBF) kernel.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import numpy as np

def rbf_kernel(X1, X2, length_scale=1.0, sigma_f=1.0):
    dist_sq = np.sum(X1**2, axis=1, keepdims=True) + \
              np.sum(X2**2, axis=1) - 2 * X1 @ X2.T
    return sigma_f**2 * np.exp(-0.5 * dist_sq / length_scale**2)

def gp_regression(X_train, y_train, X_test,
                  length_scale=1.0, sigma_f=1.0, sigma_y=1e-8):
    X_train = np.array(X_train, dtype=float)
    y_train = np.array(y_train, dtype=float).reshape(-1, 1)
    X_test = np.array(X_test, dtype=float)
    if X_train.ndim == 1:
        X_train = X_train.reshape(-1, 1)
    if X_test.ndim == 1:
        X_test = X_test.reshape(-1, 1)

    K = rbf_kernel(X_train, X_train, length_scale, sigma_f)
    K += sigma_y**2 * np.eye(len(X_train))
    K_s = rbf_kernel(X_train, X_test, length_scale, sigma_f)
    K_ss = rbf_kernel(X_test, X_test, length_scale, sigma_f)

    L = np.linalg.cholesky(K)
    alpha = np.linalg.solve(L.T, np.linalg.solve(L, y_train))
    mu = (K_s.T @ alpha).flatten()

    v = np.linalg.solve(L, K_s)
    cov = K_ss - v.T @ v
    var = np.diag(cov)

    return mu, var

Explanation

  1. RBF Kernel: Compute the squared-exponential kernel matrix between two sets of points using k(x, x') = sigma_f^2 * exp(-||x - x'||^2 / (2 * l^2)).
  2. Training covariance: Build K (train-train), K_s (train-test), and K_ss (test-test) kernel matrices.
  3. Cholesky decomposition: Factor K = L L^T for numerical stability and efficient solving.
  4. Posterior mean: mu = K_s^T @ K^{-1} @ y, computed via Cholesky solves.
  5. Posterior variance: var = diag(K_ss - K_s^T @ K^{-1} @ K_s).

Complexity

  • Time: O(n^3) for Cholesky decomposition of the n x n training covariance matrix
  • Space: O(n^2) to store the kernel matrices