Implement Score Matching for Score-Based Diffusion

#404 · Deep Learning · Medium

Problem

Implement the score matching objective for score-based diffusion models. Instead of predicting noise, score-based models learn the score function (gradient of the log probability). Implement denoising score matching where the model predicts the score at various noise levels.

Solution

import numpy as np

def denoising_score_matching_loss(
    model_fn,
    x0: np.ndarray,
    sigmas: np.ndarray
) -> float:
    batch_size = x0.shape[0]

    # Sample random noise levels
    sigma_idx = np.random.randint(0, len(sigmas), size=batch_size)
    sigma = sigmas[sigma_idx]

    # Reshape sigma for broadcasting
    sigma_shaped = sigma.copy()
    while sigma_shaped.ndim < x0.ndim:
        sigma_shaped = sigma_shaped[..., np.newaxis]

    # Add noise
    noise = np.random.randn(*x0.shape)
    x_noisy = x0 + sigma_shaped * noise

    # True score: -noise / sigma (gradient of log p(x_noisy | x0))
    target_score = -noise / sigma_shaped

    # Model predicts score
    predicted_score = model_fn(x_noisy, sigma)

    # Weighted MSE loss: sigma^2 * ||predicted - target||^2
    weights = sigma_shaped ** 2
    loss = np.mean(weights * (predicted_score - target_score) ** 2)
    return float(loss)


def langevin_sampling(score_fn, sigmas: np.ndarray, x_init: np.ndarray, steps_per_sigma: int = 100, step_size: float = 0.01) -> np.ndarray:
    x = x_init.copy()
    for sigma in reversed(sigmas):
        alpha = step_size * (sigma / sigmas[-1]) ** 2
        for _ in range(steps_per_sigma):
            score = score_fn(x, sigma)
            noise = np.random.randn(*x.shape)
            x = x + 0.5 * alpha * score + np.sqrt(alpha) * noise
    return x

Explanation

Score function: the score s(x) = grad_x log p(x) points toward higher-density regions.
Denoising score matching: add noise at level sigma, then the optimal score is -(x_noisy - x0) / sigma^2 = -noise / sigma.
Weighting: multiply loss by sigma^2 so each noise level contributes equally (without weighting, high noise levels dominate).
Langevin sampling: generate samples by iterating x <- x + alpha/2 * score(x) + sqrt(alpha) * noise, starting from high noise and annealing to low noise.

Complexity

Time: O(B d) per training step, O(S L * d) for sampling (S noise levels, L steps each)
Space: O(B * d) for the noisy samples and scores

← #403 #405 →