Numerical Gradient Checking

#313 · Machine Learning · Medium

Problem

Implement numerical gradient checking to verify the correctness of an analytically computed gradient. Compare the analytical gradient with a numerical approximation and report whether they match within a tolerance.

Solution

import numpy as np

def numerical_gradient(f, params: np.ndarray, h: float = 1e-7) -> np.ndarray:
    grad = np.zeros_like(params, dtype=float)
    for i in range(len(params)):
        params_plus = params.copy()
        params_minus = params.copy()
        params_plus[i] += h
        params_minus[i] -= h
        grad[i] = (f(params_plus) - f(params_minus)) / (2 * h)
    return grad

def gradient_check(f, analytical_grad_fn, params: np.ndarray,
                   h: float = 1e-7, tolerance: float = 1e-5) -> dict:
    analytical_grad = analytical_grad_fn(params)
    numerical_grad = numerical_gradient(f, params, h)

    diff = np.abs(analytical_grad - numerical_grad)
    relative_error = np.linalg.norm(diff) / (
        np.linalg.norm(analytical_grad) + np.linalg.norm(numerical_grad) + 1e-8
    )

    passed = relative_error < tolerance

    return {
        "analytical_gradient": analytical_grad,
        "numerical_gradient": numerical_grad,
        "absolute_difference": diff,
        "relative_error": float(relative_error),
        "passed": bool(passed),
    }

Explanation

Numerical gradient uses central differences: (f(x + h) - f(x - h)) / (2h) for each parameter.
Relative error normalizes the difference by the combined magnitudes of both gradients, avoiding issues when gradients are very small or very large.
If relative error < tolerance (typically 1e-5 to 1e-7), the analytical gradient is likely correct.
This is a crucial debugging tool for neural network implementations. It is too slow for production use but invaluable during development.

Complexity

Time: O(n) function evaluations where n is the number of parameters
Space: O(n) for gradient storage

← #312 #314 →