← back

Numerical Gradient Checking

#313 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement numerical gradient checking to verify the correctness of an analytically computed gradient. Compare the analytical gradient with a numerical approximation and report whether they match within a tolerance.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import numpy as np

def numerical_gradient(f, params: np.ndarray, h: float = 1e-7) -> np.ndarray:
    grad = np.zeros_like(params, dtype=float)
    for i in range(len(params)):
        params_plus = params.copy()
        params_minus = params.copy()
        params_plus[i] += h
        params_minus[i] -= h
        grad[i] = (f(params_plus) - f(params_minus)) / (2 * h)
    return grad

def gradient_check(f, analytical_grad_fn, params: np.ndarray,
                   h: float = 1e-7, tolerance: float = 1e-5) -> dict:
    analytical_grad = analytical_grad_fn(params)
    numerical_grad = numerical_gradient(f, params, h)

    diff = np.abs(analytical_grad - numerical_grad)
    relative_error = np.linalg.norm(diff) / (
        np.linalg.norm(analytical_grad) + np.linalg.norm(numerical_grad) + 1e-8
    )

    passed = relative_error < tolerance

    return {
        "analytical_gradient": analytical_grad,
        "numerical_gradient": numerical_grad,
        "absolute_difference": diff,
        "relative_error": float(relative_error),
        "passed": bool(passed),
    }

Explanation

  1. Numerical gradient uses central differences: (f(x + h) - f(x - h)) / (2h) for each parameter.
  2. Relative error normalizes the difference by the combined magnitudes of both gradients, avoiding issues when gradients are very small or very large.
  3. If relative error < tolerance (typically 1e-5 to 1e-7), the analytical gradient is likely correct.
  4. This is a crucial debugging tool for neural network implementations. It is too slow for production use but invaluable during development.

Complexity

  • Time: O(n) function evaluations where n is the number of parameters
  • Space: O(n) for gradient storage