← back

Implement He Weight Initialization for Neural Networks

#370 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement He (Kaiming) weight initialization for neural networks with ReLU activations. This method accounts for the fact that ReLU zeros out half the values, requiring a different scaling than Xavier initialization.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np

def he_init(fan_in: int, fan_out: int, mode: str = "fan_in") -> np.ndarray:
    if mode == "fan_in":
        std = np.sqrt(2.0 / fan_in)
    elif mode == "fan_out":
        std = np.sqrt(2.0 / fan_out)
    else:
        raise ValueError(f"Unknown mode: {mode}")
    return np.random.normal(0, std, (fan_in, fan_out))

def he_init_uniform(fan_in: int, fan_out: int, mode: str = "fan_in") -> np.ndarray:
    if mode == "fan_in":
        limit = np.sqrt(6.0 / fan_in)
    elif mode == "fan_out":
        limit = np.sqrt(6.0 / fan_out)
    else:
        raise ValueError(f"Unknown mode: {mode}")
    return np.random.uniform(-limit, limit, (fan_in, fan_out))

Explanation

  1. He normal initializes weights from N(0, sqrt(2/fan)) where fan is either fan_in (forward pass) or fan_out (backward pass).
  2. He uniform draws from U[-limit, limit] where limit = sqrt(6/fan).
  3. The factor of 2 (instead of 1 in Xavier) compensates for ReLU zeroing out negative activations, which halves the variance.
  4. fan_in mode preserves forward-pass variance; fan_out mode preserves backward-pass gradient variance.

Complexity

  • Time: O(fan_in * fan_out) to generate the weight matrix
  • Space: O(fan_in * fan_out) for the weight matrix