Implement He Weight Initialization

#290 · Deep Learning · Easy

Problem

Implement He weight initialization for neural networks. This initialization is designed for layers using ReLU activations and accounts for the fact that ReLU zeros out half of the neurons.

Solution

import numpy as np

def he_normal(fan_in: int, fan_out: int) -> np.ndarray:
    std = np.sqrt(2.0 / fan_in)
    return np.random.normal(0, std, size=(fan_in, fan_out))

def he_uniform(fan_in: int, fan_out: int) -> np.ndarray:
    limit = np.sqrt(6.0 / fan_in)
    return np.random.uniform(-limit, limit, size=(fan_in, fan_out))

Explanation

He normal samples from N(0, std^2) where std = sqrt(2 / fan_in). The factor of 2 compensates for the fact that ReLU sets approximately half the activations to zero.
He uniform samples from U[-limit, limit] where limit = sqrt(6 / fan_in).
Unlike Xavier initialization which uses both fan_in and fan_out, He initialization only depends on fan_in because the derivation focuses on maintaining forward-pass variance through ReLU layers.

Complexity

Time: O(fan_in * fan_out)
Space: O(fan_in * fan_out)

← #289 #291 →