Implement Xavier/Glorot Weight Initialization

#289 · Deep Learning · Medium

Problem

Implement Xavier (Glorot) weight initialization for neural networks. Given the number of input units (fan_in) and output units (fan_out), generate a weight matrix sampled from the appropriate distribution to maintain variance across layers.

Solution

import numpy as np

def xavier_uniform(fan_in: int, fan_out: int) -> np.ndarray:
    limit = np.sqrt(6.0 / (fan_in + fan_out))
    return np.random.uniform(-limit, limit, size=(fan_in, fan_out))

def xavier_normal(fan_in: int, fan_out: int) -> np.ndarray:
    std = np.sqrt(2.0 / (fan_in + fan_out))
    return np.random.normal(0, std, size=(fan_in, fan_out))

Explanation

Xavier uniform samples weights from U[-limit, limit] where limit = sqrt(6 / (fan_in + fan_out)). This ensures the variance of the outputs is approximately equal to the variance of the inputs.
Xavier normal samples from N(0, std^2) where std = sqrt(2 / (fan_in + fan_out)).
Both approaches keep the signal from exploding or vanishing through layers, making training more stable. Best suited for layers with tanh or sigmoid activations.

Complexity

Time: O(fan_in * fan_out)
Space: O(fan_in * fan_out)

← #288 #290 →