Implement Xavier/Glorot Weight Initialization

#369 · Deep Learning · Easy

Problem

Implement Xavier/Glorot weight initialization for neural network layers. This method initializes weights from a distribution scaled by the number of input and output neurons, keeping variance stable across layers.

Solution

import numpy as np

def xavier_init(fan_in: int, fan_out: int, mode: str = "uniform") -> np.ndarray:
    if mode == "uniform":
        limit = np.sqrt(6.0 / (fan_in + fan_out))
        return np.random.uniform(-limit, limit, (fan_in, fan_out))
    elif mode == "normal":
        std = np.sqrt(2.0 / (fan_in + fan_out))
        return np.random.normal(0, std, (fan_in, fan_out))
    else:
        raise ValueError(f"Unknown mode: {mode}")

Explanation

Xavier uniform draws weights from U[-limit, limit] where limit = sqrt(6 / (fan_in + fan_out)).
Xavier normal draws weights from N(0, std) where std = sqrt(2 / (fan_in + fan_out)).
The scaling ensures the variance of activations and gradients remains approximately constant across layers, preventing vanishing or exploding values.
Works best with symmetric activations like tanh and sigmoid.

Complexity

Time: O(fan_in * fan_out) to generate the weight matrix
Space: O(fan_in * fan_out) for the weight matrix

← #368 #370 →