Implement Label Smoothing for Multi-Class Cross-Entropy

#194 · Machine Learning · Medium

Problem

Implement Label Smoothing for multi-class cross-entropy loss. Instead of using hard one-hot targets, blend the true label with a uniform distribution over all classes using a smoothing parameter epsilon.

Solution

import numpy as np

def label_smoothing_cross_entropy(y_true: np.ndarray, y_pred: np.ndarray,
                                   epsilon: float = 0.1) -> float:
    y_true = np.array(y_true)
    y_pred = np.array(y_pred, dtype=float)

    if y_true.ndim == 1:
        # Convert class indices to one-hot
        n_classes = y_pred.shape[-1]
        n_samples = len(y_true)
        one_hot = np.zeros((n_samples, n_classes))
        one_hot[np.arange(n_samples), y_true.astype(int)] = 1.0
        y_true = one_hot

    n_classes = y_true.shape[-1]
    smoothed = y_true * (1.0 - epsilon) + epsilon / n_classes

    # Clip predictions to avoid log(0)
    y_pred = np.clip(y_pred, 1e-12, 1.0)

    # Cross-entropy with smoothed labels
    loss = -np.sum(smoothed * np.log(y_pred), axis=-1)
    return float(np.mean(loss))

Explanation

If y_true contains class indices, convert to one-hot encoding.
Apply label smoothing: smoothed = (1 - epsilon) * one_hot + epsilon / K where K is the number of classes.
This redistributes a fraction epsilon of the probability mass uniformly across all classes.
Compute the cross-entropy loss using the smoothed targets: -sum(smoothed * log(y_pred)).
Label smoothing acts as a regularizer and prevents the model from becoming overconfident.

Complexity

Time: O(n * K) where n is the number of samples and K is the number of classes
Space: O(n * K) for the smoothed label matrix

← #193 #195 →