Implement Variational Autoencoder (VAE) Loss (ELBO)

#393 · Deep Learning · Medium

Problem

Implement the Variational Autoencoder (VAE) loss, which is the negative Evidence Lower Bound (ELBO). It consists of a reconstruction loss (how well the decoder recreates the input) and a KL divergence term (how close the latent distribution is to a standard normal prior).

Solution

import numpy as np

def vae_loss(x: np.ndarray, x_reconstructed: np.ndarray, mu: np.ndarray, log_var: np.ndarray) -> tuple[float, float, float]:
    # x, x_reconstructed: (batch_size, input_dim)
    # mu, log_var: (batch_size, latent_dim)
    batch_size = x.shape[0]

    # Reconstruction loss (MSE)
    recon_loss = np.mean(np.sum((x - x_reconstructed) ** 2, axis=1))

    # KL divergence: -0.5 * sum(1 + log_var - mu^2 - exp(log_var))
    kl_loss = -0.5 * np.mean(np.sum(1 + log_var - mu ** 2 - np.exp(log_var), axis=1))

    total_loss = recon_loss + kl_loss
    return float(total_loss), float(recon_loss), float(kl_loss)


def reparameterize(mu: np.ndarray, log_var: np.ndarray) -> np.ndarray:
    std = np.exp(0.5 * log_var)
    eps = np.random.randn(*mu.shape)
    return mu + std * eps

Explanation

Reconstruction loss: measures how well the decoder output matches the input (MSE averaged over the batch).
KL divergence: measures the divergence between the encoder's learned distribution N(mu, sigma^2) and the standard normal N(0, 1). The closed-form KL for Gaussians is -0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2).
Reparameterization trick: sample z = mu + sigma * epsilon (epsilon ~ N(0,1)) to allow gradients to flow through the sampling operation.
Total loss = reconstruction + KL, minimizing the negative ELBO.

Complexity

Time: O(B * d) where B is batch size and d is the input/latent dimension
Space: O(B * d) for intermediate computations

← #392 #394 →