Implement Adam Optimization Algorithm

#49 · Deep Learning · Medium

Problem

Implement the Adam (Adaptive Moment Estimation) optimization algorithm. Adam maintains per-parameter running averages of both the first moment (mean) and second moment (uncentered variance) of the gradients, with bias correction.

Solution

import numpy as np

class Adam:
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.m = None
        self.v = None
        self.t = 0

    def update(self, weights, gradients):
        weights = np.array(weights, dtype=np.float64)
        gradients = np.array(gradients, dtype=np.float64)

        if self.m is None:
            self.m = np.zeros_like(weights)
            self.v = np.zeros_like(weights)

        self.t += 1

        self.m = self.beta1 * self.m + (1 - self.beta1) * gradients
        self.v = self.beta2 * self.v + (1 - self.beta2) * (gradients ** 2)

        m_hat = self.m / (1 - self.beta1 ** self.t)
        v_hat = self.v / (1 - self.beta2 ** self.t)

        weights -= self.lr * m_hat / (np.sqrt(v_hat) + self.epsilon)

        return weights.tolist()

Explanation

Initialize first moment m and second moment v to zero vectors on the first call.
Update the biased first moment estimate: m = beta1 * m + (1 - beta1) * grad.
Update the biased second moment estimate: v = beta2 * v + (1 - beta2) * grad^2.
Compute bias-corrected estimates by dividing by (1 - beta^t).
Update weights: w -= lr * m_hat / (sqrt(v_hat) + epsilon).

Complexity

Time: O(d) per update step, where d is the number of parameters
Space: O(d) for the moment vectors

← #48 #50 →