← back

Momentum Optimizer

#146 · Deep Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement the Momentum optimizer. Momentum accelerates SGD by accumulating an exponentially decaying moving average of past gradients, helping to navigate ravines and reduce oscillation.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np

class MomentumOptimizer:
    def __init__(self, learning_rate: float = 0.01, momentum: float = 0.9):
        self.lr = learning_rate
        self.momentum = momentum
        self.velocity = None

    def update(self, params: np.ndarray, grads: np.ndarray) -> np.ndarray:
        if self.velocity is None:
            self.velocity = np.zeros_like(params)

        self.velocity = self.momentum * self.velocity - self.lr * grads
        params = params + self.velocity
        return params

def momentum_update(params: np.ndarray, grads: np.ndarray, velocity: np.ndarray,
                    lr: float = 0.01, momentum: float = 0.9) -> tuple[np.ndarray, np.ndarray]:
    velocity = momentum * velocity - lr * grads
    params = params + velocity
    return params, velocity

Explanation

  1. Maintain a velocity vector initialized to zero.
  2. Each step, update velocity: v = momentum * v - lr * gradient.
  3. Update parameters: params = params + v.
  4. The momentum term (typically 0.9) causes the optimizer to keep moving in directions where gradients consistently point, while dampening oscillations.
  5. This is equivalent to a ball rolling downhill with friction -- it builds up speed in consistent directions.

Complexity

  • Time: O(P) per update where P = number of parameters
  • Space: O(P) for the velocity vector