← back

Exponential Weighted Average of Rewards

#161 · Reinforcement Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement Exponential Weighted Average (Exponential Moving Average) for tracking rewards in a non-stationary bandit problem. Recent observations are weighted more heavily than older ones.

Solution

1
2
def exponential_weighted_average(old_average: float, new_value: float, alpha: float) -> float:
    return old_average + alpha * (new_value - old_average)

Explanation

  1. The update rule is: V_new = V_old + alpha * (new_value - V_old), equivalently V_new = (1 - alpha) * V_old + alpha * new_value.
  2. The parameter alpha (0 < alpha <= 1) controls how much weight is given to the new observation vs. the history.
  3. Larger alpha means more responsiveness to recent rewards (forgets faster).
  4. This is preferred over the sample average in non-stationary environments where the true reward distribution changes over time.

Complexity

  • Time: O(1)
  • Space: O(1)