Exponential Weighted Average of Rewards

#161 · Reinforcement Learning · Easy

Problem

Implement Exponential Weighted Average (Exponential Moving Average) for tracking rewards in a non-stationary bandit problem. Recent observations are weighted more heavily than older ones.

Solution

1

2

def exponential_weighted_average(old_average: float, new_value: float, alpha: float) -> float:
    return old_average + alpha * (new_value - old_average)

Explanation

The update rule is: V_new = V_old + alpha * (new_value - V_old), equivalently V_new = (1 - alpha) * V_old + alpha * new_value.
The parameter alpha (0 < alpha <= 1) controls how much weight is given to the new observation vs. the history.
Larger alpha means more responsiveness to recent rewards (forgets faster).
This is preferred over the sample average in non-stationary environments where the true reward distribution changes over time.

Complexity

Time: O(1)
Space: O(1)

← #160 #162 →