#161 · Reinforcement Learning · Easy
⊣ Solve on deep-ml.comImplement Exponential Weighted Average (Exponential Moving Average) for tracking rewards in a non-stationary bandit problem. Recent observations are weighted more heavily than older ones.
def exponential_weighted_average(old_average: float, new_value: float, alpha: float) -> float:
return old_average + alpha * (new_value - old_average)V_new = V_old + alpha * (new_value - V_old), equivalently V_new = (1 - alpha) * V_old + alpha * new_value.alpha (0 < alpha <= 1) controls how much weight is given to the new observation vs. the history.