#159 · Reinforcement Learning · Easy
⊣ Solve on deep-ml.comImplement Incremental Mean computation for online reward estimation in a bandit problem. Instead of storing all rewards and recomputing the mean, update the mean incrementally after each new observation.
def incremental_mean(old_mean: float, new_value: float, n: int) -> float:
return old_mean + (new_value - old_mean) / nnew_mean = old_mean + (new_value - old_mean) / n, where n is the count including the new observation.(new_value - old_mean) / n is the correction: if the new value is above the old mean, the mean increases; if below, it decreases.