← back

Incremental Mean for Online Reward Estimation

#159 · Reinforcement Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement Incremental Mean computation for online reward estimation in a bandit problem. Instead of storing all rewards and recomputing the mean, update the mean incrementally after each new observation.

Solution

1
2
def incremental_mean(old_mean: float, new_value: float, n: int) -> float:
    return old_mean + (new_value - old_mean) / n

Explanation

  1. The formula is: new_mean = old_mean + (new_value - old_mean) / n, where n is the count including the new observation.
  2. This is algebraically equivalent to computing the full mean from scratch, but only requires O(1) time and space per update.
  3. The term (new_value - old_mean) / n is the correction: if the new value is above the old mean, the mean increases; if below, it decreases.
  4. This is the foundation for many online learning algorithms.

Complexity

  • Time: O(1)
  • Space: O(1)