Incremental Mean for Online Reward Estimation

#159 · Reinforcement Learning · Easy

Problem

Implement Incremental Mean computation for online reward estimation in a bandit problem. Instead of storing all rewards and recomputing the mean, update the mean incrementally after each new observation.

Solution

1

2

def incremental_mean(old_mean: float, new_value: float, n: int) -> float:
    return old_mean + (new_value - old_mean) / n

Explanation

The formula is: new_mean = old_mean + (new_value - old_mean) / n, where n is the count including the new observation.
This is algebraically equivalent to computing the full mean from scratch, but only requires O(1) time and space per update.
The term (new_value - old_mean) / n is the correction: if the new value is above the old mean, the mean increases; if below, it decreases.
This is the foundation for many online learning algorithms.

Complexity

Time: O(1)
Space: O(1)

← #158 #160 →