Compute Temporal Difference Error

#257 · Reinforcement Learning · Easy

Problem

Compute the Temporal Difference (TD) error for a single step in reinforcement learning. Given the current state value, reward, next state value, discount factor, and (optionally) whether the episode terminated, compute the TD error.

Solution

TD error = reward + gamma * V(next_state) * (1 - done) - V(current_state).

def td_error(
    v_current: float,
    reward: float,
    v_next: float,
    gamma: float = 0.99,
    done: bool = False,
) -> float:
    target = reward + gamma * v_next * (1.0 - float(done))
    error = target - v_current
    return round(error, 6)

Explanation

The TD target is r + gamma * V(s') if the episode continues, or just r if the episode has ended.
The TD error (delta) is the difference between the target and the current estimate V(s).
A positive TD error means the outcome was better than expected; negative means worse.
This error is used to update value estimates: V(s) <- V(s) + alpha * delta.

Complexity

Time: O(1)
Space: O(1)

← #256 #258 →