← back

Compute Temporal Difference Error

#257 · Reinforcement Learning · Easy

⊣ Solve on deep-ml.com

Problem

Compute the Temporal Difference (TD) error for a single step in reinforcement learning. Given the current state value, reward, next state value, discount factor, and (optionally) whether the episode terminated, compute the TD error.

Solution

TD error = reward + gamma * V(next_state) * (1 - done) - V(current_state).

1
2
3
4
5
6
7
8
9
10
def td_error(
    v_current: float,
    reward: float,
    v_next: float,
    gamma: float = 0.99,
    done: bool = False,
) -> float:
    target = reward + gamma * v_next * (1.0 - float(done))
    error = target - v_current
    return round(error, 6)

Explanation

  1. The TD target is r + gamma * V(s') if the episode continues, or just r if the episode has ended.
  2. The TD error (delta) is the difference between the target and the current estimate V(s).
  3. A positive TD error means the outcome was better than expected; negative means worse.
  4. This error is used to update value estimates: V(s) <- V(s) + alpha * delta.

Complexity

  • Time: O(1)
  • Space: O(1)