← back

Calculate the Discounted Return for a Given Trajectory

#167 · Reinforcement Learning · Easy

⊣ Solve on deep-ml.com

Problem

Calculate the Discounted Return for a given trajectory of (state, action, reward) tuples with a discount factor gamma. This is equivalent to computing the return starting from the first time step.

Solution

1
2
3
4
5
6
7
8
9
from typing import List, Tuple

def trajectory_return(trajectory: List[Tuple[any, any, float]],
                      gamma: float) -> float:
    rewards = [r for (_, _, r) in trajectory]
    G = 0.0
    for t in reversed(range(len(rewards))):
        G = rewards[t] + gamma * G
    return G

Explanation

  1. Extract the reward at each time step from the trajectory tuples.
  2. Compute the return using backward accumulation: starting from the last reward and working backward.
  3. At each step: G_t = r_t + gamma * G_{t+1}.
  4. The final result is G_0, the discounted return from the start of the trajectory.

Complexity

  • Time: O(n) where n is the length of the trajectory
  • Space: O(n) for the extracted rewards list (can be O(1) by indexing directly)