Calculate the Discounted Return for a Given Trajectory

#167 · Reinforcement Learning · Easy

Problem

Calculate the Discounted Return for a given trajectory of (state, action, reward) tuples with a discount factor gamma. This is equivalent to computing the return starting from the first time step.

Solution

from typing import List, Tuple

def trajectory_return(trajectory: List[Tuple[any, any, float]],
                      gamma: float) -> float:
    rewards = [r for (_, _, r) in trajectory]
    G = 0.0
    for t in reversed(range(len(rewards))):
        G = rewards[t] + gamma * G
    return G

Explanation

Extract the reward at each time step from the trajectory tuples.
Compute the return using backward accumulation: starting from the last reward and working backward.
At each step: G_t = r_t + gamma * G_{t+1}.
The final result is G_0, the discounted return from the start of the trajectory.

Complexity

Time: O(n) where n is the length of the trajectory
Space: O(n) for the extracted rewards list (can be O(1) by indexing directly)

← #166 #168 →