#167 · Reinforcement Learning · Easy
⊣ Solve on deep-ml.comCalculate the Discounted Return for a given trajectory of (state, action, reward) tuples with a discount factor gamma. This is equivalent to computing the return starting from the first time step.
from typing import List, Tuple
def trajectory_return(trajectory: List[Tuple[any, any, float]],
gamma: float) -> float:
rewards = [r for (_, _, r) in trajectory]
G = 0.0
for t in reversed(range(len(rewards))):
G = rewards[t] + gamma * G
return GG_t = r_t + gamma * G_{t+1}.