Compute the Discounted Return (also called discounted cumulative reward) for a sequence of rewards. Given a list of rewards and a discount factor gamma, compute G = r_0 + gamma * r_1 + gamma^2 * r_2 + ....
from typing import List
def discounted_return(rewards: List[float], gamma: float) -> float:
G = 0.0
for t in reversed(range(len(rewards))):
G = rewards[t] + gamma * G
return GG = r_t + gamma * G, where G accumulates the future discounted return.