Compute Discounted Return

#165 · Reinforcement Learning · Easy

Problem

Compute the Discounted Return (also called discounted cumulative reward) for a sequence of rewards. Given a list of rewards and a discount factor gamma, compute G = r_0 + gamma * r_1 + gamma^2 * r_2 + ....

Solution

1

2

3

4

5

6

7

from typing import List

def discounted_return(rewards: List[float], gamma: float) -> float:
    G = 0.0
    for t in reversed(range(len(rewards))):
        G = rewards[t] + gamma * G
    return G

Explanation

Process rewards from the last time step to the first (backward).
At each step: G = r_t + gamma * G, where G accumulates the future discounted return.
This backward computation is efficient and avoids computing powers of gamma explicitly.
The discount factor gamma (0 <= gamma <= 1) controls how much future rewards are valued relative to immediate ones.

Complexity

Time: O(n) where n is the number of rewards
Space: O(1)

← #164 #166 →