← back

Compute Discounted Return

#165 · Reinforcement Learning · Easy

⊣ Solve on deep-ml.com

Problem

Compute the Discounted Return (also called discounted cumulative reward) for a sequence of rewards. Given a list of rewards and a discount factor gamma, compute G = r_0 + gamma * r_1 + gamma^2 * r_2 + ....

Solution

1
2
3
4
5
6
7
from typing import List

def discounted_return(rewards: List[float], gamma: float) -> float:
    G = 0.0
    for t in reversed(range(len(rewards))):
        G = rewards[t] + gamma * G
    return G

Explanation

  1. Process rewards from the last time step to the first (backward).
  2. At each step: G = r_t + gamma * G, where G accumulates the future discounted return.
  3. This backward computation is efficient and avoids computing powers of gamma explicitly.
  4. The discount factor gamma (0 <= gamma <= 1) controls how much future rewards are valued relative to immediate ones.

Complexity

  • Time: O(n) where n is the number of rewards
  • Space: O(1)