← back

Dynamic Programming

Probability / Expected Value DP

DP where states store probabilities instead of counts or values. Transitions multiply by transition probabilities and sum over predecessors. Used for random walks, dice games, and card drawing problems.

Probability and Expected Value DP

A knight is placed on a chessboard and makes kk random moves. Each move is chosen uniformly at random from the 8 standard knight moves. What is the probability that the knight remains on the board after all kk moves? This is LeetCode 688. Knight Probability in Chessboard, and it is the cleanest introduction to probability DP: a family of problems where DP states store probabilities instead of counts or values, and transitions multiply by transition probabilities.

The Core Idea

In a standard counting DP, you might define dp[step][row][col] as the number of ways to reach cell (row, col) after step moves. Probability DP is almost identical, except dp[step][row][col] represents the probability of being at cell (row, col) after step moves, given that the knight started at a specific cell and is still on the board.

The transition is natural. At each step, the knight at any cell moves to one of 8 neighbors with probability 18\frac{1}{8} each. So the probability of being at cell (r, c) after step s is the sum of probabilities from all 8 predecessor cells at step s-1, each divided by 8:

dp[s][r][c] = sum(dp[s-1][pr][pc] / 8) for all valid predecessors (pr, pc)

If a predecessor cell is off the board, it contributes 0, since the knight already left the board from that path. The answer is the sum of dp[k][r][c] over all cells (r, c) on the board.

Walked Example: Knight Probability

Consider a 3×33 \times 3 board (n=3n = 3), the knight starts at (0, 0), and k=1k = 1.

Step 0 (base case): The knight is at (0, 0) with probability 1.

1
2
3
4
dp[0]:
1.000  0.000  0.000
0.000  0.000  0.000
0.000  0.000  0.000

Step 1: From (0, 0), the 8 possible knight moves lead to cells: (1, 2), (2, 1), and six positions that are off the board. Each destination gets probability 18\frac{1}{8}. The six off-board moves represent the knight leaving the board.

1
2
3
4
dp[1]:
0.000  0.000  0.000
0.000  0.000  0.125
0.125  0.000  0.000

The total probability of still being on the board after 1 move is 0.125+0.125=0.250.125 + 0.125 = 0.25. Only 2 out of 8 possible moves kept the knight on a 3×33 \times 3 board.

Code for Knight Probability

For LeetCode 688, the probability DP translates directly into code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def knightProbability(n: int, k: int, row: int, column: int) -> float:
    moves = [(-2,-1),(-2,1),(-1,-2),(-1,2),
             (1,-2),(1,2),(2,-1),(2,1)]

    # dp[r][c] = probability of being at (r, c) after current step
    dp = [[0.0] * n for _ in range(n)]
    dp[row][column] = 1.0

    for _ in range(k):
        new_dp = [[0.0] * n for _ in range(n)]
        for r in range(n):
            for c in range(n):
                if dp[r][c] > 0:
                    for dr, dc in moves:
                        nr, nc = r + dr, c + dc
                        if 0 <= nr < n and 0 <= nc < n:
                            new_dp[nr][nc] += dp[r][c] / 8.0
        dp = new_dp

    return sum(dp[r][c] for r in range(n) for c in range(n))

Notice the space optimization: we only keep two layers (dp and new_dp) rather than the full 3D table, since each step depends only on the previous step.

Soup Servings: Probability with Memoization

LeetCode 808. Soup Servings presents a different flavor of probability DP. There are two types of soup, A and B, each starting with nn ml. Each turn, one of four operations is chosen with equal probability (14\frac{1}{4}), consuming different amounts of A and B. Find the probability that soup A empties first, plus half the probability that both empty at the same time.

The state is (a, b), the remaining amounts of soup A and B. The four operations reduce (a, b) by (100, 0), (75, 25), (50, 50), or (25, 75). Since all amounts are multiples of 25, divide everything by 25 to shrink the state space. Define dp(a, b) as the desired probability starting from state (a, b):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from functools import lru_cache

def soupServings(n: int) -> float:
    # For large n, the answer converges to 1.0
    if n >= 4800:
        return 1.0

    m = (n + 24) // 25  # convert to units of 25

    @lru_cache(maxsize=None)
    def dp(a: int, b: int) -> float:
        if a <= 0 and b <= 0:
            return 0.5  # both empty simultaneously
        if a <= 0:
            return 1.0  # A empties first
        if b <= 0:
            return 0.0  # B empties first
        return 0.25 * (dp(a-4, b) + dp(a-3, b-1) +
                       dp(a-2, b-2) + dp(a-1, b-3))

    return dp(m, m)

The critical optimization is the early return for large nn. Because operation 1 consumes 4 units of A and 0 of B while the most B-heavy option only consumes 3 of B, soup A is statistically favored to empty first. For n4800n \geq 4800, the probability is so close to 1.0 that the answer rounds correctly. Without this check, the state space would be too large.

New 21 Game: Probability with Sliding Window

LeetCode 837. New 21 Game asks: you start with 0 points. Each draw adds a random number in [1,maxPts][1, maxPts] uniformly. You stop drawing when you have k\geq k points. What is the probability that your total is n\leq n?

Define dp[x] as the probability of reaching exactly x points at some point during the game. The base case is dp[0] = 1. For x in [1,k+maxPts1][1, k + maxPts - 1], the value dp[x] is the sum of dp[j] / maxPts for all j in [xmaxPts,x1][x - maxPts, x - 1] that satisfy j<kj < k (since you stop drawing at kk or above). This is because you could have been at any such j and drawn the value x - j.

The naive approach would compute this sum in O(maxPts)O(maxPts) per state. A sliding window makes it O(1)O(1). Maintain a running sum windowSum of all dp[j] values where jj is in the active window:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def new21Game(n: int, k: int, maxPts: int) -> float:
    if k == 0 or n >= k + maxPts - 1:
        return 1.0

    dp = [0.0] * (k + maxPts)
    dp[0] = 1.0
    window_sum = 1.0  # sum of dp[j] for j in the active window

    for x in range(1, k + maxPts):
        dp[x] = window_sum / maxPts

        # Update sliding window
        if x < k:
            window_sum += dp[x]   # x enters the window
        if x >= maxPts:
            window_sum -= dp[x - maxPts]  # x - maxPts exits the window

    return sum(dp[k:n+1])

Walked example, n=6n = 6, k=1k = 1, maxPts=10maxPts = 10:

You draw once (since k=1k = 1, you stop after reaching 1 or more). The draw is uniform in [1,10][1, 10]. You need the result to be 6\leq 6. The probability is 610=0.6\frac{6}{10} = 0.6.

With n=21n = 21, k=17k = 17, maxPts=10maxPts = 10: the sliding window tracks which previous states can transition to the current state, avoiding redundant summation. Each state is computed in constant time by adding the newest contributor and removing the one that just fell out of range.

The Key Insight

Probability DP is structurally identical to counting DP. The difference is purely semantic:

  • In counting DP, dp[state] = number of ways to reach this state. Transitions add contributions from predecessor states.
  • In probability DP, dp[state] = probability of reaching this state. Transitions add contributions from predecessors, each multiplied by the transition probability.

If all transitions from a state are equally likely among mm options, you simply divide by mm. If transitions have different weights, you multiply by the appropriate probability for each edge. The DP recurrence structure (which states feed into which) is exactly the same as it would be for a counting problem. Only the arithmetic changes.

This also extends to expected value. If dp[state] represents the expected value of some quantity (expected cost, expected number of steps), the transition becomes:

E[state] = sum over successors s of: P(transition to s) * (cost of transition + E[s])

This is just the linearity of expectation applied to the DP structure.

Complexity

Knight Probability (688): Time O(kn2)O(k \cdot n^2): for each of kk steps, process all n2n^2 cells. Space O(n2)O(n^2) with the two-layer optimization.

Soup Servings (808): Time O(m2)O(m^2) where m=n/25m = n / 25, capped at around 200200 due to the early return for large nn. Space O(m2)O(m^2) for the memoization table.

New 21 Game (837): Time O(k+maxPts)O(k + maxPts) thanks to the sliding window: each state is computed in O(1)O(1). Space O(k+maxPts)O(k + maxPts) for the DP array.

The common thread: probability DP has the same time and space complexity as the equivalent counting DP. The transition probabilities are just constant-factor multiplications that do not change the asymptotic cost.

Recognizing this pattern

You're probably looking at probability / expected-value DP when:

  • The problem asks for a probability, an expected value, or an expected count, not a yes/no, not a maximum, not a path.
  • A random process drives state transitions: dice, coin flips, uniform draws, random moves, shuffled decks.
  • States transition with equal weights 1/m1/m (often) or known weights summing to 1, and the recurrence is "sum over next states of (prob \cdot child value)."
  • The number of distinct states is small enough to memoize even though the underlying sample space is exponential.
  • The problem accepts an answer within an absolute tolerance like 10510^{-5}, which is the giveaway that floating-point arithmetic is intended.

Common templates:

  • Random walk / probability of position after k steps: DP over (steps remaining, position). Example: Knight Probability in Chessboard.
  • Stopping process with absorbing state: recurse on state until terminal condition, weighting by transition probability. Example: Soup Servings.
  • Probability over a running sum with bounded draws: sliding-window DP over (current score). Example: New 21 Game.
  • Expected value (linearity of expectation): recursion of the form E[s]=P(ss)(cost+E[s])E[s] = \sum P(s \to s') (cost + E[s']). Example: Dice Roll Simulation.
  • Probability after merging / partitioning: count outcomes both numerator and denominator with DP. Example: Domino and Tromino Tiling.