Dynamic Programming
DP where states store probabilities instead of counts or values. Transitions multiply by transition probabilities and sum over predecessors. Used for random walks, dice games, and card drawing problems.
A knight is placed on a chessboard and makes random moves. Each move is chosen uniformly at random from the 8 standard knight moves. What is the probability that the knight remains on the board after all moves? This is LeetCode 688. Knight Probability in Chessboard, and it is the cleanest introduction to probability DP: a family of problems where DP states store probabilities instead of counts or values, and transitions multiply by transition probabilities.
In a standard counting DP, you might define dp[step][row][col] as the number of ways to reach cell (row, col) after step moves. Probability DP is almost identical, except dp[step][row][col] represents the probability of being at cell (row, col) after step moves, given that the knight started at a specific cell and is still on the board.
The transition is natural. At each step, the knight at any cell moves to one of 8 neighbors with probability each. So the probability of being at cell (r, c) after step s is the sum of probabilities from all 8 predecessor cells at step s-1, each divided by 8:
dp[s][r][c] = sum(dp[s-1][pr][pc] / 8) for all valid predecessors (pr, pc)
If a predecessor cell is off the board, it contributes 0, since the knight already left the board from that path. The answer is the sum of dp[k][r][c] over all cells (r, c) on the board.
Consider a board (), the knight starts at (0, 0), and .
Step 0 (base case): The knight is at (0, 0) with probability 1.
dp[0]:
1.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000Step 1: From (0, 0), the 8 possible knight moves lead to cells: (1, 2), (2, 1), and six positions that are off the board. Each destination gets probability . The six off-board moves represent the knight leaving the board.
dp[1]:
0.000 0.000 0.000
0.000 0.000 0.125
0.125 0.000 0.000The total probability of still being on the board after 1 move is . Only 2 out of 8 possible moves kept the knight on a board.
For LeetCode 688, the probability DP translates directly into code:
def knightProbability(n: int, k: int, row: int, column: int) -> float:
moves = [(-2,-1),(-2,1),(-1,-2),(-1,2),
(1,-2),(1,2),(2,-1),(2,1)]
# dp[r][c] = probability of being at (r, c) after current step
dp = [[0.0] * n for _ in range(n)]
dp[row][column] = 1.0
for _ in range(k):
new_dp = [[0.0] * n for _ in range(n)]
for r in range(n):
for c in range(n):
if dp[r][c] > 0:
for dr, dc in moves:
nr, nc = r + dr, c + dc
if 0 <= nr < n and 0 <= nc < n:
new_dp[nr][nc] += dp[r][c] / 8.0
dp = new_dp
return sum(dp[r][c] for r in range(n) for c in range(n))Notice the space optimization: we only keep two layers (dp and new_dp) rather than the full 3D table, since each step depends only on the previous step.
LeetCode 808. Soup Servings presents a different flavor of probability DP. There are two types of soup, A and B, each starting with ml. Each turn, one of four operations is chosen with equal probability (), consuming different amounts of A and B. Find the probability that soup A empties first, plus half the probability that both empty at the same time.
The state is (a, b), the remaining amounts of soup A and B. The four operations reduce (a, b) by (100, 0), (75, 25), (50, 50), or (25, 75). Since all amounts are multiples of 25, divide everything by 25 to shrink the state space. Define dp(a, b) as the desired probability starting from state (a, b):
from functools import lru_cache
def soupServings(n: int) -> float:
# For large n, the answer converges to 1.0
if n >= 4800:
return 1.0
m = (n + 24) // 25 # convert to units of 25
@lru_cache(maxsize=None)
def dp(a: int, b: int) -> float:
if a <= 0 and b <= 0:
return 0.5 # both empty simultaneously
if a <= 0:
return 1.0 # A empties first
if b <= 0:
return 0.0 # B empties first
return 0.25 * (dp(a-4, b) + dp(a-3, b-1) +
dp(a-2, b-2) + dp(a-1, b-3))
return dp(m, m)The critical optimization is the early return for large . Because operation 1 consumes 4 units of A and 0 of B while the most B-heavy option only consumes 3 of B, soup A is statistically favored to empty first. For , the probability is so close to 1.0 that the answer rounds correctly. Without this check, the state space would be too large.
LeetCode 837. New 21 Game asks: you start with 0 points. Each draw adds a random number in uniformly. You stop drawing when you have points. What is the probability that your total is ?
Define dp[x] as the probability of reaching exactly x points at some point during the game. The base case is dp[0] = 1. For x in , the value dp[x] is the sum of dp[j] / maxPts for all j in that satisfy (since you stop drawing at or above). This is because you could have been at any such j and drawn the value x - j.
The naive approach would compute this sum in per state. A sliding window makes it . Maintain a running sum windowSum of all dp[j] values where is in the active window:
def new21Game(n: int, k: int, maxPts: int) -> float:
if k == 0 or n >= k + maxPts - 1:
return 1.0
dp = [0.0] * (k + maxPts)
dp[0] = 1.0
window_sum = 1.0 # sum of dp[j] for j in the active window
for x in range(1, k + maxPts):
dp[x] = window_sum / maxPts
# Update sliding window
if x < k:
window_sum += dp[x] # x enters the window
if x >= maxPts:
window_sum -= dp[x - maxPts] # x - maxPts exits the window
return sum(dp[k:n+1])Walked example, , , :
You draw once (since , you stop after reaching 1 or more). The draw is uniform in . You need the result to be . The probability is .
With , , : the sliding window tracks which previous states can transition to the current state, avoiding redundant summation. Each state is computed in constant time by adding the newest contributor and removing the one that just fell out of range.
Probability DP is structurally identical to counting DP. The difference is purely semantic:
dp[state] = number of ways to reach this state. Transitions add contributions from predecessor states.dp[state] = probability of reaching this state. Transitions add contributions from predecessors, each multiplied by the transition probability.If all transitions from a state are equally likely among options, you simply divide by . If transitions have different weights, you multiply by the appropriate probability for each edge. The DP recurrence structure (which states feed into which) is exactly the same as it would be for a counting problem. Only the arithmetic changes.
This also extends to expected value. If dp[state] represents the expected value of some quantity (expected cost, expected number of steps), the transition becomes:
E[state] = sum over successors s of: P(transition to s) * (cost of transition + E[s])
This is just the linearity of expectation applied to the DP structure.
Knight Probability (688): Time : for each of steps, process all cells. Space with the two-layer optimization.
Soup Servings (808): Time where , capped at around due to the early return for large . Space for the memoization table.
New 21 Game (837): Time thanks to the sliding window: each state is computed in . Space for the DP array.
The common thread: probability DP has the same time and space complexity as the equivalent counting DP. The transition probabilities are just constant-factor multiplications that do not change the asymptotic cost.
You're probably looking at probability / expected-value DP when:
Common templates:
Practice Problems (9)