Dynamic Programming

Stock Trading / State Machine DP

Model distinct states (holding, sold, cooling down) with transitions between them. Handles unlimited transactions, at-most-k transactions, cooldown periods, and transaction fees.

The stock trading problems on LeetCode form one of the most satisfying progressions in all of dynamic programming. They start simple, with a single transaction where you just track the minimum price, and layer on constraints until you need a full state machine to keep track of what is going on. The beautiful thing is that every variant reduces to the same framework: define your states, write the transitions, and iterate through the prices.

Best time to buy and sell stock I: one transaction

LeetCode 121 gives you an array of prices where prices[i] is the price on day i, and asks you to find the maximum profit from buying on one day and selling on a later day. If no profit is possible, return 0.

The approach is to scan left to right, tracking the minimum price seen so far. At each day, the best you could do by selling today is price - min_price. Update the global best and the running minimum as you go:

1

2

3

4

5

6

7

def maxProfit(prices):
    min_price = prices[0]
    max_profit = 0
    for price in prices[1:]:
        max_profit = max(max_profit, price - min_price)
        min_price = min(min_price, price)
    return max_profit

This is actually Kadane's algorithm in disguise. If you compute the array of daily price changes, the maximum profit from a single transaction is exactly the maximum subarray sum of that difference array. A contiguous run of net-positive changes corresponds to buying at the start and selling at the end.

Best time to buy and sell stock II: unlimited transactions

For LeetCode 122, you can buy and sell as many times as you want (but must sell before buying again). Greedy works perfectly here: take every positive price difference. If tomorrow's price is higher than today's, buy today and sell tomorrow. The sum of all positive consecutive differences gives the maximum profit:

1

2

def maxProfit(prices):
    return sum(max(0, prices[i+1] - prices[i]) for i in range(len(prices) - 1))

This works because with unlimited transactions and no constraints, there is no reason not to capture every upswing. You can equivalently think of it as buying and selling on every consecutive pair where the price goes up. No DP needed: pure greedy.

When greedy breaks: the need for state machines

The greedy approach for unlimited transactions works precisely because there are no constraints between actions. But real interview problems add friction: a cooldown period after selling, a transaction fee, or a cap on the total number of transactions. Once these constraints appear, greedy fails because the optimal action today depends on your state: whether you currently hold a stock, whether you just sold one, and how many transactions you have left.

This is where the state machine DP framework takes over. You define a small set of states that capture everything relevant about your trading history, write transitions between them, and iterate through the price array updating all states simultaneously.

The cooldown variant: three states

LeetCode 309 adds a constraint: after you sell a stock, you must wait one day before buying again. You cannot just greedily take every upswing because selling locks you out of buying the next day.

Model this with three states:

hold: you own a stock (either you bought today or you are carrying one from before).
cash: you do not own a stock and you are free to buy today.
cool: you just sold a stock yesterday, so you must wait before buying.

The transitions on each day are:

hold = max(prev_hold, cash - price): either keep holding, or buy from the cash state.
cash = max(prev_cash, cool): either stay in cash, or transition out of cooldown.
cool = prev_hold + price: sell the stock you are holding.

Notice the direction of the arrows. From cash, you can buy (go to hold) or wait (stay in cash). From hold, you can sell (go to cool) or wait (stay in hold). From cool, you can only wait (go to cash). You can never buy from cool. That is the cooldown constraint.

Walking through an example with cooldown

Consider prices = [1, 2, 3, 0, 2].

Initialize: hold = -inf, cash = 0, cool = 0. We start with no stock and no money spent.

Day 0 (price = 1): Buy from cash. hold = max(-inf, 0 - 1) = -1. cash = max(0, 0) = 0. cool = -inf + 1 = -inf (cannot sell what we do not have).

Day 1 (price = 2): Hold or buy. hold = max(-1, 0 - 2) = -1 (keep holding). cash = max(0, -inf) = 0. cool = -1 + 2 = 1 (sell for profit of 1).

Day 2 (price = 3): hold = max(-1, 0 - 3) = -1. cash = max(0, 1) = 1 (cooldown ends, profit of 1 is now available). cool = -1 + 3 = 2 (sell for profit of 2).

Day 3 (price = 0): hold = max(-1, 1 - 0) = 1 (buy at price 0, using the 1 profit from cash). cash = max(1, 2) = 2 (cooldown from selling at price 3 ends). cool = -1 + 0 = -1.

Day 4 (price = 2): hold = max(1, 2 - 2) = 1. cash = max(2, -1) = 2. cool = 1 + 2 = 3 (sell for profit of 3).

Final answer: max(cash, cool) = max(2, 3) = 3. The optimal strategy is buy at 1, sell at 3 (profit 2), cooldown day 3, buy at 0, sell at 2 (profit 2). Wait, but our answer is 3. Let us trace it: buy at 1, sell at 2 (profit 1), cooldown, buy at 0, sell at 2 (profit 2), total 3. Or: buy at 0 on day 3 using accumulated cash of 1, sell at 2 on day 4 for total of 3. The state machine correctly finds the optimum without us having to reason about which specific trades to make.

The code for cooldown

def maxProfit(prices):
    hold, cash, cool = float('-inf'), 0, 0
    for price in prices:
        prev_hold = hold
        hold = max(hold, cash - price)
        cash = max(cash, cool)
        cool = prev_hold + price
    return max(cash, cool)

Notice prev_hold. This is critical. When computing cool, we need the value of hold from the previous iteration, not the one we just updated on this iteration. If we used the updated hold, we would be buying and selling on the same day. Saving prev_hold before overwriting hold prevents this.

The order of the three assignments also matters. We update hold first (using old cash), then cash (using old cool), then cool (using prev_hold). Each update reads values that have not yet been overwritten this iteration, except for hold which we saved explicitly. Getting the order wrong, or forgetting to save the previous value, is the most common bug in this family of problems.

The fee variant: two states

LeetCode 714 introduces a transaction fee: you pay a fee every time you sell (or buy, it does not matter which side, just be consistent). There is no cooldown, so you only need two states: hold and cash.

1

2

3

4

5

6

7

def maxProfit(prices, fee):
    hold, cash = float('-inf'), 0
    for price in prices:
        prev_hold = hold
        hold = max(hold, cash - price)
        cash = max(cash, prev_hold + price - fee)
    return cash

The only difference from the cooldown version is the - fee on the sell transition and the removal of the cool state. The fee discourages rapid trading: you will not sell and rebuy for a tiny gain if the fee exceeds it. The DP handles this automatically. No greedy reasoning required.

At most k transactions

The hardest variant is LeetCode 188, which caps the total number of complete buy-sell transactions at k. Now you need an extra dimension: how many transactions have been completed.

Define dp[t][0] as the maximum profit with at most t transactions and currently not holding a stock, and dp[t][1] as the same but currently holding a stock. For each price, update every transaction level:

def maxProfit(k, prices):
    if not prices:
        return 0

    # Optimization: if k >= len(prices) // 2, unlimited transactions
    if k >= len(prices) // 2:
        return sum(max(0, prices[i+1] - prices[i]) for i in range(len(prices) - 1))

    # dp[t][0] = cash after at most t transactions
    # dp[t][1] = hold after at most t transactions
    dp = [[0, float('-inf')] for _ in range(k + 1)]

    for price in prices:
        for t in range(k, 0, -1):
            dp[t][0] = max(dp[t][0], dp[t][1] + price)       # sell
            dp[t][1] = max(dp[t][1], dp[t-1][0] - price)     # buy (uses t-1)
        # dp[0] stays: dp[0][0] = 0, dp[0][1] = -inf (can't buy with 0 transactions left)

    return dp[k][0]

The key insight is that buying uses a transaction. When you buy, you move from transaction level t-1 to t. When you sell, you stay at level t. So buying reads from dp[t-1][0] and selling reads from dp[t][1]. The inner loop runs from k down to 1 to avoid using values we just updated (similar to the 0/1 knapsack reverse-order trick).

The $k \geq n/2$ optimization

If k is at least len(prices) // 2, you can never run out of transactions, since the most transactions you could possibly make is one buy-sell pair for every two days. In this case the problem degenerates to the unlimited variant, and you can use the simple greedy. This optimization is essential because without it, k can be up to $10^9$ and the DP array would be too large to allocate.

Why simultaneous updates matter

The most subtle bug in this entire family is reading a value that has already been updated in the current iteration. Consider the cooldown code without saving prev_hold:

1

2

3

4

# WRONG
hold = max(hold, cash - price)
cash = max(cash, cool)
cool = hold + price   # uses UPDATED hold, not previous hold

The problem: cool = hold + price now includes the possibility that we bought on this very day (the new hold might equal cash - price). That means we could be buying and selling on the same day, which violates the rules. Saving prev_hold fixes this:

1

2

3

4

5

# CORRECT
prev_hold = hold
hold = max(hold, cash - price)
cash = max(cash, cool)
cool = prev_hold + price

An alternative is to compute all new values into temporaries and then assign them all at once. Some people prefer this because it eliminates any dependency on update order:

1

2

3

4

5

hold, cash, cool = (
    max(hold, cash - price),
    max(cash, cool),
    hold + price
)

Python's tuple packing evaluates all the right-hand expressions before any assignment, so this is automatically correct.

Complexity

For the cooldown and fee variants, time is $O(n)$ and space is $O(1)$ : one pass through the prices with a constant number of state variables.

For the k-transaction variant, time is $O(n \cdot k)$ and space is $O(k)$ : for each of the n prices, you update k transaction levels, and the DP table has 2k entries.

The $k \geq n/2$ optimization ensures that even when k is astronomically large, you fall back to the $O(n)$ greedy, so the overall complexity is $O(n \cdot \min(k, n))$ .

The progression as a study guide

These problems are best studied in order. Start with the single-transaction problem to build intuition about tracking running minimums. Move to unlimited transactions to see why greedy works when there are no constraints. Then add cooldown or a fee to see how constraints force you from greedy into DP. Finally, tackle k transactions to see how an extra dimension models a resource constraint. Each problem builds on the last, and by the end, the state machine framework feels natural. When you encounter a new trading variant you have never seen before, you will instinctively ask: what are the states, what are the transitions, and what value am I optimizing? That question is always enough to solve it.

Recognizing this pattern

You're probably looking at the stock-trading state-machine DP when:

The problem hands you an array of prices indexed by day and asks for the maximum profit from a sequence of buys and sells.
There's a constraint on the shape of the trading: at most one share at a time, a cooldown day after selling, a transaction fee, or a cap of $k$ transactions.
A single greedy "sum all positive deltas" pass almost works but is broken by one of the above frictions.
The natural way to describe a position is a small set of mutually exclusive labels, such as "holding", "not holding", and "cooling down", with deterministic transitions between them.
The answer is a number, not a list of trades, and you only need to know whether you ended the day with cash in hand.

Common templates:

Single transaction (one buy, one sell): track minimum price seen so far and best profit. Example: Best Time to Buy and Sell Stock.
Unlimited transactions: pure greedy on positive deltas, equivalent to two-state DP that collapses. Example: Best Time to Buy and Sell Stock II.
Cooldown / fee variant: three or two rolling variables with a forced wait or per-sale penalty. Example: Best Time to Buy and Sell Stock with Cooldown.
At most k transactions: 2D DP indexed by transaction count and holding state, with the $k \geq n/2$ greedy fallback. Example: Best Time to Buy and Sell Stock IV.
At most 2 transactions: degenerate case of the $k$ -transaction DP with four rolling variables. Example: Best Time to Buy and Sell Stock III.