← back

Dynamic Programming

Bitmask DP / State Compression

Represent subsets of items as bitmask states. Exponential in the number of items but allows efficient DP over all subsets. Used for traveling salesman variants, assignment problems, and covering problems.

You have n workers and n jobs. Each worker can do any job, but the cost varies: worker i doing job j costs cost[i][j]. Assign each worker to exactly one job, each job to exactly one worker, minimizing total cost. This is the assignment problem, and it is the gateway to understanding bitmask DP.

A greedy approach fails because locally optimal assignments can be globally suboptimal. Standard DP also hits a wall: if you assign workers one by one, the state needs to capture which jobs have already been taken, not just how many. Knowing that "3 jobs are assigned" is useless if you cannot tell which 3. The set of taken jobs is the state, and bitmasks give you a compact way to represent it.

Bitmasks as set representations

An integer's binary representation can encode a subset. If you have n items, an n-bit integer can represent any subset: bit i is 1 if item i is in the set, 0 if it is not. For example, with 4 jobs, the mask 0b1010 (decimal 10) means jobs 1 and 3 are taken, while jobs 0 and 2 are free.

The operations you need map directly to bitwise operators:

  • Check if item i is in the set: mask & (1 << i)
  • Add item i to the set: mask | (1 << i)
  • Remove item i: mask & ~(1 << i)
  • Count items in the set: bin(mask).count('1') (popcount)
  • Check if the set is empty: mask == 0

This is the "state compression" in bitmask DP: a set that might require an array or hash set is compressed into a single integer, making it usable as a DP index.

The state space

With n items, there are 2n2^n possible subsets, so the DP table has 2n2^n entries. Each entry dp[mask] represents the optimal value achievable using the subset of items encoded by mask. For the assignment problem, dp[mask] is the minimum cost of assigning workers to exactly the jobs indicated by mask.

The key insight is that you fill positions in order. If mask has k bits set, that means k jobs have been assigned, which means workers 0 through k-1 have been assigned. The next worker to assign is worker k. This is where popcount comes in: it tells you which "position" you are filling next.

Walking through a small example

Suppose 3 workers and 3 jobs, with this cost matrix:

1
2
3
4
           Job 0  Job 1  Job 2
Worker 0:    9      2      7
Worker 1:    6      4      3
Worker 2:    5      8      1

The DP table has 232^3 = 8 entries, from mask 000 to 111.

Start: dp[000] = 0. No jobs assigned, zero cost.

Assign worker 0 (popcount of mask is 0, so next worker is 0). Try each free job:

  • Job 0: dp[001] = dp[000] + cost[0][0] = 0 + 9 = 9
  • Job 1: dp[010] = dp[000] + cost[0][1] = 0 + 2 = 2
  • Job 2: dp[100] = dp[000] + cost[0][2] = 0 + 7 = 7

Assign worker 1 (popcount is 1). From each single-job mask, try adding a free job:

  • From dp[001] = 9: add job 1 gives dp[011] = 9 + 4 = 13, add job 2 gives dp[101] = 9 + 3 = 12
  • From dp[010] = 2: add job 0 gives dp[011] = min(13, 2 + 6) = 8, add job 2 gives dp[110] = 2 + 3 = 5
  • From dp[100] = 7: add job 0 gives dp[101] = min(12, 7 + 6) = 12, add job 1 gives dp[110] = min(5, 7 + 4) = 5

Assign worker 2 (popcount is 2). From each two-job mask:

  • From dp[011] = 8: add job 2 gives dp[111] = 8 + 1 = 9
  • From dp[101] = 12: add job 1 gives dp[111] = min(9, 12 + 8) = 9
  • From dp[110] = 5: add job 0 gives dp[111] = min(9, 5 + 5) = 9

The answer is dp[111] = 9, corresponding to worker 0 on job 1 (cost 2), worker 1 on job 0 (cost 6), worker 2 on job 2 (cost 1). A pure greedy, assigning each worker their cheapest available job, would have given worker 0 to job 1 (2), worker 1 to job 2 (3), then worker 2 stuck with job 0 (5), for a total of 10. The DP correctly trades a more expensive choice for worker 1 to free up the cheapest cell (worker 2 on job 2) at the end.

The code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def min_cost_assignment(cost):
    n = len(cost)
    dp = [float('inf')] * (1 << n)
    dp[0] = 0

    for mask in range(1 << n):
        if dp[mask] == float('inf'):
            continue
        j = bin(mask).count('1')  # which worker to assign next
        if j >= n:
            continue
        for i in range(n):
            if mask & (1 << i):
                continue  # job i already taken
            new_mask = mask | (1 << i)
            dp[new_mask] = min(dp[new_mask], dp[mask] + cost[j][i])

    return dp[(1 << n) - 1]

The outer loop iterates over all 2n2^n masks. For each mask, popcount determines which worker is being assigned. The inner loop tries assigning that worker to each free job. The transition sets the job's bit and adds the cost. The answer lives at dp[(1 << n) - 1], the mask where all jobs are taken.

How popcount determines the position

This is the subtlety that makes bitmask DP click. When iterating masks from 0 upward, masks with fewer bits are always processed before masks with more bits, because a smaller number of set bits means a smaller (or equal) integer. So when you reach a mask with k bits set, all masks with fewer than k bits have already been processed. This means you can safely use popcount to determine which "round" of assignment you are in.

For the assignment problem, popcount tells you: "k jobs are taken, so k workers have been assigned, so the next worker is worker k." You do not need to track which workers are assigned because they are assigned in order 0, 1, 2, ..., n-1. Only the jobs vary, and the bitmask tracks exactly that.

Travelling Salesman Problem

The TSP is the most famous bitmask DP problem. Given n cities and distances between them, find the shortest route visiting every city exactly once and returning to the start.

Here the state needs two dimensions: dp[mask][last] = minimum cost to visit exactly the cities in mask, ending at city last. The transition extends the tour by visiting one more unvisited city:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def tsp(dist, n):
    INF = float('inf')
    dp = [[INF] * n for _ in range(1 << n)]
    dp[1][0] = 0  # start at city 0

    for mask in range(1 << n):
        for last in range(n):
            if dp[mask][last] == INF:
                continue
            if not (mask & (1 << last)):
                continue
            for nxt in range(n):
                if mask & (1 << nxt):
                    continue  # already visited
                new_mask = mask | (1 << nxt)
                dp[new_mask][nxt] = min(dp[new_mask][nxt],
                                        dp[mask][last] + dist[last][nxt])

    full = (1 << n) - 1
    return min(dp[full][last] + dist[last][0] for last in range(n))

The extra dimension last is necessary because the cost of the next step depends on where you currently are, unlike the assignment problem, where worker order is fixed. This bumps the state space to O(2nn)O(2^n \cdot n) and the time complexity to O(2nn2)O(2^n \cdot n^2).

Partition to K Equal Sum Subsets

LeetCode 698. Partition to K Equal Sum Subsets asks: given an array of integers and an integer k, can you partition the array into k non-empty subsets whose sums are all equal?

First, check if the total sum is divisible by k. If not, return False immediately. Let target = total_sum / k. Then use bitmask DP where dp[mask] represents the running sum of elements assigned to the current bucket, after some number of complete buckets have been filled. When a bucket reaches target, it "completes" and resets to 0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def canPartitionKSubsets(nums, k):
    total = sum(nums)
    if total % k != 0:
        return False
    target = total // k
    n = len(nums)
    dp = [-1] * (1 << n)
    dp[0] = 0  # no elements used, current bucket sum = 0

    for mask in range(1 << n):
        if dp[mask] == -1:
            continue
        for i in range(n):
            if mask & (1 << i):
                continue  # already used
            if dp[mask] + nums[i] > target:
                continue  # would overflow current bucket
            new_mask = mask | (1 << i)
            dp[new_mask] = (dp[mask] + nums[i]) % target

    return dp[(1 << n) - 1] == 0

The key insight is the % target: when the current bucket sum reaches exactly target, it resets to 0, meaning a new bucket starts. At the end, if all elements are used (mask = all 1s) and the bucket sum is 0, then every bucket was completed exactly. The constraint n <= 16 is the classic signal that bitmask DP is expected.

The critical constraint: n <= 20

Bitmask DP is only feasible when n is small. The state space is 2n2^n (or 2nn2^n \cdot n for TSP), and each state requires iterating over up to n items. Here is the reality check:

  • n = 15: 2152^{15} = 32,768 states. Fast.
  • n = 20: 2202^{20} = 1,048,576 states. Feasible, roughly a million operations times n.
  • n = 23: 2232^{23} = 8,388,608. Borderline, depends on the constant factor.
  • n = 25: 2252^{25} = 33,554,432. Usually too slow, especially with an inner loop.

When you see a problem with n <= 20 (or sometimes n <= 15 or n <= 23) and it involves subsets, permutations, or assignments, bitmask DP is almost certainly the intended approach. The small constraint on n is the biggest signal.

Subset enumeration trick

Sometimes you need to iterate over all submasks of a given mask, that is, all subsets of a subset. The naive approach of checking every mask from 0 to mask is wasteful. Instead, you can enumerate submasks directly:

1
2
3
4
5
# Iterate over all non-empty submasks of mask
sub = mask
while sub > 0:
    # process sub
    sub = (sub - 1) & mask

This works because (sub - 1) & mask drops the lowest set bit of sub (relative to mask) and sets all lower bits that are in mask. It generates every submask in decreasing order and naturally terminates at 0.

The total work across all masks is O(3n)O(3^n), not O(4n)O(4^n) as you might expect. Each element can be in the "outer mask only," "both outer and inner mask," or "neither," giving 3 choices per element. This is useful in problems like partitioning a set into groups with constraints, where dp[mask] represents the answer for the subset mask and transitions try splitting mask into two parts.

Complexity

For the basic assignment problem, time is O(2nn)O(2^n \cdot n): 2n2^n states, each doing O(n)O(n) work. Space is O(2n)O(2^n). For TSP with the extra last dimension, time is O(2nn2)O(2^n \cdot n^2) and space is O(2nn)O(2^n \cdot n). For problems using subset enumeration, time is O(3n)O(3^n).

These are exponential algorithms, which is why the small-n constraint is non-negotiable. But within that constraint, bitmask DP is remarkably efficient. It is the difference between n!n! brute force (which is astronomical even for n=15n = 15) and 2nn2^n \cdot n, which is perfectly tractable. For n=20n = 20, 220202^{20} \cdot 20 is about 20 million, while 20!20! is about 2.4×10182.4 \times 10^{18}. Bitmask DP turns an intractable problem into a fast one, as long as n stays small.

Recognizing this pattern

You're probably looking at bitmask DP when:

  • Constraints look suspicious: n <= 15, n <= 20, occasionally n <= 23. The cap is the giveaway.
  • The natural state is a set of used/visited/assigned items, and you can't compress it further (order matters, or which items are taken matters).
  • The problem involves permutations, assignments, partitions into groups, or visiting all of a set in some order.
  • A factorial brute force (n!) would obviously time out, but 2npoly(n)2^n \cdot \text{poly}(n) fits.

Common templates:

Practice Problems (60)

464 Can I Win
473 Matchsticks to Square
526 Beautiful Arrangement
691 Stickers to Spell Word
698 Partition to K Equal Sum Subsets
847 Shortest Path Visiting All Nodes
943 Find the Shortest Superstring
996 Number of Squareful Arrays
1125 Smallest Sufficient Team
1178 Number of Valid Words for Each Puzzle
1255 Maximum Score Words Formed by Letters
1284 Minimum Number of Flips to Convert Binary Matrix to Zero Matrix
1320 Minimum Distance to Type a Word Using Two Fingers
1349 Maximum Students Taking Exam
1434 Number of Ways to Wear Different Hats to Each Other
1494 Parallel Courses II
1542 Find Longest Awesome Substring
1595 Minimum Cost to Connect Two Groups of Points
1601 Maximum Number of Achievable Transfer Requests
1617 Count Subtrees With Max Distance Between Cities
1655 Distribute Repeating Integers
1659 Maximize Grid Happiness
1681 Minimum Incompatibility
1723 Find Minimum Time to Finish All Jobs
1728 Cat and Mouse II
1787 Make the XOR of All Segments Equal to Zero
1799 Maximize Score After N Operations
1815 Maximum Number of Groups Getting Fresh Donuts
1879 Minimum XOR Sum of Two Arrays
1900 The Earliest and Latest Rounds Where Players Compete
1931 Painting a Grid With Three Different Colors
1947 Maximum Compatibility Score Sum
1986 Minimum Number of Work Sessions to Finish the Tasks
1994 The Number of Good Subsets
2002 Maximum Product of the Length of Two Palindromic Subsequences
2151 Maximum Good People Based on Statements
2172 Maximum AND Sum of Array
2212 Maximum Points in an Archery Competition
2305 Fair Distribution of Cookies
2572 Count the Number of Square-Free Subsets
2741 Special Permutations
2850 Minimum Moves to Spread Stones Over Grid
3003 Maximize the Number of Partitions After Operations
3117 Minimum Sum of Values by Dividing Array
3149 Find the Minimum Cost Array Permutation
3276 Select Cells in Grid With Maximum Score
3283 Maximum Number of Moves to Kill All Pawns
3287 Find the Maximum Sequence Value of Array
3320 Count The Number of Winning Sequences
3376 Minimum Time to Break Locks I
3444 Minimum Increments for Target Multiples in an Array
3530 Maximum Profit from Valid Topological Order in DAG
3533 Concatenated Divisibility
3539 Find Sum of Array Product of Magical Sequences
3566 Partition Array into Two Equal Product Subsets
3568 Minimum Moves to Clean the Classroom
3575 Maximum Good Subtree Score
3594 Minimum Time to Transport All Individuals
3615 Longest Palindromic Path in Graph
3670 Maximum Product of Two Integers With No Common Bits