Dynamic Programming

Bitmask DP / State Compression

Represent subsets of items as bitmask states. Exponential in the number of items but allows efficient DP over all subsets. Used for traveling salesman variants, assignment problems, and covering problems.

You have n workers and n jobs. Each worker can do any job, but the cost varies: worker i doing job j costs cost[i][j]. Assign each worker to exactly one job, each job to exactly one worker, minimizing total cost. This is the assignment problem, and it is the gateway to understanding bitmask DP.

A greedy approach fails because locally optimal assignments can be globally suboptimal. Standard DP also hits a wall: if you assign workers one by one, the state needs to capture which jobs have already been taken, not just how many. Knowing that "3 jobs are assigned" is useless if you cannot tell which 3. The set of taken jobs is the state, and bitmasks give you a compact way to represent it.

Bitmasks as set representations

An integer's binary representation can encode a subset. If you have n items, an n-bit integer can represent any subset: bit i is 1 if item i is in the set, 0 if it is not. For example, with 4 jobs, the mask 0b1010 (decimal 10) means jobs 1 and 3 are taken, while jobs 0 and 2 are free.

The operations you need map directly to bitwise operators:

Check if item i is in the set: mask & (1 << i)
Add item i to the set: mask | (1 << i)
Remove item i: mask & ~(1 << i)
Count items in the set: bin(mask).count('1') (popcount)
Check if the set is empty: mask == 0

This is the "state compression" in bitmask DP: a set that might require an array or hash set is compressed into a single integer, making it usable as a DP index.

The state space

With n items, there are $2^n$ possible subsets, so the DP table has $2^n$ entries. Each entry dp[mask] represents the optimal value achievable using the subset of items encoded by mask. For the assignment problem, dp[mask] is the minimum cost of assigning workers to exactly the jobs indicated by mask.

The key insight is that you fill positions in order. If mask has k bits set, that means k jobs have been assigned, which means workers 0 through k-1 have been assigned. The next worker to assign is worker k. This is where popcount comes in: it tells you which "position" you are filling next.

Walking through a small example

Suppose 3 workers and 3 jobs, with this cost matrix:

1

2

3

4

           Job 0  Job 1  Job 2
Worker 0:    9      2      7
Worker 1:    6      4      3
Worker 2:    5      8      1

The DP table has $2^3$ = 8 entries, from mask 000 to 111.

Start: dp[000] = 0. No jobs assigned, zero cost.

Assign worker 0 (popcount of mask is 0, so next worker is 0). Try each free job:

Job 0: dp[001] = dp[000] + cost[0][0] = 0 + 9 = 9
Job 1: dp[010] = dp[000] + cost[0][1] = 0 + 2 = 2
Job 2: dp[100] = dp[000] + cost[0][2] = 0 + 7 = 7

Assign worker 1 (popcount is 1). From each single-job mask, try adding a free job:

From dp[001] = 9: add job 1 gives dp[011] = 9 + 4 = 13, add job 2 gives dp[101] = 9 + 3 = 12
From dp[010] = 2: add job 0 gives dp[011] = min(13, 2 + 6) = 8, add job 2 gives dp[110] = 2 + 3 = 5
From dp[100] = 7: add job 0 gives dp[101] = min(12, 7 + 6) = 12, add job 1 gives dp[110] = min(5, 7 + 4) = 5

Assign worker 2 (popcount is 2). From each two-job mask:

From dp[011] = 8: add job 2 gives dp[111] = 8 + 1 = 9
From dp[101] = 12: add job 1 gives dp[111] = min(9, 12 + 8) = 9
From dp[110] = 5: add job 0 gives dp[111] = min(9, 5 + 5) = 9

The answer is dp[111] = 9, corresponding to worker 0 on job 1 (cost 2), worker 1 on job 0 (cost 6), worker 2 on job 2 (cost 1). A pure greedy, assigning each worker their cheapest available job, would have given worker 0 to job 1 (2), worker 1 to job 2 (3), then worker 2 stuck with job 0 (5), for a total of 10. The DP correctly trades a more expensive choice for worker 1 to free up the cheapest cell (worker 2 on job 2) at the end.

The code

def min_cost_assignment(cost):
    n = len(cost)
    dp = [float('inf')] * (1 << n)
    dp[0] = 0

    for mask in range(1 << n):
        if dp[mask] == float('inf'):
            continue
        j = bin(mask).count('1')  # which worker to assign next
        if j >= n:
            continue
        for i in range(n):
            if mask & (1 << i):
                continue  # job i already taken
            new_mask = mask | (1 << i)
            dp[new_mask] = min(dp[new_mask], dp[mask] + cost[j][i])

    return dp[(1 << n) - 1]

The outer loop iterates over all $2^n$ masks. For each mask, popcount determines which worker is being assigned. The inner loop tries assigning that worker to each free job. The transition sets the job's bit and adds the cost. The answer lives at dp[(1 << n) - 1], the mask where all jobs are taken.

How popcount determines the position

This is the subtlety that makes bitmask DP click. When iterating masks from 0 upward, masks with fewer bits are always processed before masks with more bits, because a smaller number of set bits means a smaller (or equal) integer. So when you reach a mask with k bits set, all masks with fewer than k bits have already been processed. This means you can safely use popcount to determine which "round" of assignment you are in.

For the assignment problem, popcount tells you: "k jobs are taken, so k workers have been assigned, so the next worker is worker k." You do not need to track which workers are assigned because they are assigned in order 0, 1, 2, ..., n-1. Only the jobs vary, and the bitmask tracks exactly that.

Travelling Salesman Problem

The TSP is the most famous bitmask DP problem. Given n cities and distances between them, find the shortest route visiting every city exactly once and returning to the start.

Here the state needs two dimensions: dp[mask][last] = minimum cost to visit exactly the cities in mask, ending at city last. The transition extends the tour by visiting one more unvisited city:

def tsp(dist, n):
    INF = float('inf')
    dp = [[INF] * n for _ in range(1 << n)]
    dp[1][0] = 0  # start at city 0

    for mask in range(1 << n):
        for last in range(n):
            if dp[mask][last] == INF:
                continue
            if not (mask & (1 << last)):
                continue
            for nxt in range(n):
                if mask & (1 << nxt):
                    continue  # already visited
                new_mask = mask | (1 << nxt)
                dp[new_mask][nxt] = min(dp[new_mask][nxt],
                                        dp[mask][last] + dist[last][nxt])

    full = (1 << n) - 1
    return min(dp[full][last] + dist[last][0] for last in range(n))

The extra dimension last is necessary because the cost of the next step depends on where you currently are, unlike the assignment problem, where worker order is fixed. This bumps the state space to $O(2^n \cdot n)$ and the time complexity to $O(2^n \cdot n^2)$ .

Partition to K Equal Sum Subsets

LeetCode 698. Partition to K Equal Sum Subsets asks: given an array of integers and an integer k, can you partition the array into k non-empty subsets whose sums are all equal?

First, check if the total sum is divisible by k. If not, return False immediately. Let target = total_sum / k. Then use bitmask DP where dp[mask] represents the running sum of elements assigned to the current bucket, after some number of complete buckets have been filled. When a bucket reaches target, it "completes" and resets to 0.

def canPartitionKSubsets(nums, k):
    total = sum(nums)
    if total % k != 0:
        return False
    target = total // k
    n = len(nums)
    dp = [-1] * (1 << n)
    dp[0] = 0  # no elements used, current bucket sum = 0

    for mask in range(1 << n):
        if dp[mask] == -1:
            continue
        for i in range(n):
            if mask & (1 << i):
                continue  # already used
            if dp[mask] + nums[i] > target:
                continue  # would overflow current bucket
            new_mask = mask | (1 << i)
            dp[new_mask] = (dp[mask] + nums[i]) % target

    return dp[(1 << n) - 1] == 0

The key insight is the % target: when the current bucket sum reaches exactly target, it resets to 0, meaning a new bucket starts. At the end, if all elements are used (mask = all 1s) and the bucket sum is 0, then every bucket was completed exactly. The constraint n <= 16 is the classic signal that bitmask DP is expected.

The critical constraint: n <= 20

Bitmask DP is only feasible when n is small. The state space is $2^n$ (or $2^n \cdot n$ for TSP), and each state requires iterating over up to n items. Here is the reality check:

n = 15: $2^{15}$ = 32,768 states. Fast.
n = 20: $2^{20}$ = 1,048,576 states. Feasible, roughly a million operations times n.
n = 23: $2^{23}$ = 8,388,608. Borderline, depends on the constant factor.
n = 25: $2^{25}$ = 33,554,432. Usually too slow, especially with an inner loop.

When you see a problem with n <= 20 (or sometimes n <= 15 or n <= 23) and it involves subsets, permutations, or assignments, bitmask DP is almost certainly the intended approach. The small constraint on n is the biggest signal.

Subset enumeration trick

Sometimes you need to iterate over all submasks of a given mask, that is, all subsets of a subset. The naive approach of checking every mask from 0 to mask is wasteful. Instead, you can enumerate submasks directly:

1

2

3

4

5

# Iterate over all non-empty submasks of mask
sub = mask
while sub > 0:
    # process sub
    sub = (sub - 1) & mask

This works because (sub - 1) & mask drops the lowest set bit of sub (relative to mask) and sets all lower bits that are in mask. It generates every submask in decreasing order and naturally terminates at 0.

The total work across all masks is $O(3^n)$ , not $O(4^n)$ as you might expect. Each element can be in the "outer mask only," "both outer and inner mask," or "neither," giving 3 choices per element. This is useful in problems like partitioning a set into groups with constraints, where dp[mask] represents the answer for the subset mask and transitions try splitting mask into two parts.

Complexity

For the basic assignment problem, time is $O(2^n \cdot n)$ : $2^n$ states, each doing $O(n)$ work. Space is $O(2^n)$ . For TSP with the extra last dimension, time is $O(2^n \cdot n^2)$ and space is $O(2^n \cdot n)$ . For problems using subset enumeration, time is $O(3^n)$ .

These are exponential algorithms, which is why the small-n constraint is non-negotiable. But within that constraint, bitmask DP is remarkably efficient. It is the difference between $n!$ brute force (which is astronomical even for $n = 15$ ) and $2^n \cdot n$ , which is perfectly tractable. For $n = 20$ , $2^{20} \cdot 20$ is about 20 million, while $20!$ is about $2.4 \times 10^{18}$ . Bitmask DP turns an intractable problem into a fast one, as long as n stays small.

Recognizing this pattern

You're probably looking at bitmask DP when:

Constraints look suspicious: n <= 15, n <= 20, occasionally n <= 23. The cap is the giveaway.
The natural state is a set of used/visited/assigned items, and you can't compress it further (order matters, or which items are taken matters).
The problem involves permutations, assignments, partitions into groups, or visiting all of a set in some order.
A factorial brute force (n!) would obviously time out, but $2^n \cdot \text{poly}(n)$ fits.

Common templates:

Assignment DP: dp[mask] is the best cost to assign the items in mask, transition by adding one item. Example: Minimum Cost to Connect Two Groups of Points.
TSP-style dp[mask][last]: track which nodes visited and where you currently are. Example: Find the Shortest Superstring.
Count subsets satisfying a property: sweep all $2^n$ masks, check predicate, accumulate. Example: Maximum Compatibility Score Sum.
Submask enumeration ( $O(3^n)$ ): for each mask, iterate its submasks to split into groups. Example: Partition to K Equal Sum Subsets.
Profile DP on narrow grids: column-by-column DP where the mask encodes the current frontier. Example: Number of Ways to Wear Different Hats to Each Other.