Dynamic Programming
Represent subsets of items as bitmask states. Exponential in the number of items but allows efficient DP over all subsets. Used for traveling salesman variants, assignment problems, and covering problems.
You have n workers and n jobs. Each worker can do any job, but the cost varies: worker i doing job j costs cost[i][j]. Assign each worker to exactly one job, each job to exactly one worker, minimizing total cost. This is the assignment problem, and it is the gateway to understanding bitmask DP.
A greedy approach fails because locally optimal assignments can be globally suboptimal. Standard DP also hits a wall: if you assign workers one by one, the state needs to capture which jobs have already been taken, not just how many. Knowing that "3 jobs are assigned" is useless if you cannot tell which 3. The set of taken jobs is the state, and bitmasks give you a compact way to represent it.
An integer's binary representation can encode a subset. If you have n items, an n-bit integer can represent any subset: bit i is 1 if item i is in the set, 0 if it is not. For example, with 4 jobs, the mask 0b1010 (decimal 10) means jobs 1 and 3 are taken, while jobs 0 and 2 are free.
The operations you need map directly to bitwise operators:
mask & (1 << i)mask | (1 << i)mask & ~(1 << i)bin(mask).count('1') (popcount)mask == 0This is the "state compression" in bitmask DP: a set that might require an array or hash set is compressed into a single integer, making it usable as a DP index.
With n items, there are possible subsets, so the DP table has entries. Each entry dp[mask] represents the optimal value achievable using the subset of items encoded by mask. For the assignment problem, dp[mask] is the minimum cost of assigning workers to exactly the jobs indicated by mask.
The key insight is that you fill positions in order. If mask has k bits set, that means k jobs have been assigned, which means workers 0 through k-1 have been assigned. The next worker to assign is worker k. This is where popcount comes in: it tells you which "position" you are filling next.
Suppose 3 workers and 3 jobs, with this cost matrix:
Job 0 Job 1 Job 2
Worker 0: 9 2 7
Worker 1: 6 4 3
Worker 2: 5 8 1The DP table has = 8 entries, from mask 000 to 111.
Start: dp[000] = 0. No jobs assigned, zero cost.
Assign worker 0 (popcount of mask is 0, so next worker is 0). Try each free job:
dp[001] = dp[000] + cost[0][0] = 0 + 9 = 9dp[010] = dp[000] + cost[0][1] = 0 + 2 = 2dp[100] = dp[000] + cost[0][2] = 0 + 7 = 7Assign worker 1 (popcount is 1). From each single-job mask, try adding a free job:
dp[001] = 9: add job 1 gives dp[011] = 9 + 4 = 13, add job 2 gives dp[101] = 9 + 3 = 12dp[010] = 2: add job 0 gives dp[011] = min(13, 2 + 6) = 8, add job 2 gives dp[110] = 2 + 3 = 5dp[100] = 7: add job 0 gives dp[101] = min(12, 7 + 6) = 12, add job 1 gives dp[110] = min(5, 7 + 4) = 5Assign worker 2 (popcount is 2). From each two-job mask:
dp[011] = 8: add job 2 gives dp[111] = 8 + 1 = 9dp[101] = 12: add job 1 gives dp[111] = min(9, 12 + 8) = 9dp[110] = 5: add job 0 gives dp[111] = min(9, 5 + 5) = 9The answer is dp[111] = 9, corresponding to worker 0 on job 1 (cost 2), worker 1 on job 0 (cost 6), worker 2 on job 2 (cost 1). A pure greedy, assigning each worker their cheapest available job, would have given worker 0 to job 1 (2), worker 1 to job 2 (3), then worker 2 stuck with job 0 (5), for a total of 10. The DP correctly trades a more expensive choice for worker 1 to free up the cheapest cell (worker 2 on job 2) at the end.
def min_cost_assignment(cost):
n = len(cost)
dp = [float('inf')] * (1 << n)
dp[0] = 0
for mask in range(1 << n):
if dp[mask] == float('inf'):
continue
j = bin(mask).count('1') # which worker to assign next
if j >= n:
continue
for i in range(n):
if mask & (1 << i):
continue # job i already taken
new_mask = mask | (1 << i)
dp[new_mask] = min(dp[new_mask], dp[mask] + cost[j][i])
return dp[(1 << n) - 1]The outer loop iterates over all masks. For each mask, popcount determines which worker is being assigned. The inner loop tries assigning that worker to each free job. The transition sets the job's bit and adds the cost. The answer lives at dp[(1 << n) - 1], the mask where all jobs are taken.
This is the subtlety that makes bitmask DP click. When iterating masks from 0 upward, masks with fewer bits are always processed before masks with more bits, because a smaller number of set bits means a smaller (or equal) integer. So when you reach a mask with k bits set, all masks with fewer than k bits have already been processed. This means you can safely use popcount to determine which "round" of assignment you are in.
For the assignment problem, popcount tells you: "k jobs are taken, so k workers have been assigned, so the next worker is worker k." You do not need to track which workers are assigned because they are assigned in order 0, 1, 2, ..., n-1. Only the jobs vary, and the bitmask tracks exactly that.
The TSP is the most famous bitmask DP problem. Given n cities and distances between them, find the shortest route visiting every city exactly once and returning to the start.
Here the state needs two dimensions: dp[mask][last] = minimum cost to visit exactly the cities in mask, ending at city last. The transition extends the tour by visiting one more unvisited city:
def tsp(dist, n):
INF = float('inf')
dp = [[INF] * n for _ in range(1 << n)]
dp[1][0] = 0 # start at city 0
for mask in range(1 << n):
for last in range(n):
if dp[mask][last] == INF:
continue
if not (mask & (1 << last)):
continue
for nxt in range(n):
if mask & (1 << nxt):
continue # already visited
new_mask = mask | (1 << nxt)
dp[new_mask][nxt] = min(dp[new_mask][nxt],
dp[mask][last] + dist[last][nxt])
full = (1 << n) - 1
return min(dp[full][last] + dist[last][0] for last in range(n))The extra dimension last is necessary because the cost of the next step depends on where you currently are, unlike the assignment problem, where worker order is fixed. This bumps the state space to and the time complexity to .
LeetCode 698. Partition to K Equal Sum Subsets asks: given an array of integers and an integer k, can you partition the array into k non-empty subsets whose sums are all equal?
First, check if the total sum is divisible by k. If not, return False immediately. Let target = total_sum / k. Then use bitmask DP where dp[mask] represents the running sum of elements assigned to the current bucket, after some number of complete buckets have been filled. When a bucket reaches target, it "completes" and resets to 0.
def canPartitionKSubsets(nums, k):
total = sum(nums)
if total % k != 0:
return False
target = total // k
n = len(nums)
dp = [-1] * (1 << n)
dp[0] = 0 # no elements used, current bucket sum = 0
for mask in range(1 << n):
if dp[mask] == -1:
continue
for i in range(n):
if mask & (1 << i):
continue # already used
if dp[mask] + nums[i] > target:
continue # would overflow current bucket
new_mask = mask | (1 << i)
dp[new_mask] = (dp[mask] + nums[i]) % target
return dp[(1 << n) - 1] == 0The key insight is the % target: when the current bucket sum reaches exactly target, it resets to 0, meaning a new bucket starts. At the end, if all elements are used (mask = all 1s) and the bucket sum is 0, then every bucket was completed exactly. The constraint n <= 16 is the classic signal that bitmask DP is expected.
Bitmask DP is only feasible when n is small. The state space is (or for TSP), and each state requires iterating over up to n items. Here is the reality check:
When you see a problem with n <= 20 (or sometimes n <= 15 or n <= 23) and it involves subsets, permutations, or assignments, bitmask DP is almost certainly the intended approach. The small constraint on n is the biggest signal.
Sometimes you need to iterate over all submasks of a given mask, that is, all subsets of a subset. The naive approach of checking every mask from 0 to mask is wasteful. Instead, you can enumerate submasks directly:
# Iterate over all non-empty submasks of mask
sub = mask
while sub > 0:
# process sub
sub = (sub - 1) & maskThis works because (sub - 1) & mask drops the lowest set bit of sub (relative to mask) and sets all lower bits that are in mask. It generates every submask in decreasing order and naturally terminates at 0.
The total work across all masks is , not as you might expect. Each element can be in the "outer mask only," "both outer and inner mask," or "neither," giving 3 choices per element. This is useful in problems like partitioning a set into groups with constraints, where dp[mask] represents the answer for the subset mask and transitions try splitting mask into two parts.
For the basic assignment problem, time is : states, each doing work. Space is . For TSP with the extra last dimension, time is and space is . For problems using subset enumeration, time is .
These are exponential algorithms, which is why the small-n constraint is non-negotiable. But within that constraint, bitmask DP is remarkably efficient. It is the difference between brute force (which is astronomical even for ) and , which is perfectly tractable. For , is about 20 million, while is about . Bitmask DP turns an intractable problem into a fast one, as long as n stays small.
You're probably looking at bitmask DP when:
n <= 15, n <= 20, occasionally n <= 23. The cap is the giveaway.n!) would obviously time out, but fits.Common templates:
dp[mask] is the best cost to assign the items in mask, transition by adding one item. Example: Minimum Cost to Connect Two Groups of Points.dp[mask][last]: track which nodes visited and where you currently are. Example: Find the Shortest Superstring.Practice Problems (60)