Heap / Priority Queue
Maintain a min-heap of size k. For each new element, push it and pop the smallest if size exceeds k. The heap contains the k largest elements at the end. For k-th smallest, use a max-heap.
The "top-K" family of problems asks you to efficiently extract the K largest values, the K most frequent elements, or the K closest points from a larger collection. Sorting works, but it costs regardless of K. When K is much smaller than n, a heap-based approach cuts that to , a meaningful win on large inputs.
The canonical version, LeetCode 215, asks for the Kth largest element of an array. The natural instinct is to use a max-heap to track the largest elements: keep pulling off the top and you get the biggest values. But that approach requires storing all elements and then extracting K of them. The smarter move is a size-K min-heap.
Here is the insight: if you maintain a min-heap of exactly K elements, the root is always the smallest of the K largest seen so far, the threshold that a new element must beat to deserve a spot. For every new element you see, compare it against the root. If the new element is larger, evict the root and push the new element in. At the end, the heap contains exactly the K largest elements, and you never needed more than K slots.
import heapq
def kth_largest(nums: list[int], k: int) -> int:
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap) # evict the smallest-of-K
return heap[0] # root is the Kth largestWalk through nums = [3, 1, 5, 12, 2, 11] with k = 3:
[3][1, 3][1, 3, 5][3, 5, 12][3, 5, 12] (2 was immediately evicted)[5, 11, 12]The root 5 is the 3rd largest. The heap contains {5, 11, 12}, which are indeed the three largest elements.
Each of the elements is pushed once and popped at most once. Each heap operation on a size-K heap costs . Total: time, space. When K is small (say, K = 10 out of n = 1,000,000), this is substantially faster than sorting.
For LeetCode 347, when the problem asks for the K most frequent values, build a frequency map first and then run the same size-K heap logic over the counts.
from collections import Counter
import heapq
def top_k_frequent(nums: list[int], k: int) -> list[int]:
freq = Counter(nums)
# heap of (count, element); min-heap evicts least frequent
heap = []
for elem, count in freq.items():
heapq.heappush(heap, (count, elem))
if len(heap) > k:
heapq.heappop(heap)
return [elem for count, elem in heap]For nums = [1,1,1,2,2,3] with k = 2, the frequency map is {1:3, 2:2, 3:1}. The heap processes all three entries. After all insertions the heap holds the two entries with the highest counts: (2, 2) and (3, 1), having evicted count 1. The result is elements 1 and 2, which are the two most frequent. Correct.
Python's heapq.nlargest(k, freq, key=freq.get) is a one-liner that does the same thing under the hood and is perfectly acceptable in interviews.
LeetCode 973 shows how the same pattern applies to geometric problems. Compute each point's squared distance (no need for the square root; it preserves ordering) and use a max-heap of size K, or equivalently, a min-heap with negated distances.
import heapq
def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
# max-heap: store (-distance, point) so the farthest sits at root
heap = []
for x, y in points:
dist = x * x + y * y
heapq.heappush(heap, (-dist, [x, y]))
if len(heap) > k:
heapq.heappop(heap) # evict the farthest
return [point for _, point in heap]Here the heap evicts the farthest point seen so far, leaving only the K closest at the end. Negating the distance turns Python's min-heap into a max-heap for this comparison key.
Python's heapq module only provides a min-heap. Two strategies handle max-heap needs:
-x instead of x; the smallest negative corresponds to the largest original value.heapq.nlargest or heapq.nsmallest: these are convenient for one-shot extraction but internally sort, so they are , the same asymptotic complexity as the manual approach.For the Kth largest element specifically, there is also a Quickselect solution that achieves average time, but the heap approach is simpler to implement under interview pressure and has deterministic worst-case behavior.
With LeetCode 378, rows and columns are each sorted, so you can treat the matrix like K sorted lists and merge them with a min-heap. Seed the heap with the first element of every row, then pop the minimum K times. Each pop advances one row's pointer, and you push the next element from that same row.
import heapq
def kth_smallest_matrix(matrix: list[list[int]], k: int) -> int:
n = len(matrix)
# (value, row, col)
heap = [(matrix[r][0], r, 0) for r in range(n)]
heapq.heapify(heap)
for _ in range(k):
val, r, c = heapq.heappop(heap)
if c + 1 < n:
heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
return valFor a 3x3 matrix the heap starts with the leftmost column. Each pop returns the global minimum of all frontier elements, and after K pops you have the Kth smallest. Time is where n is the number of rows, since the heap never exceeds size n.
LeetCode 373 extends this idea: given two sorted arrays nums1 and nums2, find the K pairs (a, b) with the smallest a + b. Seed a min-heap with every (nums1[i] + nums2[0], i, 0), then each time you pop a pair at index (i, j), push (nums1[i] + nums2[j+1], i, j+1) to explore the next candidate.
import heapq
def k_smallest_pairs(nums1: list[int], nums2: list[int], k: int) -> list[list[int]]:
if not nums1 or not nums2:
return []
heap = [(nums1[i] + nums2[0], i, 0) for i in range(min(k, len(nums1)))]
heapq.heapify(heap)
result = []
while heap and len(result) < k:
total, i, j = heapq.heappop(heap)
result.append([nums1[i], nums2[j]])
if j + 1 < len(nums2):
heapq.heappush(heap, (nums1[i] + nums2[j + 1], i, j + 1))
return resultThe key insight is that if (i, j) is a valid pair, then (i, j+1) is the next cheapest extension along that row. We never need to push (i+1, j) because every row's j=0 entry is already in the initial seed. This keeps the heap at most size K and runs in time.
Some problems combine a selection constraint with an optimization objective. In the IPO problem (LC 502), you have starting capital w, and each project has a minimum capital requirement and a profit. You can complete at most K projects sequentially, and finishing a project adds its profit to your capital.
Sort projects by capital requirement. Use a pointer to sweep through them: as your capital grows, push newly affordable projects onto a max-heap keyed by profit. Each round, pop the most profitable affordable project.
import heapq
def find_maximized_capital(k: int, w: int, profits: list[int], capital: list[int]) -> int:
projects = sorted(zip(capital, profits))
max_heap = []
i = 0
for _ in range(k):
while i < len(projects) and projects[i][0] <= w:
heapq.heappush(max_heap, -projects[i][1])
i += 1
if not max_heap:
break
w += -heapq.heappop(max_heap)
return wThe sorted order ensures you only consider each project once. The max-heap (via negation) always surfaces the best available profit. Total time is for sorting plus for heap operations.
Take LeetCode 871: a car starts with some fuel and needs to reach a target distance. Along the way are gas stations, each at a known position with a known fuel amount. Find the minimum number of stops needed to reach the target.
The greedy insight: drive as far as you can. When you run out of fuel, "retroactively" refuel at the best station you already passed. A max-heap stores the fuel amounts of all passed stations, so each refuel grabs the maximum available.
import heapq
def min_refuel_stops(target: int, start_fuel: int, stations: list[list[int]]) -> int:
max_heap = []
fuel = start_fuel
prev = 0
stops = 0
for pos, gas in stations + [[target, 0]]:
fuel -= (pos - prev)
while fuel < 0 and max_heap:
fuel += -heapq.heappop(max_heap)
stops += 1
if fuel < 0:
return -1
heapq.heappush(max_heap, -gas)
prev = pos
return stopsBy deferring the refueling decision, we always pick the station that gives the most fuel. This greedy choice is provably optimal. The heap holds at most n station entries, so the algorithm runs in time.
LeetCode 1481 adds a twist: given an array of integers and a budget K, remove exactly K elements to minimize the number of distinct values remaining. The strategy: count frequencies, then greedily remove elements starting with the least frequent values first. Removing all copies of a rare value eliminates one distinct value at the cheapest cost.
from collections import Counter
def find_least_num_of_unique_ints(arr: list[int], k: int) -> int:
freq = Counter(arr)
counts = sorted(freq.values())
for i, count in enumerate(counts):
if k >= count:
k -= count
else:
return len(counts) - i
return 0Sorting the frequency counts and removing from the bottom is equivalent to using a min-heap on frequencies. Either way, you spend your removal budget where it eliminates the most distinct values. Time is for sorting the counts.
You're probably looking at the top-K pattern when:
Common templates:
Practice Problems (56)