Heap / Priority Queue

Top-K Elements

Maintain a min-heap of size k. For each new element, push it and pop the smallest if size exceeds k. The heap contains the k largest elements at the end. For k-th smallest, use a max-heap.

Finding Top-K Elements with Heaps

The "top-K" family of problems asks you to efficiently extract the K largest values, the K most frequent elements, or the K closest points from a larger collection. Sorting works, but it costs $O(n \log n)$ regardless of K. When K is much smaller than n, a heap-based approach cuts that to $O(n \log K)$ , a meaningful win on large inputs.

The Counter-Intuitive Min-Heap Trick

The canonical version, LeetCode 215, asks for the Kth largest element of an array. The natural instinct is to use a max-heap to track the largest elements: keep pulling off the top and you get the biggest values. But that approach requires storing all $n$ elements and then extracting K of them. The smarter move is a size-K min-heap.

Here is the insight: if you maintain a min-heap of exactly K elements, the root is always the smallest of the K largest seen so far, the threshold that a new element must beat to deserve a spot. For every new element you see, compare it against the root. If the new element is larger, evict the root and push the new element in. At the end, the heap contains exactly the K largest elements, and you never needed more than K slots.

import heapq

def kth_largest(nums: list[int], k: int) -> int:
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)   # evict the smallest-of-K
    return heap[0]               # root is the Kth largest

Walk through nums = [3, 1, 5, 12, 2, 11] with k = 3:

Push 3 → heap: [3]
Push 1 → heap: [1, 3]
Push 5 → heap: [1, 3, 5]
Push 12, size exceeds 3, pop 1 → heap: [3, 5, 12]
Push 2, size exceeds 3, pop 2 → heap: [3, 5, 12] (2 was immediately evicted)
Push 11, size exceeds 3, pop 3 → heap: [5, 11, 12]

The root 5 is the 3rd largest. The heap contains {5, 11, 12}, which are indeed the three largest elements.

Complexity

Each of the $n$ elements is pushed once and popped at most once. Each heap operation on a size-K heap costs $O(log K)$ . Total: $O(n \log K)$ time, $O(K)$ space. When K is small (say, K = 10 out of n = 1,000,000), this is substantially faster than sorting.

Frequency Variant: Top K Frequent Elements

For LeetCode 347, when the problem asks for the K most frequent values, build a frequency map first and then run the same size-K heap logic over the counts.

from collections import Counter
import heapq

def top_k_frequent(nums: list[int], k: int) -> list[int]:
    freq = Counter(nums)
    # heap of (count, element); min-heap evicts least frequent
    heap = []
    for elem, count in freq.items():
        heapq.heappush(heap, (count, elem))
        if len(heap) > k:
            heapq.heappop(heap)
    return [elem for count, elem in heap]

For nums = [1,1,1,2,2,3] with k = 2, the frequency map is {1:3, 2:2, 3:1}. The heap processes all three entries. After all insertions the heap holds the two entries with the highest counts: (2, 2) and (3, 1), having evicted count 1. The result is elements 1 and 2, which are the two most frequent. Correct.

Python's heapq.nlargest(k, freq, key=freq.get) is a one-liner that does the same thing under the hood and is perfectly acceptable in interviews.

K Closest Points to the Origin

LeetCode 973 shows how the same pattern applies to geometric problems. Compute each point's squared distance (no need for the square root; it preserves ordering) and use a max-heap of size K, or equivalently, a min-heap with negated distances.

import heapq

def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
    # max-heap: store (-distance, point) so the farthest sits at root
    heap = []
    for x, y in points:
        dist = x * x + y * y
        heapq.heappush(heap, (-dist, [x, y]))
        if len(heap) > k:
            heapq.heappop(heap)   # evict the farthest
    return [point for _, point in heap]

Here the heap evicts the farthest point seen so far, leaving only the K closest at the end. Negating the distance turns Python's min-heap into a max-heap for this comparison key.

Python's heapq and the Negation Pattern

Python's heapq module only provides a min-heap. Two strategies handle max-heap needs:

Negate values: store -x instead of x; the smallest negative corresponds to the largest original value.
Use heapq.nlargest or heapq.nsmallest: these are convenient for one-shot extraction but internally sort, so they are $O(n \log K)$ , the same asymptotic complexity as the manual approach.

For the Kth largest element specifically, there is also a Quickselect solution that achieves $O(n)$ average time, but the heap approach is simpler to implement under interview pressure and has deterministic $O(n \log K)$ worst-case behavior.

Kth Smallest in a Sorted Matrix

With LeetCode 378, rows and columns are each sorted, so you can treat the matrix like K sorted lists and merge them with a min-heap. Seed the heap with the first element of every row, then pop the minimum K times. Each pop advances one row's pointer, and you push the next element from that same row.

import heapq

def kth_smallest_matrix(matrix: list[list[int]], k: int) -> int:
    n = len(matrix)
    # (value, row, col)
    heap = [(matrix[r][0], r, 0) for r in range(n)]
    heapq.heapify(heap)

    for _ in range(k):
        val, r, c = heapq.heappop(heap)
        if c + 1 < n:
            heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
    return val

For a 3x3 matrix the heap starts with the leftmost column. Each pop returns the global minimum of all frontier elements, and after K pops you have the Kth smallest. Time is $O(K \log n)$ where n is the number of rows, since the heap never exceeds size n.

Find K Pairs with Smallest Sums

LeetCode 373 extends this idea: given two sorted arrays nums1 and nums2, find the K pairs (a, b) with the smallest a + b. Seed a min-heap with every (nums1[i] + nums2[0], i, 0), then each time you pop a pair at index (i, j), push (nums1[i] + nums2[j+1], i, j+1) to explore the next candidate.

import heapq

def k_smallest_pairs(nums1: list[int], nums2: list[int], k: int) -> list[list[int]]:
    if not nums1 or not nums2:
        return []
    heap = [(nums1[i] + nums2[0], i, 0) for i in range(min(k, len(nums1)))]
    heapq.heapify(heap)

    result = []
    while heap and len(result) < k:
        total, i, j = heapq.heappop(heap)
        result.append([nums1[i], nums2[j]])
        if j + 1 < len(nums2):
            heapq.heappush(heap, (nums1[i] + nums2[j + 1], i, j + 1))
    return result

The key insight is that if (i, j) is a valid pair, then (i, j+1) is the next cheapest extension along that row. We never need to push (i+1, j) because every row's j=0 entry is already in the initial seed. This keeps the heap at most size K and runs in $O(K \log K)$ time.

IPO: Greedy with Two Heaps

Some problems combine a selection constraint with an optimization objective. In the IPO problem (LC 502), you have starting capital w, and each project has a minimum capital requirement and a profit. You can complete at most K projects sequentially, and finishing a project adds its profit to your capital.

Sort projects by capital requirement. Use a pointer to sweep through them: as your capital grows, push newly affordable projects onto a max-heap keyed by profit. Each round, pop the most profitable affordable project.

import heapq

def find_maximized_capital(k: int, w: int, profits: list[int], capital: list[int]) -> int:
    projects = sorted(zip(capital, profits))
    max_heap = []
    i = 0

    for _ in range(k):
        while i < len(projects) and projects[i][0] <= w:
            heapq.heappush(max_heap, -projects[i][1])
            i += 1
        if not max_heap:
            break
        w += -heapq.heappop(max_heap)
    return w

The sorted order ensures you only consider each project once. The max-heap (via negation) always surfaces the best available profit. Total time is $O(n \log n)$ for sorting plus $O(n \log n)$ for heap operations.

Minimum Number of Refueling Stops

Take LeetCode 871: a car starts with some fuel and needs to reach a target distance. Along the way are gas stations, each at a known position with a known fuel amount. Find the minimum number of stops needed to reach the target.

The greedy insight: drive as far as you can. When you run out of fuel, "retroactively" refuel at the best station you already passed. A max-heap stores the fuel amounts of all passed stations, so each refuel grabs the maximum available.

import heapq

def min_refuel_stops(target: int, start_fuel: int, stations: list[list[int]]) -> int:
    max_heap = []
    fuel = start_fuel
    prev = 0
    stops = 0

    for pos, gas in stations + [[target, 0]]:
        fuel -= (pos - prev)
        while fuel < 0 and max_heap:
            fuel += -heapq.heappop(max_heap)
            stops += 1
        if fuel < 0:
            return -1
        heapq.heappush(max_heap, -gas)
        prev = pos
    return stops

By deferring the refueling decision, we always pick the station that gives the most fuel. This greedy choice is provably optimal. The heap holds at most n station entries, so the algorithm runs in $O(n \log n)$ time.

Least Number of Unique Integers after K Removals

LeetCode 1481 adds a twist: given an array of integers and a budget K, remove exactly K elements to minimize the number of distinct values remaining. The strategy: count frequencies, then greedily remove elements starting with the least frequent values first. Removing all copies of a rare value eliminates one distinct value at the cheapest cost.

from collections import Counter

def find_least_num_of_unique_ints(arr: list[int], k: int) -> int:
    freq = Counter(arr)
    counts = sorted(freq.values())

    for i, count in enumerate(counts):
        if k >= count:
            k -= count
        else:
            return len(counts) - i
    return 0

Sorting the frequency counts and removing from the bottom is equivalent to using a min-heap on frequencies. Either way, you spend your removal budget where it eliminates the most distinct values. Time is $O(n \log n)$ for sorting the counts.

Recognizing this pattern

You're probably looking at the top-K pattern when:

The problem says "top K", "Kth largest", "Kth smallest", "K most frequent", or "K closest", and K is a small parameter compared to n.
You only need the K best elements, not a full sort, and K is bounded enough that a size-K heap fits comfortably in memory.
The data may be streamed and you cannot revisit older elements, so you must maintain the top K online.
Average $O(n)$ via quickselect is acceptable when the problem allows in-place partitioning of an array.

Common templates:

Size-K min-heap for K largest: push, pop when size exceeds K, root is the Kth largest. Example: Kth Largest Element in an Array.
Size-K max-heap for K smallest / closest: heap stores negated keys so the worst kept element pops on overflow. Example: K Closest Points to Origin.
Counter + heap for K most frequent: count frequencies first, then run a size-K heap on (freq, value) pairs. Example: Top K Frequent Elements.
Quickselect partition: partition around a pivot until the pivot lands at index K. Example: Kth Largest Element in an Array.

Practice Problems (56)

⊣ 215 Kth Largest Element in an ArrayMedium

⊣ 347 Top K Frequent ElementsMedium

⊣ 373 Find K Pairs with Smallest SumsMedium

⊣ 378 Kth Smallest Element in a Sorted MatrixMedium

⊣ 414 Third Maximum NumberEasy

⊣ 502 IPOHard

⊣ 703 Kth Largest Element in a StreamEasy

⊣ 786 K-th Smallest Prime FractionMedium

⊣ 857 Minimum Cost to Hire K WorkersHard

⊣ 871 Minimum Number of Refueling StopsHard

⊣ 973 K Closest Points to OriginMedium