← back

Heap / Priority Queue

Running Median (Two Heaps)

Maintain a max-heap for the lower half and a min-heap for the upper half. Balance them to differ by at most 1. The median is the top of the larger heap, or the average of both tops.

Maintaining a Running Median with Two Heaps

LeetCode 295 motivates the two-heap approach. Finding the median of a static sorted array is trivial: pick the middle element. The challenge arises when numbers arrive one at a time from a stream and you need to report the current median after each insertion, without re-sorting the whole collection every time.

The key observation is that you do not need the full sorted order. You only need access to the one or two elements closest to the middle. Two heaps give you exactly that.

The Two-Heap Invariant

Split the stream into a lower half and an upper half:

  • lo is a max-heap holding the smaller half of all numbers seen so far. Its root is the largest element in the lower half.
  • hi is a min-heap holding the larger half. Its root is the smallest element in the upper half.

Maintain two invariants:

  1. Every element in lo is less than or equal to every element in hi.
  2. The sizes differ by at most one: len(lo) is either equal to len(hi) or exactly one more.

When these invariants hold, the median is always at the boundary. If the heaps are equal in size, the median is the average of the two roots. If lo is larger by one, the median is lo's root.

Inserting a New Element

A clean two-step approach keeps the invariants intact without requiring a long chain of conditionals.

Step 1: Push the new number onto lo. Since lo is a max-heap, this places the number in the lower partition, but lo's new root might now be larger than hi's root, violating invariant 1.

Step 2: Fix invariant 1 by moving lo's root to hi. This guarantees the cross-partition ordering is correct.

After step 2, hi might be larger than lo, violating invariant 2. If so, move hi's root back to lo.

This unconditional push-then-rebalance pattern handles all cases: numbers smaller than the current median, numbers larger, and numbers that fall exactly at the median.

Walked Example

Stream: [5, 15, 1, 3]

Insert 5:

  • Push 5 to lo → lo: [5], hi: []
  • Move lo's top (5) to hi → lo: [], hi: [5]
  • hi is larger, move hi's top (5) back to lo → lo: [5], hi: []
  • Median: 5.0 (lo is larger by one, return lo's root)

Insert 15:

  • Push 15 to lo → lo: [15, 5] (max-heap, 15 at root), hi: []
  • Move lo's top (15) to hi → lo: [5], hi: [15]
  • Sizes equal, no rebalance needed
  • Median: (5 + 15) / 2 = 10.0

Insert 1:

  • Push 1 to lo → lo: [5, 1], hi: [15]
  • Move lo's top (5) to hi → lo: [1], hi: [5, 15]
  • hi is larger (2 vs 1), move hi's top (5) back to lo → lo: [5, 1], hi: [15]
  • Median: 5.0 (lo is larger by one)

Insert 3:

  • Push 3 to lo → lo: [5, 1, 3], hi: [15]
  • Move lo's top (5) to hi → lo: [3, 1], hi: [5, 15]
  • Sizes equal (2 vs 2), no rebalance needed
  • Median: (3 + 5) / 2 = 4.0

The sorted sequence at this point is [1, 3, 5, 15]. The true median is (3 + 5) / 2 = 4.0. Correct.

The Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import heapq

class MedianFinder:
    def __init__(self):
        self.lo = []   # max-heap for lower half (negate values)
        self.hi = []   # min-heap for upper half

    def addNum(self, num: int) -> None:
        # Step 1: push to lower half
        heapq.heappush(self.lo, -num)
        # Step 2: balance cross-partition order
        heapq.heappush(self.hi, -heapq.heappop(self.lo))
        # Step 3: rebalance sizes (lo may have one more, never hi)
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))

    def findMedian(self) -> float:
        if len(self.lo) > len(self.hi):
            return float(-self.lo[0])
        return (-self.lo[0] + self.hi[0]) / 2.0

Python's Negation Trick

Python's heapq module is a min-heap only. Simulating a max-heap requires negating all stored values: pushing -num means the most negative value (i.e., the largest original value) sits at the root. When you pop, negate again to recover the original value.

This is why every push to lo uses -num and every read from lo's root uses -self.lo[0]. The hi heap stores values as-is since it is already a min-heap.

Complexity

Each insertion performs a constant number of heap pushes and pops on heaps of size O(n)O(n). Each heap operation is O(logn)O(\log n), so insertion is O(logn)O(\log n). Reading the median is O(1)O(1) since it only inspects the roots. Space is O(n)O(n) to store all elements.

Sliding Window Median

LeetCode 480 adds a twist: a harder variant maintains the running median over a fixed-size sliding window: as the window advances, you must remove the element that falls out while adding the new one. Heaps do not support arbitrary removal efficiently, so the standard solution uses lazy deletion.

Keep a dictionary tracking which elements have been "logically deleted," meaning elements that have left the window but are still physically in a heap. When you pop from a heap, check whether the root is in the deleted set; if so, discard it and pop again. This does not require any upfront restructuring; the deleted entries simply get skipped the next time they bubble to the top.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import heapq
from collections import defaultdict

def medianSlidingWindow(nums: list[int], k: int) -> list[float]:
    lo: list[int] = []  # max-heap (negated values)
    hi: list[int] = []  # min-heap
    deleted: dict[int, int] = defaultdict(int)

    def get_valid_top(heap: list, negate: bool) -> float:
        while heap:
            val = -heap[0] if negate else heap[0]
            if deleted[val] > 0:
                deleted[val] -= 1
                heapq.heappop(heap)
            else:
                break
        return -heap[0] if negate else heap[0]

    # seed first window
    for num in nums[:k]:
        heapq.heappush(lo, -num)
    for _ in range(k // 2):
        heapq.heappush(hi, -heapq.heappop(lo))

    results = []

    def current_median() -> float:
        lo_top = get_valid_top(lo, negate=True)
        if k % 2 == 1:
            return float(lo_top)
        hi_top = get_valid_top(hi, negate=False)
        return (lo_top + hi_top) / 2.0

    results.append(current_median())

    for i in range(k, len(nums)):
        incoming, outgoing = nums[i], nums[i - k]
        deleted[outgoing] += 1
        # add incoming to the correct side and rebalance
        if incoming <= get_valid_top(lo, negate=True):
            heapq.heappush(lo, -incoming)
        else:
            heapq.heappush(hi, incoming)
        # rebalance sizes accounting for lazy deletions
        while len(lo) - len(hi) > (k + 1) // 2:
            heapq.heappush(hi, -heapq.heappop(lo))
        while len(hi) > len(lo):
            heapq.heappush(lo, -heapq.heappop(hi))
        results.append(current_median())

    return results

The lazy deletion approach preserves O(nlogk)O(n \log k) overall complexity. Each element is pushed once, lazily deleted once, and eventually popped once: three heap operations of O(logk)O(\log k) each, over n elements.

Recognizing this pattern

You're probably looking at the two-heap pattern when:

  • The problem asks for the median of a data stream or a running median over a sliding window.
  • You need fast access to the boundary element between the smaller and larger halves of a dynamic set.
  • Insertions arrive online, so sorting the whole array each step would be too slow, but you only need one or two specific order statistics.
  • You need the Kth element of a dynamic multiset where K tracks the running size (always the middle).

Common templates:

  • Streaming median (lo max-heap + hi min-heap): keep |len(lo) - len(hi)| <= 1, median is one or both roots. Example: Find Median from Data Stream.
  • Sliding window median with lazy deletion: defer removal of out-of-window roots until they surface. Example: Sliding Window Median.
  • IPO / scheduling with two priority queues: one heap holds "available", another holds "pending by start condition." Example: IPO.
  • Balance-after-insert template: push to one heap, move its root to the other, rebalance sizes. Example: Finding MK Average.