← back

Binary Search

Median of Two Sorted Arrays

Binary search on the partition point of the smaller array to find the correct split between both arrays. Ensures equal elements on each side for the median. Achieves O(log(min(m,n))).

Median of Two Sorted Arrays

The Problem

You are given two sorted arrays nums1 and nums2 of sizes m and n. Find the median of the combined sorted order. The overall run-time complexity must be O(log(min(m,n)))O(\log(\min(m, n))).

This is LeetCode 4, and the logarithmic requirement is what makes it hard. If the problem only asked for "find the median," you could merge the two arrays in O(m+n)O(m + n) time and pick the middle element. That works, but it does not satisfy the constraint. We need a fundamentally different approach, one built on binary search.

Why Merging Is Too Slow

Merging two sorted arrays is a familiar operation. You maintain two pointers, compare elements, and advance the smaller one. After (m + n) / 2 steps you reach the median. The time is O(m+n)O(m + n) and the space is O(1)O(1) if you only track the last couple of values.

The trouble is that O(m+n)O(m + n) is linear in the total size. When the problem demands O(log(min(m,n)))O(\log(\min(m, n))), it is asking you to ignore most of the elements entirely. You need to jump to the right place in a single binary search, not crawl there one element at a time.

The Partition Idea

Imagine cutting both arrays with a vertical line so that the left side contains exactly half the total elements and the right side contains the other half. If every element on the left is less than or equal to every element on the right, then the median sits right at the cut: it is either the maximum of the left side (when the total count is odd) or the average of the maximum of the left side and the minimum of the right side (when the total count is even).

Concretely, suppose you place the cut so that i elements come from A and j elements come from B, where i + j = (m + n + 1) // 2. The left side is A[0..i-1] and B[0..j-1]. The right side is A[i..m-1] and B[j..n-1].

Because both arrays are individually sorted, the only thing that could go wrong is at the boundary between the arrays. Specifically, you need two "cross conditions" to hold:

  • A[i-1] <= B[j]: the largest element taken from A on the left must not exceed the smallest element taken from B on the right.
  • B[j-1] <= A[i]: the largest element taken from B on the left must not exceed the smallest element taken from A on the right.

If both conditions hold, you have a valid partition and can read off the median.

Binary Search on the Smaller Array

You only have one degree of freedom. Once you choose i (how many elements to take from A), the value j is forced: j = (m + n + 1) // 2 - i. So you binary search on i in the range [0, m].

At each step:

  • If A[i-1] > B[j], then A is contributing too many elements to the left side. Move the cut in A to the left: hi = i - 1.
  • If B[j-1] > A[i], then A is contributing too few elements. Move the cut to the right: lo = i + 1.
  • Otherwise, the partition is correct.

You always binary search on the smaller array. If A is longer than B, swap them. This guarantees that j never goes negative (since i <= m <= n means j >= 0) and that the binary search runs in O(log(min(m,n)))O(\log(\min(m, n))) time.

Handling Edge Partitions with Sentinels

When i = 0, no elements from A are on the left, so there is no A[i-1] to compare. Similarly, when i = m, no elements from A are on the right, so there is no A[i]. The clean way to handle this is with sentinels: treat A[-1] as negative infinity and A[m] as positive infinity. The same goes for B. This way the cross conditions always have well-defined values and edge partitions are handled without special-case logic.

Walkthrough: A = [1, 3], B = [2]

Total elements: 3 (odd), so the median is the single middle element. We need (3 + 1) // 2 = 2 elements on the left.

A is smaller (m = 2, n = 1), so we binary search on A. lo = 0, hi = 2.

Iteration 1: i = (0 + 2) // 2 = 1, j = 2 - 1 = 1. Left side: A[0] = 1, B[0] = 2. Right side: A[1] = 3, (B is exhausted, so B[1] = +inf). Check: A[0] = 1 <= B[1] = +inf, yes. B[0] = 2 <= A[1] = 3, yes. Valid partition. The maximum of the left side is max(1, 2) = 2. Since the total count is odd, the median is 2.

Walkthrough: A = [1, 2], B = [3, 4]

Total elements: 4 (even), so the median is the average of the 2nd and 3rd elements. We need (4 + 1) // 2 = 2 elements on the left.

A and B are equal length (m = 2, n = 2). Binary search on A. lo = 0, hi = 2.

Iteration 1: i = (0 + 2) // 2 = 1, j = 2 - 1 = 1. Left side: A[0] = 1, B[0] = 3. Right side: A[1] = 2, B[1] = 4. Check: A[0] = 1 <= B[1] = 4, yes. B[0] = 3 <= A[1] = 2, no. B is contributing too much on the left, meaning A is contributing too little. Move right: lo = 2.

Iteration 2: i = (2 + 2) // 2 = 2, j = 2 - 2 = 0. Left side: A[0] = 1, A[1] = 2. (B contributes nothing on left, so B[-1] = -inf.) Right side: B[0] = 3, B[1] = 4. (A is exhausted, so A[2] = +inf.) Check: A[1] = 2 <= B[0] = 3, yes. B[-1] = -inf <= A[2] = +inf, yes. Valid partition. Maximum of left: max(2, -inf) = 2. Minimum of right: min(+inf, 3) = 3. The median is (2 + 3) / 2 = 2.5.

The Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def findMedianSortedArrays(nums1, nums2):
    # Always binary search on the smaller array
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1

    m, n = len(nums1), len(nums2)
    lo, hi = 0, m
    half = (m + n + 1) // 2

    while lo <= hi:
        i = (lo + hi) // 2
        j = half - i

        # Sentinel values for edge partitions
        left_A  = nums1[i - 1] if i > 0 else float('-inf')
        right_A = nums1[i]     if i < m else float('inf')
        left_B  = nums2[j - 1] if j > 0 else float('-inf')
        right_B = nums2[j]     if j < n else float('inf')

        if left_A > right_B:
            # Too many from A on the left, move cut left
            hi = i - 1
        elif left_B > right_A:
            # Too few from A on the left, move cut right
            lo = i + 1
        else:
            # Valid partition
            max_left = max(left_A, left_B)
            if (m + n) % 2 == 1:
                return max_left
            min_right = min(right_A, right_B)
            return (max_left + min_right) / 2

The function starts by ensuring nums1 is the smaller array. It then binary searches over the partition index i in nums1. For each candidate, it computes j so that the left side has exactly half the total elements. The sentinels handle the cases where a partition sits at the very start or very end of an array. Once the cross conditions are satisfied, it reads the median directly from the boundary values.

Why Binary Search on the Smaller Array

If you binary-searched on the larger array instead, the algorithm would still be correct, but it would run in O(log(max(m,n)))O(\log(\max(m, n))) instead of O(log(min(m,n)))O(\log(\min(m, n))). Worse, you could run into cases where j becomes negative because the larger array can contribute more than half the elements by itself. Searching on the smaller array avoids this entirely: since i ranges from 0 to m and m <= n, the derived j = half - i is always between 0 and n.

Variation: Kth Element of Two Sorted Arrays

The median is just the special case where you want the element at position (m + n + 1) // 2. The same partition idea generalizes: to find the kth smallest element across both arrays, set half = k instead of (m + n + 1) // 2. You binary search for a partition with exactly k elements on the left side, check the same cross conditions, and return the maximum of the left side. Everything else, including the sentinel logic, the binary search bounds, and the swap to ensure you search the smaller array, remains identical.

The only subtlety is that i can now range up to min(m, k) instead of m, since you cannot take more than k elements from one array. And the lower bound is max(0, k - n), since if k > n you must take at least k - n elements from A.

Complexity Analysis

Time: O(log(min(m,n)))O(\log(\min(m, n))). The binary search runs on the smaller array, halving the range each step.

Space: O(1)O(1). Only a constant number of variables are used regardless of input size.

Recognizing this pattern

You're probably looking at the median-of-two-sorted-arrays partition when:

  • You have two (or a few) sorted arrays and need an order-statistic (median, kth element) of their combined order in better than O(m+n)O(m + n).
  • The required complexity is O(log(min(m,n)))O(\log(\min(m, n))) or O(log(m+n))O(\log(m + n)), ruling out merging.
  • The defining trick is partitioning both arrays so the left halves combined equal half the total length, then checking the cross-boundary inequalities A[i-1] <= B[j] and B[j-1] <= A[i].
  • You can express the problem as "find a split where everything on the left is <= everything on the right."

Common templates:

  • Median via partition: binary search on the smaller array; balance left-side counts so the boundary elements satisfy the cross-pair inequalities. Example: Median of Two Sorted Arrays.
  • Kth smallest via partition: generalize to any k, not just the middle; same partition logic with a shifted left-count target.
  • Kth smallest by halving: alternative O(logk)O(\log k) recursion that discards k/2k/2 elements from one array at a time. Example: Kth Smallest Element in a Sorted Matrix.
  • Median of a data stream: when the data is dynamic rather than two fixed arrays, use two heaps. Example: Find Median from Data Stream.