← back

Randomized Algorithms

Quickselect

Partition the array around a pivot (like quicksort) but only recurse into the half containing the k-th element. O(n) average time for finding the k-th largest/smallest element.

Quickselect

Suppose you need the kth smallest element in an unsorted array. The naive approach, sorting the whole array and then indexing at k, costs O(nlogn)O(n \log n). But sorting does far more work than necessary: it orders every element, while you only care about one. Quickselect exploits this observation to find the kth smallest in O(n)O(n) average time, throwing away roughly half the problem at each step rather than sorting anything.

The Connection to Quicksort

Quickselect is a direct descendant of quicksort. Both algorithms are built around a partition step: pick a pivot, then rearrange the array so that all elements smaller than the pivot end up to its left, all elements larger end up to its right, and the pivot sits in its final sorted position. Quicksort then recurses on both sides. Quickselect only recurses on the one side that contains the kth position. The other half is discarded entirely.

Three-Way Partition

A three-way partition, splitting into less, equal, and greater buckets, handles duplicates correctly and is the most readable form for interviews. After partitioning:

  • If k < len(less), the answer is in the smaller elements; recurse on less with the same k.
  • If k < len(less) + len(equal), the pivot itself is the kth smallest; return it.
  • Otherwise, the answer is in the larger elements; recurse on greater with k adjusted by the sizes of the other two buckets.

Walking Through an Example

Find the 3rd smallest (k=2, zero-indexed) in [7, 2, 1, 8, 6, 3, 5, 4].

Round 1. Say the random pivot is 4.

  • less = [2, 1, 3] (length 3)
  • equal = [4] (length 1)
  • greater = [7, 8, 6, 5] (length 4)

Since k=2 < len(less)=3, the answer is somewhere in [2, 1, 3]. Recurse with k=2.

Round 2. Array is [2, 1, 3], k=2. Say pivot is 2.

  • less = [1] (length 1)
  • equal = [2] (length 1)
  • greater = [3] (length 1)

Since k=2 >= len(less) + len(equal) = 2, the answer is in greater. Recurse on [3] with k = 2 - 1 - 1 = 0.

Round 3. Array is [3], k=0. Pivot is 3.

  • less = [], equal = [3], greater = []

Since k=0 < len(less) + len(equal) = 1, return 3. Correct: the 3rd smallest in the original array is indeed 3.

The Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import random

def quickselect(nums: list[int], k: int) -> int:
    """Return the kth smallest element (0-indexed)."""
    if len(nums) == 1:
        return nums[0]

    pivot = random.choice(nums)
    less    = [x for x in nums if x < pivot]
    equal   = [x for x in nums if x == pivot]
    greater = [x for x in nums if x > pivot]

    if k < len(less):
        return quickselect(less, k)
    elif k < len(less) + len(equal):
        return pivot
    else:
        return quickselect(greater, k - len(less) - len(equal))

To find the kth largest instead, pass k = len(nums) - k (converting to a kth-smallest index), or negate all values and find the kth smallest of the negated array.

Why Average O(n)O(n)?

The key insight is that, on average, each partition step reduces the problem to roughly half the remaining elements. The total work across all recursive calls forms a geometric series:

1
n + n/2 + n/4 + n/8 + ... ≈ 2n

So the expected total work is O(n)O(n). More formally, with a random pivot the expected size of the surviving subarray is at most 3n/43n/4, giving a recurrence of T(n)=T(3n/4)+O(n)T(n) = T(3n/4) + O(n), which solves to O(n)O(n).

This is fundamentally different from quicksort, which must recurse on both sides and therefore costs O(nlogn)O(n \log n) on average. Quickselect's one-sided recursion is what gives it the linear bound.

Worst Case and Median-of-Medians

In the worst case, if every pivot chosen turns out to be the minimum or maximum of its subarray, quickselect degrades to O(n2)O(n²). With a random pivot, the probability of this happening consistently is vanishingly small (the probability of always choosing the minimum across n rounds is (1/n)n(1/n)^n), so randomization makes the worst case essentially theoretical.

If you need a guaranteed O(n)O(n) worst case, the median-of-medians algorithm provides it. The idea: divide the array into groups of five, find the median of each group (constant work per group since each group has fixed size), then find the median of those medians recursively. Using this value as the pivot guarantees the surviving subarray is at most 70% of the original size, yielding T(n)=T(n/5)+T(7n/10)+O(n)T(n) = T(n/5) + T(7n/10) + O(n), which solves to O(n)O(n). Median-of-medians is rarely needed in practice, since the constants are high and random pivot is almost always sufficient, but it demonstrates that linear selection is achievable in the worst case.

Kth Largest Element in an Array

LeetCode 215. Kth Largest Element in an Array is the most common interview problem solved by quickselect. Given an array and an integer k, return the kth largest element. Not the kth distinct element, but the kth element in sorted order from the right.

The trick is a one-line conversion: the kth largest is the (n - k)th smallest (zero-indexed). So for nums = [3, 2, 1, 5, 6, 4] and k = 2, we want the 4th smallest (index 4 in a sorted array), which is 5.

1
2
def findKthLargest(nums, k):
    return quickselect(nums, len(nums) - k)

Walking Through [3, 2, 1, 5, 6, 4] with k=2

We need the (6 - 2) = 4th smallest (zero-indexed), which is the value 5.

Round 1. Suppose pivot is 3.

  • less = [2, 1] (length 2)
  • equal = [3] (length 1)
  • greater = [5, 6, 4] (length 3)

Since k=4 >= len(less) + len(equal) = 3, the answer is in greater. Recurse with k = 4 - 2 - 1 = 1.

Round 2. Array is [5, 6, 4], k=1. Suppose pivot is 5.

  • less = [4] (length 1)
  • equal = [5] (length 1)
  • greater = [6] (length 1)

Since k=1 >= len(less) = 1 and k=1 < len(less) + len(equal) = 2, the pivot 5 is the answer. Return 5.

Top K Frequent Elements

LeetCode 347. Top K Frequent Elements asks for the k most frequent elements. You can solve this with quickselect by first counting frequencies with a hashmap, then selecting the kth largest frequency. The idea: create a list of (count, value) pairs and use quickselect to find the element at position n - k, then return all elements from that position onward.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from collections import Counter
import random

def topKFrequent(nums, k):
    counts = Counter(nums)
    pairs = list(counts.items())  # (value, frequency) pairs

    # Find the k largest pairs by frequency using quickselect.
    # We want the (k-1)th largest as the threshold, i.e., the (len-k)th smallest.
    def select(arr, idx):
        """Return the element at sorted position idx (by frequency, ascending)."""
        pivot = random.choice(arr)[1]
        less    = [x for x in arr if x[1] < pivot]
        equal   = [x for x in arr if x[1] == pivot]
        greater = [x for x in arr if x[1] > pivot]

        if idx < len(less):
            return select(less, idx)
        elif idx < len(less) + len(equal):
            return pivot
        else:
            return select(greater, idx - len(less) - len(equal))

    threshold = select(pairs, len(pairs) - k)
    result = [val for val, cnt in pairs if cnt > threshold]
    # Fill the remaining slots from values at exactly the threshold frequency
    result += [val for val, cnt in pairs if cnt == threshold][: k - len(result)]
    return result

An alternative, and often cleaner, approach for this specific problem is bucket sort by frequency, which also runs in O(n)O(n). But quickselect demonstrates the general principle: any "top K" or "kth element" problem where you don't need full sorted order is a candidate for quickselect.

In-Place Partition: The Lomuto Scheme

The three-bucket version above creates new lists on each recursion, using O(n)O(n) extra space. For interviews that ask for O(1)O(1) auxiliary space, use the Lomuto partition scheme, which rearranges the array in place.

The idea: pick a pivot (say the last element), then sweep through with a write pointer. Every element smaller than the pivot gets swapped to the write pointer's position. After the sweep, swap the pivot into its final position.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import random

def findKthLargest(nums, k):
    target = len(nums) - k

    def partition(lo, hi):
        # Random pivot, swap to end
        pivot_idx = random.randint(lo, hi)
        nums[pivot_idx], nums[hi] = nums[hi], nums[pivot_idx]
        pivot = nums[hi]
        write = lo
        for i in range(lo, hi):
            if nums[i] <= pivot:
                nums[write], nums[i] = nums[i], nums[write]
                write += 1
        nums[write], nums[hi] = nums[hi], nums[write]
        return write

    lo, hi = 0, len(nums) - 1
    while lo <= hi:
        p = partition(lo, hi)
        if p == target:
            return nums[p]
        elif p < target:
            lo = p + 1
        else:
            hi = p - 1

This version modifies the input array but uses only O(1)O(1) extra space (ignoring the recursion stack, which is replaced by a loop here). The iterative version avoids stack overflow on large inputs.

When to Reach for Quickselect

Use quickselect whenever the problem asks for a single order statistic, such as the kth smallest, kth largest, median, or top-k, and you do not need the rest of the elements sorted. The signal is "kth" or "top k" without a requirement to output them in sorted order. If sorted order matters, a heap of size k (O(nlogk)O(n \log k)) is often simpler and more predictable; if a value distribution is bounded, bucket sort wins (LeetCode 347). Quickselect's sweet spot is the unrestricted, unsorted case where its expected O(n)O(n) beats both alternatives.

Complexity Summary

  • Time (average): O(n)O(n), due to geometric series of shrinking subproblems
  • Time (worst case): O(n2)O(n²), avoided in practice by random pivot selection
  • Space: O(n)O(n) for the three-bucket version; the in-place Lomuto version uses O(1)O(1) auxiliary space

Recognizing this pattern

You're probably looking at quickselect when:

  • You need the kth largest or kth smallest element in expected O(n) time without sorting.
  • The problem explicitly forbids O(n log n) sorting or asks for an in-place solution on the input array.
  • A heap of size k would work but k is close to n, making the O(n log k) heap approach worse than linear selection.
  • You also need the top-k elements as a set (not in order), and quickselect partitions them on one side of the pivot for free.

Common templates: