Randomized Algorithms
Partition the array around a pivot (like quicksort) but only recurse into the half containing the k-th element. O(n) average time for finding the k-th largest/smallest element.
Suppose you need the kth smallest element in an unsorted array. The naive approach, sorting the whole array and then indexing at k, costs . But sorting does far more work than necessary: it orders every element, while you only care about one. Quickselect exploits this observation to find the kth smallest in average time, throwing away roughly half the problem at each step rather than sorting anything.
Quickselect is a direct descendant of quicksort. Both algorithms are built around a partition step: pick a pivot, then rearrange the array so that all elements smaller than the pivot end up to its left, all elements larger end up to its right, and the pivot sits in its final sorted position. Quicksort then recurses on both sides. Quickselect only recurses on the one side that contains the kth position. The other half is discarded entirely.
A three-way partition, splitting into less, equal, and greater buckets, handles duplicates correctly and is the most readable form for interviews. After partitioning:
k < len(less), the answer is in the smaller elements; recurse on less with the same k.k < len(less) + len(equal), the pivot itself is the kth smallest; return it.greater with k adjusted by the sizes of the other two buckets.Find the 3rd smallest (k=2, zero-indexed) in [7, 2, 1, 8, 6, 3, 5, 4].
Round 1. Say the random pivot is 4.
less = [2, 1, 3] (length 3)equal = [4] (length 1)greater = [7, 8, 6, 5] (length 4)Since k=2 < len(less)=3, the answer is somewhere in [2, 1, 3]. Recurse with k=2.
Round 2. Array is [2, 1, 3], k=2. Say pivot is 2.
less = [1] (length 1)equal = [2] (length 1)greater = [3] (length 1)Since k=2 >= len(less) + len(equal) = 2, the answer is in greater. Recurse on [3] with k = 2 - 1 - 1 = 0.
Round 3. Array is [3], k=0. Pivot is 3.
less = [], equal = [3], greater = []Since k=0 < len(less) + len(equal) = 1, return 3. Correct: the 3rd smallest in the original array is indeed 3.
import random
def quickselect(nums: list[int], k: int) -> int:
"""Return the kth smallest element (0-indexed)."""
if len(nums) == 1:
return nums[0]
pivot = random.choice(nums)
less = [x for x in nums if x < pivot]
equal = [x for x in nums if x == pivot]
greater = [x for x in nums if x > pivot]
if k < len(less):
return quickselect(less, k)
elif k < len(less) + len(equal):
return pivot
else:
return quickselect(greater, k - len(less) - len(equal))To find the kth largest instead, pass k = len(nums) - k (converting to a kth-smallest index), or negate all values and find the kth smallest of the negated array.
The key insight is that, on average, each partition step reduces the problem to roughly half the remaining elements. The total work across all recursive calls forms a geometric series:
n + n/2 + n/4 + n/8 + ... ≈ 2nSo the expected total work is . More formally, with a random pivot the expected size of the surviving subarray is at most , giving a recurrence of , which solves to .
This is fundamentally different from quicksort, which must recurse on both sides and therefore costs on average. Quickselect's one-sided recursion is what gives it the linear bound.
In the worst case, if every pivot chosen turns out to be the minimum or maximum of its subarray, quickselect degrades to . With a random pivot, the probability of this happening consistently is vanishingly small (the probability of always choosing the minimum across n rounds is ), so randomization makes the worst case essentially theoretical.
If you need a guaranteed worst case, the median-of-medians algorithm provides it. The idea: divide the array into groups of five, find the median of each group (constant work per group since each group has fixed size), then find the median of those medians recursively. Using this value as the pivot guarantees the surviving subarray is at most 70% of the original size, yielding , which solves to . Median-of-medians is rarely needed in practice, since the constants are high and random pivot is almost always sufficient, but it demonstrates that linear selection is achievable in the worst case.
LeetCode 215. Kth Largest Element in an Array is the most common interview problem solved by quickselect. Given an array and an integer k, return the kth largest element. Not the kth distinct element, but the kth element in sorted order from the right.
The trick is a one-line conversion: the kth largest is the (n - k)th smallest (zero-indexed). So for nums = [3, 2, 1, 5, 6, 4] and k = 2, we want the 4th smallest (index 4 in a sorted array), which is 5.
def findKthLargest(nums, k):
return quickselect(nums, len(nums) - k)We need the (6 - 2) = 4th smallest (zero-indexed), which is the value 5.
Round 1. Suppose pivot is 3.
less = [2, 1] (length 2)equal = [3] (length 1)greater = [5, 6, 4] (length 3)Since k=4 >= len(less) + len(equal) = 3, the answer is in greater. Recurse with k = 4 - 2 - 1 = 1.
Round 2. Array is [5, 6, 4], k=1. Suppose pivot is 5.
less = [4] (length 1)equal = [5] (length 1)greater = [6] (length 1)Since k=1 >= len(less) = 1 and k=1 < len(less) + len(equal) = 2, the pivot 5 is the answer. Return 5.
LeetCode 347. Top K Frequent Elements asks for the k most frequent elements. You can solve this with quickselect by first counting frequencies with a hashmap, then selecting the kth largest frequency. The idea: create a list of (count, value) pairs and use quickselect to find the element at position n - k, then return all elements from that position onward.
from collections import Counter
import random
def topKFrequent(nums, k):
counts = Counter(nums)
pairs = list(counts.items()) # (value, frequency) pairs
# Find the k largest pairs by frequency using quickselect.
# We want the (k-1)th largest as the threshold, i.e., the (len-k)th smallest.
def select(arr, idx):
"""Return the element at sorted position idx (by frequency, ascending)."""
pivot = random.choice(arr)[1]
less = [x for x in arr if x[1] < pivot]
equal = [x for x in arr if x[1] == pivot]
greater = [x for x in arr if x[1] > pivot]
if idx < len(less):
return select(less, idx)
elif idx < len(less) + len(equal):
return pivot
else:
return select(greater, idx - len(less) - len(equal))
threshold = select(pairs, len(pairs) - k)
result = [val for val, cnt in pairs if cnt > threshold]
# Fill the remaining slots from values at exactly the threshold frequency
result += [val for val, cnt in pairs if cnt == threshold][: k - len(result)]
return resultAn alternative, and often cleaner, approach for this specific problem is bucket sort by frequency, which also runs in . But quickselect demonstrates the general principle: any "top K" or "kth element" problem where you don't need full sorted order is a candidate for quickselect.
The three-bucket version above creates new lists on each recursion, using extra space. For interviews that ask for auxiliary space, use the Lomuto partition scheme, which rearranges the array in place.
The idea: pick a pivot (say the last element), then sweep through with a write pointer. Every element smaller than the pivot gets swapped to the write pointer's position. After the sweep, swap the pivot into its final position.
import random
def findKthLargest(nums, k):
target = len(nums) - k
def partition(lo, hi):
# Random pivot, swap to end
pivot_idx = random.randint(lo, hi)
nums[pivot_idx], nums[hi] = nums[hi], nums[pivot_idx]
pivot = nums[hi]
write = lo
for i in range(lo, hi):
if nums[i] <= pivot:
nums[write], nums[i] = nums[i], nums[write]
write += 1
nums[write], nums[hi] = nums[hi], nums[write]
return write
lo, hi = 0, len(nums) - 1
while lo <= hi:
p = partition(lo, hi)
if p == target:
return nums[p]
elif p < target:
lo = p + 1
else:
hi = p - 1This version modifies the input array but uses only extra space (ignoring the recursion stack, which is replaced by a loop here). The iterative version avoids stack overflow on large inputs.
Use quickselect whenever the problem asks for a single order statistic, such as the kth smallest, kth largest, median, or top-k, and you do not need the rest of the elements sorted. The signal is "kth" or "top k" without a requirement to output them in sorted order. If sorted order matters, a heap of size k () is often simpler and more predictable; if a value distribution is bounded, bucket sort wins (LeetCode 347). Quickselect's sweet spot is the unrestricted, unsorted case where its expected beats both alternatives.
You're probably looking at quickselect when:
O(n) time without sorting.O(n log n) sorting or asks for an in-place solution on the input array.k would work but k is close to n, making the O(n log k) heap approach worse than linear selection.k elements as a set (not in order), and quickselect partitions them on one side of the pivot for free.Common templates:
<, =, >. Example: Top K Frequent Elements.k = n // 2 (or two-call median of even-length array). Example: Find Median from Data Stream.