Sorting Techniques
Distribute elements into buckets by value range, or count occurrences directly. Achieves O(n) sorting when values have bounded range. Used for maximum gap, H-index, and frequency-based ordering.
Comparison-based sorting algorithms, including merge sort, quicksort, and heapsort, all share a fundamental lower bound: . No matter how cleverly you compare elements, you cannot sort faster in the general case. But when the values in your input have a bounded range, you can bypass comparisons entirely and sort in time using counting sort or bucket sort.
The motivating problem is LeetCode 164. Maximum Gap: given an unsorted array of non-negative integers, find the maximum difference between successive elements in its sorted form. The constraint is that you must solve it in time and space. Since comparison sorts are off the table, this problem forces you to think about non-comparison-based sorting.
Every comparison-based sort builds a decision tree. Each comparison splits remaining possibilities into two branches. With possible orderings of elements, the tree needs at least levels, which by Stirling's approximation is . No comparison sort can beat this.
However, if you know extra information about your data, for instance, that all values fall within a range , you can avoid comparisons altogether by using the values as indices into an array. This is the core idea behind counting sort and bucket sort.
Counting sort works when values come from a known, finite range . The idea is simple: count how many times each value appears, then reconstruct the sorted array from those counts.
count[element].Input: [4, 2, 2, 8, 3, 3, 1], range
Step 1: Build count array (size 9, indices 0..8)
index: 0 1 2 3 4 5 6 7 8
count: 0 1 2 2 1 0 0 0 1
Step 2: Reconstruct sorted array by reading counts left to right
count[0] = 0 -> (nothing)
count[1] = 1 -> [1]
count[2] = 2 -> [1, 2, 2]
count[3] = 2 -> [1, 2, 2, 3, 3]
count[4] = 1 -> [1, 2, 2, 3, 3, 4]
count[8] = 1 -> [1, 2, 2, 3, 3, 4, 8]
Output: [1, 2, 2, 3, 3, 4, 8]The time complexity is where is the number of elements and is the range. Space is also . When , the whole algorithm runs in linear time.
def counting_sort(nums, max_val):
count = [0] * (max_val + 1)
for num in nums:
count[num] += 1
result = []
for val in range(max_val + 1):
result.extend([val] * count[val])
return resultBucket sort generalizes counting sort. Instead of one slot per value, you distribute elements into a fixed number of buckets that each cover a subrange of values. You then sort each bucket individually (often with insertion sort or another simple method) and concatenate the results.
The key advantage of bucket sort is that you do not need one slot per possible value. You can choose any number of buckets , and as long as the elements are roughly uniformly distributed, each bucket will be small. The expected time is , which becomes when .
LeetCode 164. Maximum Gap is the textbook application of bucket sort. The key insight uses the pigeonhole principle: the maximum gap in the sorted array must be at least as large as the average gap.
Given elements with minimum value and maximum value , there are gaps between successive sorted elements. The total range is , so the average gap is:
$$
The maximum gap must be . If we create buckets of size equal to the average gap, then the maximum gap cannot occur between two elements in the same bucket. It must occur between the maximum of one bucket and the minimum of the next non-empty bucket.
This means we only need to track the min and max of each bucket: no need to sort within buckets at all.
Input: [3, 6, 9, 1]
Step 1: Find range
lo = 1, hi = 9, n = 4
Step 2: Calculate bucket size
bucket_size = max(1, (9 - 1) / (4 - 1)) = max(1, 2) = 2
(We use ceiling or ensure at least 1 to avoid zero-width buckets)
Step 3: Calculate number of buckets
bucket_count = (9 - 1) / 2 + 1 = 5
Bucket ranges: [1,2], [3,4], [5,6], [7,8], [9,10]
Step 4: Place elements into buckets (index = (val - lo) / bucket_size)
1 -> bucket 0 (index = (1-1)/2 = 0)
3 -> bucket 1 (index = (3-1)/2 = 1)
6 -> bucket 2 (index = (6-1)/2 = 2)
9 -> bucket 4 (index = (9-1)/2 = 4)
Bucket 0: min=1, max=1
Bucket 1: min=3, max=3
Bucket 2: min=6, max=6
Bucket 3: (empty)
Bucket 4: min=9, max=9
Step 5: Find maximum gap between consecutive non-empty buckets
Gap between bucket 0 max and bucket 1 min: 3 - 1 = 2
Gap between bucket 1 max and bucket 2 min: 6 - 3 = 3
Gap between bucket 2 max and bucket 4 min: 9 - 6 = 3
Maximum gap = 3def maximum_gap(nums):
if len(nums) < 2:
return 0
lo, hi = min(nums), max(nums)
if lo == hi:
return 0
n = len(nums)
bucket_size = max(1, (hi - lo) // (n - 1))
bucket_count = (hi - lo) // bucket_size + 1
# Each bucket stores [min, max]
buckets = [[float('inf'), float('-inf')] for _ in range(bucket_count)]
for num in nums:
idx = (num - lo) // bucket_size
buckets[idx][0] = min(buckets[idx][0], num)
buckets[idx][1] = max(buckets[idx][1], num)
# Walk through non-empty buckets and find max gap
max_gap = 0
prev_max = lo
for bucket_min, bucket_max in buckets:
if bucket_min == float('inf'):
continue # empty bucket
max_gap = max(max_gap, bucket_min - prev_max)
prev_max = bucket_max
return max_gapThis runs in time and space. We make two passes over the array (to find min/max and to distribute), then one pass over the buckets. The number of buckets is at most , so everything stays linear.
LeetCode 274. H-Index: given an array of citation counts, find the largest such that papers have at least citations. A natural approach sorts the array in , but counting sort does it in .
The insight is that any citation count is equivalent for the purpose of computing h-index, so you can cap all values at . This gives a bounded range , perfect for counting sort. After building the count array, walk from the right (highest citations) and accumulate counts. The first index where the running total index is your h-index.
def h_index(citations):
n = len(citations)
count = [0] * (n + 1)
for c in citations:
count[min(c, n)] += 1
total = 0
for h in range(n, -1, -1):
total += count[h]
if total >= h:
return h
return 0LeetCode 451. Sort Characters By Frequency: return the string sorted by character frequency in descending order. Count character frequencies with a hash map, then use bucket sort: create buckets indexed by frequency (from 1 to ) and place characters into their corresponding frequency bucket. Read buckets from highest to lowest.
def frequency_sort(s):
freq = {}
for ch in s:
freq[ch] = freq.get(ch, 0) + 1
# Bucket sort by frequency
buckets = [[] for _ in range(len(s) + 1)]
for ch, f in freq.items():
buckets[f].append(ch)
result = []
for f in range(len(s), 0, -1):
for ch in buckets[f]:
result.append(ch * f)
return ''.join(result)LeetCode 347. Top K Frequent Elements: return the most frequent elements. A heap-based approach runs in , but bucket sort by frequency achieves .
Count frequencies, then create buckets where bucket holds elements with frequency . Walk from the highest bucket downward, collecting elements until you have of them.
def top_k_frequent(nums, k):
freq = {}
for num in nums:
freq[num] = freq.get(num, 0) + 1
buckets = [[] for _ in range(len(nums) + 1)]
for num, f in freq.items():
buckets[f].append(num)
result = []
for f in range(len(nums), 0, -1):
for num in buckets[f]:
result.append(num)
if len(result) == k:
return result
return result| Algorithm | Time | Space | When to Use |
|---|---|---|---|
| Counting sort | Values in bounded range , | ||
| Bucket sort | expected | Uniform distribution, buckets | |
| Maximum Gap | Only need gap info, not full sort | ||
| Frequency bucket sort | Sorting or selecting by frequency |
The common thread is exploiting structure in your data, whether bounded ranges, known distributions, or frequency-based ordering, to avoid the comparison sort barrier. Whenever a problem demands linear-time sorting or ranking, consider whether counting sort or bucket sort applies.
You're probably looking at bucket or counting sort when:
Common templates:
Practice Problems (10)