Segment Tree / Fenwick Tree
Map large value ranges to compact indices by sorting unique values and assigning ranks. Enables Fenwick trees, segment trees, and sweep lines to operate on values up to 10^9 with only O(n) space.
Imagine you need to count, for each element in an array, how many smaller elements appear to its right. A Fenwick tree (BIT) indexed by value solves this elegantly: process from right to left, query the prefix sum up to value - 1, then insert the current value. The problem? Values can be as large as , and you cannot allocate an array of that size. Yet the input has at most elements, so at most distinct values actually matter. Coordinate compression bridges this gap by mapping the original values down to a dense range where is the number of distinct values, letting you index any array-backed data structure by rank instead of by raw value.
Sort the unique values and assign each one a rank. The relative order of all values is preserved: the smallest gets rank 0, the next smallest gets rank 1, and so on. Any query or update that depends only on relative ordering (comparisons like "less than" or "greater than") produces the same answer whether you use the original values or their ranks.
Given the array:
[100, 3, 50000, 3, 100]Step 1: Extract unique values and sort them:
sorted_unique = [3, 100, 50000]Step 2: Assign ranks:
{3: 0, 100: 1, 50000: 2}Step 3: Replace each original value with its rank:
[1, 0, 2, 0, 1]The compressed array lives in , so a Fenwick tree or segment tree of size 3 suffices, no matter how large the original values were.
There are two common approaches. The first builds an explicit rank map:
def compress(nums):
sorted_unique = sorted(set(nums))
rank = {v: i for i, v in enumerate(sorted_unique)}
return [rank[x] for x in nums], len(sorted_unique)The second uses binary search with bisect_left, which avoids building a dictionary and is useful when you want 0-indexed ranks:
from bisect import bisect_left
def compress(nums):
sorted_unique = sorted(set(nums))
return [bisect_left(sorted_unique, x) for x in nums], len(sorted_unique)Both produce identical results. The dictionary approach is per lookup after sorting. The bisect approach is per lookup but uses less memory. In practice, either works fine for .
When using a Fenwick tree (which is 1-indexed), shift ranks up by 1 so the smallest value maps to index 1 instead of 0:
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}LeetCode 315. Count of Smaller Numbers After Self
Given an array nums, return a list where result[i] is the count of elements to the right of nums[i] that are strictly smaller. The brute-force approach checks every pair; coordinate compression with a Fenwick tree brings this down to .
Strategy: Process the array from right to left. For each element, query the BIT for the prefix sum up to rank - 1 (counting how many smaller elements have already been inserted), then update the BIT at position rank.
from bisect import bisect_left
def countSmaller(nums):
sorted_unique = sorted(set(nums))
rank = {v: bisect_left(sorted_unique, v) + 1 for v in sorted_unique}
k = len(sorted_unique)
tree = [0] * (k + 1)
def update(i):
while i <= k:
tree[i] += 1
i += i & (-i)
def query(i):
s = 0
while i > 0:
s += tree[i]
i -= i & (-i)
return s
result = []
for x in reversed(nums):
r = rank[x]
result.append(query(r - 1))
update(r)
return result[::-1]Walkthrough with nums = [5, 2, 6, 1]:
Sorted unique values: [1, 2, 5, 6] with ranks {1:1, 2:2, 5:3, 6:4}.
Processing right to left:
x=1, rank=1: query(0) = 0, update(1). Result so far: [0]x=6, rank=4: query(3) = 1, update(4). Result so far: [0, 1]x=2, rank=2: query(1) = 1, update(2). Result so far: [0, 1, 1]x=5, rank=3: query(2) = 2, update(3). Result so far: [0, 1, 1, 2]Reverse to get: [2, 1, 1, 0]. Correct.
A reverse pair is a pair (i, j) where i < j and nums[i] > 2 * nums[j]. The tricky part is that the comparison involves 2 * nums[j], not nums[j] directly. During coordinate compression, include both the original values and their doubles in the set of values to compress:
def reversePairs(nums):
all_vals = set()
for x in nums:
all_vals.add(x)
all_vals.add(2 * x)
sorted_unique = sorted(all_vals)
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
k = len(sorted_unique)
tree = [0] * (k + 1)
def update(i):
while i <= k:
tree[i] += 1
i += i & (-i)
def query(i):
s = 0
while i > 0:
s += tree[i]
i -= i & (-i)
return s
count = 0
for x in reversed(nums):
# Count elements already inserted with rank < rank[x]
# (those are to the right and smaller than x)
# We need elements y where x > 2*y, i.e., 2*y < x
count += query(rank[x] - 1)
# Insert 2*x so future elements query against doubled values
update(rank[2 * x])
return countThe key insight is that we insert 2 * nums[j] into the BIT (not nums[j]), so when a future element nums[i] queries for values less than nums[i], it is effectively checking 2 * nums[j] < nums[i].
Sweep line with large coordinates. Interval scheduling or rectangle union-area problems sometimes involve coordinates up to . Compress all x-coordinates (or y-coordinates) to a dense range, then use a segment tree over the compressed axis. The classic LeetCode 850. Rectangle Area II uses this approach.
Segment tree with huge value ranges. Any problem where you'd build a segment tree indexed by value (not by array position) benefits from compression. Range frequency queries, counting distinct values in ranges, and offline query problems all commonly compress their value domain first.
Discretizing continuous or sparse data. Problems involving intervals on a number line with endpoints up to often compress endpoints into a range of size , enabling efficient processing with arrays or trees.
Sorting the unique values costs . Building the rank map is (or with bisect). Each subsequent BIT or segment tree operation is where is the number of distinct values, so the overall algorithm remains . Space is for the rank map and the data structure.
You're probably looking at coordinate compression when:
nums[i] up to , timestamps up to , coordinates on a near-infinite line) but or so.Common templates:
x and a transformed version (2*x, x + k) in the rank set so cross-comparisons stay consistent. Example: Reverse Pairs.Practice Problems (2)