← back

Segment Tree / Fenwick Tree

Coordinate Compression

Map large value ranges to compact indices by sorting unique values and assigning ranks. Enables Fenwick trees, segment trees, and sweep lines to operate on values up to 10^9 with only O(n) space.

Coordinate Compression

Imagine you need to count, for each element in an array, how many smaller elements appear to its right. A Fenwick tree (BIT) indexed by value solves this elegantly: process from right to left, query the prefix sum up to value - 1, then insert the current value. The problem? Values can be as large as 10910^9, and you cannot allocate an array of that size. Yet the input has at most n105n \leq 10^5 elements, so at most 10510^5 distinct values actually matter. Coordinate compression bridges this gap by mapping the original values down to a dense range [0,k1][0, k-1] where kk is the number of distinct values, letting you index any array-backed data structure by rank instead of by raw value.

The Core Idea

Sort the unique values and assign each one a rank. The relative order of all values is preserved: the smallest gets rank 0, the next smallest gets rank 1, and so on. Any query or update that depends only on relative ordering (comparisons like "less than" or "greater than") produces the same answer whether you use the original values or their ranks.

Walked Example

Given the array:

1
[100, 3, 50000, 3, 100]

Step 1: Extract unique values and sort them:

1
sorted_unique = [3, 100, 50000]

Step 2: Assign ranks:

1
{3: 0, 100: 1, 50000: 2}

Step 3: Replace each original value with its rank:

1
[1, 0, 2, 0, 1]

The compressed array lives in [0,2][0, 2], so a Fenwick tree or segment tree of size 3 suffices, no matter how large the original values were.

Code Pattern

There are two common approaches. The first builds an explicit rank map:

1
2
3
4
def compress(nums):
    sorted_unique = sorted(set(nums))
    rank = {v: i for i, v in enumerate(sorted_unique)}
    return [rank[x] for x in nums], len(sorted_unique)

The second uses binary search with bisect_left, which avoids building a dictionary and is useful when you want 0-indexed ranks:

1
2
3
4
5
from bisect import bisect_left

def compress(nums):
    sorted_unique = sorted(set(nums))
    return [bisect_left(sorted_unique, x) for x in nums], len(sorted_unique)

Both produce identical results. The dictionary approach is O(1)O(1) per lookup after O(klogk)O(k \log k) sorting. The bisect approach is O(logk)O(\log k) per lookup but uses less memory. In practice, either works fine for n105n \leq 10^5.

When using a Fenwick tree (which is 1-indexed), shift ranks up by 1 so the smallest value maps to index 1 instead of 0:

1
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}

Application: Count of Smaller Numbers After Self

LeetCode 315. Count of Smaller Numbers After Self

Given an array nums, return a list where result[i] is the count of elements to the right of nums[i] that are strictly smaller. The brute-force O(n2)O(n^2) approach checks every pair; coordinate compression with a Fenwick tree brings this down to O(nlogn)O(n \log n).

Strategy: Process the array from right to left. For each element, query the BIT for the prefix sum up to rank - 1 (counting how many smaller elements have already been inserted), then update the BIT at position rank.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from bisect import bisect_left

def countSmaller(nums):
    sorted_unique = sorted(set(nums))
    rank = {v: bisect_left(sorted_unique, v) + 1 for v in sorted_unique}
    k = len(sorted_unique)

    tree = [0] * (k + 1)

    def update(i):
        while i <= k:
            tree[i] += 1
            i += i & (-i)

    def query(i):
        s = 0
        while i > 0:
            s += tree[i]
            i -= i & (-i)
        return s

    result = []
    for x in reversed(nums):
        r = rank[x]
        result.append(query(r - 1))
        update(r)

    return result[::-1]

Walkthrough with nums = [5, 2, 6, 1]:

Sorted unique values: [1, 2, 5, 6] with ranks {1:1, 2:2, 5:3, 6:4}.

Processing right to left:

  • x=1, rank=1: query(0) = 0, update(1). Result so far: [0]
  • x=6, rank=4: query(3) = 1, update(4). Result so far: [0, 1]
  • x=2, rank=2: query(1) = 1, update(2). Result so far: [0, 1, 1]
  • x=5, rank=3: query(2) = 2, update(3). Result so far: [0, 1, 1, 2]

Reverse to get: [2, 1, 1, 0]. Correct.

Application: Reverse Pairs

LeetCode 493. Reverse Pairs

A reverse pair is a pair (i, j) where i < j and nums[i] > 2 * nums[j]. The tricky part is that the comparison involves 2 * nums[j], not nums[j] directly. During coordinate compression, include both the original values and their doubles in the set of values to compress:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def reversePairs(nums):
    all_vals = set()
    for x in nums:
        all_vals.add(x)
        all_vals.add(2 * x)
    sorted_unique = sorted(all_vals)
    rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
    k = len(sorted_unique)

    tree = [0] * (k + 1)

    def update(i):
        while i <= k:
            tree[i] += 1
            i += i & (-i)

    def query(i):
        s = 0
        while i > 0:
            s += tree[i]
            i -= i & (-i)
        return s

    count = 0
    for x in reversed(nums):
        # Count elements already inserted with rank < rank[x]
        # (those are to the right and smaller than x)
        # We need elements y where x > 2*y, i.e., 2*y < x
        count += query(rank[x] - 1)
        # Insert 2*x so future elements query against doubled values
        update(rank[2 * x])

    return count

The key insight is that we insert 2 * nums[j] into the BIT (not nums[j]), so when a future element nums[i] queries for values less than nums[i], it is effectively checking 2 * nums[j] < nums[i].

Other Applications

Sweep line with large coordinates. Interval scheduling or rectangle union-area problems sometimes involve coordinates up to 10910^9. Compress all x-coordinates (or y-coordinates) to a dense range, then use a segment tree over the compressed axis. The classic LeetCode 850. Rectangle Area II uses this approach.

Segment tree with huge value ranges. Any problem where you'd build a segment tree indexed by value (not by array position) benefits from compression. Range frequency queries, counting distinct values in ranges, and offline query problems all commonly compress their value domain first.

Discretizing continuous or sparse data. Problems involving intervals on a number line with endpoints up to 10910^9 often compress endpoints into a range of size O(n)O(n), enabling efficient processing with arrays or trees.

Complexity

Sorting the unique values costs O(nlogn)O(n \log n). Building the rank map is O(n)O(n) (or O(nlogn)O(n \log n) with bisect). Each subsequent BIT or segment tree operation is O(logk)O(\log k) where knk \leq n is the number of distinct values, so the overall algorithm remains O(nlogn)O(n \log n). Space is O(n)O(n) for the rank map and the data structure.

Recognizing this pattern

You're probably looking at coordinate compression when:

  • The value domain is huge (nums[i] up to 10910^9, timestamps up to 101810^{18}, coordinates on a near-infinite line) but n105n \leq 10^5 or so.
  • You want an array-indexed structure (Fenwick tree, segment tree, counting array, bucket array) keyed by value, not by position.
  • Only relative order of values matters (less-than comparisons, ranks), not the absolute magnitudes.
  • The problem involves counting inversions, smaller-on-the-right, range frequencies, or sweep-line over real-valued intervals.
  • You catch yourself wishing for "an array of size max(nums)" that is obviously too big.

Common templates:

  • Rank-mapped Fenwick tree for inversions: compress values, then BIT-count smaller/greater on the right. Example: Count of Smaller Numbers After Self.
  • Compress paired values: include both x and a transformed version (2*x, x + k) in the rank set so cross-comparisons stay consistent. Example: Reverse Pairs.
  • Sweep line over compressed axis: sort all event endpoints, dedupe to ranks, run a segment tree over the dense axis. Example: Rectangle Area II.
  • Offline range queries on a value domain: sort queries, compress query/array values together, answer with a BIT. Example: Range Sum of Sorted Subarray Sums (when paired with prefix sums) or Count of Range Sum.
  • Discretizing interval endpoints: map sparse 1D coordinates to [0,k)[0, k) for difference-array or segment-tree sweeps. Example: My Calendar III.