← back

Infrastructure

Consistent Hashing

Distribute data across nodes while minimizing redistribution when nodes join or leave. The foundation of distributed caches and databases.

Consistent Hashing

Consistent hashing is one of the most important algorithms in distributed systems. It solves a seemingly simple problem: how do you distribute data across a cluster of servers so that adding or removing a server does not require reshuffling everything? Without consistent hashing, modern systems like Amazon DynamoDB, Apache Cassandra, Akamai's CDN, and Discord's message routing would not work at their current scale.

The Problem with Naive Hashing

The simplest way to distribute keys across N servers is modular hashing:

1
server_index = hash(key) % N

This works until you add or remove a server. When N changes, almost every key maps to a different server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Example: 4 servers, 8 keys

hash(key) % 4:
  key_0 → server 0    key_4 → server 0
  key_1 → server 1    key_5 → server 1
  key_2 → server 2    key_6 → server 2
  key_3 → server 3    key_7 → server 3

Now add 1 server (N=5), hash(key) % 5:
  key_0 → server 0 ✓  key_4 → server 4 ✗ (moved!)
  key_1 → server 1 ✓  key_5 → server 0 ✗ (moved!)
  key_2 → server 2 ✓  key_6 → server 1 ✗ (moved!)
  key_3 → server 3 ✓  key_7 → server 2 ✗ (moved!)

Result: 50% of keys need to move. At scale, this causes a cache stampede.

In a caching layer with millions of keys, this mass redistribution means millions of simultaneous cache misses, overwhelming the backend database. This is called a cache avalanche.

The Consistent Hashing Algorithm

Consistent hashing maps both servers and keys onto a circular hash space (the "hash ring"), typically ranging from 0 to 2^32 - 1 or 0 to 2^128 - 1.

Step 1: Place Servers on the Ring

Hash each server's identifier (IP address, hostname) to get its position on the ring.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hash ring (0 to 360 for simplicity):

                    0°/360°
                      |
                 S_A (30°)
                /          \
              /              \
           /                   S_B (120°)
         |                      |
         |       Hash Ring      |
         |                      |
           \                   S_C (210°)
              \              /
                \          /
                 S_D (300°)
                      |
                    180°

Step 2: Map Keys to Servers

To find which server stores a key, hash the key to get its position on the ring, then walk clockwise until you hit a server.

1
2
3
4
hash("user:42") = 95°  → walk clockwise → hits S_B (120°) → stored on S_B
hash("user:99") = 250° → walk clockwise → hits S_D (300°) → stored on S_D
hash("user:17") = 350° → walk clockwise → hits S_A (30°)  → stored on S_A
hash("user:55") = 150° → walk clockwise → hits S_C (210°) → stored on S_C

Step 3: Adding a Server

When a new server S_E joins at position 170°, only the keys in the arc between S_B (120°) and S_E (170°) need to move -- from S_C to S_E. All other keys stay put.

1
2
3
4
5
6
7
8
9
Before adding S_E:
  Keys in range (120°, 210°] → S_C

After adding S_E at 170°:
  Keys in range (120°, 170°] → S_E (moved from S_C)
  Keys in range (170°, 210°] → S_C (unchanged)

Only keys between 120° and 170° are affected.
On average: K/N keys move (where K = total keys, N = total servers).

Step 4: Removing a Server

When S_B (120°) is removed, its keys move to the next server clockwise (S_E or S_C, depending on the configuration). Again, only the keys that were on S_B are affected.

1
2
3
4
5
Before removing S_B:
  Keys in range (30°, 120°] → S_B

After removing S_B:
  Keys in range (30°, 170°] → S_E (S_B's keys absorbed by next clockwise node)

Virtual Nodes

Basic consistent hashing has a flaw: with a small number of servers, the distribution is uneven. One server might own 60% of the ring while another owns 10%.

Virtual nodes solve this. Each physical server is mapped to multiple positions on the ring. Instead of one hash per server, you create V hashes (e.g., V=150).

1
2
3
Physical server A → Virtual nodes: A_0 (30°), A_1 (130°), A_2 (270°), A_3 (340°)
Physical server B → Virtual nodes: B_0 (80°), B_1 (190°), B_2 (310°), B_3 (390°)
Physical server C → Virtual nodes: C_0 (20°), C_1 (160°), C_2 (230°), C_3 (360°)
1
2
3
4
5
6
7
8
9
Ring with virtual nodes (V=4 per server):

  C_0  A_0  B_0  A_1  C_1  B_1  C_2  A_2  B_2  A_3  C_3
   20   30   80  130  160  190  230  270  310  340  360
   |    |    |    |    |    |    |    |    |    |    |
   ●────●────●────●────●────●────●────●────●────●────●──→

Keys are more evenly distributed because each server
has multiple, spread-out positions on the ring.

Benefits of Virtual Nodes

  1. Better load distribution. With 150+ virtual nodes per server, the standard deviation of load drops to under 5%.
  2. Heterogeneous hardware. A more powerful server can have more virtual nodes, receiving a proportionally larger share of data.
  3. Smoother rebalancing. When a server is added, it takes small slices of data from many servers rather than a large chunk from one.

Cost of Virtual Nodes

More virtual nodes means more entries in the ring metadata and slightly more memory. With 100 servers and 150 virtual nodes each, you have 15,000 ring entries -- trivial for modern systems.

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import hashlib
import bisect

class ConsistentHashRing:
    def __init__(self, virtual_nodes=150):
        self.virtual_nodes = virtual_nodes
        self.ring = {}          # hash_value → server_name
        self.sorted_keys = []   # sorted list of hash values

    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_server(self, server):
        for i in range(self.virtual_nodes):
            virtual_key = f"{server}:vn{i}"
            hash_val = self._hash(virtual_key)
            self.ring[hash_val] = server
            bisect.insort(self.sorted_keys, hash_val)

    def remove_server(self, server):
        for i in range(self.virtual_nodes):
            virtual_key = f"{server}:vn{i}"
            hash_val = self._hash(virtual_key)
            del self.ring[hash_val]
            self.sorted_keys.remove(hash_val)

    def get_server(self, key):
        if not self.ring:
            return None
        hash_val = self._hash(key)
        # Find the first server clockwise from this hash
        idx = bisect.bisect_right(self.sorted_keys, hash_val)
        if idx == len(self.sorted_keys):
            idx = 0  # Wrap around the ring
        return self.ring[self.sorted_keys[idx]]

# Usage
ring = ConsistentHashRing(virtual_nodes=150)
ring.add_server("server_A")
ring.add_server("server_B")
ring.add_server("server_C")

print(ring.get_server("user:42"))   # → "server_B"
print(ring.get_server("user:99"))   # → "server_A"

# Adding a server only moves ~1/N of keys
ring.add_server("server_D")
print(ring.get_server("user:42"))   # Likely still "server_B"

Rebalancing

When a server is added or removed, data must be physically moved. The consistent hashing algorithm tells you which keys need to move, but the actual data transfer requires coordination.

Approach 1: Lazy Rebalancing

When a key is requested and the responsible server does not have it, it fetches the data from the previous owner. Over time, all hot keys migrate naturally. Cold keys may remain on the old server until a background sweep moves them.

Approach 2: Proactive Rebalancing

When the ring changes, a background process identifies all keys that need to move and transfers them in batches. This ensures consistency faster but generates network traffic.

Approach 3: Range Transfer

Instead of moving individual keys, transfer entire key ranges. Since virtual nodes define contiguous arcs on the ring, the data within an arc can be transferred as a single bulk operation.

Real-World Usage

Amazon DynamoDB

DynamoDB uses consistent hashing to distribute data across storage nodes. Virtual nodes ensure even distribution. When capacity is added, new nodes take over ranges from existing nodes with minimal disruption. The paper "Dynamo: Amazon's Highly Available Key-value Store" popularized consistent hashing in production systems.

Akamai CDN

Akamai was one of the first companies to use consistent hashing (1997, invented by Karger et al. at MIT, with Akamai as the motivating application). CDN edge servers use consistent hashing to determine which server caches which content. When a server goes down, its content is automatically served by the next server on the ring, avoiding a thundering herd to the origin.

Apache Cassandra

Cassandra uses consistent hashing to distribute data across a cluster. Each node is assigned token ranges on the ring. Virtual nodes (called vnodes in Cassandra) are enabled by default. The `Murmur3Partitioner` hashes partition keys to positions on the ring.

Discord

Discord uses consistent hashing to route WebSocket connections and messages to the correct guild (server) handler. When they scale up, new nodes take over a portion of guilds with minimal disruption.

Comparison with Other Approaches

ApproachKeys moved on add/removeLoad balanceComplexity
Modular hash (key % N)~100% of keysGoodSimple
Consistent hashing~K/N keysPoor (without vnodes)Moderate
Consistent hashing + vnodes~K/N keysExcellentModerate
Range-based partitioningDepends on splitManual tuningComplex
Hash slots (Redis Cluster)Slot-level migrationGoodModerate

Interview Tips

  • Draw the ring. Always start by drawing the hash ring with servers and keys. It makes the explanation 10x clearer.
  • Explain the "why" first. Start with the modular hashing problem (cache avalanche), then present consistent hashing as the solution.
  • Always mention virtual nodes. Basic consistent hashing is incomplete without them. Interviewers expect you to address the uneven distribution problem.
  • Know the math. When adding a server to an N-server cluster, approximately K/N keys are redistributed (where K is total keys). This is optimal -- you cannot do better.
  • Connect to real systems. Mentioning DynamoDB, Cassandra, or CDNs shows you understand where this algorithm is applied in practice.
  • Discuss the hash function choice. MD5 and SHA-1 produce good distribution but are cryptographic (slower). Murmur3 and xxHash are faster non-cryptographic alternatives used in production systems.
  • Mention bounded load. Google's "Consistent Hashing with Bounded Loads" paper (2017) addresses the problem where even with virtual nodes, some servers can be temporarily overloaded. It introduces a load cap per server, redirecting excess load to the next server on the ring.