Infrastructure
Distribute data across nodes while minimizing redistribution when nodes join or leave. The foundation of distributed caches and databases.
Consistent hashing is one of the most important algorithms in distributed systems. It solves a seemingly simple problem: how do you distribute data across a cluster of servers so that adding or removing a server does not require reshuffling everything? Without consistent hashing, modern systems like Amazon DynamoDB, Apache Cassandra, Akamai's CDN, and Discord's message routing would not work at their current scale.
The simplest way to distribute keys across N servers is modular hashing:
server_index = hash(key) % NThis works until you add or remove a server. When N changes, almost every key maps to a different server.
Example: 4 servers, 8 keys
hash(key) % 4:
key_0 → server 0 key_4 → server 0
key_1 → server 1 key_5 → server 1
key_2 → server 2 key_6 → server 2
key_3 → server 3 key_7 → server 3
Now add 1 server (N=5), hash(key) % 5:
key_0 → server 0 ✓ key_4 → server 4 ✗ (moved!)
key_1 → server 1 ✓ key_5 → server 0 ✗ (moved!)
key_2 → server 2 ✓ key_6 → server 1 ✗ (moved!)
key_3 → server 3 ✓ key_7 → server 2 ✗ (moved!)
Result: 50% of keys need to move. At scale, this causes a cache stampede.In a caching layer with millions of keys, this mass redistribution means millions of simultaneous cache misses, overwhelming the backend database. This is called a cache avalanche.
Consistent hashing maps both servers and keys onto a circular hash space (the "hash ring"), typically ranging from 0 to 2^32 - 1 or 0 to 2^128 - 1.
Hash each server's identifier (IP address, hostname) to get its position on the ring.
Hash ring (0 to 360 for simplicity):
0°/360°
|
S_A (30°)
/ \
/ \
/ S_B (120°)
| |
| Hash Ring |
| |
\ S_C (210°)
\ /
\ /
S_D (300°)
|
180°To find which server stores a key, hash the key to get its position on the ring, then walk clockwise until you hit a server.
hash("user:42") = 95° → walk clockwise → hits S_B (120°) → stored on S_B
hash("user:99") = 250° → walk clockwise → hits S_D (300°) → stored on S_D
hash("user:17") = 350° → walk clockwise → hits S_A (30°) → stored on S_A
hash("user:55") = 150° → walk clockwise → hits S_C (210°) → stored on S_CWhen a new server S_E joins at position 170°, only the keys in the arc between S_B (120°) and S_E (170°) need to move -- from S_C to S_E. All other keys stay put.
Before adding S_E:
Keys in range (120°, 210°] → S_C
After adding S_E at 170°:
Keys in range (120°, 170°] → S_E (moved from S_C)
Keys in range (170°, 210°] → S_C (unchanged)
Only keys between 120° and 170° are affected.
On average: K/N keys move (where K = total keys, N = total servers).When S_B (120°) is removed, its keys move to the next server clockwise (S_E or S_C, depending on the configuration). Again, only the keys that were on S_B are affected.
Before removing S_B:
Keys in range (30°, 120°] → S_B
After removing S_B:
Keys in range (30°, 170°] → S_E (S_B's keys absorbed by next clockwise node)Basic consistent hashing has a flaw: with a small number of servers, the distribution is uneven. One server might own 60% of the ring while another owns 10%.
Virtual nodes solve this. Each physical server is mapped to multiple positions on the ring. Instead of one hash per server, you create V hashes (e.g., V=150).
Physical server A → Virtual nodes: A_0 (30°), A_1 (130°), A_2 (270°), A_3 (340°)
Physical server B → Virtual nodes: B_0 (80°), B_1 (190°), B_2 (310°), B_3 (390°)
Physical server C → Virtual nodes: C_0 (20°), C_1 (160°), C_2 (230°), C_3 (360°)Ring with virtual nodes (V=4 per server):
C_0 A_0 B_0 A_1 C_1 B_1 C_2 A_2 B_2 A_3 C_3
20 30 80 130 160 190 230 270 310 340 360
| | | | | | | | | | |
●────●────●────●────●────●────●────●────●────●────●──→
Keys are more evenly distributed because each server
has multiple, spread-out positions on the ring.More virtual nodes means more entries in the ring metadata and slightly more memory. With 100 servers and 150 virtual nodes each, you have 15,000 ring entries -- trivial for modern systems.
import hashlib
import bisect
class ConsistentHashRing:
def __init__(self, virtual_nodes=150):
self.virtual_nodes = virtual_nodes
self.ring = {} # hash_value → server_name
self.sorted_keys = [] # sorted list of hash values
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_server(self, server):
for i in range(self.virtual_nodes):
virtual_key = f"{server}:vn{i}"
hash_val = self._hash(virtual_key)
self.ring[hash_val] = server
bisect.insort(self.sorted_keys, hash_val)
def remove_server(self, server):
for i in range(self.virtual_nodes):
virtual_key = f"{server}:vn{i}"
hash_val = self._hash(virtual_key)
del self.ring[hash_val]
self.sorted_keys.remove(hash_val)
def get_server(self, key):
if not self.ring:
return None
hash_val = self._hash(key)
# Find the first server clockwise from this hash
idx = bisect.bisect_right(self.sorted_keys, hash_val)
if idx == len(self.sorted_keys):
idx = 0 # Wrap around the ring
return self.ring[self.sorted_keys[idx]]
# Usage
ring = ConsistentHashRing(virtual_nodes=150)
ring.add_server("server_A")
ring.add_server("server_B")
ring.add_server("server_C")
print(ring.get_server("user:42")) # → "server_B"
print(ring.get_server("user:99")) # → "server_A"
# Adding a server only moves ~1/N of keys
ring.add_server("server_D")
print(ring.get_server("user:42")) # Likely still "server_B"When a server is added or removed, data must be physically moved. The consistent hashing algorithm tells you which keys need to move, but the actual data transfer requires coordination.
When a key is requested and the responsible server does not have it, it fetches the data from the previous owner. Over time, all hot keys migrate naturally. Cold keys may remain on the old server until a background sweep moves them.
When the ring changes, a background process identifies all keys that need to move and transfers them in batches. This ensures consistency faster but generates network traffic.
Instead of moving individual keys, transfer entire key ranges. Since virtual nodes define contiguous arcs on the ring, the data within an arc can be transferred as a single bulk operation.
DynamoDB uses consistent hashing to distribute data across storage nodes. Virtual nodes ensure even distribution. When capacity is added, new nodes take over ranges from existing nodes with minimal disruption. The paper "Dynamo: Amazon's Highly Available Key-value Store" popularized consistent hashing in production systems.
Akamai was one of the first companies to use consistent hashing (1997, invented by Karger et al. at MIT, with Akamai as the motivating application). CDN edge servers use consistent hashing to determine which server caches which content. When a server goes down, its content is automatically served by the next server on the ring, avoiding a thundering herd to the origin.
Cassandra uses consistent hashing to distribute data across a cluster. Each node is assigned token ranges on the ring. Virtual nodes (called vnodes in Cassandra) are enabled by default. The `Murmur3Partitioner` hashes partition keys to positions on the ring.
Discord uses consistent hashing to route WebSocket connections and messages to the correct guild (server) handler. When they scale up, new nodes take over a portion of guilds with minimal disruption.
| Approach | Keys moved on add/remove | Load balance | Complexity |
|---|---|---|---|
| Modular hash (key % N) | ~100% of keys | Good | Simple |
| Consistent hashing | ~K/N keys | Poor (without vnodes) | Moderate |
| Consistent hashing + vnodes | ~K/N keys | Excellent | Moderate |
| Range-based partitioning | Depends on split | Manual tuning | Complex |
| Hash slots (Redis Cluster) | Slot-level migration | Good | Moderate |