Classic Designs
Design a real-time messaging system like WhatsApp or Slack. Covers WebSocket connections, message ordering, presence, and offline message delivery.
Designing a real-time chat system like WhatsApp or Slack is a staple of system design interviews. It tests your understanding of WebSockets, message ordering, delivery guarantees, presence detection, and the fan-out problem for group chats. The challenge is not just sending messages -- it is ensuring they arrive in order, exactly once, and even when the recipient is offline.
HTTP is request-response: the client must ask the server for new messages. For real-time chat, you need a persistent, bidirectional connection. WebSockets provide exactly this.
┌────────┐ WebSocket ┌─────────────────┐
│ Client │ ◄══════════════════════► │ WebSocket Gateway │
└────────┘ persistent, full-duplex └────────┬────────┘
│
┌──────┴──────┐
│ Chat Service │
└─────────────┘Each user maintains a WebSocket connection to a gateway server. The gateway maintains an in-memory mapping of `user_id → connection`. When a message arrives for a user, the system looks up which gateway server holds their connection and routes the message there.
Connection Registry (Redis or in-memory):
user_42 → gateway-server-3
user_99 → gateway-server-1
user_17 → gateway-server-3At scale, you need many gateway servers behind a load balancer. Since WebSocket connections are stateful, you cannot use simple round-robin. Options:
1. Alice sends a message to Bob via her WebSocket connection.
2. Gateway receives the message and forwards it to the Chat Service.
3. Chat Service:
a. Generates a message ID and timestamp.
b. Stores the message in the database.
c. Looks up Bob's gateway in the connection registry.
d. If Bob is online: pushes the message to Bob's gateway → Bob's client.
e. If Bob is offline: stores in an offline queue.
4. Bob's client receives the message and sends a delivery acknowledgment.
5. Chat Service updates the message status to "delivered" and notifies Alice.Chat messages are write-heavy (billions per day) and access patterns are sequential (users scroll through recent messages). This favors a wide-column store like Cassandra or HBase.
Table: messages
Partition Key: conversation_id
Clustering Key: message_id (time-ordered)
┌──────────────────┬────────────┬──────────┬─────────────────────┬───────────┐
│ conversation_id │ message_id │ sender │ timestamp │ content │
├──────────────────┼────────────┼──────────┼─────────────────────┼───────────┤
│ conv_alice_bob │ msg_001 │ alice │ 2024-01-15T10:30:00 │ "Hey!" │
│ conv_alice_bob │ msg_002 │ bob │ 2024-01-15T10:30:05 │ "Hi!" │
│ conv_alice_bob │ msg_003 │ alice │ 2024-01-15T10:30:10 │ "How are..│
└──────────────────┴────────────┴──────────┴─────────────────────┴───────────┘Using `conversation_id` as the partition key ensures all messages in a conversation are stored on the same node, making range queries (fetching recent messages) efficient.
Ordering is deceptively hard in distributed systems. Two messages sent at "the same time" from different clients may arrive at the server in different orders. Wall-clock timestamps are unreliable because clocks drift between machines.
A Lamport timestamp is a logical clock that provides a partial ordering of events. Each process maintains a counter. When sending a message, increment the counter and include it. When receiving a message, set your counter to `max(local_counter, received_counter) + 1`.
class LamportClock:
def __init__(self):
self.counter = 0
def send(self):
self.counter += 1
return self.counter
def receive(self, received_timestamp):
self.counter = max(self.counter, received_timestamp) + 1
return self.counterA simpler approach for chat: the server assigns a monotonically increasing sequence number to each message within a conversation. Since all messages in a conversation flow through the chat service, the service can maintain a per-conversation counter. This is the approach WhatsApp uses.
Message from Alice → Server assigns seq=1 for conv_alice_bob
Message from Bob → Server assigns seq=2 for conv_alice_bob
Message from Alice → Server assigns seq=3 for conv_alice_bobThe client displays messages sorted by sequence number, not by local timestamps.
The client sends a periodic heartbeat (e.g., every 30 seconds) to the server. If the server does not receive a heartbeat within the timeout window, it marks the user as offline.
Presence Store (Redis with TTL):
user_42: { status: "online", last_seen: 1705312200, ttl: 60s }
user_99: { status: "online", last_seen: 1705312195, ttl: 60s }When Alice comes online, who needs to know? Only users who have Alice in their contact list AND are currently online. Broadcasting presence to all of Alice's contacts would be wasteful for users with thousands of contacts.
Optimization: Only send presence updates to users who are actively viewing a chat with Alice. The client subscribes to presence for visible conversations, not all contacts.
Group messages require fan-out: one incoming message must be delivered to N group members. There are two approaches:
When a message is sent to a group, the server creates a copy of the message in each member's inbox. This is WhatsApp's approach for groups up to 256 members.
Alice sends "Hello" to Group G (members: Alice, Bob, Carol, Dave)
→ Write to Bob's inbox: { from: "Alice", group: "G", content: "Hello" }
→ Write to Carol's inbox: { from: "Alice", group: "G", content: "Hello" }
→ Write to Dave's inbox: { from: "Alice", group: "G", content: "Hello" }Pros: Read path is simple. Each user reads from their own inbox. Cons: Write amplification. A message to a 256-member group creates 255 writes.
Store the message once in the group's message log. When a member opens the group, they read from the shared log. This is Slack's approach for channels with thousands of members.
Alice sends "Hello" to Channel #general (5000 members)
→ Write once to channel_general message log
When Bob opens #general:
→ Read from channel_general log where seq > Bob's last_read_seqPros: Single write regardless of group size. Cons: Read path is heavier. Notifications and unread counts require additional logic.
Use write-time fan-out for small groups (< 100 members) and read-time fan-out for large channels. This is what most production systems do.
When a user is offline, messages must be queued and delivered when they reconnect.
1. Chat Service detects Bob is offline (not in connection registry).
2. Message is stored in the database (it always is) and Bob's unread counter is incremented.
3. A push notification is sent via APNs/FCM.
4. When Bob reconnects:
a. Client sends its last known message sequence number.
b. Server sends all messages with sequence numbers greater than that.
c. Bob's client acknowledges receipt.This "sync" protocol ensures no messages are lost, even if the push notification fails. The client is the ultimate source of truth for what it has received.
WhatsApp uses the Signal Protocol for end-to-end encryption. The server never sees plaintext messages. Key points for interviews:
This means the server cannot implement server-side search on message content. Clients must handle search locally.
┌──────────────────┐
Clients ──────────> │ Load Balancer │
└────────┬─────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ WS Gateway │ │ WS Gateway │ │ WS Gateway │
│ Server 1 │ │ Server 2 │ │ Server 3 │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌──────┴───────┐
│ Chat Service │
└──────┬───────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Message DB │ │ Redis │ │Push Notif. │
│ (Cassandra) │ │(Presence + │ │ Service │
│ │ │ Registry) │ │(APNs/FCM) │
└─────────────┘ └────────────┘ └────────────┘