Networking & Communication

WebSockets and Long Polling

Real-time communication patterns for chat, notifications, and live updates. Compare WebSockets, SSE, and long polling trade-offs.

WebSockets and Long Polling

Most of the web runs on request-response: the client asks, the server answers. But some features need the server to push data to the client — chat messages, live notifications, stock tickers, collaborative editing, real-time dashboards. For these use cases, you need a different communication model. Understanding the spectrum of real-time options and their trade-offs is essential for system design interviews.

The Four Approaches

There are four main techniques for server-to-client communication, each with different characteristics:

1

2

3

4

5

6

Approach          Connection     Direction      Overhead     Complexity
────────────────────────────────────────────────────────────────────────
Short Polling     Repeated       Client → Srv   Very High    Low
Long Polling      Held open      Client → Srv   Medium       Medium
SSE               Persistent     Server → Clt   Low          Low
WebSockets        Persistent     Bidirectional  Very Low     High

Short Polling

The simplest approach: the client repeatedly asks the server for updates at fixed intervals.

Client                          Server
  │                               │
  │──── GET /messages ──────────►│
  │◄─── 200 [] (no new msgs) ───│
  │                               │
  │  (wait 5 seconds)             │
  │                               │
  │──── GET /messages ──────────►│
  │◄─── 200 [] (no new msgs) ───│
  │                               │
  │  (wait 5 seconds)             │
  │                               │
  │──── GET /messages ──────────►│
  │◄─── 200 [{msg: "hello"}] ───│
  │                               │

# Client-side short polling
import time
import requests

def poll_for_messages(last_seen_id=0):
    while True:
        response = requests.get(
            f"/api/messages?since={last_seen_id}"
        )
        messages = response.json()
        for msg in messages:
            display(msg)
            last_seen_id = max(last_seen_id, msg["id"])
        time.sleep(5)  # Fixed interval

Problems with short polling

Wasted requests: Most polls return empty responses. At 1 million connected users polling every 5 seconds, that is 200,000 requests per second — most returning nothing.
Latency: Updates are delayed by up to the polling interval. A 5-second interval means an average of 2.5 seconds of latency.
Trade-off between latency and load: Shorter intervals reduce latency but increase server load proportionally.

Short polling is acceptable for dashboards that refresh every 30-60 seconds, but not for real-time features.

Long Polling

Long polling improves on short polling by having the server hold the request open until there is new data (or a timeout occurs). The client immediately sends a new request when it receives a response.

Client                          Server
  │                               │
  │──── GET /messages ──────────►│
  │         (server holds         │
  │          request open...      │
  │          waiting for data)    │
  │                               │  ◄── new message arrives
  │◄─── 200 [{msg: "hello"}] ───│
  │                               │
  │──── GET /messages ──────────►│  (immediately reconnect)
  │         (server holds         │
  │          request open again)  │
  │                               │

# Server-side long polling
import asyncio

class LongPollingHandler:
    def __init__(self):
        self.waiters = {}  # user_id -> list of futures

    async def get_messages(self, user_id, timeout=30):
        future = asyncio.get_event_loop().create_future()
        self.waiters.setdefault(user_id, []).append(future)

        try:
            # Wait for new data or timeout
            result = await asyncio.wait_for(future, timeout=timeout)
            return {"messages": result}
        except asyncio.TimeoutError:
            return {"messages": []}  # Return empty, client reconnects
        finally:
            self.waiters[user_id].remove(future)

    async def publish_message(self, user_id, message):
        for future in self.waiters.get(user_id, []):
            if not future.done():
                future.set_result([message])

Advantages over short polling

Near-instant delivery when data is available
No wasted empty responses during quiet periods
Works through firewalls and proxies (it is just HTTP)

Drawbacks

Each waiting client holds a server connection open. At scale, this means thousands of open connections.
The repeated connect-wait-response-reconnect cycle has overhead (HTTP headers, TCP setup).
Hard to handle multiple messages arriving at once — the first message triggers the response, and subsequent messages must wait for the client to reconnect.

Who uses it

Long polling is still used when WebSocket support is not available. Facebook's original chat used long polling. It is also a common fallback mechanism — many WebSocket libraries (like Socket.IO) fall back to long polling when WebSocket connections fail.

Server-Sent Events (SSE)

SSE provides a standardized, unidirectional push channel from server to client over a single HTTP connection. The server sends events as a text stream, and the browser's built-in `EventSource` API handles reconnection automatically.

Client                          Server
  │                               │
  │──── GET /events ────────────►│
  │◄─── HTTP 200                 │
  │◄─── Content-Type: text/      │
  │     event-stream             │
  │                               │
  │◄─── data: {"msg":"hi"}\n\n ──│
  │                               │
  │     (connection stays open)   │
  │                               │
  │◄─── data: {"msg":"hey"}\n\n ─│
  │                               │

# Server-side SSE endpoint
from flask import Flask, Response
import json
import time

app = Flask(__name__)

def event_stream(user_id):
    pubsub = redis.pubsub()
    pubsub.subscribe(f"events:{user_id}")

    # Send initial connection event
    yield f"event: connected\ndata: {{}}\n\n"

    for message in pubsub.listen():
        if message["type"] == "message":
            data = json.loads(message["data"])
            yield f"id: {data['id']}\n"
            yield f"event: {data['type']}\n"
            yield f"data: {json.dumps(data['payload'])}\n\n"

@app.route("/events/<user_id>")
def stream(user_id):
    return Response(
        event_stream(user_id),
        mimetype="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive"
        }
    )

Key features

Automatic reconnection: The browser reconnects automatically if the connection drops, with a configurable retry interval.
Event IDs: The server can assign IDs to events. On reconnection, the client sends the last ID it received via the `Last-Event-ID` header, and the server can replay missed events.
Named events: Events can have types (`event: notification\n`), letting the client listen for specific event types.

When to use SSE

SSE is ideal when you only need server-to-client push: notifications, live feeds, real-time dashboards, stock tickers. It is simpler than WebSockets, works over standard HTTP (easier to proxy, cache, and load balance), and has built-in reconnection.

WebSockets

WebSockets provide full-duplex, bidirectional communication over a single TCP connection. After an initial HTTP handshake that "upgrades" the connection, both client and server can send messages at any time without HTTP overhead.

Client                          Server
  │                               │
  │──── GET /chat                │
  │     Upgrade: websocket ─────►│
  │◄─── 101 Switching Protocols ─│
  │                               │
  │ ════ WebSocket connection ════│
  │                               │
  │──── {"type":"msg","text":"hi"}│
  │                               │
  │◄─── {"type":"msg","text":"hey"}│
  │                               │
  │──── {"type":"typing"} ───────│
  │                               │
  │◄─── {"type":"msg","text":"ok"}│
  │                               │

# WebSocket server with heartbeat
import asyncio
import websockets
import json

connected_clients = {}  # user_id -> websocket

async def handler(websocket, path):
    user_id = authenticate(websocket)
    connected_clients[user_id] = websocket

    try:
        # Start heartbeat task
        heartbeat_task = asyncio.create_task(
            send_heartbeats(websocket)
        )

        async for raw_message in websocket:
            message = json.loads(raw_message)
            await route_message(user_id, message)

    except websockets.ConnectionClosed:
        pass
    finally:
        heartbeat_task.cancel()
        del connected_clients[user_id]

async def send_heartbeats(websocket, interval=30):
    """Send periodic pings to detect dead connections."""
    while True:
        try:
            await websocket.ping()
            await asyncio.sleep(interval)
        except websockets.ConnectionClosed:
            break

async def send_to_user(user_id, message):
    ws = connected_clients.get(user_id)
    if ws:
        await ws.send(json.dumps(message))
    else:
        # User is not connected — queue for later or use push notification
        await message_queue.enqueue(user_id, message)

Connection Management

WebSocket connections are long-lived and stateful, which introduces several challenges:

Heartbeats: Network equipment (firewalls, load balancers, NAT devices) often kills idle connections after 30-60 seconds. Periodic ping/pong frames keep the connection alive and detect dead clients quickly.

Reconnection: Clients must handle disconnections gracefully. A good reconnection strategy uses exponential backoff with jitter:

# Client-side reconnection with exponential backoff
class WebSocketClient:
    def __init__(self, url):
        self.url = url
        self.base_delay = 1       # Start with 1 second
        self.max_delay = 30       # Cap at 30 seconds
        self.attempt = 0

    def connect(self):
        try:
            self.ws = websocket.connect(self.url)
            self.attempt = 0      # Reset on success
        except ConnectionError:
            self.attempt += 1
            delay = min(
                self.base_delay * (2 ** self.attempt),
                self.max_delay
            )
            jitter = random.uniform(0, delay * 0.3)
            time.sleep(delay + jitter)
            self.connect()        # Retry

Authentication: WebSocket handshakes do not easily support custom headers in browser environments. Common approaches include:

Passing a token as a query parameter: `ws://host/chat?token=abc`
Sending an authentication message immediately after connection
Using cookies (set during a prior HTTP request)

Scaling WebSockets

Scaling WebSockets is fundamentally harder than scaling stateless HTTP services because each connection is tied to a specific server.

The problem: User A is connected to Server 1. User B is connected to Server 2. When A sends a message to B, Server 1 needs to know that B is on Server 2.

              ┌─────────────┐
              │ Load Balancer│
              │ (sticky/L4) │
              └──────┬───────┘
                ┌────┴────┐
                │         │
          ┌─────┴──┐  ┌───┴────┐
          │Server 1│  │Server 2│
          │ User A │  │ User B │
          └────┬───┘  └───┬────┘
               │          │
          ┌────┴──────────┴────┐
          │   Redis Pub/Sub    │
          │  (message broker)  │
          └────────────────────┘

Solutions:

Pub/Sub backbone: Use Redis Pub/Sub or a message broker (Kafka, NATS) as a communication layer between WebSocket servers. When Server 1 receives a message for User B, it publishes to a channel that Server 2 subscribes to.

Sticky sessions: Configure the load balancer to route a user's connections to the same server (using cookies or IP hashing). This simplifies local state but complicates failover.

Connection registry: Maintain a centralized mapping of user_id to server in Redis. When routing a message, look up the target server and forward directly.

Comparison: Which to Choose

┌─────────────────┬───────────┬─────────────┬──────────────┬──────────────┐
│                  │ Short     │ Long        │ SSE          │ WebSockets   │
│                  │ Polling   │ Polling     │              │              │
├─────────────────┼───────────┼─────────────┼──────────────┼──────────────┤
│ Direction        │ Client→   │ Client→     │ Server→      │ Bidirectional│
│ Latency          │ High      │ Low-Medium  │ Low          │ Very Low     │
│ Server load      │ Very High │ Medium      │ Low          │ Low          │
│ Complexity       │ Very Low  │ Medium      │ Low          │ High         │
│ Browser support  │ Universal │ Universal   │ Modern       │ Modern       │
│ Proxy friendly   │ Yes       │ Mostly      │ Yes (HTTP)   │ Sometimes    │
│ Auto reconnect   │ N/A       │ Manual      │ Built-in     │ Manual       │
│ Scalability      │ Easy      │ Medium      │ Medium       │ Hard         │
│ Binary data      │ Yes (req) │ Yes (req)   │ No (text)    │ Yes          │
│ HTTP/2 support   │ Yes       │ Yes         │ Yes          │ Separate     │
└─────────────────┴───────────┴─────────────┴──────────────┴──────────────┘

Decision framework

Dashboard refreshing every 30s: Short polling. Simple, effective, low frequency.
Notifications, live feeds: SSE. Server-to-client only, built-in reconnection, simple.
Chat, collaborative editing: WebSockets. Bidirectional, low latency, frequent messages.
Fallback when WebSockets blocked: Long polling. Works everywhere, reasonable latency.

Real-World Examples

Slack: Uses WebSockets for real-time messaging. Falls back to long polling in restricted network environments. The connection is multiplexed — all channels share a single WebSocket connection per client.

Twitter/X: Uses a combination of SSE for the streaming API and long polling for some features. The firehose API streams tweets in real time over persistent HTTP connections.

Google Docs: Uses WebSockets for collaborative editing with Operational Transformation (OT). Each keystroke is sent as a small message, and the server broadcasts operations to all connected editors.

Stock trading platforms: WebSockets for price feeds and order updates. The low latency of WebSockets is critical when prices change multiple times per second.

Interview Tips

Start with requirements. "Does the client need to send data to the server, or just receive? How frequently do updates occur?" This determines whether you need WebSockets or SSE.

Address scaling explicitly. "With 1 million concurrent WebSocket connections, we need roughly 50-100 WebSocket servers (10,000-20,000 connections per server), with Redis Pub/Sub as the message backbone."

Mention heartbeats and reconnection. This shows operational awareness. "We send a ping every 30 seconds to detect dead connections and free resources. Clients reconnect with exponential backoff."

Discuss fallback strategies. "We primarily use WebSockets but fall back to long polling for clients behind corporate proxies that strip the Upgrade header."

Consider the message delivery guarantee. "When a user reconnects, we need to deliver messages they missed. We assign sequence numbers to messages and let the client request messages since their last received sequence number."

Think about connection state on deploy. "When we deploy new code, we need to gracefully drain WebSocket connections. We send a reconnect signal, and clients establish new connections to updated servers."