Caching

Distributed Caching with Redis

Using Redis as a distributed cache, session store, and rate limiter. Covers eviction policies, persistence options, and cluster mode.

Distributed Caching with Redis

Redis is an in-memory data store that has become the de facto standard for caching, session management, rate limiting, and real-time features in modern systems. It is fast (sub-millisecond latency for most operations), versatile (supports rich data structures beyond simple key-value), and battle-tested (used by Twitter, GitHub, Pinterest, Snapchat, and most tech companies).

Understanding Redis deeply is valuable for system design interviews because it appears in almost every design — and interviewers often probe beyond surface-level "just use Redis" answers.

Core Data Structures

Redis is more than a key-value store. Its power comes from its data structures, each optimized for specific use cases.

Strings

The simplest type. A key maps to a string value (up to 512MB). Supports atomic increment/decrement, making it useful for counters.

# Basic key-value
redis.set("user:123:name", "Alice")
redis.get("user:123:name")  # "Alice"

# Atomic counter
redis.incr("page:views:homepage")  # Returns 1, 2, 3, ...
redis.incrby("page:views:homepage", 10)  # Increment by 10

# Set with expiration
redis.setex("session:abc123", 3600, session_data)  # Expires in 1 hour

Use cases: Caching serialized objects, counters, rate limiting tokens, distributed locks.

Hashes

A key maps to a dictionary of field-value pairs. More memory-efficient than storing each field as a separate key.

# Store a user object
redis.hset("user:123", mapping={
    "name": "Alice",
    "email": "alice@example.com",
    "plan": "premium"
})
redis.hget("user:123", "name")    # "Alice"
redis.hgetall("user:123")          # {"name": "Alice", "email": ..., "plan": ...}
redis.hincrby("user:123", "login_count", 1)  # Atomic field increment

Use cases: Storing objects with multiple fields, user profiles, session data, configuration.

Lists

Ordered sequences of strings. Efficient push/pop at both ends (O(1)). Supports blocking pop operations.

1

2

3

4

redis.lpush("queue:emails", email_data)  # Push to left (head)
redis.rpop("queue:emails")               # Pop from right (tail) — FIFO queue
redis.brpop("queue:emails", timeout=30)  # Blocking pop — waits for data
redis.lrange("queue:emails", 0, 9)       # Get first 10 elements

Use cases: Simple message queues, activity feeds (most recent N items), task queues.

Sets

Unordered collections of unique strings. Supports set operations (union, intersection, difference).

1

2

3

4

redis.sadd("user:123:tags", "python", "redis", "docker")
redis.sismember("user:123:tags", "redis")  # True
redis.sinter("user:123:tags", "user:456:tags")  # Common tags
redis.scard("user:123:tags")  # Count: 3

Use cases: Tag systems, unique visitor tracking, mutual friends, online user presence.

Sorted Sets (ZSets)

Like sets, but each member has a floating-point score. Members are ordered by score. This is one of Redis's most powerful and distinctive data structures.

# Leaderboard
redis.zadd("leaderboard:game1", {"alice": 2500, "bob": 1800, "charlie": 3100})
redis.zrevrange("leaderboard:game1", 0, 9, withscores=True)  # Top 10
redis.zrank("leaderboard:game1", "alice")   # Rank (0-indexed, ascending)
redis.zrevrank("leaderboard:game1", "alice") # Rank (descending)
redis.zincrby("leaderboard:game1", 100, "alice")  # Alice scores 100 points

# Time-based: sliding window rate limiter
redis.zadd("ratelimit:user:123", {request_id: current_timestamp})
redis.zrangebyscore("ratelimit:user:123", min_ts, max_ts)  # Requests in window

Use cases: Leaderboards, priority queues, sliding window rate limiting, time-series with score as timestamp, range queries on scores.

Eviction Policies

When Redis runs out of memory, it needs to decide which keys to remove. The `maxmemory-policy` setting controls this.

LRU (Least Recently Used)

Evicts the key that has not been accessed for the longest time.

allkeys-lru: Evict any key using LRU. The most common policy for caches.
volatile-lru: Only evict keys that have a TTL set. Keys without a TTL are never evicted.

LFU (Least Frequently Used)

Evicts the key that has been accessed least frequently. Better than LRU when some keys are accessed in bursts but not continuously.

allkeys-lfu: Evict any key using LFU.
volatile-lfu: Only evict keys with a TTL.

Other Policies

volatile-ttl: Evict keys with the shortest remaining TTL first.
allkeys-random / volatile-random: Evict random keys.
noeviction: Return errors when memory is full. Use this when Redis is a primary data store, not a cache.

Recommendation: Use `allkeys-lru` for caching workloads. It is the safest default. Use `allkeys-lfu` if you have a mix of frequently and infrequently accessed keys and want to keep the hot ones.

Implementation detail: Redis does not maintain a true LRU list (that would be too expensive). Instead, it samples a configurable number of keys (`maxmemory-samples`, default 5) and evicts the least recently used among the sample. Increasing the sample size improves accuracy at the cost of CPU.

Persistence Options

Redis is an in-memory store, but it offers two mechanisms to persist data to disk.

RDB (Redis Database Backup)

Point-in-time snapshots of the dataset, saved to a binary file at configured intervals.

1

2

3

4

# redis.conf
save 900 1     # Save if at least 1 key changed in 900 seconds
save 300 10    # Save if at least 10 keys changed in 300 seconds
save 60 10000  # Save if at least 10000 keys changed in 60 seconds

Pros: Compact file, fast restart (just load the file), minimal performance impact (uses fork + copy-on-write).

Cons: Data loss up to the interval between snapshots. If Redis crashes 5 minutes after the last snapshot, those 5 minutes of writes are lost.

AOF (Append-Only File)

Logs every write operation to a file. On restart, Redis replays the log to reconstruct the dataset.

1

2

3

4

5

# redis.conf
appendonly yes
appendfsync everysec  # Fsync every second (good balance)
# appendfsync always  # Fsync every write (safest, slowest)
# appendfsync no      # Let the OS decide (fastest, least safe)

Pros: At most 1 second of data loss (with `everysec`). More durable than RDB.

Cons: Larger files, slower restarts (must replay entire log). AOF rewrite (compaction) runs periodically to keep the file manageable.

Recommended Setup

Use both RDB and AOF together. RDB provides fast restarts and compact backups. AOF provides minimal data loss. On restart, Redis uses AOF (more complete) if available, falling back to RDB.

For pure caching (where data loss is acceptable), you can disable persistence entirely for maximum performance.

Redis Cluster

Redis Cluster provides horizontal scaling by automatically splitting data across multiple Redis nodes.

Architecture

Data is divided into 16,384 hash slots.
Each master node owns a subset of slots.

[Master A: slots 0-5460]     <-> [Replica A']
[Master B: slots 5461-10922] <-> [Replica B']
[Master C: slots 10923-16383] <-> [Replica C']

Key assignment: slot = CRC16(key) % 16384

How it works:

The client hashes the key to determine the slot
The client sends the command to the node owning that slot
If the client contacts the wrong node, it receives a `MOVED` redirect and retries

Pros: Linear scalability — add more masters for more capacity and throughput. Automatic failover — if a master dies, its replica is promoted.

Cons: Multi-key operations (MGET, transactions, Lua scripts) only work if all keys hash to the same slot. To force keys to the same slot, use hash tags: `{user:123}:profile` and `{user:123}:settings` both hash on `user:123`.

Scaling Redis Cluster

Adding a node involves:

Add the new empty node to the cluster
Migrate hash slots from existing nodes to the new node
The migration is online — reads and writes continue during migration

Removing a node is the reverse: migrate its slots to other nodes, then remove it.

Redis Sentinel

Sentinel provides high availability for non-clustered Redis deployments (a single master with replicas).

1

2

3

4

5

6

[Sentinel 1] [Sentinel 2] [Sentinel 3]
      \           |           /
       \          |          /
        v         v         v
      [Master] --> [Replica 1]
                --> [Replica 2]

What Sentinel does:

Monitoring: Continuously checks that master and replicas are reachable
Automatic failover: If the master is unreachable, Sentinels vote to promote a replica
Configuration provider: Clients ask Sentinel for the current master address

You need at least 3 Sentinel instances (odd number) to achieve a quorum for failover decisions. This prevents split-brain scenarios.

Sentinel vs Cluster: Sentinel is for high availability of a single dataset. Cluster is for horizontal scaling across multiple nodes. If your data fits on one machine but you need failover, use Sentinel. If you need more capacity, use Cluster.

Common Use Cases in System Design

Session Storage

Store user sessions in Redis instead of the application server. This enables stateless application servers (any server can handle any request).

1

2

3

4

5

6

7

# On login
redis.setex(f"session:{session_id}", 3600, json.dumps(user_data))

# On each request
session = redis.get(f"session:{session_id}")
if session is None:
    redirect_to_login()

Leaderboards

Sorted sets make leaderboards trivial and efficient.

# Add/update score
redis.zadd("leaderboard:weekly", {user_id: score})

# Get top 100
top_100 = redis.zrevrange("leaderboard:weekly", 0, 99, withscores=True)

# Get user's rank
rank = redis.zrevrank("leaderboard:weekly", user_id)

All operations are O(log N) — fast even with millions of users.

Rate Limiting

Redis is ideal for distributed rate limiting because it is fast and supports atomic operations. See the Rate Limiting article for detailed algorithms.

1

2

3

4

5

6

7

# Simple fixed-window rate limiter
key = f"ratelimit:{user_id}:{current_minute}"
count = redis.incr(key)
if count == 1:
    redis.expire(key, 60)
if count > MAX_REQUESTS_PER_MINUTE:
    reject_request()

Distributed Locks

Use `SET NX EX` for simple distributed locks (the Redlock algorithm extends this for multi-node safety).

# Acquire lock
acquired = redis.set("lock:resource", lock_id, nx=True, ex=10)
if not acquired:
    raise LockError("Resource is locked")

# Release lock (only if we own it)
if redis.get("lock:resource") == lock_id:
    redis.delete("lock:resource")

Pub/Sub for Real-Time Features

Redis Pub/Sub enables simple real-time message broadcasting.

# Publisher
redis.publish("chat:room:42", json.dumps(message))

# Subscriber
pubsub = redis.pubsub()
pubsub.subscribe("chat:room:42")
for message in pubsub.listen():
    handle_message(message)

Caveat: Redis Pub/Sub is fire-and-forget — if a subscriber is disconnected, it misses messages. For durable messaging, use Redis Streams or a proper message queue.

Performance Considerations

Single-threaded model: Redis processes commands sequentially on a single thread. This means no locking overhead and atomic operations by default, but a single slow command (e.g., `KEYS *` on a large dataset) blocks everything. Never use `KEYS` in production — use `SCAN` instead.

Pipelining: Batch multiple commands into a single network round trip. This can improve throughput by 5-10x for bulk operations.

Memory optimization: Redis stores everything in memory, so memory efficiency matters. Use hashes for small objects (Redis optimizes hashes with fewer than ~100 fields into a compact encoding). Set appropriate maxmemory and eviction policies.

Interview Tips

Do not just say "use Redis." Specify what data structure you are using and why. "We will use a Redis sorted set for the leaderboard because it gives us O(log N) rank lookups and range queries" is much better than "we will cache it in Redis."

Know the eviction policies. If an interviewer asks what happens when Redis runs out of memory, explain LRU/LFU eviction. If your data must not be evicted, discuss either sizing Redis appropriately or using the noeviction policy.

Discuss persistence tradeoffs. For caching, persistence may be unnecessary. For session storage or rate limiting state, AOF with everysec is a good default. Be able to explain why.

Mention Cluster for scale. If your system needs more than ~25GB of cache or more than ~100K operations per second, a single Redis instance is not enough. Mention Redis Cluster and the hash slot mechanism.

Address single points of failure. A single Redis instance is a SPOF. In production, you need either Sentinel (for failover) or Cluster (for both scaling and failover). Always mention this in your design.

Know the limitations: Redis is not a replacement for a database. It should not be your only copy of important data (unless you are comfortable with the persistence tradeoffs). Always have a source of truth that is more durable.