Caching
Using Redis as a distributed cache, session store, and rate limiter. Covers eviction policies, persistence options, and cluster mode.
Redis is an in-memory data store that has become the de facto standard for caching, session management, rate limiting, and real-time features in modern systems. It is fast (sub-millisecond latency for most operations), versatile (supports rich data structures beyond simple key-value), and battle-tested (used by Twitter, GitHub, Pinterest, Snapchat, and most tech companies).
Understanding Redis deeply is valuable for system design interviews because it appears in almost every design — and interviewers often probe beyond surface-level "just use Redis" answers.
Redis is more than a key-value store. Its power comes from its data structures, each optimized for specific use cases.
The simplest type. A key maps to a string value (up to 512MB). Supports atomic increment/decrement, making it useful for counters.
# Basic key-value
redis.set("user:123:name", "Alice")
redis.get("user:123:name") # "Alice"
# Atomic counter
redis.incr("page:views:homepage") # Returns 1, 2, 3, ...
redis.incrby("page:views:homepage", 10) # Increment by 10
# Set with expiration
redis.setex("session:abc123", 3600, session_data) # Expires in 1 hourUse cases: Caching serialized objects, counters, rate limiting tokens, distributed locks.
A key maps to a dictionary of field-value pairs. More memory-efficient than storing each field as a separate key.
# Store a user object
redis.hset("user:123", mapping={
"name": "Alice",
"email": "alice@example.com",
"plan": "premium"
})
redis.hget("user:123", "name") # "Alice"
redis.hgetall("user:123") # {"name": "Alice", "email": ..., "plan": ...}
redis.hincrby("user:123", "login_count", 1) # Atomic field incrementUse cases: Storing objects with multiple fields, user profiles, session data, configuration.
Ordered sequences of strings. Efficient push/pop at both ends (O(1)). Supports blocking pop operations.
redis.lpush("queue:emails", email_data) # Push to left (head)
redis.rpop("queue:emails") # Pop from right (tail) — FIFO queue
redis.brpop("queue:emails", timeout=30) # Blocking pop — waits for data
redis.lrange("queue:emails", 0, 9) # Get first 10 elementsUse cases: Simple message queues, activity feeds (most recent N items), task queues.
Unordered collections of unique strings. Supports set operations (union, intersection, difference).
redis.sadd("user:123:tags", "python", "redis", "docker")
redis.sismember("user:123:tags", "redis") # True
redis.sinter("user:123:tags", "user:456:tags") # Common tags
redis.scard("user:123:tags") # Count: 3Use cases: Tag systems, unique visitor tracking, mutual friends, online user presence.
Like sets, but each member has a floating-point score. Members are ordered by score. This is one of Redis's most powerful and distinctive data structures.
# Leaderboard
redis.zadd("leaderboard:game1", {"alice": 2500, "bob": 1800, "charlie": 3100})
redis.zrevrange("leaderboard:game1", 0, 9, withscores=True) # Top 10
redis.zrank("leaderboard:game1", "alice") # Rank (0-indexed, ascending)
redis.zrevrank("leaderboard:game1", "alice") # Rank (descending)
redis.zincrby("leaderboard:game1", 100, "alice") # Alice scores 100 points
# Time-based: sliding window rate limiter
redis.zadd("ratelimit:user:123", {request_id: current_timestamp})
redis.zrangebyscore("ratelimit:user:123", min_ts, max_ts) # Requests in windowUse cases: Leaderboards, priority queues, sliding window rate limiting, time-series with score as timestamp, range queries on scores.
When Redis runs out of memory, it needs to decide which keys to remove. The `maxmemory-policy` setting controls this.
Evicts the key that has not been accessed for the longest time.
Evicts the key that has been accessed least frequently. Better than LRU when some keys are accessed in bursts but not continuously.
Recommendation: Use `allkeys-lru` for caching workloads. It is the safest default. Use `allkeys-lfu` if you have a mix of frequently and infrequently accessed keys and want to keep the hot ones.
Implementation detail: Redis does not maintain a true LRU list (that would be too expensive). Instead, it samples a configurable number of keys (`maxmemory-samples`, default 5) and evicts the least recently used among the sample. Increasing the sample size improves accuracy at the cost of CPU.
Redis is an in-memory store, but it offers two mechanisms to persist data to disk.
Point-in-time snapshots of the dataset, saved to a binary file at configured intervals.
# redis.conf
save 900 1 # Save if at least 1 key changed in 900 seconds
save 300 10 # Save if at least 10 keys changed in 300 seconds
save 60 10000 # Save if at least 10000 keys changed in 60 secondsPros: Compact file, fast restart (just load the file), minimal performance impact (uses fork + copy-on-write).
Cons: Data loss up to the interval between snapshots. If Redis crashes 5 minutes after the last snapshot, those 5 minutes of writes are lost.
Logs every write operation to a file. On restart, Redis replays the log to reconstruct the dataset.
# redis.conf
appendonly yes
appendfsync everysec # Fsync every second (good balance)
# appendfsync always # Fsync every write (safest, slowest)
# appendfsync no # Let the OS decide (fastest, least safe)Pros: At most 1 second of data loss (with `everysec`). More durable than RDB.
Cons: Larger files, slower restarts (must replay entire log). AOF rewrite (compaction) runs periodically to keep the file manageable.
Use both RDB and AOF together. RDB provides fast restarts and compact backups. AOF provides minimal data loss. On restart, Redis uses AOF (more complete) if available, falling back to RDB.
For pure caching (where data loss is acceptable), you can disable persistence entirely for maximum performance.
Redis Cluster provides horizontal scaling by automatically splitting data across multiple Redis nodes.
Data is divided into 16,384 hash slots.
Each master node owns a subset of slots.
[Master A: slots 0-5460] <-> [Replica A']
[Master B: slots 5461-10922] <-> [Replica B']
[Master C: slots 10923-16383] <-> [Replica C']
Key assignment: slot = CRC16(key) % 16384How it works:
`MOVED` redirect and retriesPros: Linear scalability — add more masters for more capacity and throughput. Automatic failover — if a master dies, its replica is promoted.
Cons: Multi-key operations (MGET, transactions, Lua scripts) only work if all keys hash to the same slot. To force keys to the same slot, use hash tags: `{user:123}:profile` and `{user:123}:settings` both hash on `user:123`.
Adding a node involves:
Removing a node is the reverse: migrate its slots to other nodes, then remove it.
Sentinel provides high availability for non-clustered Redis deployments (a single master with replicas).
[Sentinel 1] [Sentinel 2] [Sentinel 3]
\ | /
\ | /
v v v
[Master] --> [Replica 1]
--> [Replica 2]What Sentinel does:
You need at least 3 Sentinel instances (odd number) to achieve a quorum for failover decisions. This prevents split-brain scenarios.
Sentinel vs Cluster: Sentinel is for high availability of a single dataset. Cluster is for horizontal scaling across multiple nodes. If your data fits on one machine but you need failover, use Sentinel. If you need more capacity, use Cluster.
Store user sessions in Redis instead of the application server. This enables stateless application servers (any server can handle any request).
# On login
redis.setex(f"session:{session_id}", 3600, json.dumps(user_data))
# On each request
session = redis.get(f"session:{session_id}")
if session is None:
redirect_to_login()Sorted sets make leaderboards trivial and efficient.
# Add/update score
redis.zadd("leaderboard:weekly", {user_id: score})
# Get top 100
top_100 = redis.zrevrange("leaderboard:weekly", 0, 99, withscores=True)
# Get user's rank
rank = redis.zrevrank("leaderboard:weekly", user_id)All operations are O(log N) — fast even with millions of users.
Redis is ideal for distributed rate limiting because it is fast and supports atomic operations. See the Rate Limiting article for detailed algorithms.
# Simple fixed-window rate limiter
key = f"ratelimit:{user_id}:{current_minute}"
count = redis.incr(key)
if count == 1:
redis.expire(key, 60)
if count > MAX_REQUESTS_PER_MINUTE:
reject_request()Use `SET NX EX` for simple distributed locks (the Redlock algorithm extends this for multi-node safety).
# Acquire lock
acquired = redis.set("lock:resource", lock_id, nx=True, ex=10)
if not acquired:
raise LockError("Resource is locked")
# Release lock (only if we own it)
if redis.get("lock:resource") == lock_id:
redis.delete("lock:resource")Redis Pub/Sub enables simple real-time message broadcasting.
# Publisher
redis.publish("chat:room:42", json.dumps(message))
# Subscriber
pubsub = redis.pubsub()
pubsub.subscribe("chat:room:42")
for message in pubsub.listen():
handle_message(message)Caveat: Redis Pub/Sub is fire-and-forget — if a subscriber is disconnected, it misses messages. For durable messaging, use Redis Streams or a proper message queue.
Single-threaded model: Redis processes commands sequentially on a single thread. This means no locking overhead and atomic operations by default, but a single slow command (e.g., `KEYS *` on a large dataset) blocks everything. Never use `KEYS` in production — use `SCAN` instead.
Pipelining: Batch multiple commands into a single network round trip. This can improve throughput by 5-10x for bulk operations.
Memory optimization: Redis stores everything in memory, so memory efficiency matters. Use hashes for small objects (Redis optimizes hashes with fewer than ~100 fields into a compact encoding). Set appropriate maxmemory and eviction policies.
Do not just say "use Redis." Specify what data structure you are using and why. "We will use a Redis sorted set for the leaderboard because it gives us O(log N) rank lookups and range queries" is much better than "we will cache it in Redis."
Know the eviction policies. If an interviewer asks what happens when Redis runs out of memory, explain LRU/LFU eviction. If your data must not be evicted, discuss either sizing Redis appropriately or using the noeviction policy.
Discuss persistence tradeoffs. For caching, persistence may be unnecessary. For session storage or rate limiting state, AOF with everysec is a good default. Be able to explain why.
Mention Cluster for scale. If your system needs more than ~25GB of cache or more than ~100K operations per second, a single Redis instance is not enough. Mention Redis Cluster and the hash slot mechanism.
Address single points of failure. A single Redis instance is a SPOF. In production, you need either Sentinel (for failover) or Cluster (for both scaling and failover). Always mention this in your design.
Know the limitations: Redis is not a replacement for a database. It should not be your only copy of important data (unless you are comfortable with the persistence tradeoffs). Always have a source of truth that is more durable.