Caching
Cache-aside, write-through, write-behind, and read-through patterns. When to cache, what to cache, and how to handle cache invalidation.
Caching is the most impactful performance optimization in system design. By storing frequently accessed data in a fast storage layer (usually memory), you can reduce latency by 10-100x and dramatically decrease load on your database.
But caching introduces a fundamental problem: you now have two copies of the same data, and keeping them in sync is genuinely hard. Most caching bugs come from stale data, and most caching outages come from the cache going down and the database being overwhelmed. Understanding caching strategies means understanding these failure modes.
The application manages the cache explicitly. On a read, the application checks the cache first. If the data is not there (a cache miss), it reads from the database, stores the result in the cache, and returns it.
def get_user(user_id):
# 1. Check cache
user = cache.get(f"user:{user_id}")
if user is not None:
return user # Cache hit
# 2. Cache miss — read from database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# 3. Populate cache for next time
cache.set(f"user:{user_id}", user, ttl=300) # 5 min TTL
return userPros:
Cons:
This is the most common pattern. Use it as your default unless you have a specific reason to choose another.
The cache itself is responsible for loading data from the database on a miss. The application only talks to the cache, never directly to the database for reads.
# The application code is simpler:
def get_user(user_id):
return cache.get(f"user:{user_id}")
# If miss, the cache library automatically queries the DB,
# stores the result, and returns it.Pros: Simpler application code — the caching logic is encapsulated in the cache layer.
Cons: The cache must know how to query your database, which couples them. Harder to customize per-query.
Used by: Some CDNs and ORM-level caches (e.g., Hibernate second-level cache).
Every write goes to both the cache and the database, in the same operation. The cache is always up-to-date.
def update_user(user_id, data):
# Write to database
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# Immediately update cache
cache.set(f"user:{user_id}", data, ttl=300)Pros: Cache is always consistent with the database (for writes that go through this path). Reads after writes always see the latest data.
Cons: Every write has the latency of both the database write and the cache write. Data that is written but never read wastes cache space. Does not help with data written by other services that bypass this code path.
Best combined with cache-aside: Use write-through for your own writes and cache-aside for reads. This gives you read-after-write consistency without caching data that is never read.
The application writes to the cache only. The cache asynchronously flushes writes to the database in the background, often in batches.
def update_user(user_id, data):
# Write to cache only — returns immediately
cache.set(f"user:{user_id}", data)
# Background process flushes to DB every N seconds or on evictionPros: Extremely fast writes (memory speed). Batching writes to the database reduces I/O. Good for write-heavy workloads.
Cons: Risk of data loss. If the cache crashes before flushing to the database, recent writes are lost. Complex to implement correctly. Debugging is harder because the database may be behind the cache.
Used by: CPU caches (your L1/L2/L3 caches use write-back), some database engines internally (e.g., InnoDB buffer pool).
Use with caution in application-level design. Unless data loss is acceptable, write-behind is risky.
Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." He was right. Cache invalidation is the hardest part of caching.
Set a time-to-live (TTL) on every cache entry. After the TTL expires, the entry is removed and the next read triggers a fresh load from the database.
cache.set("user:123", user_data, ttl=300) # Expires in 5 minutesPros: Simple, guarantees bounded staleness. Even if you forget to invalidate, the data refreshes within TTL seconds.
Cons: Data is stale for up to TTL seconds. Choosing the right TTL is an art — too short means too many cache misses, too long means too much staleness.
TTL guidelines:
When data changes in the database, explicitly delete or update the corresponding cache entry.
def update_user(user_id, data):
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
cache.delete(f"user:{user_id}") # Invalidate, don't updateWhy delete instead of update? Deleting is simpler and avoids race conditions. If two concurrent updates try to set the cache, you might end up with the older value winning. Deleting ensures the next read gets fresh data from the database.
Pros: Data is fresh almost immediately after a change.
Cons: Requires discipline — every code path that modifies data must also invalidate the cache. Miss one path and you have stale data that is invisible and hard to debug.
For larger systems, use a change data capture (CDC) stream to automatically invalidate cache entries when database rows change.
Database -> CDC (Debezium) -> Kafka -> Cache Invalidation Service -> RedisThis decouples the write path from cache invalidation and catches all changes, even those made by other services or direct database modifications.
When a popular cache entry expires, many concurrent requests simultaneously see a cache miss and all hit the database at once. This can overwhelm the database.
TTL expires on popular key
-> 1000 requests arrive simultaneously
-> All see cache miss
-> All query the database
-> Database is overwhelmedLock/Mutex: Only one request is allowed to rebuild the cache. Others wait for it to finish and then read from the cache.
def get_popular_item(item_id):
data = cache.get(f"item:{item_id}")
if data is not None:
return data
# Try to acquire a lock
if cache.set(f"lock:item:{item_id}", "1", nx=True, ex=5):
# I won the lock — I'll rebuild the cache
data = db.query("SELECT * FROM items WHERE id = %s", item_id)
cache.set(f"item:{item_id}", data, ttl=300)
cache.delete(f"lock:item:{item_id}")
return data
else:
# Someone else is rebuilding — wait and retry
time.sleep(0.05)
return get_popular_item(item_id)Early expiration (stale-while-revalidate): Return the stale value immediately while refreshing the cache in the background. The TTL has two values: a "soft" TTL (after which the value is refreshed asynchronously) and a "hard" TTL (after which it is truly expired).
Request coalescing: If multiple identical requests arrive while a cache rebuild is in progress, collapse them into a single database query.
When you deploy a new service or add cache nodes, the cache is empty (cold). All requests hit the database until the cache is populated, which can cause a temporary overload.
Strategies:
If a query returns no results (e.g., user not found), you should cache that too. Otherwise, repeated requests for non-existent data always hit the database (a "cache penetration" attack or just an unfortunate access pattern).
def get_user(user_id):
result = cache.get(f"user:{user_id}")
if result is not None:
return result if result != "__NULL__" else None
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
if user is None:
cache.set(f"user:{user_id}", "__NULL__", ttl=60) # Short TTL for nulls
return None
else:
cache.set(f"user:{user_id}", user, ttl=300)
return userYou invalidate a popular key, and instantly thousands of requests try to rebuild it. This is the thundering herd problem triggered by explicit invalidation rather than TTL expiry. The same solutions apply (locking, request coalescing).
A write updates the database but the cache invalidation fails (network issue, bug). The cache now serves stale data indefinitely (or until TTL expires). This is why you should always use TTL as a safety net, even with event-based invalidation.
Caching everything "just in case" wastes memory, increases complexity, and makes debugging harder. Cache only data that is read frequently, expensive to compute, and tolerant of brief staleness.
Production systems often use multiple layers of caching:
Browser Cache (HTTP cache-control headers)
-> CDN (edge caching, static assets)
-> Application-level cache (Redis/Memcached, dynamic data)
-> Database query cache / buffer pool
-> OS page cache (disk pages in RAM)Each layer reduces load on the layers below it. When designing a system, consider which layer is most appropriate for each type of data.
Always mention cache invalidation. If you propose caching in an interview without discussing invalidation, the interviewer will ask. Be proactive — say "we will use cache-aside with TTL-based invalidation and event-based invalidation for critical paths."
Justify your TTL choices. Saying "5-minute TTL" is meaningless without context. Explain why 5 minutes is acceptable for this data (e.g., "user profiles change infrequently, so a 5-minute staleness window is fine for the feed, but we use write-through for the user's own profile view").
Know the thundering herd. If your system has popular items (e.g., a viral post, a hot product), you must address the thundering herd problem. This is a common follow-up question.
Cache-aside is the safe default. Unless you have a specific reason to use write-through or write-behind, start with cache-aside. It is the most widely used and easiest to reason about.
Discuss cache failure. What happens if Redis goes down? Your system should degrade gracefully — fall back to the database, possibly with reduced functionality or rate limiting to prevent the database from being overwhelmed.
Mention Memcached vs Redis. Memcached is simpler and sometimes faster for pure key-value caching. Redis supports richer data structures (sorted sets, lists, hashes) and persistence. Most teams choose Redis for its versatility.