Caching

Caching Strategies

Cache-aside, write-through, write-behind, and read-through patterns. When to cache, what to cache, and how to handle cache invalidation.

Caching Strategies

Caching is the most impactful performance optimization in system design. By storing frequently accessed data in a fast storage layer (usually memory), you can reduce latency by 10-100x and dramatically decrease load on your database.

But caching introduces a fundamental problem: you now have two copies of the same data, and keeping them in sync is genuinely hard. Most caching bugs come from stale data, and most caching outages come from the cache going down and the database being overwhelmed. Understanding caching strategies means understanding these failure modes.

Caching Patterns

Cache-Aside (Lazy Loading)

The application manages the cache explicitly. On a read, the application checks the cache first. If the data is not there (a cache miss), it reads from the database, stores the result in the cache, and returns it.

def get_user(user_id):
    # 1. Check cache
    user = cache.get(f"user:{user_id}")
    if user is not None:
        return user  # Cache hit

    # 2. Cache miss — read from database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Populate cache for next time
    cache.set(f"user:{user_id}", user, ttl=300)  # 5 min TTL

    return user

Pros:

Only caches data that is actually requested (no wasted memory)
Cache failure is not fatal — the app falls back to the database
Simple to implement and reason about

Cons:

First request for any item is always a cache miss (cold start)
Data can become stale — the database may be updated without the cache knowing
Application code must handle caching logic explicitly

This is the most common pattern. Use it as your default unless you have a specific reason to choose another.

Read-Through

The cache itself is responsible for loading data from the database on a miss. The application only talks to the cache, never directly to the database for reads.

1

2

3

4

5

# The application code is simpler:
def get_user(user_id):
    return cache.get(f"user:{user_id}")
    # If miss, the cache library automatically queries the DB,
    # stores the result, and returns it.

Pros: Simpler application code — the caching logic is encapsulated in the cache layer.

Cons: The cache must know how to query your database, which couples them. Harder to customize per-query.

Used by: Some CDNs and ORM-level caches (e.g., Hibernate second-level cache).

Write-Through

Every write goes to both the cache and the database, in the same operation. The cache is always up-to-date.

1

2

3

4

5

6

def update_user(user_id, data):
    # Write to database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)

    # Immediately update cache
    cache.set(f"user:{user_id}", data, ttl=300)

Pros: Cache is always consistent with the database (for writes that go through this path). Reads after writes always see the latest data.

Cons: Every write has the latency of both the database write and the cache write. Data that is written but never read wastes cache space. Does not help with data written by other services that bypass this code path.

Best combined with cache-aside: Use write-through for your own writes and cache-aside for reads. This gives you read-after-write consistency without caching data that is never read.

Write-Behind (Write-Back)

The application writes to the cache only. The cache asynchronously flushes writes to the database in the background, often in batches.

1

2

3

4

def update_user(user_id, data):
    # Write to cache only — returns immediately
    cache.set(f"user:{user_id}", data)
    # Background process flushes to DB every N seconds or on eviction

Pros: Extremely fast writes (memory speed). Batching writes to the database reduces I/O. Good for write-heavy workloads.

Cons: Risk of data loss. If the cache crashes before flushing to the database, recent writes are lost. Complex to implement correctly. Debugging is harder because the database may be behind the cache.

Used by: CPU caches (your L1/L2/L3 caches use write-back), some database engines internally (e.g., InnoDB buffer pool).

Use with caution in application-level design. Unless data loss is acceptable, write-behind is risky.

Cache Invalidation

Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." He was right. Cache invalidation is the hardest part of caching.

TTL-Based Invalidation

Set a time-to-live (TTL) on every cache entry. After the TTL expires, the entry is removed and the next read triggers a fresh load from the database.

1

cache.set("user:123", user_data, ttl=300)  # Expires in 5 minutes

Pros: Simple, guarantees bounded staleness. Even if you forget to invalidate, the data refreshes within TTL seconds.

Cons: Data is stale for up to TTL seconds. Choosing the right TTL is an art — too short means too many cache misses, too long means too much staleness.

TTL guidelines:

User profile data: 5-15 minutes (changes infrequently)
Product catalog: 1-5 minutes (price changes should propagate quickly)
Session data: 30 minutes to hours (aligned with session duration)
Configuration: 1-5 minutes (changes should propagate relatively quickly)

Event-Based Invalidation

When data changes in the database, explicitly delete or update the corresponding cache entry.

1

2

3

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    cache.delete(f"user:{user_id}")  # Invalidate, don't update

Why delete instead of update? Deleting is simpler and avoids race conditions. If two concurrent updates try to set the cache, you might end up with the older value winning. Deleting ensures the next read gets fresh data from the database.

Pros: Data is fresh almost immediately after a change.

Cons: Requires discipline — every code path that modifies data must also invalidate the cache. Miss one path and you have stale data that is invisible and hard to debug.

Event Streaming Invalidation

For larger systems, use a change data capture (CDC) stream to automatically invalidate cache entries when database rows change.

1

Database -> CDC (Debezium) -> Kafka -> Cache Invalidation Service -> Redis

This decouples the write path from cache invalidation and catches all changes, even those made by other services or direct database modifications.

The Thundering Herd Problem

When a popular cache entry expires, many concurrent requests simultaneously see a cache miss and all hit the database at once. This can overwhelm the database.

1

2

3

4

5

TTL expires on popular key
  -> 1000 requests arrive simultaneously
  -> All see cache miss
  -> All query the database
  -> Database is overwhelmed

Solutions

Lock/Mutex: Only one request is allowed to rebuild the cache. Others wait for it to finish and then read from the cache.

def get_popular_item(item_id):
    data = cache.get(f"item:{item_id}")
    if data is not None:
        return data

    # Try to acquire a lock
    if cache.set(f"lock:item:{item_id}", "1", nx=True, ex=5):
        # I won the lock — I'll rebuild the cache
        data = db.query("SELECT * FROM items WHERE id = %s", item_id)
        cache.set(f"item:{item_id}", data, ttl=300)
        cache.delete(f"lock:item:{item_id}")
        return data
    else:
        # Someone else is rebuilding — wait and retry
        time.sleep(0.05)
        return get_popular_item(item_id)

Early expiration (stale-while-revalidate): Return the stale value immediately while refreshing the cache in the background. The TTL has two values: a "soft" TTL (after which the value is refreshed asynchronously) and a "hard" TTL (after which it is truly expired).

Request coalescing: If multiple identical requests arrive while a cache rebuild is in progress, collapse them into a single database query.

Cache Warming

When you deploy a new service or add cache nodes, the cache is empty (cold). All requests hit the database until the cache is populated, which can cause a temporary overload.

Strategies:

Pre-populate on deploy: Run a script that loads frequently accessed data into the cache before routing traffic
Gradual traffic shifting: Slowly ramp up traffic to new instances so the cache warms naturally
Persistent cache: Use Redis with RDB/AOF persistence so the cache survives restarts
Shadow traffic: Send a copy of production reads to the new cache before it goes live

Common Pitfalls

Caching Null Results

If a query returns no results (e.g., user not found), you should cache that too. Otherwise, repeated requests for non-existent data always hit the database (a "cache penetration" attack or just an unfortunate access pattern).

def get_user(user_id):
    result = cache.get(f"user:{user_id}")
    if result is not None:
        return result if result != "__NULL__" else None

    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    if user is None:
        cache.set(f"user:{user_id}", "__NULL__", ttl=60)  # Short TTL for nulls
        return None
    else:
        cache.set(f"user:{user_id}", user, ttl=300)
        return user

Cache Stampede After Invalidation

You invalidate a popular key, and instantly thousands of requests try to rebuild it. This is the thundering herd problem triggered by explicit invalidation rather than TTL expiry. The same solutions apply (locking, request coalescing).

Inconsistency Between Cache and Database

A write updates the database but the cache invalidation fails (network issue, bug). The cache now serves stale data indefinitely (or until TTL expires). This is why you should always use TTL as a safety net, even with event-based invalidation.

Over-Caching

Caching everything "just in case" wastes memory, increases complexity, and makes debugging harder. Cache only data that is read frequently, expensive to compute, and tolerant of brief staleness.

Cache Hierarchy

Production systems often use multiple layers of caching:

1

2

3

4

5

Browser Cache (HTTP cache-control headers)
    -> CDN (edge caching, static assets)
    -> Application-level cache (Redis/Memcached, dynamic data)
    -> Database query cache / buffer pool
    -> OS page cache (disk pages in RAM)

Each layer reduces load on the layers below it. When designing a system, consider which layer is most appropriate for each type of data.

Interview Tips

Always mention cache invalidation. If you propose caching in an interview without discussing invalidation, the interviewer will ask. Be proactive — say "we will use cache-aside with TTL-based invalidation and event-based invalidation for critical paths."

Justify your TTL choices. Saying "5-minute TTL" is meaningless without context. Explain why 5 minutes is acceptable for this data (e.g., "user profiles change infrequently, so a 5-minute staleness window is fine for the feed, but we use write-through for the user's own profile view").

Know the thundering herd. If your system has popular items (e.g., a viral post, a hot product), you must address the thundering herd problem. This is a common follow-up question.

Cache-aside is the safe default. Unless you have a specific reason to use write-through or write-behind, start with cache-aside. It is the most widely used and easiest to reason about.

Discuss cache failure. What happens if Redis goes down? Your system should degrade gracefully — fall back to the database, possibly with reduced functionality or rate limiting to prevent the database from being overwhelmed.

Mention Memcached vs Redis. Memcached is simpler and sometimes faster for pure key-value caching. Redis supports richer data structures (sorted sets, lists, hashes) and persistence. Most teams choose Redis for its versatility.