Fundamentals

CAP Theorem

You can only guarantee two of three: consistency, availability, and partition tolerance. Learn how real systems make this trade-off and what it means for your design.

CAP Theorem

The CAP theorem is one of the most cited — and most misunderstood — concepts in distributed systems. It provides a framework for reasoning about the fundamental trade-offs you face when building systems that span multiple machines connected by a network. In interviews, demonstrating a nuanced understanding of CAP separates candidates who have read a blog post from those who have actually built distributed systems.

Brewer's Theorem

In 2000, Eric Brewer conjectured (and it was later proved by Gilbert and Lynch in 2002) that a distributed data store can provide at most two of the following three guarantees simultaneously:

Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time. This is linearizability — the system behaves as if there is a single copy of the data.

Availability (A): Every request receives a non-error response, without the guarantee that it contains the most recent write. The system always responds, even if the data might be stale.

Partition Tolerance (P): The system continues to operate despite arbitrary message loss or failure of part of the network. Nodes cannot communicate with each other, but they keep running.

The key insight: P is not optional

Here is what many people miss: in a distributed system, network partitions will happen. Cables get cut, switches fail, data centers lose connectivity. Partition tolerance is not a choice — it is a requirement for any system that spans multiple machines.

This means the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance) during a partition. When the network is healthy, you can have all three. The CAP theorem only forces a trade-off when things go wrong.

         Consistency
            ╱╲
           ╱  ╲
          ╱ CP ╲
         ╱      ╲
        ╱────────╲
       ╱   Pick    ╲
      ╱   during    ╲
     ╱   partition   ╲
    ╱                  ╲
   ╱────────────────────╲
  Availability ──── Partition Tolerance
        AP         (always required)

CP Systems: Consistency over Availability

A CP system refuses to respond (or returns an error) if it cannot guarantee the response is up to date. During a partition, the minority side of the partition stops accepting writes to prevent inconsistency.

Real-world examples

HBase: Built on top of HDFS, HBase uses a single active RegionServer per region. If a RegionServer is unreachable, that region becomes unavailable until failover completes. You never get stale data, but you might not get any data.

MongoDB (with majority write concern): When configured with `w: "majority"` and `readConcern: "linearizable"`, MongoDB will not acknowledge writes unless a majority of replica set members confirm them. During a partition, the minority side cannot accept writes.

ZooKeeper: Used for distributed coordination, ZooKeeper prioritizes consistency. If a ZooKeeper node cannot reach a quorum, it stops serving requests. This is exactly what you want for configuration data and distributed locks — stale lock information would be catastrophic.

etcd: The backbone of Kubernetes, etcd uses the Raft consensus protocol and provides strong consistency. If a leader cannot reach a majority of nodes, it steps down rather than risk split-brain.

# CP behavior during a partition
class CPDatabase:
    def write(self, key, value):
        ack_count = self.replicate_to_all_nodes(key, value)
        if ack_count < self.quorum_size():
            raise UnavailableError("Cannot reach quorum — rejecting write")
        return "OK"

    def read(self, key):
        if not self.can_reach_quorum():
            raise UnavailableError("Cannot confirm data is current")
        return self.local_store.get(key)

AP Systems: Availability over Consistency

An AP system always responds to requests, even if some nodes have not received the latest updates. During a partition, both sides continue serving traffic, which means different clients might see different versions of the data.

Real-world examples

Cassandra: Every node can accept reads and writes. If a partition splits the cluster, both sides keep operating. When the partition heals, Cassandra reconciles conflicts using last-write-wins (by timestamp) or custom conflict resolution. You configure the trade-off through consistency levels — `ONE`, `QUORUM`, `ALL` — tuning per query.

Amazon DynamoDB: Designed for extreme availability, DynamoDB uses consistent hashing with virtual nodes. It accepts writes even when some replicas are unreachable. Conflicts are resolved during reads using vector clocks (in the original Dynamo paper) or last-writer-wins.

CouchDB: Designed for offline-first applications, CouchDB embraces eventual consistency. Each node operates independently, and conflicts are resolved during replication using a deterministic revision tree.

DNS: The Domain Name System is perhaps the most visible AP system. DNS records propagate asynchronously with TTLs. During network issues, DNS servers serve cached (potentially stale) records rather than returning errors. Availability is paramount — a DNS outage breaks the entire internet.

# AP behavior during a partition
class APDatabase:
    def write(self, key, value):
        self.local_store.put(key, value, timestamp=now())
        self.async_replicate(key, value)  # Best effort — may fail
        return "OK"  # Always succeeds locally

    def read(self, key):
        return self.local_store.get(key)  # Returns local copy, may be stale

    def on_partition_heal(self):
        self.anti_entropy_sync()  # Reconcile diverged data

PACELC: The Full Picture

The CAP theorem only describes behavior during partitions, but systems make trade-offs even when things are running normally. Daniel Abadi proposed the PACELC extension:

If there is a Partition (P), choose between Availability (A) and Consistency (C). Else (E), when the system is running normally, choose between Latency (L) and Consistency (C).

This captures the day-to-day trade-off more accurately:

┌───────────────┬──────────────────────┬────────────────────────┐
│ System        │ During Partition     │ Normal Operation       │
│               │ (PAC)                │ (ELC)                  │
├───────────────┼──────────────────────┼────────────────────────┤
│ DynamoDB      │ PA (availability)    │ EL (low latency)       │
│ Cassandra     │ PA (availability)    │ EL (low latency)       │
│ MongoDB       │ PC (consistency)     │ EC (consistency)       │
│ HBase         │ PC (consistency)     │ EC (consistency)       │
│ Cosmos DB     │ PA or PC (tunable)   │ EL or EC (tunable)     │
│ CockroachDB   │ PC (consistency)     │ EC (consistency)       │
│ Spanner       │ PC (consistency)     │ EC (consistency, GPS)  │
└───────────────┴──────────────────────┴────────────────────────┘

Spanner is interesting here — it chooses consistency everywhere (PC/EC) but uses TrueTime (GPS + atomic clocks) to minimize the latency penalty of synchronous replication.

Common Misconceptions

"You pick two out of three"

This framing is misleading. You do not choose CA and ignore P. Network partitions happen whether you want them to or not. The choice is only between C and A during a partition.

"CP means zero availability"

A CP system is not always unavailable. It is fully available when the network is healthy. It only sacrifices availability during partitions, and only for the minority partition (or whichever side cannot reach quorum).

"AP means no consistency"

AP systems are eventually consistent — they converge to a consistent state once the partition heals. Many AP systems offer tunable consistency (like Cassandra's consistency levels) that let you trade latency for stronger guarantees on a per-query basis.

"My system is CA"

If your system runs on a single machine, CAP does not apply — it is not a distributed system. If it runs on multiple machines, partitions are possible, so it cannot be CA. Single-node PostgreSQL is not CA in the CAP sense; it is simply not a distributed system.

Practical Decision Framework

When designing a system, ask yourself:

What happens if users see stale data? If stale data causes financial loss, safety issues, or data corruption, lean CP. If stale data causes a minor inconvenience (like seeing an old profile picture), lean AP.

What is your availability SLA? If you need 99.999% uptime, AP systems are easier to achieve this with because they keep serving during partitions.

Can you use different models for different data? Many real systems use CP for critical data (account balances, inventory counts) and AP for less critical data (user preferences, analytics).

What conflict resolution strategy makes sense? If conflicts are hard to resolve (like bank transfers), prefer CP and prevent conflicts. If conflicts are easy to resolve (last-write-wins for a user profile), AP is fine.

# Hybrid approach: different consistency for different data
class HybridService:
    def update_account_balance(self, user_id, amount):
        # CP: use strong consistency for financial data
        self.db.write(
            key=f"balance:{user_id}",
            value=amount,
            consistency="quorum"  # Wait for majority
        )

    def update_profile_picture(self, user_id, url):
        # AP: use eventual consistency for non-critical data
        self.db.write(
            key=f"avatar:{user_id}",
            value=url,
            consistency="one"  # Write to one node, replicate async
        )

Interview Tips

Start with the trade-off, not the theorem. Say "In a distributed system, when a network partition occurs, we must choose between consistency and availability." This shows understanding beyond rote memorization.

Give concrete examples. "For a banking system, I would choose consistency because showing a wrong balance could allow double-spending. For a social media feed, I would choose availability because a slightly stale feed is better than no feed."

Mention PACELC. This immediately signals deeper knowledge. "Even without partitions, there is a latency-consistency trade-off — this is the PACELC extension."

Discuss tunable consistency. Many modern databases (Cassandra, DynamoDB, Cosmos DB) let you choose consistency per operation. Mentioning this shows practical experience.

Do not oversimplify. Avoid saying "X is a CP database" without qualification. Most systems offer configurable consistency levels, and the choice often depends on how you configure and use them.

Connect to your design. When designing a system in an interview, explicitly state which consistency model you are choosing for each component and why. "The user service uses eventual consistency because profile updates can tolerate a few seconds of staleness, but the payment service uses strong consistency because we cannot risk processing a payment twice."