← back

Messaging & Async

Event-Driven Architecture

Build loosely coupled systems using events. Covers event sourcing, CQRS, and the difference between event notification and event-carried state transfer.

Event-Driven Architecture

You are designing a banking system where a single deposit triggers a cascade of actions: updating the account balance, sending a confirmation email, recalculating daily interest, and logging the transaction for compliance. In a traditional request-response model, the deposit service would need to know about all these downstream consumers and call each one synchronously. Event-driven architecture flips this relationship. The deposit service publishes a `DepositCompleted` event, and any interested service subscribes to it. The producer knows nothing about its consumers.

Why Event-Driven Architecture Matters

Tight coupling between services is the silent killer of large distributed systems. When Service A directly calls Service B, Service C, and Service D, you cannot change any of those services without risking a cascade of failures. Event-driven architecture decouples producers from consumers, enabling teams to develop, deploy, and scale services independently.

Beyond decoupling, events provide a natural audit trail. Every state change is captured as an immutable fact. This is invaluable in domains like finance, healthcare, and e-commerce where you need to answer the question: "How did we get to this state?"

Core Patterns

Event Notification

The simplest pattern. A service publishes a lightweight event that says "something happened" without carrying much data. Consumers receive the notification and query back to the source if they need details.

1
2
3
4
OrderService --> publishes --> { type: "OrderPlaced", orderId: "abc123" }

InventoryService <-- subscribes <-- reads orderId, calls OrderService API for full details
EmailService    <-- subscribes <-- reads orderId, calls OrderService API for full details

Pros: Events are small, and the source remains the single source of truth. Cons: Consumers must make synchronous callbacks, which reintroduces some coupling and increases latency.

Event-Carried State Transfer

The event carries all the data the consumer needs, eliminating the need to call back to the source.

1
2
3
4
5
6
7
OrderService --> publishes --> {
  type: "OrderPlaced",
  orderId: "abc123",
  items: [...],
  customer: { id: "u1", email: "alice@example.com" },
  total: 59.99
}

Pros: Consumers are fully decoupled. They can maintain their own local copies of data, improving read performance and fault tolerance. Cons: Events are larger. Data can become stale if consumers do not process updates promptly. You now have distributed data that may diverge.

Event Sourcing

Instead of storing the current state of an entity, you store the complete sequence of events that led to that state. The current state is derived by replaying events.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Event log for Account #42
events = [
    {"type": "AccountOpened", "balance": 0, "timestamp": "2024-01-01T00:00:00Z"},
    {"type": "Deposited", "amount": 1000, "timestamp": "2024-01-02T10:30:00Z"},
    {"type": "Withdrawn", "amount": 200, "timestamp": "2024-01-03T14:15:00Z"},
    {"type": "Deposited", "amount": 500, "timestamp": "2024-01-04T09:00:00Z"},
]

# Current state: balance = 0 + 1000 - 200 + 500 = 1300
def replay(events):
    balance = 0
    for event in events:
        if event["type"] == "Deposited":
            balance += event["amount"]
        elif event["type"] == "Withdrawn":
            balance -= event["amount"]
    return balance

Pros: Complete audit trail, ability to rebuild state at any point in time, natural fit for debugging and compliance. Cons: Event schema evolution is hard, replaying large event streams is slow without snapshots, and querying current state requires projection.

CQRS (Command Query Responsibility Segregation)

CQRS separates the write model (commands) from the read model (queries). Commands go through the event-sourced write side, which produces events. Those events are projected into optimized read models (often denormalized databases or search indices).

1
2
3
4
5
6
7
8
                    ┌──────────────┐
  Commands ───────> │  Write Model │ ──── Events ────┐
                    │ (Event Store)│                  │
                    └──────────────┘                  ▼
                                              ┌──────────────┐
  Queries ────────────────────────────────>   │  Read Model   │
                                              │ (Projections) │
                                              └──────────────┘

The write model is optimized for consistency and business rule validation. The read model is optimized for query performance. You might have multiple read models: one in PostgreSQL for relational queries, one in Elasticsearch for full-text search, one in Redis for real-time dashboards.

Eventual Consistency Challenges

Event-driven systems are inherently eventually consistent. When a user places an order, the inventory service might not reflect the change for milliseconds or even seconds. This creates real UX challenges.

The Stale Read Problem

A user places an order and immediately refreshes the page, but the read model has not yet processed the event. The order appears missing. Common mitigations:

  • Read-your-writes consistency: After a write, route the user's subsequent reads to the write model or wait for the projection to catch up.
  • Optimistic UI: The client assumes success and displays the expected state immediately, reconciling later if needed.
  • Polling or WebSocket: The client subscribes to a completion event rather than refreshing.

Ordering Guarantees

Events may arrive out of order, especially across partitions. If a `Deposited` event arrives before the `AccountOpened` event, the consumer must handle this gracefully. Strategies include:

  • Partitioning events by entity ID so that all events for the same account go through the same partition (preserving order within an entity).
  • Using sequence numbers or vector clocks to detect and reorder events.
  • Designing consumers to be idempotent: processing the same event twice produces the same result.

Duplicate Events

Network failures, retries, and at-least-once delivery guarantees mean consumers will see duplicates. Every consumer must be idempotent. Common techniques:

1
2
3
4
5
6
7
8
def handle_deposit(event, processed_events_set):
    if event["event_id"] in processed_events_set:
        return  # Already processed, skip

    account = load_account(event["account_id"])
    account.balance += event["amount"]
    save_account(account)
    processed_events_set.add(event["event_id"])

Real-World Examples

Banking

Banks are natural fits for event sourcing. Every transaction is an event. The account balance is a projection. Regulatory requirements demand a complete, immutable audit trail. If a dispute arises, the bank can replay events to reconstruct exactly what happened.

E-Commerce (Amazon-style)

When a customer places an order, the system publishes an `OrderPlaced` event. The inventory service reserves stock. The payment service charges the card. The notification service sends a confirmation email. The analytics service records the purchase. Each service operates independently, and the system degrades gracefully: if the email service is down, the order still processes.

Real-Time Analytics

Streaming platforms like Netflix publish user interaction events (play, pause, skip, search). These events flow through a streaming pipeline into real-time dashboards and recommendation engines. The event log serves as both the source of truth and the input for machine learning pipelines.

Trade-Offs

AspectEvent-DrivenRequest-Response
CouplingLooseTight
ConsistencyEventualImmediate
DebuggingHarder (async flows)Easier (synchronous traces)
ScalabilityHigher (independent scaling)Lower (bottleneck on synchronous calls)
ComplexityHigher (event schemas, ordering, idempotency)Lower (straightforward call chains)
Audit trailNaturalRequires explicit logging

Interview Tips

  • Start with the "why." Before jumping into event-driven patterns, explain what problem you are solving: decoupling, auditability, scalability, or real-time processing.
  • Know when NOT to use it. Simple CRUD applications with a single database do not need event sourcing. The complexity is only justified when you have multiple consumers, need an audit trail, or require independent scaling.
  • Always address eventual consistency. Interviewers will probe whether you understand the trade-offs. Mention read-your-writes consistency, idempotency, and ordering guarantees.
  • Distinguish the patterns. Be precise about the differences between event notification, event-carried state transfer, event sourcing, and CQRS. Mixing them up signals shallow understanding.
  • Mention schema evolution. In long-lived systems, event schemas change. Discuss versioning strategies: adding optional fields, using schema registries (like Confluent Schema Registry), and maintaining backward compatibility.
  • Snapshots for event sourcing. If you mention event sourcing, explain that replaying millions of events is impractical. Periodic snapshots capture the current state, and replay only happens from the last snapshot forward.