Messaging & Async
Build loosely coupled systems using events. Covers event sourcing, CQRS, and the difference between event notification and event-carried state transfer.
You are designing a banking system where a single deposit triggers a cascade of actions: updating the account balance, sending a confirmation email, recalculating daily interest, and logging the transaction for compliance. In a traditional request-response model, the deposit service would need to know about all these downstream consumers and call each one synchronously. Event-driven architecture flips this relationship. The deposit service publishes a `DepositCompleted` event, and any interested service subscribes to it. The producer knows nothing about its consumers.
Tight coupling between services is the silent killer of large distributed systems. When Service A directly calls Service B, Service C, and Service D, you cannot change any of those services without risking a cascade of failures. Event-driven architecture decouples producers from consumers, enabling teams to develop, deploy, and scale services independently.
Beyond decoupling, events provide a natural audit trail. Every state change is captured as an immutable fact. This is invaluable in domains like finance, healthcare, and e-commerce where you need to answer the question: "How did we get to this state?"
The simplest pattern. A service publishes a lightweight event that says "something happened" without carrying much data. Consumers receive the notification and query back to the source if they need details.
OrderService --> publishes --> { type: "OrderPlaced", orderId: "abc123" }
InventoryService <-- subscribes <-- reads orderId, calls OrderService API for full details
EmailService <-- subscribes <-- reads orderId, calls OrderService API for full detailsPros: Events are small, and the source remains the single source of truth. Cons: Consumers must make synchronous callbacks, which reintroduces some coupling and increases latency.
The event carries all the data the consumer needs, eliminating the need to call back to the source.
OrderService --> publishes --> {
type: "OrderPlaced",
orderId: "abc123",
items: [...],
customer: { id: "u1", email: "alice@example.com" },
total: 59.99
}Pros: Consumers are fully decoupled. They can maintain their own local copies of data, improving read performance and fault tolerance. Cons: Events are larger. Data can become stale if consumers do not process updates promptly. You now have distributed data that may diverge.
Instead of storing the current state of an entity, you store the complete sequence of events that led to that state. The current state is derived by replaying events.
# Event log for Account #42
events = [
{"type": "AccountOpened", "balance": 0, "timestamp": "2024-01-01T00:00:00Z"},
{"type": "Deposited", "amount": 1000, "timestamp": "2024-01-02T10:30:00Z"},
{"type": "Withdrawn", "amount": 200, "timestamp": "2024-01-03T14:15:00Z"},
{"type": "Deposited", "amount": 500, "timestamp": "2024-01-04T09:00:00Z"},
]
# Current state: balance = 0 + 1000 - 200 + 500 = 1300
def replay(events):
balance = 0
for event in events:
if event["type"] == "Deposited":
balance += event["amount"]
elif event["type"] == "Withdrawn":
balance -= event["amount"]
return balancePros: Complete audit trail, ability to rebuild state at any point in time, natural fit for debugging and compliance. Cons: Event schema evolution is hard, replaying large event streams is slow without snapshots, and querying current state requires projection.
CQRS separates the write model (commands) from the read model (queries). Commands go through the event-sourced write side, which produces events. Those events are projected into optimized read models (often denormalized databases or search indices).
┌──────────────┐
Commands ───────> │ Write Model │ ──── Events ────┐
│ (Event Store)│ │
└──────────────┘ ▼
┌──────────────┐
Queries ────────────────────────────────> │ Read Model │
│ (Projections) │
└──────────────┘The write model is optimized for consistency and business rule validation. The read model is optimized for query performance. You might have multiple read models: one in PostgreSQL for relational queries, one in Elasticsearch for full-text search, one in Redis for real-time dashboards.
Event-driven systems are inherently eventually consistent. When a user places an order, the inventory service might not reflect the change for milliseconds or even seconds. This creates real UX challenges.
A user places an order and immediately refreshes the page, but the read model has not yet processed the event. The order appears missing. Common mitigations:
Events may arrive out of order, especially across partitions. If a `Deposited` event arrives before the `AccountOpened` event, the consumer must handle this gracefully. Strategies include:
Network failures, retries, and at-least-once delivery guarantees mean consumers will see duplicates. Every consumer must be idempotent. Common techniques:
def handle_deposit(event, processed_events_set):
if event["event_id"] in processed_events_set:
return # Already processed, skip
account = load_account(event["account_id"])
account.balance += event["amount"]
save_account(account)
processed_events_set.add(event["event_id"])Banks are natural fits for event sourcing. Every transaction is an event. The account balance is a projection. Regulatory requirements demand a complete, immutable audit trail. If a dispute arises, the bank can replay events to reconstruct exactly what happened.
When a customer places an order, the system publishes an `OrderPlaced` event. The inventory service reserves stock. The payment service charges the card. The notification service sends a confirmation email. The analytics service records the purchase. Each service operates independently, and the system degrades gracefully: if the email service is down, the order still processes.
Streaming platforms like Netflix publish user interaction events (play, pause, skip, search). These events flow through a streaming pipeline into real-time dashboards and recommendation engines. The event log serves as both the source of truth and the input for machine learning pipelines.
| Aspect | Event-Driven | Request-Response |
|---|---|---|
| Coupling | Loose | Tight |
| Consistency | Eventual | Immediate |
| Debugging | Harder (async flows) | Easier (synchronous traces) |
| Scalability | Higher (independent scaling) | Lower (bottleneck on synchronous calls) |
| Complexity | Higher (event schemas, ordering, idempotency) | Lower (straightforward call chains) |
| Audit trail | Natural | Requires explicit logging |