Classic Designs
Design a payment system like Stripe. Covers idempotency, double-entry bookkeeping, payment state machines, webhooks, and reconciliation.
Designing a payment system like Stripe is one of the most demanding system design questions because correctness is paramount. A bug in a social media feed shows the wrong post; a bug in a payment system loses real money. This question tests your understanding of idempotency, state machines, double-entry bookkeeping, webhook reliability, and the unique challenge of building a system where "at least once" and "at most once" are both unacceptable -- you need exactly once.
Assumptions:
- 10 million transactions per day (Stripe processes more, but this is a good starting point)
- Peak: 5x average = ~580 transactions/sec
- Average transaction record: 1 KB
- Storage: 10M × 1 KB × 365 days = 3.65 TB/year
- Ledger entries: 2 entries per transaction (double-entry) = 7.3 TB/year
- Webhook deliveries: 3-4 events per transaction = 40M events/dayThe most critical concept in payment systems. Network failures are inevitable: a client sends a payment request, the server processes it, but the response is lost. The client retries. Without idempotency, the customer is charged twice.
The client generates a unique idempotency key (typically a UUID) and includes it in every payment request. The server stores the key alongside the result. On retry, the server recognizes the key and returns the stored result without reprocessing.
class PaymentService:
def process_payment(self, idempotency_key, amount, currency, customer_id):
# 1. Check if this key has been seen before
existing = self.db.get_by_idempotency_key(idempotency_key)
if existing:
return existing.result # return cached result, no reprocessing
# 2. Acquire a lock on the idempotency key to prevent concurrent duplicates
lock = self.lock_manager.acquire(f"idem:{idempotency_key}", ttl=30)
if not lock:
raise ConflictError("Request already in progress")
try:
# 3. Process the payment
result = self._charge_payment_method(amount, currency, customer_id)
# 4. Store the result keyed by idempotency key
self.db.save_idempotency_record(idempotency_key, result)
return result
finally:
lock.release()In accounting, every transaction is recorded as two entries: a debit and a credit. The sum of all debits must equal the sum of all credits. This invariant makes it easy to detect errors and ensures the system is always balanced.
Payment of $100 from Customer to Merchant:
┌───────────────────┬──────────┬──────────┐
│ Account │ Debit │ Credit │
├───────────────────┼──────────┼──────────┤
│ Customer Wallet │ │ $100.00 │ ← money leaves customer
│ Merchant Balance │ $97.00 │ │ ← merchant receives $97
│ Platform Revenue │ $3.00 │ │ ← platform takes $3 fee
├───────────────────┼──────────┼──────────┤
│ TOTAL │ $100.00 │ $100.00 │ ← balanced!
└───────────────────┴──────────┴──────────┘
Refund of $100:
┌───────────────────┬──────────┬──────────┐
│ Account │ Debit │ Credit │
├───────────────────┼──────────┼──────────┤
│ Customer Wallet │ $100.00 │ │ ← money returns to customer
│ Merchant Balance │ │ $97.00 │ ← merchant gives back $97
│ Platform Revenue │ │ $3.00 │ ← platform gives back fee
├───────────────────┼──────────┼──────────┤
│ TOTAL │ $100.00 │ $100.00 │ ← balanced!
└───────────────────┴──────────┴──────────┘class Ledger:
def record_payment(self, tx_id, amount, fee, customer_id, merchant_id):
entries = [
LedgerEntry(tx_id, account=f"customer:{customer_id}",
credit=amount, debit=0),
LedgerEntry(tx_id, account=f"merchant:{merchant_id}",
credit=0, debit=amount - fee),
LedgerEntry(tx_id, account="platform:revenue",
credit=0, debit=fee),
]
# All entries are written in a single transaction
self.db.insert_batch(entries)
def verify_balance(self):
total_debits = self.db.sum_all_debits()
total_credits = self.db.sum_all_credits()
assert total_debits == total_credits, "LEDGER IMBALANCE DETECTED"Why double-entry? If you just increment and decrement balances, a bug or crash could leave the system in an inconsistent state (money created or destroyed). Double-entry bookkeeping makes imbalances immediately detectable. Run the balance verification as a scheduled job and alert if it ever fails.
A payment goes through a well-defined lifecycle. Modeling it as an explicit state machine prevents invalid transitions and ensures every state change is auditable.
Payment State Machine:
CREATED → PENDING → AUTHORIZED → CAPTURED → SETTLED
│ │ │
▼ ▼ ▼
FAILED VOIDED REFUNDED
(partial or full)
States:
CREATED: Payment intent created, not yet submitted to payment processor.
PENDING: Submitted to payment processor, awaiting response.
AUTHORIZED: Funds reserved on the customer's card (not yet charged).
CAPTURED: Funds actually charged and transferred.
SETTLED: Funds deposited into the merchant's bank account.
FAILED: Payment processor declined the transaction.
VOIDED: Authorization cancelled before capture.
REFUNDED: Captured payment returned to the customer.Many merchants use a two-phase flow. When a customer places an order, the system authorizes the payment (reserves the funds). Later, when the order ships, the system captures the payment (actually charges the card). This prevents charging for items that are out of stock or cannot be fulfilled.
class Payment:
VALID_TRANSITIONS = {
"CREATED": ["PENDING"],
"PENDING": ["AUTHORIZED", "FAILED"],
"AUTHORIZED": ["CAPTURED", "VOIDED"],
"CAPTURED": ["SETTLED", "REFUNDED"],
"SETTLED": ["REFUNDED"],
}
def transition(self, new_state, metadata=None):
if new_state not in self.VALID_TRANSITIONS.get(self.state, []):
raise InvalidTransitionError(
f"Cannot go from {self.state} to {new_state}"
)
old_state = self.state
self.state = new_state
self.updated_at = now()
# Log every state change for auditing
self.audit_log.append({
"from": old_state,
"to": new_state,
"timestamp": self.updated_at,
"metadata": metadata,
})
# Trigger webhooks
self.webhook_queue.enqueue(PaymentEvent(self.id, new_state))Merchants need to know when payment states change (e.g., a payment succeeded, a refund was processed). Webhooks are HTTP callbacks: your system POSTs an event to a URL the merchant configured.
Webhooks must be delivered at least once. Network failures, merchant server downtime, and timeouts are common. The system must retry failed deliveries.
Webhook Delivery Pipeline:
Payment State Change → Webhook Queue (persistent)
│
▼
Webhook Delivery Worker
│
┌─────────┼─────────┐
▼ ▼ ▼
Success Timeout 5xx Error
(200 OK) (>10s) (500, 502)
│ │ │
▼ ▼ ▼
Done Retry Retry
(exponential backoff)
Retry schedule:
Attempt 1: immediately
Attempt 2: 5 minutes later
Attempt 3: 30 minutes later
Attempt 4: 2 hours later
Attempt 5: 8 hours later
Attempt 6: 24 hours later
After 6 failures: mark as failed, alert merchant via dashboardimport hmac
import hashlib
import time
def sign_webhook(payload, secret):
timestamp = str(int(time.time()))
message = f"{timestamp}.{payload}"
signature = hmac.new(
secret.encode(), message.encode(), hashlib.sha256
).hexdigest()
return timestamp, signature
# Merchant verification:
def verify_webhook(payload, timestamp, signature, secret):
if abs(time.time() - int(timestamp)) > 300: # 5 minute tolerance
return False
expected = hmac.new(
secret.encode(), f"{timestamp}.{payload}".encode(), hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, expected)The Payment Card Industry Data Security Standard (PCI DSS) governs how cardholder data (card numbers, CVV, expiration dates) must be handled. PCI compliance is not optional -- it is legally required for any system that processes credit cards.
Tokenization Flow:
Customer Browser → Stripe.js (client-side SDK)
│
▼
Stripe Token Service (PCI-compliant)
│ returns token: "tok_abc123"
▼
Merchant Backend (never sees card number)
│ sends token + amount to Stripe API
▼
Stripe Payment Service
│ detokenizes, charges card via card network
▼
Result returned to merchantReconciliation is the process of verifying that your internal records match the external records from payment processors and banks. Discrepancies indicate bugs, fraud, or timing issues.
1. Export all transactions from your database for the day.
2. Download settlement reports from each payment processor (Visa, Mastercard, etc.).
3. Match each internal transaction to the corresponding external record.
4. Flag discrepancies:
- Transaction in your system but not in processor report (we think it succeeded, but it did not).
- Transaction in processor report but not in your system (we missed recording it).
- Amount mismatches (currency conversion issues, fee calculation errors).
5. Investigate and resolve each discrepancy.
Common causes of discrepancies:
- Race conditions during payment processing.
- Network timeouts where the payment succeeded but the response was lost.
- Currency conversion rounding differences.
- Chargebacks processed by the bank but not yet reflected in your system.def reconcile(internal_transactions, processor_report):
internal_map = {tx.processor_ref: tx for tx in internal_transactions}
external_map = {entry.ref_id: entry for entry in processor_report}
discrepancies = []
for ref, tx in internal_map.items():
if ref not in external_map:
discrepancies.append(("MISSING_EXTERNAL", ref, tx))
elif tx.amount != external_map[ref].amount:
discrepancies.append(("AMOUNT_MISMATCH", ref, tx, external_map[ref]))
for ref, entry in external_map.items():
if ref not in internal_map:
discrepancies.append(("MISSING_INTERNAL", ref, entry))
return discrepanciesPayment processing involves multiple external systems (card networks, banks, fraud detection), each of which can fail. The system must handle failures gracefully without losing money.
The most dangerous failure mode: you send a charge request to the payment processor, and the connection times out. Did the charge go through? You do not know.
A payment often involves multiple steps (debit customer, credit merchant, record in ledger). If one step fails, the others must be rolled back.
Saga Pattern for Payment:
Step 1: Reserve funds (authorize)
Compensation: Release funds (void authorization)
Step 2: Record ledger entries
Compensation: Reverse ledger entries
Step 3: Capture funds
Compensation: Refund
If Step 3 fails:
Execute compensation for Step 2, then Step 1 (reverse order) ┌──────────────┐
Merchants──> │ API Gateway │
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Payment │ │ Webhook │ │ Ledger │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│Payment DB│ │ Message │ │ Ledger DB│
│(Postgres)│ │ Queue │ │(Postgres)│
└────┬─────┘ └──────────┘ └──────────┘
│
▼
┌──────────────────────┐
│ Payment Processor │
│ (Visa/MC/Bank APIs) │
└──────────────────────┘