Scalability Patterns

API Gateway

A single entry point for all client requests. Handles authentication, rate limiting, routing, and protocol translation. The front door of microservices.

API Gateway

In a microservices architecture, a client might need data from five different services to render a single page: user profiles from the User Service, order history from the Order Service, recommendations from the ML Service, notifications from the Notification Service, and settings from the Config Service. Without an API gateway, the client would need to know the address of each service, handle authentication with each one, deal with different protocols, and make multiple round trips. An API gateway is the single entry point that sits between clients and your microservices, handling cross-cutting concerns so that individual services do not have to.

What an API Gateway Does

Request Routing

The most fundamental function. The gateway receives a client request and routes it to the appropriate backend service based on the URL path, HTTP method, headers, or other criteria.

1

2

3

4

5

6

Client Request                  Gateway Routes To
────────────────────────────    ──────────────────────
GET  /api/users/42          →   User Service
POST /api/orders             →   Order Service
GET  /api/products/search    →   Search Service
GET  /api/feed               →   Feed Service

Authentication and Authorization

Instead of each microservice implementing its own auth, the gateway validates JWT tokens, API keys, or OAuth tokens at the edge. Services behind the gateway receive pre-authenticated requests with user identity in headers.

Client → [Authorization: Bearer eyJhbGc...] → API Gateway
                                                  │
                                          Validate JWT token
                                          Extract user_id, roles
                                                  │
                                          Forward with headers:
                                          X-User-Id: 42
                                          X-User-Roles: admin,user
                                                  │
                                                  ▼
                                          Backend Service
                                          (trusts gateway headers)

Rate Limiting

Protect backend services from being overwhelmed. The gateway enforces per-client, per-endpoint, or global rate limits.

# Rate limiting strategies
rate_limits = {
    "free_tier": {
        "requests_per_minute": 60,
        "requests_per_day": 10000,
    },
    "premium_tier": {
        "requests_per_minute": 600,
        "requests_per_day": 1000000,
    }
}

# Implementation using sliding window counter in Redis
def check_rate_limit(client_id, tier):
    key = f"rate:{client_id}:{current_minute()}"
    count = redis.incr(key)
    redis.expire(key, 60)

    limit = rate_limits[tier]["requests_per_minute"]
    if count > limit:
        return 429  # Too Many Requests
    return None  # Allowed

Protocol Translation

Clients speak HTTP/REST. Internal services might use gRPC, GraphQL, WebSockets, or message queues. The gateway translates between them.

1

2

3

Client (REST/JSON)  →  API Gateway  →  gRPC Service
Client (REST/JSON)  →  API Gateway  →  GraphQL Service
Client (WebSocket)  →  API Gateway  →  Message Queue (Kafka)

Response Aggregation

A single client request might require data from multiple services. The gateway can fan out the request, call multiple services in parallel, merge the responses, and return a single response.

Client: GET /api/dashboard

Gateway:
  ├── GET user-service/profile/42      (parallel)
  ├── GET order-service/recent/42      (parallel)
  ├── GET notification-service/count/42 (parallel)
  └── GET recommendation-service/42     (parallel)

Gateway merges responses:
{
  "profile": { "name": "Alice", ... },
  "recentOrders": [...],
  "unreadNotifications": 5,
  "recommendations": [...]
}

This reduces the number of client-to-server round trips, which is especially important for mobile clients on high-latency networks.

Request/Response Transformation

The gateway can transform requests and responses to match what clients and services expect:

Add or remove headers.
Transform request/response bodies (e.g., XML to JSON).
Filter sensitive fields from responses.
Version translation (route /v1/users to the legacy service, /v2/users to the new service).

Caching

The gateway can cache responses from backend services, reducing load and latency for frequently accessed, slowly changing data.

1

2

3

GET /api/products/popular
  → Cache hit (TTL: 5 minutes) → Return cached response (1ms)
  → Cache miss → Call Product Service → Cache response → Return (50ms)

Load Balancing

The gateway distributes requests across multiple instances of a backend service using round-robin, least connections, or weighted algorithms.

Observability

Centralized logging, metrics, and distributed tracing. The gateway assigns a trace ID to every request and propagates it through all downstream calls, making it easy to trace a request across services.

API Gateway Patterns

Single Gateway

One gateway handles all API traffic. Simple to manage but can become a bottleneck and single point of failure.

Backend for Frontend (BFF)

A separate gateway for each client type (web, mobile, IoT). Each BFF is tailored to its client's needs, aggregating and transforming data differently.

1

2

3

Web Client   → Web BFF Gateway   → Microservices
Mobile App   → Mobile BFF Gateway → Microservices
IoT Devices  → IoT BFF Gateway   → Microservices

The mobile BFF might return smaller payloads and fewer fields. The web BFF might return richer data with more nested objects. This avoids a one-size-fits-all API that satisfies no one.

Edge Gateway + Internal Gateway

An edge gateway handles external traffic (authentication, rate limiting, TLS termination). An internal gateway handles service-to-service routing within the cluster. This separation keeps the edge gateway lightweight and the internal gateway focused on routing.

Real-World API Gateways

Kong

Open-source, built on Nginx and OpenResty (Lua). Plugin-based architecture for adding authentication, rate limiting, logging, and more. Runs as a reverse proxy with a PostgreSQL or Cassandra datastore for configuration.

Strengths: Large plugin ecosystem, high performance, Kubernetes-native with Kong Ingress Controller. Weaknesses: Complex configuration at scale, plugin compatibility across versions.

AWS API Gateway

Fully managed service. Handles REST, HTTP, and WebSocket APIs. Integrates with Lambda, IAM, Cognito, and other AWS services.

Strengths: Zero ops, auto-scaling, native AWS integration, usage plans and API keys. Weaknesses: Cold start latency with Lambda integration, AWS vendor lock-in, cost at high throughput.

1

2

3

4

5

6

7

AWS API Gateway pricing (approximate):
  REST API: $3.50 per million requests + data transfer
  HTTP API: $1.00 per million requests (simpler, cheaper)

  At 1 billion requests/month:
    REST: ~$3,500/month
    HTTP: ~$1,000/month

Envoy (Service Mesh Sidecar)

Not a traditional API gateway, but increasingly used as one. Envoy is a high-performance proxy that runs as a sidecar alongside each service. In a service mesh (Istio), every service gets an Envoy sidecar, and traffic between services flows through these proxies.

Strengths: Rich observability, advanced load balancing (circuit breaking, outlier detection), gRPC-native. Weaknesses: Operational complexity, resource overhead of running a sidecar per service.

NGINX

The most widely deployed reverse proxy. NGINX Plus adds commercial features like active health checks, JWT validation, and a REST API for dynamic configuration.

Traefik

Cloud-native reverse proxy with automatic service discovery. Integrates with Docker, Kubernetes, and Consul. Auto-configures routes based on container labels.

Trade-Offs

Gateway as a Bottleneck

Every request flows through the gateway. If it goes down, everything goes down. Mitigations:

Deploy multiple gateway instances behind a load balancer.
Use health checks and auto-scaling.
Keep the gateway stateless so any instance can handle any request.

Added Latency

The gateway adds a network hop. At the edge, this is typically 1-5ms. Mitigations:

Keep the gateway lightweight. Complex business logic belongs in services, not the gateway.
Deploy gateways close to services (same data center, same Kubernetes cluster).

Tight Coupling Risk

If the gateway contains routing rules for every service, changes to any service require gateway updates. Mitigations:

Use service discovery (Consul, Kubernetes DNS) for dynamic routing.
Let services register their own routes.
Use configuration-as-code and CI/CD for gateway changes.

Over-Aggregation

It is tempting to put complex aggregation logic in the gateway. This turns the gateway into a monolith that knows about every service's data model. Keep aggregation simple in the gateway; use dedicated BFF services for complex orchestration.

Gateway vs Service Mesh

Feature	API Gateway	Service Mesh (Istio/Linkerd)
Position	Edge (north-south traffic)	Internal (east-west traffic)
Auth	Client authentication	Mutual TLS between services
Routing	Path-based, external	Service-to-service, internal
Rate limiting	Per-client	Per-service
Observability	External request tracing	Internal request tracing
Implementation	Centralized proxy	Distributed sidecars

In practice, you often use both: an API gateway at the edge for external clients and a service mesh internally for service-to-service communication.

Interview Tips

Start with the problem. Explain why clients talking directly to microservices is problematic: multiple round trips, duplicated auth logic, protocol mismatch, tight coupling.
List the core responsibilities. Routing, authentication, rate limiting, and protocol translation are the essentials. Mention aggregation and caching as bonuses.
Discuss the BFF pattern. If the system has multiple client types (web, mobile, IoT), propose separate gateways. This shows architectural maturity.
Address the single point of failure. The gateway is critical infrastructure. Explain how you make it highly available (multiple instances, stateless design, health checks).
Keep it thin. Emphasize that the gateway should not contain business logic. It is a routing and cross-cutting concerns layer.
Name real technologies. Mentioning Kong, AWS API Gateway, or Envoy shows you have practical experience or awareness.
Connect to rate limiting algorithms. If the interviewer probes on rate limiting, discuss token bucket, sliding window counter, or leaky bucket algorithms.