Scalability Patterns
A single entry point for all client requests. Handles authentication, rate limiting, routing, and protocol translation. The front door of microservices.
In a microservices architecture, a client might need data from five different services to render a single page: user profiles from the User Service, order history from the Order Service, recommendations from the ML Service, notifications from the Notification Service, and settings from the Config Service. Without an API gateway, the client would need to know the address of each service, handle authentication with each one, deal with different protocols, and make multiple round trips. An API gateway is the single entry point that sits between clients and your microservices, handling cross-cutting concerns so that individual services do not have to.
The most fundamental function. The gateway receives a client request and routes it to the appropriate backend service based on the URL path, HTTP method, headers, or other criteria.
Client Request Gateway Routes To
──────────────────────────── ──────────────────────
GET /api/users/42 → User Service
POST /api/orders → Order Service
GET /api/products/search → Search Service
GET /api/feed → Feed ServiceInstead of each microservice implementing its own auth, the gateway validates JWT tokens, API keys, or OAuth tokens at the edge. Services behind the gateway receive pre-authenticated requests with user identity in headers.
Client → [Authorization: Bearer eyJhbGc...] → API Gateway
│
Validate JWT token
Extract user_id, roles
│
Forward with headers:
X-User-Id: 42
X-User-Roles: admin,user
│
▼
Backend Service
(trusts gateway headers)Protect backend services from being overwhelmed. The gateway enforces per-client, per-endpoint, or global rate limits.
# Rate limiting strategies
rate_limits = {
"free_tier": {
"requests_per_minute": 60,
"requests_per_day": 10000,
},
"premium_tier": {
"requests_per_minute": 600,
"requests_per_day": 1000000,
}
}
# Implementation using sliding window counter in Redis
def check_rate_limit(client_id, tier):
key = f"rate:{client_id}:{current_minute()}"
count = redis.incr(key)
redis.expire(key, 60)
limit = rate_limits[tier]["requests_per_minute"]
if count > limit:
return 429 # Too Many Requests
return None # AllowedClients speak HTTP/REST. Internal services might use gRPC, GraphQL, WebSockets, or message queues. The gateway translates between them.
Client (REST/JSON) → API Gateway → gRPC Service
Client (REST/JSON) → API Gateway → GraphQL Service
Client (WebSocket) → API Gateway → Message Queue (Kafka)A single client request might require data from multiple services. The gateway can fan out the request, call multiple services in parallel, merge the responses, and return a single response.
Client: GET /api/dashboard
Gateway:
├── GET user-service/profile/42 (parallel)
├── GET order-service/recent/42 (parallel)
├── GET notification-service/count/42 (parallel)
└── GET recommendation-service/42 (parallel)
Gateway merges responses:
{
"profile": { "name": "Alice", ... },
"recentOrders": [...],
"unreadNotifications": 5,
"recommendations": [...]
}This reduces the number of client-to-server round trips, which is especially important for mobile clients on high-latency networks.
The gateway can transform requests and responses to match what clients and services expect:
The gateway can cache responses from backend services, reducing load and latency for frequently accessed, slowly changing data.
GET /api/products/popular
→ Cache hit (TTL: 5 minutes) → Return cached response (1ms)
→ Cache miss → Call Product Service → Cache response → Return (50ms)The gateway distributes requests across multiple instances of a backend service using round-robin, least connections, or weighted algorithms.
Centralized logging, metrics, and distributed tracing. The gateway assigns a trace ID to every request and propagates it through all downstream calls, making it easy to trace a request across services.
One gateway handles all API traffic. Simple to manage but can become a bottleneck and single point of failure.
A separate gateway for each client type (web, mobile, IoT). Each BFF is tailored to its client's needs, aggregating and transforming data differently.
Web Client → Web BFF Gateway → Microservices
Mobile App → Mobile BFF Gateway → Microservices
IoT Devices → IoT BFF Gateway → MicroservicesThe mobile BFF might return smaller payloads and fewer fields. The web BFF might return richer data with more nested objects. This avoids a one-size-fits-all API that satisfies no one.
An edge gateway handles external traffic (authentication, rate limiting, TLS termination). An internal gateway handles service-to-service routing within the cluster. This separation keeps the edge gateway lightweight and the internal gateway focused on routing.
Open-source, built on Nginx and OpenResty (Lua). Plugin-based architecture for adding authentication, rate limiting, logging, and more. Runs as a reverse proxy with a PostgreSQL or Cassandra datastore for configuration.
Strengths: Large plugin ecosystem, high performance, Kubernetes-native with Kong Ingress Controller. Weaknesses: Complex configuration at scale, plugin compatibility across versions.
Fully managed service. Handles REST, HTTP, and WebSocket APIs. Integrates with Lambda, IAM, Cognito, and other AWS services.
Strengths: Zero ops, auto-scaling, native AWS integration, usage plans and API keys. Weaknesses: Cold start latency with Lambda integration, AWS vendor lock-in, cost at high throughput.
AWS API Gateway pricing (approximate):
REST API: $3.50 per million requests + data transfer
HTTP API: $1.00 per million requests (simpler, cheaper)
At 1 billion requests/month:
REST: ~$3,500/month
HTTP: ~$1,000/monthNot a traditional API gateway, but increasingly used as one. Envoy is a high-performance proxy that runs as a sidecar alongside each service. In a service mesh (Istio), every service gets an Envoy sidecar, and traffic between services flows through these proxies.
Strengths: Rich observability, advanced load balancing (circuit breaking, outlier detection), gRPC-native. Weaknesses: Operational complexity, resource overhead of running a sidecar per service.
The most widely deployed reverse proxy. NGINX Plus adds commercial features like active health checks, JWT validation, and a REST API for dynamic configuration.
Cloud-native reverse proxy with automatic service discovery. Integrates with Docker, Kubernetes, and Consul. Auto-configures routes based on container labels.
Every request flows through the gateway. If it goes down, everything goes down. Mitigations:
The gateway adds a network hop. At the edge, this is typically 1-5ms. Mitigations:
If the gateway contains routing rules for every service, changes to any service require gateway updates. Mitigations:
It is tempting to put complex aggregation logic in the gateway. This turns the gateway into a monolith that knows about every service's data model. Keep aggregation simple in the gateway; use dedicated BFF services for complex orchestration.
| Feature | API Gateway | Service Mesh (Istio/Linkerd) |
|---|---|---|
| Position | Edge (north-south traffic) | Internal (east-west traffic) |
| Auth | Client authentication | Mutual TLS between services |
| Routing | Path-based, external | Service-to-service, internal |
| Rate limiting | Per-client | Per-service |
| Observability | External request tracing | Internal request tracing |
| Implementation | Centralized proxy | Distributed sidecars |
In practice, you often use both: an API gateway at the edge for external clients and a service mesh internally for service-to-service communication.