Fundamentals
Understand when to scale up a single machine versus scaling out across many. Covers the trade-offs, bottlenecks, and when each approach makes sense.
Every system eventually hits a wall. Users grow, data accumulates, and the single server that worked fine for your first thousand users starts buckling under load. At that point you face a fundamental decision: do you make your existing machine bigger, or do you add more machines? This is the vertical vs horizontal scaling question, and it comes up in nearly every system design interview.
Vertical scaling means giving your existing server more resources — more CPU cores, more RAM, faster disks, better network cards. You take the same machine and make it beefier.
Vertical scaling is simple. Your application code does not change. You do not need to worry about distributing state across machines, coordinating between nodes, or handling network partitions. A single powerful machine means a single point of coordination: one database, one cache, one application process that can access all the data it needs through local memory.
For many early-stage systems, vertical scaling is the right answer. A modern server with 128 CPU cores, 2 TB of RAM, and NVMe storage can handle a surprising amount of work. Companies like Stack Overflow famously served millions of users from a small number of very powerful machines.
There is a hard ceiling on how big a single machine can get. You cannot buy a server with a million cores. As you approach the high end of available hardware, costs increase super-linearly — a machine with twice the resources often costs more than twice as much.
More critically, a single machine is a single point of failure. If it goes down, everything goes down. You get zero fault tolerance from vertical scaling alone.
Vertical Scaling Curve:
Performance ▲
│ ╭── Hardware ceiling
│ ╱
│ ╱
│ ╱
│ ╱
│╱
└──────────────────► Cost
(super-linear cost growth)Horizontal scaling means adding more machines to your system. Instead of one powerful server, you have ten, a hundred, or a thousand commodity servers working together.
Horizontal scaling has no theoretical ceiling. Need more capacity? Add more nodes. It also gives you fault tolerance — if one machine dies, the others keep serving traffic. Cloud providers make this especially easy: you can spin up new instances in seconds.
Horizontal scaling introduces significant architectural complexity:
The key insight for horizontal scaling is to make your services stateless. A stateless service stores no local data between requests — all state lives in an external store (database, cache, object storage). When a service is stateless, any instance can handle any request, and you can add or remove instances freely.
# Stateful — hard to scale horizontally
class SessionServer:
def __init__(self):
self.sessions = {} # Local state — tied to this machine
def handle_request(self, user_id, request):
session = self.sessions.get(user_id) # Only works if same server
return process(session, request)
# Stateless — easy to scale horizontally
class StatelessServer:
def __init__(self, redis_client):
self.redis = redis_client # External state store
def handle_request(self, user_id, request):
session = self.redis.get(f"session:{user_id}") # Any server can read this
return process(session, request)┌──────────────────────┬────────────────────────┬────────────────────────┐
│ Dimension │ Vertical │ Horizontal │
├──────────────────────┼────────────────────────┼────────────────────────┤
│ Complexity │ Low │ High │
│ Cost curve │ Super-linear │ Near-linear │
│ Upper limit │ Hardware ceiling │ Practically unlimited │
│ Fault tolerance │ None (single machine) │ Built-in redundancy │
│ Downtime to scale │ Often yes (reboot) │ No (add nodes live) │
│ Data consistency │ Simple (one node) │ Requires coordination │
│ Network overhead │ None (local) │ Significant │
│ Best for │ Early stage, databases │ Web servers, stateless │
└──────────────────────┴────────────────────────┴────────────────────────┘Stateless application servers are straightforward to scale horizontally. Databases are not. Your database is inherently stateful, and scaling it out requires one of several strategies:
Read replicas: Replicate data to follower databases that handle read queries. The leader handles all writes. This works when your workload is read-heavy (most web applications).
Sharding: Split your data across multiple databases, each responsible for a subset of the data. This is the path to truly horizontal database scaling, but it introduces complexity around cross-shard queries, rebalancing, and application-level routing.
Vertical scaling first: Many experienced engineers advocate scaling your database vertically as long as possible. A single, powerful database machine is far simpler to operate than a sharded cluster. Amazon RDS instances go up to 128 vCPUs and 1 TB of RAM — that handles a lot of queries.
Typical scaling journey:
1. Single server (app + DB)
2. Separate app and DB servers
3. Add read replicas for the DB
4. Scale app servers horizontally (stateless)
5. Add caching layer (Redis/Memcached)
6. Shard the database (only when necessary)In cloud environments, horizontal scaling can be automated. Auto-scaling groups monitor metrics like CPU utilization, request queue depth, or custom application metrics, and add or remove instances accordingly.
Key considerations for auto-scaling:
# Simplified auto-scaling logic
def evaluate_scaling(metrics, config):
avg_cpu = metrics.get_avg_cpu(window_minutes=5)
current_instances = metrics.get_instance_count()
if avg_cpu > config.scale_up_threshold: # e.g., 70%
desired = min(current_instances + config.scale_up_step,
config.max_instances)
elif avg_cpu < config.scale_down_threshold: # e.g., 30%
desired = max(current_instances - config.scale_down_step,
config.min_instances)
else:
desired = current_instances
return desiredNetflix: Horizontal scaling with thousands of stateless microservices on AWS. Each service auto-scales independently. Data is sharded across Cassandra clusters.
Stack Overflow: Primarily vertical scaling. Serves hundreds of millions of page views with a handful of powerful servers. They chose simplicity over distributed complexity.
Slack: Hybrid approach. Application servers scale horizontally, but they invested heavily in scaling their MySQL databases vertically before eventually sharding.
Start vertical when:
Go horizontal when:
When discussing scaling in an interview, demonstrate that you understand the nuance:
The interviewer wants to see that you can reason about trade-offs, not that you always reach for the most complex solution. Sometimes the best answer is a bigger machine.