Classic Designs
Design YouTube or Netflix. Covers video transcoding, adaptive bitrate streaming, CDN delivery, view counting at scale, and recommendations.
Designing YouTube or Netflix is a classic system design question that covers the full spectrum: large file uploads, compute-heavy transcoding, CDN-based delivery, adaptive bitrate streaming, and the challenge of counting views at massive scale. The system has two very different workloads -- a write-heavy upload pipeline and a read-heavy streaming path -- and they must be designed separately.
Assumptions:
- 500 hours of video uploaded per minute
- Average video: 10 minutes, 1 GB raw → 500 MB after compression
- Upload storage: 500 hr/min × 60 min × 24 hr = 720,000 hours/day
= 720,000 × 6 × 500 MB (6 resolutions) = 2.16 PB/day
- Streaming: 1 billion video views/day
Average view: 5 minutes at 5 Mbps = 187 MB
Total bandwidth: 1B × 187 MB = 187 PB/day ≈ 17 Tbps
CDN: Must serve 17 Tbps globally. This requires thousands of edge servers.The upload pipeline is a multi-stage, asynchronous process. A user uploads a raw video, and the system must transcode it, generate thumbnails, extract metadata, and make it available for streaming.
Upload Pipeline:
User Upload → Upload Service → Object Storage (raw video)
│
▼
┌──────────────┐
│ Message Queue │ (Kafka / SQS)
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│Transcoder│ │Thumbnail │ │ Metadata │
│ Workers │ │Generator │ │ Extractor│
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
CDN Origin Thumbnail Video DB
(transcoded Storage (title, tags,
segments) duration, etc.)Large video files (multi-GB) must be uploaded in chunks for reliability. The client splits the file into chunks (e.g., 5 MB each) and uploads them sequentially or in parallel. The server tracks which chunks have been received and reassembles them.
# Upload API
POST /upload/init
Request: { filename: "video.mp4", size: 2147483648, content_type: "video/mp4" }
Response: { upload_id: "abc123", chunk_size: 5242880, total_chunks: 410 }
PUT /upload/{upload_id}/chunk/{chunk_number}
Request: [binary chunk data]
Response: { received: true }
POST /upload/{upload_id}/complete
Response: { video_id: "v_98765", status: "processing" }Raw uploaded videos come in many formats (MP4, AVI, MOV) and resolutions. The transcoding pipeline converts each video into multiple formats and resolutions for adaptive streaming.
Input: raw_video.mp4 (1080p, H.264, 2 GB)
Transcoding outputs:
├── 2160p (4K) @ 15 Mbps → H.264/H.265
├── 1080p @ 5 Mbps → H.264
├── 720p @ 2.5 Mbps → H.264
├── 480p @ 1 Mbps → H.264
├── 360p @ 0.5 Mbps → H.264
└── 240p @ 0.3 Mbps → H.264
Each output is split into segments (2-10 seconds each) for chunked streaming.Transcoding is CPU-intensive. A single 10-minute video at 6 resolutions can take 30-60 minutes of CPU time. This is done by a pool of worker machines (often GPU-accelerated) that pull jobs from a message queue.
Parallel transcoding: Each resolution can be transcoded independently and in parallel. Additionally, the video can be split into time segments, and each segment can be transcoded independently. This massively reduces wall-clock transcoding time.
Extract frames at regular intervals (every 10 seconds) and at scene transitions. Generate thumbnails at multiple sizes for different contexts (search results, video player, mobile).
# Thumbnail generation (simplified)
def generate_thumbnails(video_path, interval_sec=10):
thumbnails = []
duration = get_video_duration(video_path)
for timestamp in range(0, int(duration), interval_sec):
frame = extract_frame(video_path, timestamp)
for size in [(320, 180), (640, 360), (1280, 720)]:
thumb = resize(frame, size)
url = upload_to_storage(thumb, f"{video_id}_{timestamp}_{size}.jpg")
thumbnails.append({"timestamp": timestamp, "size": size, "url": url})
return thumbnailsModern video streaming does not download the entire file. Instead, it uses adaptive bitrate streaming protocols: HLS (HTTP Live Streaming, Apple) or DASH (Dynamic Adaptive Streaming over HTTP, open standard).
Manifest file (simplified HLS .m3u8):
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8
--- 1080p/playlist.m3u8 ---
#EXTM3U
#EXTINF:4.0,
segment_001.ts
#EXTINF:4.0,
segment_002.ts
#EXTINF:4.0,
segment_003.tsThe player continuously measures download speed. If the user's bandwidth drops (e.g., they switch from Wi-Fi to cellular), the player switches to a lower quality for the next segment. If bandwidth improves, it switches back up. This happens seamlessly -- the user sees a brief quality change but never a buffering spinner.
Time: 0s 4s 8s 12s 16s 20s 24s
1080p 1080p 720p 480p 720p 1080p 1080p
↑ ↑
Bandwidth drop Bandwidth recoversAt 17 Tbps of streaming bandwidth, serving videos from a single data center is impossible. A Content Delivery Network (CDN) caches video segments on edge servers close to users worldwide.
User in Tokyo → Tokyo Edge Server (cache hit) → Stream video
│ (cache miss)
▼
Regional PoP (Japan)
│ (cache miss)
▼
Origin Server (US)Large platforms like Netflix use multiple CDN providers (Akamai, CloudFront, their own Open Connect) and route users to the best CDN based on:
Counting views seems simple, but at 1 billion views/day, a naive `UPDATE videos SET views = views + 1` approach creates a massive write bottleneck on a single database row.
1. User watches a video → View event published to Kafka
2. Kafka topic partitioned by video_id
3. View Counter Service consumes events in batches
4. Aggregates counts over a time window (e.g., 1 minute)
5. Batch updates the database: UPDATE videos SET views = views + 12847 WHERE id = 'v_001'class ViewCounter:
def __init__(self, redis_client, kafka_producer):
self.redis = redis_client
self.kafka = kafka_producer
def record_view(self, video_id, user_id, timestamp):
# Fast path: increment Redis counter for display
self.redis.incr(f"views:{video_id}")
# Durable path: publish to Kafka for exact counting
event = {"video_id": video_id, "user_id": user_id, "ts": timestamp}
self.kafka.produce("video-views", key=video_id, value=event)Not all views are legitimate. Bots, click farms, and auto-refreshing pages inflate view counts. Filter fraudulent views by:
The recommendation system determines what videos appear on the homepage and in the "Up Next" sidebar. This is a complex ML system, but at the architecture level:
User Request → Recommendation Service
│
┌────────────┼────────────┐
▼ ▼ ▼
Candidate Ranking Filtering
Generation Model (age-gate,
(1000s of (score each region block,
candidates) candidate) etc.)
│ │ │
└────────────┼────────────┘
▼
Top N recommendationsCandidate generation uses collaborative filtering ("users who watched X also watched Y") and content-based similarity. Ranking uses a neural network that considers watch history, engagement signals, video freshness, and creator quality. The results are cached per user and refreshed periodically.
┌──────────────┐
Clients ────>│ Load Balancer│
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Upload │ │ Streaming│ │ API │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Object │ │ CDN │ │ Video DB │
│ Storage │ │ (global) │ │ + Redis │
│ (raw) │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘
│
▼
┌──────────┐
│Transcoder│
│ Workers │
└──────────┘