← back

Classic Designs

Video Streaming Platform

Design YouTube or Netflix. Covers video transcoding, adaptive bitrate streaming, CDN delivery, view counting at scale, and recommendations.

Designing a Video Streaming Platform

Designing YouTube or Netflix is a classic system design question that covers the full spectrum: large file uploads, compute-heavy transcoding, CDN-based delivery, adaptive bitrate streaming, and the challenge of counting views at massive scale. The system has two very different workloads -- a write-heavy upload pipeline and a read-heavy streaming path -- and they must be designed separately.

Requirements

Functional Requirements

  • Users can upload videos (up to 10 GB, multiple formats).
  • Users can stream videos with smooth playback and minimal buffering.
  • Adaptive bitrate: video quality adjusts to the viewer's network speed.
  • Thumbnail generation for video previews.
  • View counts, likes, and comments.
  • Search and discovery (recommendation feed).

Non-Functional Requirements

  • Availability: 99.99% uptime. Video playback must not be interrupted.
  • Latency: Video should start playing within 2 seconds (time to first byte).
  • Scale: Support 2 billion monthly active users, 500 hours of video uploaded per minute (YouTube scale).
  • Global delivery: Low-latency streaming for users worldwide.

Capacity Estimation

1
2
3
4
5
6
7
8
9
10
Assumptions:
  - 500 hours of video uploaded per minute
  - Average video: 10 minutes, 1 GB raw → 500 MB after compression
  - Upload storage: 500 hr/min × 60 min × 24 hr = 720,000 hours/day
    = 720,000 × 6 × 500 MB (6 resolutions) = 2.16 PB/day
  - Streaming: 1 billion video views/day
    Average view: 5 minutes at 5 Mbps = 187 MB
    Total bandwidth: 1B × 187 MB = 187 PB/day ≈ 17 Tbps

CDN: Must serve 17 Tbps globally. This requires thousands of edge servers.

Video Upload Pipeline

The upload pipeline is a multi-stage, asynchronous process. A user uploads a raw video, and the system must transcode it, generate thumbnails, extract metadata, and make it available for streaming.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Upload Pipeline:

  User Upload → Upload Service → Object Storage (raw video)
                     │
                     ▼
              ┌──────────────┐
              │ Message Queue │  (Kafka / SQS)
              └──────┬───────┘
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐
  │Transcoder│ │Thumbnail │ │ Metadata │
  │ Workers  │ │Generator │ │ Extractor│
  └────┬─────┘ └────┬─────┘ └────┬─────┘
       │             │             │
       ▼             ▼             ▼
  CDN Origin    Thumbnail     Video DB
  (transcoded   Storage       (title, tags,
   segments)                   duration, etc.)

Chunked Upload

Large video files (multi-GB) must be uploaded in chunks for reliability. The client splits the file into chunks (e.g., 5 MB each) and uploads them sequentially or in parallel. The server tracks which chunks have been received and reassembles them.

1
2
3
4
5
6
7
8
9
10
11
# Upload API
POST /upload/init
  Request:  { filename: "video.mp4", size: 2147483648, content_type: "video/mp4" }
  Response: { upload_id: "abc123", chunk_size: 5242880, total_chunks: 410 }

PUT /upload/{upload_id}/chunk/{chunk_number}
  Request: [binary chunk data]
  Response: { received: true }

POST /upload/{upload_id}/complete
  Response: { video_id: "v_98765", status: "processing" }

Transcoding

Raw uploaded videos come in many formats (MP4, AVI, MOV) and resolutions. The transcoding pipeline converts each video into multiple formats and resolutions for adaptive streaming.

1
2
3
4
5
6
7
8
9
10
11
Input: raw_video.mp4 (1080p, H.264, 2 GB)

Transcoding outputs:
  ├── 2160p (4K)  @ 15 Mbps  → H.264/H.265
  ├── 1080p       @ 5 Mbps   → H.264
  ├── 720p        @ 2.5 Mbps → H.264
  ├── 480p        @ 1 Mbps   → H.264
  ├── 360p        @ 0.5 Mbps → H.264
  └── 240p        @ 0.3 Mbps → H.264

Each output is split into segments (2-10 seconds each) for chunked streaming.

Transcoding is CPU-intensive. A single 10-minute video at 6 resolutions can take 30-60 minutes of CPU time. This is done by a pool of worker machines (often GPU-accelerated) that pull jobs from a message queue.

Parallel transcoding: Each resolution can be transcoded independently and in parallel. Additionally, the video can be split into time segments, and each segment can be transcoded independently. This massively reduces wall-clock transcoding time.

Thumbnail Generation

Extract frames at regular intervals (every 10 seconds) and at scene transitions. Generate thumbnails at multiple sizes for different contexts (search results, video player, mobile).

1
2
3
4
5
6
7
8
9
10
11
# Thumbnail generation (simplified)
def generate_thumbnails(video_path, interval_sec=10):
    thumbnails = []
    duration = get_video_duration(video_path)
    for timestamp in range(0, int(duration), interval_sec):
        frame = extract_frame(video_path, timestamp)
        for size in [(320, 180), (640, 360), (1280, 720)]:
            thumb = resize(frame, size)
            url = upload_to_storage(thumb, f"{video_id}_{timestamp}_{size}.jpg")
            thumbnails.append({"timestamp": timestamp, "size": size, "url": url})
    return thumbnails

Adaptive Bitrate Streaming (HLS/DASH)

Modern video streaming does not download the entire file. Instead, it uses adaptive bitrate streaming protocols: HLS (HTTP Live Streaming, Apple) or DASH (Dynamic Adaptive Streaming over HTTP, open standard).

How It Works

  1. The video is split into small segments (2-10 seconds each) at multiple quality levels.
  2. A manifest file (HLS: .m3u8, DASH: .mpd) lists all available segments and quality levels.
  3. The player downloads the manifest, then requests segments one at a time.
  4. Based on current bandwidth, the player chooses the appropriate quality level for each segment.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Manifest file (simplified HLS .m3u8):

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8

--- 1080p/playlist.m3u8 ---
#EXTM3U
#EXTINF:4.0,
segment_001.ts
#EXTINF:4.0,
segment_002.ts
#EXTINF:4.0,
segment_003.ts

Adaptive Quality Switching

The player continuously measures download speed. If the user's bandwidth drops (e.g., they switch from Wi-Fi to cellular), the player switches to a lower quality for the next segment. If bandwidth improves, it switches back up. This happens seamlessly -- the user sees a brief quality change but never a buffering spinner.

1
2
3
4
Time: 0s   4s   8s   12s  16s  20s  24s
      1080p 1080p 720p  480p  720p  1080p 1080p
                    ↑          ↑
              Bandwidth drop   Bandwidth recovers

CDN Delivery

At 17 Tbps of streaming bandwidth, serving videos from a single data center is impossible. A Content Delivery Network (CDN) caches video segments on edge servers close to users worldwide.

CDN Architecture

1
2
3
4
5
6
7
User in Tokyo → Tokyo Edge Server (cache hit) → Stream video
                     │ (cache miss)
                     ▼
              Regional PoP (Japan)
                     │ (cache miss)
                     ▼
              Origin Server (US)

Cache Strategy

  • Popular videos (top 20% that receive 80% of views) are cached on edge servers globally. These are pushed proactively (push-based CDN).
  • Long-tail videos are fetched on demand. The first viewer in a region triggers a cache pull from the origin. Subsequent viewers are served from the edge cache.
  • Cache by segment: Individual segments are cached independently. The first few segments of a video are cached more aggressively (most users watch the beginning).

Multi-CDN Strategy

Large platforms like Netflix use multiple CDN providers (Akamai, CloudFront, their own Open Connect) and route users to the best CDN based on:

  • Geographic proximity.
  • Current load on each CDN.
  • Measured latency and throughput.

View Counting at Scale

Counting views seems simple, but at 1 billion views/day, a naive `UPDATE videos SET views = views + 1` approach creates a massive write bottleneck on a single database row.

Approach: Batch Counting with Kafka

1
2
3
4
5
1. User watches a video → View event published to Kafka
2. Kafka topic partitioned by video_id
3. View Counter Service consumes events in batches
4. Aggregates counts over a time window (e.g., 1 minute)
5. Batch updates the database: UPDATE videos SET views = views + 12847 WHERE id = 'v_001'

Near-Real-Time vs Exact Counts

  • Near-real-time counter (Redis): Increment a counter in Redis for display purposes. This is fast but may lose counts if Redis restarts. Used for the view count shown to users (slightly stale is fine).
  • Exact counter (Kafka + DB): The Kafka pipeline provides at-least-once delivery. Use deduplication (track processed event IDs) to achieve exactly-once counting for analytics and monetization.
1
2
3
4
5
6
7
8
9
10
11
12
class ViewCounter:
    def __init__(self, redis_client, kafka_producer):
        self.redis = redis_client
        self.kafka = kafka_producer

    def record_view(self, video_id, user_id, timestamp):
        # Fast path: increment Redis counter for display
        self.redis.incr(f"views:{video_id}")

        # Durable path: publish to Kafka for exact counting
        event = {"video_id": video_id, "user_id": user_id, "ts": timestamp}
        self.kafka.produce("video-views", key=video_id, value=event)

Fraud Detection

Not all views are legitimate. Bots, click farms, and auto-refreshing pages inflate view counts. Filter fraudulent views by:

  • Rate limiting views per IP address.
  • Requiring a minimum watch duration (e.g., 30 seconds) before counting a view.
  • Machine learning models that detect suspicious patterns (same IP, sequential videos, zero engagement).

Recommendation Feed

The recommendation system determines what videos appear on the homepage and in the "Up Next" sidebar. This is a complex ML system, but at the architecture level:

1
2
3
4
5
6
7
8
9
10
11
12
User Request → Recommendation Service
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
  Candidate      Ranking       Filtering
  Generation     Model         (age-gate,
  (1000s of      (score each   region block,
   candidates)    candidate)    etc.)
        │            │            │
        └────────────┼────────────┘
                     ▼
              Top N recommendations

Candidate generation uses collaborative filtering ("users who watched X also watched Y") and content-based similarity. Ranking uses a neural network that considers watch history, engagement signals, video freshness, and creator quality. The results are cached per user and refreshed periodically.

High-Level Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
              ┌──────────────┐
 Clients ────>│ Load Balancer│
              └──────┬───────┘
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐
  │ Upload   │ │ Streaming│ │  API     │
  │ Service  │ │ Service  │ │ Service  │
  └────┬─────┘ └────┬─────┘ └────┬─────┘
       │             │             │
       ▼             ▼             ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐
  │ Object   │ │   CDN    │ │ Video DB │
  │ Storage  │ │ (global) │ │ + Redis  │
  │ (raw)    │ │          │ │          │
  └──────────┘ └──────────┘ └──────────┘
       │
       ▼
  ┌──────────┐
  │Transcoder│
  │ Workers  │
  └──────────┘

Interview Tips

  • Separate the upload pipeline from the streaming path. These are fundamentally different workloads. Upload is write-heavy, async, and compute-intensive. Streaming is read-heavy, latency-sensitive, and bandwidth-intensive.
  • Explain adaptive bitrate streaming. Walk through how HLS/DASH works: manifest file, segmented video, quality switching. This is the core technology that makes modern video streaming work.
  • Discuss CDN economics. Mention that bandwidth is the dominant cost for video platforms. CDN caching, especially for popular content, is what makes the economics viable.
  • Address the transcoding bottleneck. Explain parallel transcoding by resolution and by time segment. Mention that transcoding is often the longest step (hours for long videos) and how a job queue manages this.
  • Distinguish approximate vs exact view counting. Redis for display, Kafka pipeline for billing and analytics. This shows you understand the trade-off between speed and accuracy.
  • Mention video start time. The most critical user experience metric is time-to-first-frame. Prefetching the first few segments and caching them aggressively on the CDN minimizes this.