Design YouTube / Netflix

Scale: 500hr video uploaded/minute (YouTube). 200M+ daily active users (Netflix).

Requirements Clarification

Functional:

Upload video
Stream video (adaptive bitrate)
Search videos
Recommendations (out of scope unless asked)

Non-functional:

High availability (99.99%)
Low latency streaming globally
Eventual consistency for view counts acceptable
Videos stored forever (YouTube) / for license period (Netflix)

Scale Estimates

Upload: 500 hr/min → ~30,000 videos/min
Storage: 1hr video ≈ 1GB original → transcoded to 5 formats ≈ 5GB
1M videos * 5GB = 5 PB storage
Read/write ratio: heavily read-skewed (1000:1)

High-Level Architecture

[Client]
    │
    ├── Upload flow:
    │   Client → API Gateway → Upload Service → Raw Storage (S3)
    │                                       → Transcoding Queue (SQS/Kafka)
    │                                       → Transcoding Workers (EC2 fleet)
    │                                       → CDN Storage (multiple resolutions)
    │                                       → Metadata DB (title, tags, uploader)
    │
    └── Stream flow:
        Client → CDN (nearest PoP) → if miss → Origin (S3)
        Client requests manifest → gets chunk URLs → fetches chunks

Key Components Deep Dive

Video Transcoding Pipeline

Original video (S3)
    → Message queue (SQS/Kafka) — decouple upload from processing
    → Transcoding workers (EC2/ECS fleet — horizontal scale)
    → Output: 360p, 480p, 720p, 1080p, 4K versions
    → Stored on CDN-origin (S3 + CloudFront / Akamai)

DAG-based processing: thumbnail extraction, audio separation, 
captioning (Whisper) can run in parallel

Why queue? Upload is fast; transcoding is slow (minutes). Don't block user.

Adaptive Bitrate Streaming (ABR)

HLS / MPEG-DASH protocol:
1. Video split into 2-10 sec chunks
2. Manifest file (.m3u8) lists all chunk URLs per quality
3. Player monitors bandwidth → switches quality per chunk
4. Smooth quality transitions, no buffering on quality switch

Client algorithm:
- Measures last N chunk download speeds
- If bandwidth drops → request lower quality chunk next
- If bandwidth rises → request higher quality

CDN Strategy

Edge PoP (Point of Presence) in 200+ cities:
- Popular videos: pre-cached at edge (cache-aside with long TTL)
- Unpopular/new: cache on first request, propagate to nearby PoPs
- Geographic routing: DNS resolves to nearest PoP

Cache key: video_id + quality + chunk_number
TTL: 24-48hr (videos don't change after upload)

Metadata Storage

Videos table (write once, read many):
- PostgreSQL: video_id, title, description, uploader_id, status
- Elasticsearch: full-text search on title, tags, transcripts

View counts:
- Don't write to DB on every view (thundering herd)
- Increment in Redis counter → batch flush to DB every minute
- Approximate counts acceptable (YouTube shows "1.2M views")

Trade-offs

Decision	Choice	Why
Storage	S3 + CDN	Cheap, globally distributed, managed
Transcoding	Async queue	Decouple, scale workers independently
Streaming protocol	HLS/DASH	ABR built-in, universal support
View counts	Redis + batch flush	Avoid write thundering herd
Search	Elasticsearch	Full-text, relevance ranking

Failure Modes

Transcoding worker crashes → message returns to queue, retried
CDN PoP down → DNS failover to next PoP
Metadata DB down → read from read replica

[[Caching & Redis]] — CDN caching, view count pattern
[[Message Queues & Kafka]] — transcoding queue
[[AWS/S3]] — storage layer

Design YouTube / Netflix

Design YouTube / Netflix

Requirements Clarification

Scale Estimates

High-Level Architecture

Key Components Deep Dive

Video Transcoding Pipeline

Adaptive Bitrate Streaming (ABR)

CDN Strategy

Metadata Storage

Trade-offs

Failure Modes

Related