Design YouTube / Netflix
Design YouTube / Netflix
Scale: 500hr video uploaded/minute (YouTube). 200M+ daily active users (Netflix).
Requirements Clarification
Functional:
- Upload video
- Stream video (adaptive bitrate)
- Search videos
- Recommendations (out of scope unless asked)
Non-functional:
- High availability (99.99%)
- Low latency streaming globally
- Eventual consistency for view counts acceptable
- Videos stored forever (YouTube) / for license period (Netflix)
Scale Estimates
Upload: 500 hr/min → ~30,000 videos/min
Storage: 1hr video ≈ 1GB original → transcoded to 5 formats ≈ 5GB
1M videos * 5GB = 5 PB storage
Read/write ratio: heavily read-skewed (1000:1)
High-Level Architecture
[Client]
│
├── Upload flow:
│ Client → API Gateway → Upload Service → Raw Storage (S3)
│ → Transcoding Queue (SQS/Kafka)
│ → Transcoding Workers (EC2 fleet)
│ → CDN Storage (multiple resolutions)
│ → Metadata DB (title, tags, uploader)
│
└── Stream flow:
Client → CDN (nearest PoP) → if miss → Origin (S3)
Client requests manifest → gets chunk URLs → fetches chunks
Key Components Deep Dive
Video Transcoding Pipeline
Original video (S3)
→ Message queue (SQS/Kafka) — decouple upload from processing
→ Transcoding workers (EC2/ECS fleet — horizontal scale)
→ Output: 360p, 480p, 720p, 1080p, 4K versions
→ Stored on CDN-origin (S3 + CloudFront / Akamai)
DAG-based processing: thumbnail extraction, audio separation,
captioning (Whisper) can run in parallel
Why queue? Upload is fast; transcoding is slow (minutes). Don't block user.
Adaptive Bitrate Streaming (ABR)
HLS / MPEG-DASH protocol:
1. Video split into 2-10 sec chunks
2. Manifest file (.m3u8) lists all chunk URLs per quality
3. Player monitors bandwidth → switches quality per chunk
4. Smooth quality transitions, no buffering on quality switch
Client algorithm:
- Measures last N chunk download speeds
- If bandwidth drops → request lower quality chunk next
- If bandwidth rises → request higher quality
CDN Strategy
Edge PoP (Point of Presence) in 200+ cities:
- Popular videos: pre-cached at edge (cache-aside with long TTL)
- Unpopular/new: cache on first request, propagate to nearby PoPs
- Geographic routing: DNS resolves to nearest PoP
Cache key: video_id + quality + chunk_number
TTL: 24-48hr (videos don't change after upload)
Metadata Storage
Videos table (write once, read many):
- PostgreSQL: video_id, title, description, uploader_id, status
- Elasticsearch: full-text search on title, tags, transcripts
View counts:
- Don't write to DB on every view (thundering herd)
- Increment in Redis counter → batch flush to DB every minute
- Approximate counts acceptable (YouTube shows "1.2M views")
Trade-offs
| Decision | Choice | Why |
|---|---|---|
| Storage | S3 + CDN | Cheap, globally distributed, managed |
| Transcoding | Async queue | Decouple, scale workers independently |
| Streaming protocol | HLS/DASH | ABR built-in, universal support |
| View counts | Redis + batch flush | Avoid write thundering herd |
| Search | Elasticsearch | Full-text, relevance ranking |
Failure Modes
- Transcoding worker crashes → message returns to queue, retried
- CDN PoP down → DNS failover to next PoP
- Metadata DB down → read from read replica
Related
- [[Caching & Redis]] — CDN caching, view count pattern
- [[Message Queues & Kafka]] — transcoding queue
- [[AWS/S3]] — storage layer