Back to Notes

Design YouTube / Netflix

Design YouTube / Netflix

Scale: 500hr video uploaded/minute (YouTube). 200M+ daily active users (Netflix).


Requirements Clarification

Functional:

  • Upload video
  • Stream video (adaptive bitrate)
  • Search videos
  • Recommendations (out of scope unless asked)

Non-functional:

  • High availability (99.99%)
  • Low latency streaming globally
  • Eventual consistency for view counts acceptable
  • Videos stored forever (YouTube) / for license period (Netflix)

Scale Estimates

Upload: 500 hr/min → ~30,000 videos/min
Storage: 1hr video ≈ 1GB original → transcoded to 5 formats ≈ 5GB
1M videos * 5GB = 5 PB storage
Read/write ratio: heavily read-skewed (1000:1)

High-Level Architecture

[Client]
    │
    ├── Upload flow:
    │   Client → API Gateway → Upload Service → Raw Storage (S3)
    │                                       → Transcoding Queue (SQS/Kafka)
    │                                       → Transcoding Workers (EC2 fleet)
    │                                       → CDN Storage (multiple resolutions)
    │                                       → Metadata DB (title, tags, uploader)
    │
    └── Stream flow:
        Client → CDN (nearest PoP) → if miss → Origin (S3)
        Client requests manifest → gets chunk URLs → fetches chunks

Key Components Deep Dive

Video Transcoding Pipeline

Original video (S3)
    → Message queue (SQS/Kafka) — decouple upload from processing
    → Transcoding workers (EC2/ECS fleet — horizontal scale)
    → Output: 360p, 480p, 720p, 1080p, 4K versions
    → Stored on CDN-origin (S3 + CloudFront / Akamai)

DAG-based processing: thumbnail extraction, audio separation, 
captioning (Whisper) can run in parallel

Why queue? Upload is fast; transcoding is slow (minutes). Don't block user.

Adaptive Bitrate Streaming (ABR)

HLS / MPEG-DASH protocol:
1. Video split into 2-10 sec chunks
2. Manifest file (.m3u8) lists all chunk URLs per quality
3. Player monitors bandwidth → switches quality per chunk
4. Smooth quality transitions, no buffering on quality switch

Client algorithm:
- Measures last N chunk download speeds
- If bandwidth drops → request lower quality chunk next
- If bandwidth rises → request higher quality

CDN Strategy

Edge PoP (Point of Presence) in 200+ cities:
- Popular videos: pre-cached at edge (cache-aside with long TTL)
- Unpopular/new: cache on first request, propagate to nearby PoPs
- Geographic routing: DNS resolves to nearest PoP

Cache key: video_id + quality + chunk_number
TTL: 24-48hr (videos don't change after upload)

Metadata Storage

Videos table (write once, read many):
- PostgreSQL: video_id, title, description, uploader_id, status
- Elasticsearch: full-text search on title, tags, transcripts

View counts:
- Don't write to DB on every view (thundering herd)
- Increment in Redis counter → batch flush to DB every minute
- Approximate counts acceptable (YouTube shows "1.2M views")

Trade-offs

DecisionChoiceWhy
StorageS3 + CDNCheap, globally distributed, managed
TranscodingAsync queueDecouple, scale workers independently
Streaming protocolHLS/DASHABR built-in, universal support
View countsRedis + batch flushAvoid write thundering herd
SearchElasticsearchFull-text, relevance ranking

Failure Modes

  • Transcoding worker crashes → message returns to queue, retried
  • CDN PoP down → DNS failover to next PoP
  • Metadata DB down → read from read replica

Related

  • [[Caching & Redis]] — CDN caching, view count pattern
  • [[Message Queues & Kafka]] — transcoding queue
  • [[AWS/S3]] — storage layer