Design Twitter Feed

Scale: 300M DAU, 500M tweets/day, 28B feed reads/day. Read-heavy (read:write ≈ 100:1).

Requirements Clarification

Functional:

Post tweet (text, images, links)
Follow / unfollow users
View home timeline (tweets from followed users, reverse chronological)
View user timeline (a user's own tweets)

Non-functional:

Timeline load < 200ms
Eventual consistency acceptable (slight delay in seeing new tweets)
High availability > 99.9%

Core Problem: Fan-out

When user A (10M followers) tweets → how do 10M followers see it?

Two strategies:

Fan-out on Write (Push model)

User tweets → write tweet to DB → immediately push to all followers' feed caches
Timeline read → just read from Redis cache (fast, O(1))

Pros: reads are instant
Cons: celebrity with 50M followers = 50M writes on one tweet. Write amplification.

Fan-out on Read (Pull model)

User tweets → write to own tweets table only
Timeline read → fetch IDs of all followed users → fetch their recent tweets → merge sort

Pros: no write amplification
Cons: read is expensive — if following 2000 people = 2000 DB lookups per feed load

Twitter's actual approach: Hybrid

Regular users (< 1M followers): fan-out on write
Celebrity users (> 1M followers): fan-out on read, injected at read time

Timeline read:
1. Load pre-computed feed from Redis (fan-out on write users)
2. Fetch recent tweets from celebrities you follow (fan-out on read)
3. Merge and sort

High-Level Architecture

[Client]
    │
    ├── Post tweet:
    │   Client → API Gateway → Tweet Service
    │                       → Write to Tweets DB (Cassandra/DynamoDB)
    │                       → Fanout Service → Timeline Cache (Redis)
    │                                       → Notification Service
    │
    └── Read timeline:
        Client → API Gateway → Timeline Service
                            → Redis timeline cache (pre-built feeds)
                            → Merge with celebrity tweets (on read)
                            → Return sorted feed

Data Models

tweet_id    (snowflake ID — time-sortable, globally unique)
user_id
content     (280 chars)
media_ids   []
created_at
like_count  (approximate, Redis counter)
retweet_count

Timeline Cache (Redis)

Key: timeline:{user_id}
Value: sorted set of tweet_ids (score = timestamp)
Max size: keep last 800 tweets per user (Twitter's actual limit)

On new tweet from followed user:
  ZADD timeline:{follower_id} {timestamp} {tweet_id}
  ZREMRANGEBYRANK timeline:{follower_id} 0 -801  # keep last 800

Follow graph

user_id → [followed_user_ids]
Stored in: graph DB or Redis SET
followers:{user_id} → SET of follower_ids (used for fan-out)
following:{user_id} → SET of following_ids (used for feed reads)

Snowflake ID (Twitter's tweet ID)

64-bit integer:
[41 bits: timestamp ms] [10 bits: machine ID] [12 bits: sequence]

→ Time-sortable (lexicographic sort = chronological)
→ No central coordination needed
→ 1M IDs/sec per machine

Key Design Decisions

Storage: Cassandra/DynamoDB for tweets

Write-heavy, time-series data, horizontal scale.
Partition key: user_id. Sort key: tweet_id (time-sortable snowflake).
→ Efficient user_timeline queries: all tweets by user, reverse chronological.

Like counts: Redis + batch flush

Don't write to DB on every like (thundering herd on viral tweets).
INCR tweet:likes:{tweet_id} in Redis.
Flush to DB every 30s.

Media: Separate CDN

Images/videos stored in S3 + served via CDN.
tweet only stores media_id, not URL (URL generated on read).

Trade-offs

Decision	Choice	Why
Fan-out	Hybrid (write for regular, read for celebrities)	Balance write amplification vs read latency
Timeline storage	Redis sorted set	O(log N) insert, O(log N) range query
Tweet storage	Cassandra	High write throughput, time-series pattern
Tweet IDs	Snowflake	Time-sortable, no central coordinator

Failure Modes

Redis cache miss → rebuild from Cassandra (expensive, but rare)
Fan-out service lag → slight delay in feed (eventual consistency — acceptable)
Cassandra node down → replicas serve reads (RF=3)

[[Caching & Redis]] — timeline cache, like counts
[[Message Queues & Kafka]] — fan-out queue
[[Consistent Hashing]] — Cassandra partitioning

Design Twitter Feed (Social Media Feed)