Crack System Design Interviews with This One Database Cheat Sheet
Six databases. One decision framework. Zero fumbling when the interviewer asks “which database would you use here?”
Most system design interviews don’t fail on algorithms. They fail on the moment the interviewer points at a box in your architecture diagram and asks: “Why that database?” A vague answer “it scales well” is the fastest way to lose the room.

Ai Generated Image
The good news: there are really only six databases you need to know deeply. And once you internalize when each one shines, the decision becomes almost mechanical.
The Six Databases That Cover 90% of Interview Scenarios
Here’s the landscape, compressed:
┌─────────────┬──────────────┬────────────────┬──────────────────────┐
│ Database │ Type │ Consistency │ Best Use Case │
├─────────────┼──────────────┼────────────────┼──────────────────────┤
│ MySQL │ Relational │ Strong │ Payments, Orders │
│ MongoDB │ Document │ Strong │ JSON App Data │
│ Cassandra │ Wide Column │ Tunable/Even. │ Logs, Metrics │
│ DynamoDB │ Key-Value │ Even./Strong │ AWS Scale Apps │
│ Redis │ In-Memory KV │ Eventual │ Cache, Sessions │
│ Neo4j │ Graph │ Strong │ Social, Reco. Graphs │
└─────────────┴──────────────┴────────────────┴──────────────────────┘
Memorize that table. Then forget it. What you actually need is the reasoning behind each row.
The Quick Memory Rule (Seriously, Tattoo This)
Orders → MySQL
JSON data → MongoDB
Logs → Cassandra
AWS scale → DynamoDB
Speed → Redis
Relations → Neo4j
This isn’t just a mnemonic. Each pairing maps to a specific set of guarantees that make one database the right tool and the others the wrong one.
MySQL When Correctness Is Non-Negotiable
MySQL is a relational database with strong consistency on the primary. Master-replica async replication gives you read scalability, but writes go through one place.
Use it when your data has hard relationships and you cannot afford a dirty read financial transactions, order management, inventory.
┌─────────┐ writes ┌──────────────┐
│ App │ ─────────────→ │ MySQL Primary│
│ Server │ ←───────────── │ │
└─────────┘ reads └──────┬───────┘
│ async replication
┌──────▼───────┐
│ MySQL Replica│ ← read traffic
└──────────────┘
The failure mode to know: a replica can lag. For anything requiring the freshest read (e.g. checking account balance before a debit), always read from primary.
MongoDB When Your Schema Wants to Breathe
MongoDB stores documents (BSON/JSON). Strong consistency on the primary, replica sets for redundancy, and auto-sharding.
The sweet spot: user profiles, product catalogs, CMS content anything where the shape of data varies across records or evolves quickly.
// Flexible document — no schema migration needed when fields are added
{
_id: ObjectId("..."),
userId: "u_8821",
preferences: { theme: "dark", notifications: true },
recentSearches: ["system design", "redis"]
// new fields can be added without ALTER TABLE
}
Don’t use it when you have heavy joins or rigid transactional integrity requirements. That’s MySQL’s job.
Cassandra Built for Write-Heavy, Time-Series Workloads
Cassandra’s write throughput is exceptional because writes are append-only to a commit log. Peer-to-peer replication means no single point of failure. Availability is very high. Sharding is native.
Use it for: application logs, IoT telemetry, metrics pipelines anything where you write constantly and query by time range.
Write path:
App → Coordinator Node → Partition Key Hash
│
┌────────────────┼────────────────┐
▼ ▼ ▼
Node A Node B Node C
(replica 1) (replica 2) (replica 3)
The trade-off: Cassandra’s eventual consistency means you may read stale data. Tunable consistency (QUORUM, ONE, ALL) lets you dial this, but there's no free lunch higher consistency = higher latency.
DynamoDB When You’re on AWS and Need Infinite Scale
DynamoDB is a managed key-value and document store. Multi-AZ replication, fully managed sharding, and extremely high availability make it the go-to for AWS-native apps that need to scale without operational overhead.
The catch: your access patterns must be defined upfront. Bad partition key design = hot partitions = degraded performance.
// Access pattern must match key design
Table: UserSessions
PK: userId → all sessions for a user
SK: sessionId → individual session lookup
// ✓ Good: fetch all sessions for user
query(PK = "u_8821")
// ✗ Expensive: scan by session expiry without GSI
scan(filter: expiresAt < now())
Redis Cache First, Everything Else Second
Redis is in-memory, which means sub-millisecond reads and writes. Leader-follower replication. Cluster mode for sharding.
Use it as a cache layer in front of any database, or for session storage, rate limiting, and leaderboards.
┌─────────┐ ┌───────────┐ ┌──────────┐
│ Client │──────▶│ Redis │──────▶│ MySQL / │
│ │◀──────│ (Cache) │ miss │ Mongo │
└─────────┘ hit └───────────┘ └──────────┘
Cache hit: ~0.5ms
DB read: ~10–50ms
The thing most candidates miss: Redis is eventual consistency. Never use it as a source of truth for anything that can’t tolerate data loss on restart (without persistence configured).
Neo4j When Relationships Are the Data
Neo4j is a graph database. Queries traverse relationships natively, which is what makes it fast for “friends of friends,” recommendation engines, and fraud detection scenarios where a relational join chain would be 6+ levels deep.
(User)-[:FOLLOWS]->(User)-[:LIKES]->(Post)-[:TAGGED]->(Topic)
// Cypher query: recommend posts liked by people I follow
MATCH (me:User {id: "u_1"})-[:FOLLOWS]->(friend)
-[:LIKES]->(post:Post)
WHERE NOT (me)-[:SEEN]->(post)
RETURN post LIMIT 10
The trade-off: write throughput is medium, sharding is limited. Neo4j is not a general-purpose database reach for it specifically when the traversal problem is central to your product.
Putting It Together in a System Design Interview
When the interviewer asks about databases, walk through three questions:
- What is the access pattern? Point lookup vs. range scan vs. graph traversal.
- What consistency do I need? Strong (financial) vs. eventual (social feed).
- What’s the scale? Single-region reads vs. multi-region writes.
Map the answers to the table above. The choice becomes defensible because it’s grounded in trade-offs not preference.
The Real Takeaway
Every database in this list exists because a previous one couldn’t solve a specific problem at scale. MySQL couldn’t handle unstructured data. Relational stores couldn’t handle write-heavy time-series. Caches existed because every database was too slow for hot data.
Knowing why each one was built is more valuable in an interview than memorizing benchmarks. The interviewer isn’t testing recall. They’re testing whether you can reason under constraints.