system design system design notes arpit bhayani
How to Approach System Design?
- SD is extremely practical and there is a structured way to tackle the situations.
- Take Baby Steps, no matter what!
-
Understand the problem statement
- Without having a through understanding of the problem at hand. we would easily digress
- Always ask the constrains and core features so we don't digress from it.
-
Break problem down into the components
- Eg. Design Facebook - Auth, Notification, Feed - Components / Features
- Do not create components for the sake of it
- Create components which you know are must
-
Dissect each component
- Eg. Feed might have generator, aggregator, webserver ![[System_design_notes.excalidraw#^frame=Mvz77Wnc|Feed Generator]]
-
For Each subcompoenent look into
- Database and Caching
- Scaling & Fault Tolerance
- Async Processing (Delegation)
- Communication (How each subcomponent will talk with each other like TCP, UDP, HTTP etc.)
-
Add More subcomponent if needed
- Understand the scope
- Decide how other component will talk to this new one
- Decide on 4 above factors for this new component ![[System_design_notes.excalidraw#^clippedframe=yByuS1he|Feed Generator Deepdive]]
How do you know that you have build a good system?
- Every system is infinitely buildable hence when to stop the evolution is important.
- You broker system into components.
- Each component has a clear set of responsibilities and they are mutually exclusive.
- In Feed the web-server -> servers over the HTTP.
- Feed Generator -> Pulls data from multiple services and puts into the db.
- Feed Aggregator -> combines candidate items fetched from generators, filter out redundant, ranks and create a final consumable feed
- For each componenet you've slight technical details figured out
- Database and Caching
- Scaling and Fault tolerance
- Async Processing ( Delegation)
- Communication
- Each component (in isolation) is
- Scalable - horizontally
- Fault tolerant - plan for recovery(to a stable state) in case of failure
- Available - Component functions even when some component fails
Relational Database
- History of Relational Databases
- Everything revolutionary starts with financial applications.
- Like computers first did accounting which uses ledgers which stored data as rows and columns
- Databases were developed to support accounting.
- Hence it's key properties were
- Data Consistency
- Data Durability
- Data integrity
- Constraints
- All in one place
- Everything revolutionary starts with financial applications.
- Because of these reasons, relation databases provides Transcations, Transactions makes our system correct no matter which operation we performe.
- Hence relation database provides ACID property
- A - Atomicity
- C - Consistency
- I - Isolation
- D - Durability
Atomicity
- All systems within a transaction takes effect or none
- Eg. Publish post and increase total post count
- Start Transaction
- Inset into posts
- Update stats set total_posts = total_post + 1
- Commit
- Start Transaction
Consistency
- Data will never go incorrect, no matter what
- Constraints, Cascades, triggers
- Eg.
- Foreign key checks do not allow you to delete parent if child exists - Foreign Key constraints
- If parent is deleted all childs should be deleted - Cascades
- On update you want to update some column or call functions - Triggers
Durability
- When transaction commit, the changes outlives the outage.
Isolation
- When multiple transactions are executing parallely, the isolation level determines how much changes of one transaction are visible to other
- In most cases no need to change the isolation level, when we start the mysql server we defaults to the repeatable reads.
- There 4 standard isolation levels
- Repeatable Reads
- Consistent Reads within same transaction
- Even if one transaction committed other transaction could not see the changes if the value is already read
- Read Committed
- Reads within same transaction always read fresh value, If one transaction committed changes then next read from another transaction gets it immediately.
- Con: Multiple reads within same transaction are inconsistent
- Read Uncommitted
- Reads even uncommitted values from other transactions.
- Con: Dirty Reads
- Serializable
- Every read is a locking read (this depends on sql engines)
- While one transaction read other will have to wait for the first once to complete.
We pick the relational databases for relations and acid.
Database Scaling
- Below scaling techniques are applicable to both relation and non-relational databases.
Vertical Scaling
- This type of scaling we increase the compute of given node by adding more ram, cpu or storage and it will scale our database.
- This requires small downtime in cloud to reboot the system.
- This gives ability to handle scale initially
- But this type has physical hardware limitation
Horizontal Scaling
- This means we will add more nodes to our system and it will distribute the incoming load giving scale
- We can add more nodes as per the scale increases
- There are multiple techniques based on read or write loads
Read Replicas
- If our database has higher read load compare to writes like 90:10 or 80:20
- We can move the reads to other databases called as replicas so our main/master database is free to do writes
- Here we will add more nodes which are read only
- Our API servers should know which DB to connect to get things done ![[System Design/My Notes/System_design_notes.excalidraw.md#^clippedframe=ESgVPZqA|100%]]
Replication
- When change on one database needs to be sent to replica to maintain consistency.
- There are two modes of replication
- Sync Replication
- When API tries to write to master db will also write to the replica db and then API will get success for the transaction.
- This will provide strong consistency, zero replication lag but result in slower writes.
- Asynchronous Replication
- When API write it will write to only master db and replica db will eventually sync with master at periodic interval.
- This is most used in real world.
- This will provide eventual consistency, some replication lag but result in faster writes.
Sharding
- This is another type of scaling where one node cannot handle all the write operations, we split it into multiple exclusive subsets.
- Write for a particular row or document will go to one particular shard.
- This way we scale our overall database load.
- Here Each shard is independent from each other there is no replication between them
- API server needs to know whom to connect to write or read particular data.
- Some databases has a proxy that takes care of routing.
- Each shard can have its own replica.