System Design Notes Arpit Bhayani | My Portfolio

How to Approach System Design?

SD is extremely practical and there is a structured way to tackle the situations.
Take Baby Steps, no matter what!

Understand the problem statement
- Without having a through understanding of the problem at hand. we would easily digress
- Always ask the constrains and core features so we don't digress from it.
Break problem down into the components
- Eg. Design Facebook - Auth, Notification, Feed - Components / Features
- Do not create components for the sake of it
- Create components which you know are must
Dissect each component
- Eg. Feed might have generator, aggregator, webserver ![[System_design_notes.excalidraw#^frame=Mvz77Wnc|Feed Generator]]
For Each subcompoenent look into
1. Database and Caching
2. Scaling & Fault Tolerance
3. Async Processing (Delegation)
4. Communication (How each subcomponent will talk with each other like TCP, UDP, HTTP etc.)
Add More subcomponent if needed
1. Understand the scope
2. Decide how other component will talk to this new one
3. Decide on 4 above factors for this new component ![[System_design_notes.excalidraw#^clippedframe=yByuS1he|Feed Generator Deepdive]]

Every system is infinitely buildable hence when to stop the evolution is important.

You broker system into components.
Each component has a clear set of responsibilities and they are mutually exclusive.
1. In Feed the web-server -> servers over the HTTP.
2. Feed Generator -> Pulls data from multiple services and puts into the db.
3. Feed Aggregator -> combines candidate items fetched from generators, filter out redundant, ranks and create a final consumable feed
For each componenet you've slight technical details figured out
1. Database and Caching
2. Scaling and Fault tolerance
3. Async Processing ( Delegation)
4. Communication
Each component (in isolation) is
1. Scalable - horizontally
2. Fault tolerant - plan for recovery(to a stable state) in case of failure
3. Available - Component functions even when some component fails

History of Relational Databases
- Everything revolutionary starts with financial applications.
  - Like computers first did accounting which uses ledgers which stored data as rows and columns
- Databases were developed to support accounting.
- Hence it's key properties were
  1. Data Consistency
  2. Data Durability
  3. Data integrity
  4. Constraints
  5. All in one place
Because of these reasons, relation databases provides Transcations, Transactions makes our system correct no matter which operation we performe.
Hence relation database provides ACID property
- A - Atomicity
- C - Consistency
- I - Isolation
- D - Durability

All systems within a transaction takes effect or none
Eg. Publish post and increase total post count
- Start Transaction
  - Inset into posts
  - Update stats set total_posts = total_post + 1
- Commit

Data will never go incorrect, no matter what
Constraints, Cascades, triggers
Eg.
- Foreign key checks do not allow you to delete parent if child exists - Foreign Key constraints
- If parent is deleted all childs should be deleted - Cascades
- On update you want to update some column or call functions - Triggers

When multiple transactions are executing parallely, the isolation level determines how much changes of one transaction are visible to other
In most cases no need to change the isolation level, when we start the mysql server we defaults to the repeatable reads.
There 4 standard isolation levels

Repeatable Reads
- Consistent Reads within same transaction
- Even if one transaction committed other transaction could not see the changes if the value is already read
Read Committed
- Reads within same transaction always read fresh value, If one transaction committed changes then next read from another transaction gets it immediately.
- Con: Multiple reads within same transaction are inconsistent
Read Uncommitted
- Reads even uncommitted values from other transactions.
- Con: Dirty Reads
Serializable
- Every read is a locking read (this depends on sql engines)
- While one transaction read other will have to wait for the first once to complete.

We pick the relational databases for relations and acid.

Below scaling techniques are applicable to both relation and non-relational databases.

This type of scaling we increase the compute of given node by adding more ram, cpu or storage and it will scale our database.
This requires small downtime in cloud to reboot the system.
This gives ability to handle scale initially
But this type has physical hardware limitation

This means we will add more nodes to our system and it will distribute the incoming load giving scale
We can add more nodes as per the scale increases
There are multiple techniques based on read or write loads

If our database has higher read load compare to writes like 90:10 or 80:20
We can move the reads to other databases called as replicas so our main/master database is free to do writes
Here we will add more nodes which are read only
Our API servers should know which DB to connect to get things done ![[System Design/My Notes/System_design_notes.excalidraw.md#^clippedframe=ESgVPZqA|100%]]

When change on one database needs to be sent to replica to maintain consistency.
There are two modes of replication

Sync Replication
- When API tries to write to master db will also write to the replica db and then API will get success for the transaction.
- This will provide strong consistency, zero replication lag but result in slower writes.
Asynchronous Replication
- When API write it will write to only master db and replica db will eventually sync with master at periodic interval.
- This is most used in real world.
- This will provide eventual consistency, some replication lag but result in faster writes.

This is another type of scaling where one node cannot handle all the write operations, we split it into multiple exclusive subsets.
Write for a particular row or document will go to one particular shard.
This way we scale our overall database load.
Here Each shard is independent from each other there is no replication between them
API server needs to know whom to connect to write or read particular data.
Some databases has a proxy that takes care of routing.
Each shard can have its own replica.