Chapter 1: From 0 to Millions

A standard architecture that may scale, but not much is consisted of:

DNS, which is a third-party software that basically gives us the beautiful domain like (google.com). Once the client visits our website through the domain name (google.com) they first go to the DNS, which as a response gives as a Public IP address to the machine where our code is running in.
The frontend, backend, database are hosted on one machine, which serve the client requests and provide them with responses. HTTP is used to transfer requests from the client to the server and vice verse.

That’s great and all. This approach can scale to a certain extend, but not very much.

Scaling:

we can scale vertically and horizontally. To scale vertically means to have a single server/pod running and when the traffic to it increases, we increase the CPU and Storage that instance. That’s quite limiting as we cannot increase these infinitely. This approach introduces the concept of Single-Point-Of-Failure, where we have only one instance running and if something happens to it, the whole app will crash. The PRO of going with that approach is simplicity of configuration and maintenance, but with the CON of sacrificing a lot on scalability and availability.
on the other hand, horizontal scaling is where we add more servers to and distribute the traffic to multiple machines. This way we can have multiple more-lightweight machines which run our app. This provides more reliability & availability as if one server goes down, the traffic can get routed to another, available one. The problem with horizontal scaling is the complexity of configuration and maintenance.

Horizontal scalability introduces some complexities, which come as a by-product of having multiple instances. What if a DB becomes too big and we need to scale it?

We can replicate it or we can shard it. Sharding means splitting the database into multiple smaller databases based on a given key. To shard it we can leverage different algorithms to shard a give key (lets say userId is the key we will shard based up on). A simple algo is to shard with modulo (i.e. hashUserId % 4) and whatever is the result, the data goes to that shard.
Of course the above sharding algo isn’t best as we may end up with some DB instances which get more traffic than others (celebrity posts will get more interactions than normal people). So we can leverage consistent hashing to more evenly spread the load across different DBs.
The split of DB into multiple smaller ones has its benefits as well, in that we can scale them independently from one and another.

Load Balancers:

as mentioned above, to achieve horizontal scalability, there needs to be something that routes the incoming traffic to the correct instances/machines. That’s where the Load Balancer shines as it acts as an orchestrator of this traffic.
Load balancers use different algorithms/techniques to route the incoming traffic to the internal servers. One approach is the Round-Robin approach where the LB forwards the each requests to the next server in the queue and so-on.. Other LB approaches of distributing traffic are: based on pods availability or based on custom configuration.
In most use-cases, LBs stand between the clients and the servers, acting as a proxy between them. Once the client gets the public IP from the DNS, it routes it to the LB. The LB communicates to the servers only via private IPs, so it does the due-dilligence of mapping the public IPs to the private IP of the instance it decides to route the request to.

Databases:

We’ve scaled our app with LBs and adding more servers, but we also need to scale the DB. To do so, independently from the server, we can extract the DB onto its own instance.

The backend connects to the Database and based on the type of DB, READs and WRITEs data to it