VOOZH about

URL: https://dev.to/jaspreet_singh_86ae1740ac/how-systems-scale-from-0-to-100-million-users-3nm

⇱ HLD Fundamentals #4: How Systems Scale: From 0 to 100 Million Users - DEV Community


One of the most common system design interview questions is:

"How would you scale a web application from 100 users to 100 million users?"

The answer is rarely a single technology. Instead, systems evolve through multiple stages, with each stage solving a specific bottleneck.

This article walks through the typical evolution of a scalable system and explains why, how, and when each component is introduced.


1. Single Server

Why Start Here?

Every application starts simple.

In the beginning:

  • Traffic is low
  • Development speed matters more than scalability
  • Infrastructure costs should be minimal

What Is It?

A single machine handles everything:

  • Frontend
  • Backend
  • Database
Users
 |
 v
Single Server
├── Application
└── Database

How Does It Work?

  1. User sends request.
  2. Application processes request.
  3. Database stores and retrieves data.
  4. Response is returned.

Everything happens on one machine.


Problem

As traffic grows:

  • CPU becomes overloaded
  • Memory becomes insufficient
  • Database competes with application for resources

A single server becomes a bottleneck.


Interview One-Liner

A single server architecture is simple and cost-effective but becomes a bottleneck as traffic and resource usage increase.


2. Application and Database Separation

Why Do We Need It?

The application and database have different workloads.

Application Server:

  • Uses CPU
  • Handles business logic

Database Server:

  • Uses memory and storage
  • Handles queries

Keeping them together causes resource contention.


How Does It Work?

Move the database to a separate machine.

Users
 |
 v
Application Server
 |
 v
Database Server

Benefits

  • Independent scaling
  • Better resource utilization
  • Improved performance

Example

Suppose an e-commerce website receives thousands of requests.

The application handles:

  • Authentication
  • Order processing
  • API responses

The database handles:

  • Product data
  • Orders
  • User information

Separating them prevents one workload from affecting the other.


Interview One-Liner

Separating the application and database allows each layer to scale independently and removes resource contention.


3. Load Balancer and Multiple Application Servers

Why Do We Need It?

Eventually one application server is not enough.

Problems:

  • High CPU utilization
  • Increased response time
  • Single point of failure

What Is a Load Balancer?

A load balancer distributes incoming traffic across multiple servers.

Instead of sending all traffic to one machine, requests are spread evenly.


How Does It Work?

 Users
 |
 v
 Load Balancer
 / \
 / \
 v v
 App Server App Server

Request Flow

  1. User sends request.
  2. Load balancer receives request.
  3. Load balancer selects a healthy server.
  4. Request gets processed.

Benefits

Scalability

Add more servers when traffic increases.

High Availability

If one server fails, traffic is routed elsewhere.

Better Performance

Traffic is distributed evenly.


Interview Follow-Up

Common Algorithms

  • Round Robin
  • Least Connections
  • IP Hashing

Interview One-Liner

A load balancer distributes traffic across multiple application servers, improving scalability, performance, and availability.


4. Database Replication

Why Do We Need It?

Even after scaling application servers, the database often becomes the next bottleneck.

Most systems have:

Reads >> Writes

For example:

  • Product browsing
  • Social media feeds
  • News websites

Users perform far more reads than writes.


What Is Database Replication?

Replication creates copies of the database.

 Primary DB
 |
 ------------------------
 | |
 v v
 Replica DB Replica DB

How Does It Work?

Primary Database

Handles:

  • INSERT
  • UPDATE
  • DELETE

Replica Databases

Handle:

  • SELECT queries

Data is copied from primary to replicas.


Example

Amazon Product Page:

Every user viewing products performs reads.

Only a few users are updating inventory.

Reads are served by replicas.

Writes go to primary.


Benefits

  • Increased read capacity
  • Better performance
  • Improved availability

Challenges

Replication Lag

Replica may be slightly behind primary.

This leads to eventual consistency.


Interview One-Liner

Database replication scales read traffic by creating read-only replicas while keeping writes on the primary database.


5. Cache Integration

Why Do We Need It?

Even replicated databases can become overloaded.

Many requests ask for the same data repeatedly.

Examples:

  • Trending products
  • User profiles
  • Popular posts

Reading from the database every time is expensive.


What Is a Cache?

A cache is a fast in-memory storage layer.

Popular examples:

  • Redis
  • Memcached

How Does It Work?

User
 |
 v
Application
 |
 v
Cache
 |
 |
 Hit? ---- Yes ----> Return Data
 |
 No
 |
 v
Database

Cache Hit

Data found in cache.

Response is very fast.

Cache Miss

Data not found.

Application fetches from database and stores it in cache.


Example

Instagram Profile Page

Instead of querying the database millions of times, frequently accessed profiles are served directly from Redis.


Benefits

  • Lower database load
  • Faster response times
  • Better user experience

Interview One-Liner

A cache stores frequently accessed data in memory to reduce database load and improve response time.


6. CDN (Content Delivery Network)

Why Do We Need It?

Not all content is dynamic.

Static content includes:

  • Images
  • CSS
  • JavaScript
  • Videos

Serving these files from the application server is inefficient.


What Is a CDN?

A CDN is a geographically distributed network of servers that stores cached copies of static content closer to users.


How Does It Work?

User (India)
 |
 v
Nearest CDN Edge
 |
 v
Origin Server

The CDN serves content from the nearest location.


Example

Netflix

Movie assets are stored across multiple CDN edge locations worldwide.

Users download content from nearby servers rather than a central location.


Benefits

  • Reduced latency
  • Faster content delivery
  • Reduced load on origin servers

Interview One-Liner

A CDN stores static content closer to users globally, reducing latency and improving performance.


7. Data Center Scaling

Why Do We Need It?

What happens if an entire region goes down?

Examples:

  • Power failure
  • Network outage
  • Natural disasters

A single data center becomes a major risk.


How Does It Work?

Deploy infrastructure across multiple regions.

 Global Load Balancer
 / \
 / \
 v v
 Data Center A Data Center B

Benefits

High Availability

If one region fails, another takes over.

Disaster Recovery

Business continues operating.

Lower Latency

Users connect to the nearest region.


Interview One-Liner

Multiple data centers improve fault tolerance, disaster recovery, and global availability.


8. Message Queues

Why Do We Need It?

Some tasks do not need immediate execution.

Examples:

  • Sending emails
  • Processing payments
  • Generating reports
  • Image resizing

Doing these tasks synchronously increases response time.


What Is a Message Queue?

A message queue allows services to communicate asynchronously.

Popular technologies:

  • RabbitMQ
  • Apache Kafka
  • Amazon SQS

How Does It Work?

User Request
 |
 v
Application
 |
 v
Message Queue
 |
 v
Worker Service

Flow

  1. Application places message in queue.
  2. User receives immediate response.
  3. Worker processes task later.

Example

Order Placement

Without Queue:

Create Order
Send Email
Generate Invoice
Update Analytics
Return Response

With Queue:

Create Order
Return Response

Background Workers:
- Send Email
- Generate Invoice
- Update Analytics

Benefits

  • Faster responses
  • Better scalability
  • Loose coupling

Interview One-Liner

Message queues enable asynchronous communication and help handle background processing efficiently.


9. Database Sharding

Why Do We Need It?

Eventually one database server cannot store or process all data.

Problems:

  • Massive table sizes
  • Slow queries
  • Storage limitations

Vertical scaling reaches its limit.


What Is Sharding?

Sharding splits data across multiple databases.

Each database stores only a subset of the data.


How Does It Work?

User-Based Sharding

Shard 1 → Users 1 - 1M

Shard 2 → Users 1M - 2M

Shard 3 → Users 2M - 3M

Request Flow

Application
 |
 v
Shard Router
 |
-------------------
| | |
v v v
DB1 DB2 DB3

The router determines where data belongs.


Example

Instagram

A single database cannot store billions of users efficiently.

Users are distributed across multiple shards.


Benefits

  • Horizontal scaling
  • Increased storage capacity
  • Better write throughput

Challenges

Cross-Shard Queries

Joining data across shards becomes difficult.

Rebalancing

Adding new shards requires moving data.


Interview One-Liner

Sharding horizontally partitions data across multiple databases to achieve virtually unlimited scalability.


Quick Revision

Component Why Introduced?
Single Server Simple starting point
App + DB Separation Independent scaling
Load Balancer Distribute traffic
Database Replication Scale reads
Cache Reduce database load
CDN Serve static content faster
Multiple Data Centers High availability
Message Queue Asynchronous processing
Database Sharding Scale massive datasets

The Complete Evolution

Single Server
 ↓
Separate App & Database
 ↓
Load Balancer + Multiple App Servers
 ↓
Database Replication
 ↓
Cache Layer
 ↓
CDN
 ↓
Multiple Data Centers
 ↓
Message Queues
 ↓
Database Sharding

Key Takeaways

  • Systems scale by removing bottlenecks one at a time.
  • Load balancers solve application server bottlenecks.
  • Replication solves read bottlenecks.
  • Caching reduces database pressure.
  • CDNs reduce latency for static content.
  • Message queues enable asynchronous processing.
  • Sharding solves large-scale database growth.
  • There is no single scaling solution; each technique addresses a specific problem.

Understanding the why, how, and trade-offs of each stage is exactly what interviewers look for in system design discussions.