One of the most common system design interview questions is:

"How would you scale a web application from 100 users to 100 million users?"

The answer is rarely a single technology. Instead, systems evolve through multiple stages, with each stage solving a specific bottleneck.

This article walks through the typical evolution of a scalable system and explains why, how, and when each component is introduced.

1. Single Server

Why Start Here?

Every application starts simple.

In the beginning:

Traffic is low
Development speed matters more than scalability
Infrastructure costs should be minimal

What Is It?

A single machine handles everything:

Frontend
Backend
Database

Users
 |
 v
Single Server
├── Application
└── Database

How Does It Work?

User sends request.
Application processes request.
Database stores and retrieves data.
Response is returned.

Everything happens on one machine.

Problem

As traffic grows:

CPU becomes overloaded
Memory becomes insufficient
Database competes with application for resources

A single server becomes a bottleneck.

Interview One-Liner

A single server architecture is simple and cost-effective but becomes a bottleneck as traffic and resource usage increase.

2. Application and Database Separation

Why Do We Need It?

The application and database have different workloads.

Application Server:

Uses CPU
Handles business logic

Database Server:

Uses memory and storage
Handles queries

Keeping them together causes resource contention.

How Does It Work?

Move the database to a separate machine.

Users
 |
 v
Application Server
 |
 v
Database Server

Benefits

Independent scaling
Better resource utilization
Improved performance

Example

Suppose an e-commerce website receives thousands of requests.

The application handles:

Authentication
Order processing
API responses

The database handles:

Product data
Orders
User information

Separating them prevents one workload from affecting the other.

Interview One-Liner

Separating the application and database allows each layer to scale independently and removes resource contention.

3. Load Balancer and Multiple Application Servers

Why Do We Need It?

Eventually one application server is not enough.

Problems:

High CPU utilization
Increased response time
Single point of failure

What Is a Load Balancer?

A load balancer distributes incoming traffic across multiple servers.

Instead of sending all traffic to one machine, requests are spread evenly.

How Does It Work?

 Users
 |
 v
 Load Balancer
 / \
 / \
 v v
 App Server App Server

Request Flow

User sends request.
Load balancer receives request.
Load balancer selects a healthy server.
Request gets processed.

Benefits

Scalability

Add more servers when traffic increases.

High Availability

If one server fails, traffic is routed elsewhere.

Better Performance

Traffic is distributed evenly.

Interview Follow-Up

Common Algorithms

Round Robin
Least Connections
IP Hashing

Interview One-Liner

A load balancer distributes traffic across multiple application servers, improving scalability, performance, and availability.

4. Database Replication

Why Do We Need It?

Even after scaling application servers, the database often becomes the next bottleneck.

Most systems have:

Reads >> Writes

For example:

Product browsing
Social media feeds
News websites

Users perform far more reads than writes.

What Is Database Replication?

Replication creates copies of the database.

 Primary DB
 |
 ------------------------
 | |
 v v
 Replica DB Replica DB

How Does It Work?

Primary Database

Handles:

INSERT
UPDATE
DELETE

Replica Databases

Handle:

SELECT queries

Data is copied from primary to replicas.

Example

Amazon Product Page:

Every user viewing products performs reads.

Only a few users are updating inventory.

Reads are served by replicas.

Writes go to primary.

Benefits

Increased read capacity
Better performance
Improved availability

Challenges

Replication Lag

Replica may be slightly behind primary.

This leads to eventual consistency.

Interview One-Liner

Database replication scales read traffic by creating read-only replicas while keeping writes on the primary database.

5. Cache Integration

Why Do We Need It?

Even replicated databases can become overloaded.

Many requests ask for the same data repeatedly.

Examples:

Trending products
User profiles
Popular posts

Reading from the database every time is expensive.

What Is a Cache?

A cache is a fast in-memory storage layer.

Popular examples:

Redis
Memcached

How Does It Work?

User
 |
 v
Application
 |
 v
Cache
 |
 |
 Hit? ---- Yes ----> Return Data
 |
 No
 |
 v
Database

Cache Hit

Data found in cache.

Response is very fast.

Cache Miss

Data not found.

Application fetches from database and stores it in cache.

Example

Instagram Profile Page

Instead of querying the database millions of times, frequently accessed profiles are served directly from Redis.

Benefits

Lower database load
Faster response times
Better user experience

Interview One-Liner

A cache stores frequently accessed data in memory to reduce database load and improve response time.

6. CDN (Content Delivery Network)

Why Do We Need It?

Not all content is dynamic.

Static content includes:

Images
CSS
JavaScript
Videos

Serving these files from the application server is inefficient.

What Is a CDN?

A CDN is a geographically distributed network of servers that stores cached copies of static content closer to users.

How Does It Work?

User (India)
 |
 v
Nearest CDN Edge
 |
 v
Origin Server

The CDN serves content from the nearest location.

Example

Netflix

Movie assets are stored across multiple CDN edge locations worldwide.

Users download content from nearby servers rather than a central location.

Benefits

Reduced latency
Faster content delivery
Reduced load on origin servers

Interview One-Liner

A CDN stores static content closer to users globally, reducing latency and improving performance.

7. Data Center Scaling

Why Do We Need It?

What happens if an entire region goes down?

Examples:

Power failure
Network outage
Natural disasters

A single data center becomes a major risk.

How Does It Work?

Deploy infrastructure across multiple regions.

 Global Load Balancer
 / \
 / \
 v v
 Data Center A Data Center B

Benefits

High Availability

If one region fails, another takes over.

Disaster Recovery

Business continues operating.

Lower Latency

Users connect to the nearest region.

Interview One-Liner

Multiple data centers improve fault tolerance, disaster recovery, and global availability.

8. Message Queues

Why Do We Need It?

Some tasks do not need immediate execution.

Examples:

Sending emails
Processing payments
Generating reports
Image resizing

Doing these tasks synchronously increases response time.

What Is a Message Queue?

A message queue allows services to communicate asynchronously.

Popular technologies:

RabbitMQ
Apache Kafka
Amazon SQS

How Does It Work?

User Request
 |
 v
Application
 |
 v
Message Queue
 |
 v
Worker Service

Flow

Application places message in queue.
User receives immediate response.
Worker processes task later.

Example

Order Placement

Without Queue:

Create Order
Send Email
Generate Invoice
Update Analytics
Return Response

With Queue:

Create Order
Return Response

Background Workers:
- Send Email
- Generate Invoice
- Update Analytics

Benefits

Faster responses
Better scalability
Loose coupling

Interview One-Liner

Message queues enable asynchronous communication and help handle background processing efficiently.

9. Database Sharding

Why Do We Need It?

Eventually one database server cannot store or process all data.

Problems:

Massive table sizes
Slow queries
Storage limitations

Vertical scaling reaches its limit.

What Is Sharding?

Sharding splits data across multiple databases.

Each database stores only a subset of the data.

How Does It Work?

User-Based Sharding

Shard 1 → Users 1 - 1M

Shard 2 → Users 1M - 2M

Shard 3 → Users 2M - 3M

Request Flow

Application
 |
 v
Shard Router
 |
-------------------
| | |
v v v
DB1 DB2 DB3

The router determines where data belongs.

Example

Instagram

A single database cannot store billions of users efficiently.

Users are distributed across multiple shards.

Benefits

Horizontal scaling
Increased storage capacity
Better write throughput

Challenges

Cross-Shard Queries

Joining data across shards becomes difficult.

Rebalancing

Adding new shards requires moving data.

Interview One-Liner

Sharding horizontally partitions data across multiple databases to achieve virtually unlimited scalability.

Quick Revision

Component	Why Introduced?
Single Server	Simple starting point
App + DB Separation	Independent scaling
Load Balancer	Distribute traffic
Database Replication	Scale reads
Cache	Reduce database load
CDN	Serve static content faster
Multiple Data Centers	High availability
Message Queue	Asynchronous processing
Database Sharding	Scale massive datasets

The Complete Evolution

Single Server
 ↓
Separate App & Database
 ↓
Load Balancer + Multiple App Servers
 ↓
Database Replication
 ↓
Cache Layer
 ↓
CDN
 ↓
Multiple Data Centers
 ↓
Message Queues
 ↓
Database Sharding

Key Takeaways

Systems scale by removing bottlenecks one at a time.
Load balancers solve application server bottlenecks.
Replication solves read bottlenecks.
Caching reduces database pressure.
CDNs reduce latency for static content.
Message queues enable asynchronous processing.
Sharding solves large-scale database growth.
There is no single scaling solution; each technique addresses a specific problem.

Understanding the why, how, and trade-offs of each stage is exactly what interviewers look for in system design discussions.

URL: https://dev.to/jaspreet_singh_86ae1740ac/how-systems-scale-from-0-to-100-million-users-3nm

⇱ HLD Fundamentals #4: How Systems Scale: From 0 to 100 Million Users - DEV Community

1. Single Server

Why Start Here?

What Is It?

How Does It Work?

Problem

Interview One-Liner

2. Application and Database Separation

Why Do We Need It?

How Does It Work?

Benefits

Example

Interview One-Liner

3. Load Balancer and Multiple Application Servers

Why Do We Need It?

What Is a Load Balancer?

How Does It Work?

Request Flow

Benefits

Scalability

High Availability

Better Performance

Interview Follow-Up

Common Algorithms

Interview One-Liner

4. Database Replication

Why Do We Need It?

What Is Database Replication?

How Does It Work?

Primary Database

Replica Databases

Example

Benefits

Challenges

Replication Lag

Interview One-Liner

5. Cache Integration

Why Do We Need It?

What Is a Cache?

How Does It Work?

Cache Hit

Cache Miss

Example

Benefits

Interview One-Liner

6. CDN (Content Delivery Network)

Why Do We Need It?

What Is a CDN?

How Does It Work?

Example

Benefits

Interview One-Liner

7. Data Center Scaling

Why Do We Need It?

How Does It Work?

Benefits

High Availability

Disaster Recovery

Lower Latency

Interview One-Liner

8. Message Queues

Why Do We Need It?

What Is a Message Queue?

How Does It Work?

Flow

Example

Benefits

Interview One-Liner

9. Database Sharding

Why Do We Need It?

What Is Sharding?

How Does It Work?

User-Based Sharding

Request Flow

Example

Benefits

Challenges

Cross-Shard Queries

Rebalancing