One of the most common system design interview questions is:
"How would you scale a web application from 100 users to 100 million users?"
The answer is rarely a single technology. Instead, systems evolve through multiple stages, with each stage solving a specific bottleneck.
This article walks through the typical evolution of a scalable system and explains why, how, and when each component is introduced.
1. Single Server
Why Start Here?
Every application starts simple.
In the beginning:
- Traffic is low
- Development speed matters more than scalability
- Infrastructure costs should be minimal
What Is It?
A single machine handles everything:
- Frontend
- Backend
- Database
Users
|
v
Single Server
├── Application
└── Database
How Does It Work?
- User sends request.
- Application processes request.
- Database stores and retrieves data.
- Response is returned.
Everything happens on one machine.
Problem
As traffic grows:
- CPU becomes overloaded
- Memory becomes insufficient
- Database competes with application for resources
A single server becomes a bottleneck.
Interview One-Liner
A single server architecture is simple and cost-effective but becomes a bottleneck as traffic and resource usage increase.
2. Application and Database Separation
Why Do We Need It?
The application and database have different workloads.
Application Server:
- Uses CPU
- Handles business logic
Database Server:
- Uses memory and storage
- Handles queries
Keeping them together causes resource contention.
How Does It Work?
Move the database to a separate machine.
Users
|
v
Application Server
|
v
Database Server
Benefits
- Independent scaling
- Better resource utilization
- Improved performance
Example
Suppose an e-commerce website receives thousands of requests.
The application handles:
- Authentication
- Order processing
- API responses
The database handles:
- Product data
- Orders
- User information
Separating them prevents one workload from affecting the other.
Interview One-Liner
Separating the application and database allows each layer to scale independently and removes resource contention.
3. Load Balancer and Multiple Application Servers
Why Do We Need It?
Eventually one application server is not enough.
Problems:
- High CPU utilization
- Increased response time
- Single point of failure
What Is a Load Balancer?
A load balancer distributes incoming traffic across multiple servers.
Instead of sending all traffic to one machine, requests are spread evenly.
How Does It Work?
Users
|
v
Load Balancer
/ \
/ \
v v
App Server App Server
Request Flow
- User sends request.
- Load balancer receives request.
- Load balancer selects a healthy server.
- Request gets processed.
Benefits
Scalability
Add more servers when traffic increases.
High Availability
If one server fails, traffic is routed elsewhere.
Better Performance
Traffic is distributed evenly.
Interview Follow-Up
Common Algorithms
- Round Robin
- Least Connections
- IP Hashing
Interview One-Liner
A load balancer distributes traffic across multiple application servers, improving scalability, performance, and availability.
4. Database Replication
Why Do We Need It?
Even after scaling application servers, the database often becomes the next bottleneck.
Most systems have:
Reads >> Writes
For example:
- Product browsing
- Social media feeds
- News websites
Users perform far more reads than writes.
What Is Database Replication?
Replication creates copies of the database.
Primary DB
|
------------------------
| |
v v
Replica DB Replica DB
How Does It Work?
Primary Database
Handles:
- INSERT
- UPDATE
- DELETE
Replica Databases
Handle:
- SELECT queries
Data is copied from primary to replicas.
Example
Amazon Product Page:
Every user viewing products performs reads.
Only a few users are updating inventory.
Reads are served by replicas.
Writes go to primary.
Benefits
- Increased read capacity
- Better performance
- Improved availability
Challenges
Replication Lag
Replica may be slightly behind primary.
This leads to eventual consistency.
Interview One-Liner
Database replication scales read traffic by creating read-only replicas while keeping writes on the primary database.
5. Cache Integration
Why Do We Need It?
Even replicated databases can become overloaded.
Many requests ask for the same data repeatedly.
Examples:
- Trending products
- User profiles
- Popular posts
Reading from the database every time is expensive.
What Is a Cache?
A cache is a fast in-memory storage layer.
Popular examples:
- Redis
- Memcached
How Does It Work?
User
|
v
Application
|
v
Cache
|
|
Hit? ---- Yes ----> Return Data
|
No
|
v
Database
Cache Hit
Data found in cache.
Response is very fast.
Cache Miss
Data not found.
Application fetches from database and stores it in cache.
Example
Instagram Profile Page
Instead of querying the database millions of times, frequently accessed profiles are served directly from Redis.
Benefits
- Lower database load
- Faster response times
- Better user experience
Interview One-Liner
A cache stores frequently accessed data in memory to reduce database load and improve response time.
6. CDN (Content Delivery Network)
Why Do We Need It?
Not all content is dynamic.
Static content includes:
- Images
- CSS
- JavaScript
- Videos
Serving these files from the application server is inefficient.
What Is a CDN?
A CDN is a geographically distributed network of servers that stores cached copies of static content closer to users.
How Does It Work?
User (India)
|
v
Nearest CDN Edge
|
v
Origin Server
The CDN serves content from the nearest location.
Example
Netflix
Movie assets are stored across multiple CDN edge locations worldwide.
Users download content from nearby servers rather than a central location.
Benefits
- Reduced latency
- Faster content delivery
- Reduced load on origin servers
Interview One-Liner
A CDN stores static content closer to users globally, reducing latency and improving performance.
7. Data Center Scaling
Why Do We Need It?
What happens if an entire region goes down?
Examples:
- Power failure
- Network outage
- Natural disasters
A single data center becomes a major risk.
How Does It Work?
Deploy infrastructure across multiple regions.
Global Load Balancer
/ \
/ \
v v
Data Center A Data Center B
Benefits
High Availability
If one region fails, another takes over.
Disaster Recovery
Business continues operating.
Lower Latency
Users connect to the nearest region.
Interview One-Liner
Multiple data centers improve fault tolerance, disaster recovery, and global availability.
8. Message Queues
Why Do We Need It?
Some tasks do not need immediate execution.
Examples:
- Sending emails
- Processing payments
- Generating reports
- Image resizing
Doing these tasks synchronously increases response time.
What Is a Message Queue?
A message queue allows services to communicate asynchronously.
Popular technologies:
- RabbitMQ
- Apache Kafka
- Amazon SQS
How Does It Work?
User Request
|
v
Application
|
v
Message Queue
|
v
Worker Service
Flow
- Application places message in queue.
- User receives immediate response.
- Worker processes task later.
Example
Order Placement
Without Queue:
Create Order
Send Email
Generate Invoice
Update Analytics
Return Response
With Queue:
Create Order
Return Response
Background Workers:
- Send Email
- Generate Invoice
- Update Analytics
Benefits
- Faster responses
- Better scalability
- Loose coupling
Interview One-Liner
Message queues enable asynchronous communication and help handle background processing efficiently.
9. Database Sharding
Why Do We Need It?
Eventually one database server cannot store or process all data.
Problems:
- Massive table sizes
- Slow queries
- Storage limitations
Vertical scaling reaches its limit.
What Is Sharding?
Sharding splits data across multiple databases.
Each database stores only a subset of the data.
How Does It Work?
User-Based Sharding
Shard 1 → Users 1 - 1M
Shard 2 → Users 1M - 2M
Shard 3 → Users 2M - 3M
Request Flow
Application
|
v
Shard Router
|
-------------------
| | |
v v v
DB1 DB2 DB3
The router determines where data belongs.
Example
A single database cannot store billions of users efficiently.
Users are distributed across multiple shards.
Benefits
- Horizontal scaling
- Increased storage capacity
- Better write throughput
Challenges
Cross-Shard Queries
Joining data across shards becomes difficult.
Rebalancing
Adding new shards requires moving data.
Interview One-Liner
Sharding horizontally partitions data across multiple databases to achieve virtually unlimited scalability.
Quick Revision
| Component | Why Introduced? |
|---|---|
| Single Server | Simple starting point |
| App + DB Separation | Independent scaling |
| Load Balancer | Distribute traffic |
| Database Replication | Scale reads |
| Cache | Reduce database load |
| CDN | Serve static content faster |
| Multiple Data Centers | High availability |
| Message Queue | Asynchronous processing |
| Database Sharding | Scale massive datasets |
The Complete Evolution
Single Server
↓
Separate App & Database
↓
Load Balancer + Multiple App Servers
↓
Database Replication
↓
Cache Layer
↓
CDN
↓
Multiple Data Centers
↓
Message Queues
↓
Database Sharding
Key Takeaways
- Systems scale by removing bottlenecks one at a time.
- Load balancers solve application server bottlenecks.
- Replication solves read bottlenecks.
- Caching reduces database pressure.
- CDNs reduce latency for static content.
- Message queues enable asynchronous processing.
- Sharding solves large-scale database growth.
- There is no single scaling solution; each technique addresses a specific problem.
Understanding the why, how, and trade-offs of each stage is exactly what interviewers look for in system design discussions.
For further actions, you may consider blocking this person and/or reporting abuse
