Scalability
Scaling in system design refers to the ability of a system to handle increased load or demand by growing in capacity. As your application gains users, handles more requests, or processes more data, scaling ensures it continues to perform well and meet user expectations.
Vertical Scaling (Scaling Up)
Vertical Scaling (also called scaling up) means increasing the capacity of a single machine/server to handle more load. Instead of adding more servers (like in horizontal scaling), you upgrade the existing machine with:
- More powerful CPU
- More RAM
- Faster SSD storage
- Better network bandwidth
Vertical scaling is often used in:
- Monolithic applications
- Databases (before sharding/replication)
- Early-stage startups where architecture is still simple
- Systems with tight dependencies or shared state (where horizontal scaling is hard)
Benefits of Vertical Scalling
| Advantage | Explanation |
|---|---|
| ✅ Simpler architecture | No need to manage multiple nodes or distributed systems |
| ✅ No code changes | App continues to run without refactoring |
| ✅ Faster to implement | Just upgrade the hardware or instance type |
| ✅ Useful for databases | Databases benefit from more memory and CPU |
Limitations of Vertical Scalling
| Limitation | Explanation |
|---|---|
| Hardware limit | You can only scale up to the most powerful machine available |
| Downtime possible | Upgrading may require rebooting the server |
| Cost increases steeply | Higher-tier machines cost disproportionately more |
| No fault tolerance | Single point of failure if the machine crashes |
Example of Vertical Scaling
Scenario: You built a Node.js-based blog platform. It runs on a single server (2 vCPU, 4 GB RAM). As traffic increases, your app slows down—especially under heavy request bursts.
Solution: You vertically scale by upgrading to a more powerful instance (8 vCPU, 16 GB RAM).
// server.js
const express = require("express");
const app = express();
app.get("/", (req, res) => {
// Simulate heavy computation
let sum = 0;
for (let i = 0; i < 1e7; i++) sum += i;
res.send("Welcome to my blog!");
});
app.listen(3000, () => console.log("Server started on port 3000"));
On a low-memory, low-CPU server, requests take time and queue up. Users may face timeouts or slow responses.
After Vertical Scaling
You upgrade the server (e.g., using AWS EC2):
- From:
t3.small(2 vCPU, 2GB RAM) - To:
m6i.2xlarge(8 vCPU, 32GB RAM)
This boosts:
- Number of concurrent requests handled
- Speed of compute-heavy endpoints
- RAM available for Node.js heap and cache
No code changes needed.
Vertical Scaling for Databases
A common use case:
- You're using PostgreSQL with high query volume.
- Queries are slow due to lack of memory (no room for indexes/cache).
- You upgrade the DB instance to get more RAM & CPU.
Tools like Amazon RDS, DigitalOcean Managed DB, or Google Cloud SQL allow one-click vertical scaling.
Performance Comparison of Vertical Scalling
| Metric | Before Upgrade | After Upgrade |
|---|---|---|
| Avg. response time | 800ms | 150ms |
| Concurrent users | 100 | 1000+ |
| Memory usage | 95% (swap used) | 50% (no swap) |
Horizontal Scaling (Scaling Out)
Horizontal Scaling (also called scaling out) is the process of adding more machines or nodes to your system to handle increased load. Instead of upgrading a single machine (vertical scaling), you add more instances of your application or database and distribute traffic or data among them behind a load balancer..
It require more complex architecture; requires stateless design.
It’s used in:
- Web applications serving high traffic (e.g., Netflix, Facebook)
- Microservices architectures
- Cloud-native systems (Kubernetes, serverless)
- Big data processing systems
Benefits of Horizontal Scaling
| Advantage | Explanation |
|---|---|
| High scalability | Add as many servers as needed to meet demand |
| High availability | No single point of failure—if one server fails, others handle the load |
| Cost efficiency | Use many low-cost servers instead of one expensive one |
| Fault tolerance | Easy to design resilient systems |
| Easy automation | Works well with autoscaling in cloud environments |
Limitations of Horizontal Scaling
| Limitation | Explanation |
|---|---|
| 🚫 More complex system | Requires load balancing, service discovery, etc. |
| 🚫 Stateless requirement | App logic must avoid using local memory for session/state |
| 🚫 Network overhead | Data sharing across nodes adds latency and complexity |
Horizontal Scaling Architecture
+-------------------+
| Load Balancer |
+--------+----------+
|
+------------------+------------------+
| | |
+-----+ +-----+ +-----+
| App | | App | | App |
| #1 | | #2 | | #3 |
+-----+ +-----+ +-----+
Example of Horizontal Scalling
Scenario: You built a Node.js API using Express. As traffic increases, a single instance isn’t enough. You need to scale out.
Step 1: Create a Stateless Node.js App
You deploy multiple Node.js app instances using a load balancer like NGINX or AWS ELB to distribute incoming HTTP traffic.
// server.js
const express = require("express");
const app = express();
app.get("/", (req, res) => {
res.send(`Hello from process ${process.pid}`);
});
app.listen(3000, () => console.log(`Server running on port 3000`));
You can deploy this app on 3 servers and use a load balancer to route traffic across them.
To support horizontal scaling, make sure:
- No local in-memory state
- Sessions (if any) are stored in Redis or DB
Step 2: Run Multiple Instances (e.g., Using cluster or Docker)
- Using
clustermodule (simulates horizontal scaling on one machine):
// cluster.js
const cluster = require("cluster");
const os = require("os");
const numCPUs = os.cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
for (let i = 0; i < numCPUs; i++) {
cluster.fork(); // Spawn worker
}
} else {
require("./server"); // Worker runs app
}
This runs multiple processes on one machine — like simulating multiple servers.
- Real Horizontal Scaling (Multiple Servers + Load Balancer)
- Deploy your Node.js app on multiple VMs/containers (e.g., app1, app2, app3)
- Use NGINX or cloud load balancer to route traffic across them.
NGINX config (load balancing):
http {
upstream node_backend {
server 192.168.1.10:3000;
server 192.168.1.11:3000;
server 192.168.1.12:3000;
}
server {
listen 80;
location / {
proxy_pass http://node_backend;
}
}
}
Other Components You Might Add
- Session Store: Redis or Memcached (to share sessions across instances)
- Service Discovery: If using microservices (e.g., Consul, Eureka)
- Containerization: Docker, Kubernetes (to manage scaling and orchestration)
- Auto Scaling: AWS Auto Scaling Groups, GCP Instance Groups, or K8s Horizontal Pod Autoscaler
Strategies to Implement Scaling
Stateless Services
- Ensure your application doesn’t store session or state data in memory. Use external tools like Redis or databases for session storage.
- This allows easy replication across servers.
Load Balancing
- Distribute requests across instances.
- Load balancer uses algorithms like Round Robin, Least Connections, or IP Hashing.
# Sample NGINX config
upstream backend {
server app1.example.com;
server app2.example.com;
server app3.example.com;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
Database Scaling
- Read Replicas: Separate read traffic from write.
- Sharding: Partition data across multiple databases.
- Caching: Use Redis or Memcached to cache frequent queries.
Example of Scaling
Scenario:
You’re building a product catalog service. Initially, you have:
- One Node.js server
- One PostgreSQL DB As traffic grows, product searches slow down.
Solution:
- Scale Node.js horizontally: Use Docker/Kubernetes to spin up multiple Node.js containers.
- Introduce Redis Cache:** Cache popular search queries.
- Use PostgreSQL Read Replicas:** Direct read-heavy operations (like product listings) to replicas.
- Add Load Balancer:** AWS Application Load Balancer routes traffic across Node.js containers.