Skip to main content

Scalability

Scaling in system design refers to the ability of a system to handle increased load or demand by growing in capacity. As your application gains users, handles more requests, or processes more data, scaling ensures it continues to perform well and meet user expectations.

Vertical Scaling (Scaling Up)

Vertical Scaling (also called scaling up) means increasing the capacity of a single machine/server to handle more load. Instead of adding more servers (like in horizontal scaling), you upgrade the existing machine with:

  • More powerful CPU
  • More RAM
  • Faster SSD storage
  • Better network bandwidth

Vertical scaling is often used in:

  • Monolithic applications
  • Databases (before sharding/replication)
  • Early-stage startups where architecture is still simple
  • Systems with tight dependencies or shared state (where horizontal scaling is hard)

Benefits of Vertical Scalling

AdvantageExplanation
✅ Simpler architectureNo need to manage multiple nodes or distributed systems
✅ No code changesApp continues to run without refactoring
✅ Faster to implementJust upgrade the hardware or instance type
✅ Useful for databasesDatabases benefit from more memory and CPU

Limitations of Vertical Scalling

LimitationExplanation
Hardware limitYou can only scale up to the most powerful machine available
Downtime possibleUpgrading may require rebooting the server
Cost increases steeplyHigher-tier machines cost disproportionately more
No fault toleranceSingle point of failure if the machine crashes

Example of Vertical Scaling

Scenario: You built a Node.js-based blog platform. It runs on a single server (2 vCPU, 4 GB RAM). As traffic increases, your app slows down—especially under heavy request bursts.

Solution: You vertically scale by upgrading to a more powerful instance (8 vCPU, 16 GB RAM).

// server.js
const express = require("express");
const app = express();

app.get("/", (req, res) => {
// Simulate heavy computation
let sum = 0;
for (let i = 0; i < 1e7; i++) sum += i;
res.send("Welcome to my blog!");
});

app.listen(3000, () => console.log("Server started on port 3000"));

On a low-memory, low-CPU server, requests take time and queue up. Users may face timeouts or slow responses.

After Vertical Scaling

You upgrade the server (e.g., using AWS EC2):

  • From: t3.small (2 vCPU, 2GB RAM)
  • To: m6i.2xlarge (8 vCPU, 32GB RAM)

This boosts:

  • Number of concurrent requests handled
  • Speed of compute-heavy endpoints
  • RAM available for Node.js heap and cache

No code changes needed.

Vertical Scaling for Databases

A common use case:

  • You're using PostgreSQL with high query volume.
  • Queries are slow due to lack of memory (no room for indexes/cache).
  • You upgrade the DB instance to get more RAM & CPU.

Tools like Amazon RDS, DigitalOcean Managed DB, or Google Cloud SQL allow one-click vertical scaling.

Performance Comparison of Vertical Scalling

MetricBefore UpgradeAfter Upgrade
Avg. response time800ms150ms
Concurrent users1001000+
Memory usage95% (swap used)50% (no swap)

Horizontal Scaling (Scaling Out)

Horizontal Scaling (also called scaling out) is the process of adding more machines or nodes to your system to handle increased load. Instead of upgrading a single machine (vertical scaling), you add more instances of your application or database and distribute traffic or data among them behind a load balancer..

It require more complex architecture; requires stateless design.

It’s used in:

  • Web applications serving high traffic (e.g., Netflix, Facebook)
  • Microservices architectures
  • Cloud-native systems (Kubernetes, serverless)
  • Big data processing systems

Benefits of Horizontal Scaling

AdvantageExplanation
High scalabilityAdd as many servers as needed to meet demand
High availabilityNo single point of failure—if one server fails, others handle the load
Cost efficiencyUse many low-cost servers instead of one expensive one
Fault toleranceEasy to design resilient systems
Easy automationWorks well with autoscaling in cloud environments

Limitations of Horizontal Scaling

LimitationExplanation
🚫 More complex systemRequires load balancing, service discovery, etc.
🚫 Stateless requirementApp logic must avoid using local memory for session/state
🚫 Network overheadData sharing across nodes adds latency and complexity

Horizontal Scaling Architecture

             +-------------------+
| Load Balancer |
+--------+----------+
|
+------------------+------------------+
| | |
+-----+ +-----+ +-----+
| App | | App | | App |
| #1 | | #2 | | #3 |
+-----+ +-----+ +-----+

Example of Horizontal Scalling

Scenario: You built a Node.js API using Express. As traffic increases, a single instance isn’t enough. You need to scale out.

Step 1: Create a Stateless Node.js App

You deploy multiple Node.js app instances using a load balancer like NGINX or AWS ELB to distribute incoming HTTP traffic.

// server.js
const express = require("express");
const app = express();

app.get("/", (req, res) => {
res.send(`Hello from process ${process.pid}`);
});

app.listen(3000, () => console.log(`Server running on port 3000`));

You can deploy this app on 3 servers and use a load balancer to route traffic across them.

To support horizontal scaling, make sure:

  • No local in-memory state
  • Sessions (if any) are stored in Redis or DB

Step 2: Run Multiple Instances (e.g., Using cluster or Docker)

  1. Using cluster module (simulates horizontal scaling on one machine):
// cluster.js
const cluster = require("cluster");
const os = require("os");
const numCPUs = os.cpus().length;

if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
for (let i = 0; i < numCPUs; i++) {
cluster.fork(); // Spawn worker
}
} else {
require("./server"); // Worker runs app
}

This runs multiple processes on one machine — like simulating multiple servers.

  1. Real Horizontal Scaling (Multiple Servers + Load Balancer)
    • Deploy your Node.js app on multiple VMs/containers (e.g., app1, app2, app3)
    • Use NGINX or cloud load balancer to route traffic across them.

NGINX config (load balancing):

http {
upstream node_backend {
server 192.168.1.10:3000;
server 192.168.1.11:3000;
server 192.168.1.12:3000;
}

server {
listen 80;
location / {
proxy_pass http://node_backend;
}
}
}

Other Components You Might Add

  • Session Store: Redis or Memcached (to share sessions across instances)
  • Service Discovery: If using microservices (e.g., Consul, Eureka)
  • Containerization: Docker, Kubernetes (to manage scaling and orchestration)
  • Auto Scaling: AWS Auto Scaling Groups, GCP Instance Groups, or K8s Horizontal Pod Autoscaler

Strategies to Implement Scaling

Stateless Services

  • Ensure your application doesn’t store session or state data in memory. Use external tools like Redis or databases for session storage.
  • This allows easy replication across servers.

Load Balancing

  • Distribute requests across instances.
  • Load balancer uses algorithms like Round Robin, Least Connections, or IP Hashing.
# Sample NGINX config
upstream backend {
server app1.example.com;
server app2.example.com;
server app3.example.com;
}

server {
listen 80;
location / {
proxy_pass http://backend;
}
}

Database Scaling

  • Read Replicas: Separate read traffic from write.
  • Sharding: Partition data across multiple databases.
  • Caching: Use Redis or Memcached to cache frequent queries.

Example of Scaling

Scenario:

You’re building a product catalog service. Initially, you have:

  • One Node.js server
  • One PostgreSQL DB As traffic grows, product searches slow down.

Solution:

  1. Scale Node.js horizontally: Use Docker/Kubernetes to spin up multiple Node.js containers.
  2. Introduce Redis Cache:** Cache popular search queries.
  3. Use PostgreSQL Read Replicas:** Direct read-heavy operations (like product listings) to replicas.
  4. Add Load Balancer:** AWS Application Load Balancer routes traffic across Node.js containers.