Rate Limiting

Rate limiting is a technique used in system design to control the number of requests a user or system can make to a resource within a given time window. It protects services from abuse, overload, and ensures fair resource usage

Why Use Rate Limiting?

Prevent Abuse: Throttle bots, scrapers, or malicious users.
Protect Backend Resources: Databases, APIs, or microservices.
Ensure Fair Usage: Distribute resources fairly among users.
Avoid Cost Spikes: Especially in cloud-based services (e.g., APIs with metered billing).
Maintain System Stability: Prevent cascading failures by not overloading services.

Where Rate Limiting Applied

API Gateways
Load Balancers
Microservices
Reverse Proxies (e.g., NGINX, Envoy)
Databases (at query level)

Rate Limiting Algorithms

Algorithm	Description	Pros	Cons
Token Bucket	Tokens are added at fixed rate; requests consume tokens.	Smooth flow; allows bursts	Slightly complex
Leaky Bucket	Requests added to queue; processed at fixed rate	Smoother output rate	May drop bursts
Fixed Window	Count resets every window (e.g., every minute)	Simple	Traffic spikes at edges
Sliding Window Log	Logs timestamps of requests, checks time window	Accurate	High memory usage
Sliding Window Counter	Counts spread over sub-windows	Balance between accuracy & memory	Slightly complex

Example of Rate Limiting

Scenario: You have a public REST API. To avoid abuse, you allow:

100 requests per user per minute.

Design with Token Bucket (Example):

Each user gets a bucket with capacity = 100 tokens.
1 token = 1 API request.
Tokens refill at 100/min (≈1.67 tokens/sec).
If the bucket is empty → Reject request with 429 Too Many Requests.

Flow:

User A makes 5 requests -> consumes 5 tokens
Bucket has 95 left
If User A sends 101 requests in a minute -> last one is rejected

GitHub API Example

Unauthenticated users: 60 requests/hour.
Authenticated users: 5000 requests/hour.
Clients get headers like:

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4990
X-RateLimit-Reset: 1664579460

Node.js Example of Rate Limiting

Each client (IP) gets:

maxTokens = 5
refillRate = 1 token every 12 seconds

Tokens are stored per IP in Redis with timestamp and count.

const express = require("express");
const Redis = require("ioredis");

const app = express();
const redis = new Redis(); // Connects to Redis at localhost:6379

const RATE_LIMIT = {
  MAX_TOKENS: 5,
  REFILL_INTERVAL_MS: 60 * 1000,
};

function getKey(ip) {
  return `rate_limit:${ip}`;
}

async function rateLimiter(req, res, next) {
  const ip = req.ip;
  const key = getKey(ip);

  const now = Date.now();

  // Get existing token info from Redis
  const data = await redis.hgetall(key);
  let tokens = parseInt(data.tokens || RATE_LIMIT.MAX_TOKENS);
  let lastRefill = parseInt(data.lastRefill || now);

  // Calculate time since last refill
  const timeElapsed = now - lastRefill;
  const refillTokens = Math.floor(
    (timeElapsed / RATE_LIMIT.REFILL_INTERVAL_MS) * RATE_LIMIT.MAX_TOKENS
  );

  tokens = Math.min(RATE_LIMIT.MAX_TOKENS, tokens + refillTokens);
  lastRefill = refillTokens > 0 ? now : lastRefill;

  if (tokens > 0) {
    tokens--;
    await redis.hmset(key, {
      tokens,
      lastRefill,
    });
    await redis.expire(key, 60); // auto-expire in 60 sec
    next();
  } else {
    res.set("Retry-After", 60);
    res.status(429).send("Rate limit exceeded. Try again later.");
  }
}

app.use(rateLimiter);

app.get("/", (req, res) => {
  res.send("Hello! You are within the rate limit.");
});

app.listen(3000, () => {
  console.log("Server running on http://localhost:3000");
});

How It Works

First-time users start with 5 tokens.
Each request consumes 1 token.
Redis keeps track of tokens + last refill time.
Tokens are refilled proportionally every minute.
If tokens are 0 → respond with HTTP 429.

Headers for Clients

To improve UX, you can add headers like:

res.set('X-RateLimit-Limit', 5);
res.set('X-RateLimit-Remaining', tokens);

Tech Stack for Rate Limiting

API Gateway (e.g., Amazon API Gateway, Kong, Apigee): Applies user-level rate limiting.
Redis: Stores counters per user (good for distributed systems).
NGINX: Can enforce rate limiting via limit_req_zone.

Best Practices of Rate Limiting

Implement graceful degradation (e.g., retries, backoff).
Provide users with rate limit headers (e.g., X-RateLimit-Limit, X-RateLimit-Remaining).
Log and monitor rejected requests to detect abuse or misconfigurations.
Use exponential backoff in clients to reduce retry pressure.

What Happens Without Rate Limiting?

Denial-of-Service (DoS) risk.
Cost explosion from excessive compute/API calls.
Poor performance for legitimate users.
Data inconsistency under high concurrent writes.

Why Use Rate Limiting?​

Where Rate Limiting Applied​

Rate Limiting Algorithms​

Example of Rate Limiting​

GitHub API Example​

Node.js Example of Rate Limiting​

How It Works​

Headers for Clients​

Tech Stack for Rate Limiting​

Best Practices of Rate Limiting​

What Happens Without Rate Limiting?​