Skip to main content

Rate Limiting

Rate limiting is a technique used in system design to control the number of requests a user or system can make to a resource within a given time window. It protects services from abuse, overload, and ensures fair resource usage

Why Use Rate Limiting?

  1. Prevent Abuse: Throttle bots, scrapers, or malicious users.
  2. Protect Backend Resources: Databases, APIs, or microservices.
  3. Ensure Fair Usage: Distribute resources fairly among users.
  4. Avoid Cost Spikes: Especially in cloud-based services (e.g., APIs with metered billing).
  5. Maintain System Stability: Prevent cascading failures by not overloading services.

Where Rate Limiting Applied

  • API Gateways
  • Load Balancers
  • Microservices
  • Reverse Proxies (e.g., NGINX, Envoy)
  • Databases (at query level)

Rate Limiting Algorithms

AlgorithmDescriptionProsCons
Token BucketTokens are added at fixed rate; requests consume tokens.Smooth flow; allows burstsSlightly complex
Leaky BucketRequests added to queue; processed at fixed rateSmoother output rateMay drop bursts
Fixed WindowCount resets every window (e.g., every minute)SimpleTraffic spikes at edges
Sliding Window LogLogs timestamps of requests, checks time windowAccurateHigh memory usage
Sliding Window CounterCounts spread over sub-windowsBalance between accuracy & memorySlightly complex

Example of Rate Limiting

Scenario: You have a public REST API. To avoid abuse, you allow:

  • 100 requests per user per minute.

Design with Token Bucket (Example):

  1. Each user gets a bucket with capacity = 100 tokens.
  2. 1 token = 1 API request.
  3. Tokens refill at 100/min (≈1.67 tokens/sec).
  4. If the bucket is empty → Reject request with 429 Too Many Requests.

Flow:

User A makes 5 requests -> consumes 5 tokens
Bucket has 95 left
If User A sends 101 requests in a minute -> last one is rejected

GitHub API Example

  • Unauthenticated users: 60 requests/hour.
  • Authenticated users: 5000 requests/hour.
  • Clients get headers like:
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4990
X-RateLimit-Reset: 1664579460

Node.js Example of Rate Limiting

Each client (IP) gets:

  • maxTokens = 5
  • refillRate = 1 token every 12 seconds

Tokens are stored per IP in Redis with timestamp and count.

const express = require("express");
const Redis = require("ioredis");

const app = express();
const redis = new Redis(); // Connects to Redis at localhost:6379

const RATE_LIMIT = {
MAX_TOKENS: 5,
REFILL_INTERVAL_MS: 60 * 1000,
};

function getKey(ip) {
return `rate_limit:${ip}`;
}

async function rateLimiter(req, res, next) {
const ip = req.ip;
const key = getKey(ip);

const now = Date.now();

// Get existing token info from Redis
const data = await redis.hgetall(key);
let tokens = parseInt(data.tokens || RATE_LIMIT.MAX_TOKENS);
let lastRefill = parseInt(data.lastRefill || now);

// Calculate time since last refill
const timeElapsed = now - lastRefill;
const refillTokens = Math.floor(
(timeElapsed / RATE_LIMIT.REFILL_INTERVAL_MS) * RATE_LIMIT.MAX_TOKENS
);

tokens = Math.min(RATE_LIMIT.MAX_TOKENS, tokens + refillTokens);
lastRefill = refillTokens > 0 ? now : lastRefill;

if (tokens > 0) {
tokens--;
await redis.hmset(key, {
tokens,
lastRefill,
});
await redis.expire(key, 60); // auto-expire in 60 sec
next();
} else {
res.set("Retry-After", 60);
res.status(429).send("Rate limit exceeded. Try again later.");
}
}

app.use(rateLimiter);

app.get("/", (req, res) => {
res.send("Hello! You are within the rate limit.");
});

app.listen(3000, () => {
console.log("Server running on http://localhost:3000");
});

How It Works

  • First-time users start with 5 tokens.
  • Each request consumes 1 token.
  • Redis keeps track of tokens + last refill time.
  • Tokens are refilled proportionally every minute.
  • If tokens are 0 → respond with HTTP 429.

Headers for Clients

To improve UX, you can add headers like:

res.set('X-RateLimit-Limit', 5);
res.set('X-RateLimit-Remaining', tokens);

Tech Stack for Rate Limiting

  • API Gateway (e.g., Amazon API Gateway, Kong, Apigee): Applies user-level rate limiting.
  • Redis: Stores counters per user (good for distributed systems).
  • NGINX: Can enforce rate limiting via limit_req_zone.

Best Practices of Rate Limiting

  • Implement graceful degradation (e.g., retries, backoff).
  • Provide users with rate limit headers (e.g., X-RateLimit-Limit, X-RateLimit-Remaining).
  • Log and monitor rejected requests to detect abuse or misconfigurations.
  • Use exponential backoff in clients to reduce retry pressure.

What Happens Without Rate Limiting?

  • Denial-of-Service (DoS) risk.
  • Cost explosion from excessive compute/API calls.
  • Poor performance for legitimate users.
  • Data inconsistency under high concurrent writes.