Rate Limiting
Rate limiting is a technique used in system design to control the number of requests a user or system can make to a resource within a given time window. It protects services from abuse, overload, and ensures fair resource usage
Why Use Rate Limiting?
- Prevent Abuse: Throttle bots, scrapers, or malicious users.
- Protect Backend Resources: Databases, APIs, or microservices.
- Ensure Fair Usage: Distribute resources fairly among users.
- Avoid Cost Spikes: Especially in cloud-based services (e.g., APIs with metered billing).
- Maintain System Stability: Prevent cascading failures by not overloading services.
Where Rate Limiting Applied
- API Gateways
- Load Balancers
- Microservices
- Reverse Proxies (e.g., NGINX, Envoy)
- Databases (at query level)
Rate Limiting Algorithms
| Algorithm | Description | Pros | Cons |
|---|---|---|---|
| Token Bucket | Tokens are added at fixed rate; requests consume tokens. | Smooth flow; allows bursts | Slightly complex |
| Leaky Bucket | Requests added to queue; processed at fixed rate | Smoother output rate | May drop bursts |
| Fixed Window | Count resets every window (e.g., every minute) | Simple | Traffic spikes at edges |
| Sliding Window Log | Logs timestamps of requests, checks time window | Accurate | High memory usage |
| Sliding Window Counter | Counts spread over sub-windows | Balance between accuracy & memory | Slightly complex |
Example of Rate Limiting
Scenario: You have a public REST API. To avoid abuse, you allow:
- 100 requests per user per minute.
Design with Token Bucket (Example):
- Each user gets a bucket with capacity = 100 tokens.
- 1 token = 1 API request.
- Tokens refill at 100/min (≈1.67 tokens/sec).
- If the bucket is empty → Reject request with
429 Too Many Requests.
Flow:
User A makes 5 requests -> consumes 5 tokens
Bucket has 95 left
If User A sends 101 requests in a minute -> last one is rejected
GitHub API Example
- Unauthenticated users: 60 requests/hour.
- Authenticated users: 5000 requests/hour.
- Clients get headers like:
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4990
X-RateLimit-Reset: 1664579460
Node.js Example of Rate Limiting
Each client (IP) gets:
maxTokens = 5refillRate = 1 token every 12 seconds
Tokens are stored per IP in Redis with timestamp and count.
const express = require("express");
const Redis = require("ioredis");
const app = express();
const redis = new Redis(); // Connects to Redis at localhost:6379
const RATE_LIMIT = {
MAX_TOKENS: 5,
REFILL_INTERVAL_MS: 60 * 1000,
};
function getKey(ip) {
return `rate_limit:${ip}`;
}
async function rateLimiter(req, res, next) {
const ip = req.ip;
const key = getKey(ip);
const now = Date.now();
// Get existing token info from Redis
const data = await redis.hgetall(key);
let tokens = parseInt(data.tokens || RATE_LIMIT.MAX_TOKENS);
let lastRefill = parseInt(data.lastRefill || now);
// Calculate time since last refill
const timeElapsed = now - lastRefill;
const refillTokens = Math.floor(
(timeElapsed / RATE_LIMIT.REFILL_INTERVAL_MS) * RATE_LIMIT.MAX_TOKENS
);
tokens = Math.min(RATE_LIMIT.MAX_TOKENS, tokens + refillTokens);
lastRefill = refillTokens > 0 ? now : lastRefill;
if (tokens > 0) {
tokens--;
await redis.hmset(key, {
tokens,
lastRefill,
});
await redis.expire(key, 60); // auto-expire in 60 sec
next();
} else {
res.set("Retry-After", 60);
res.status(429).send("Rate limit exceeded. Try again later.");
}
}
app.use(rateLimiter);
app.get("/", (req, res) => {
res.send("Hello! You are within the rate limit.");
});
app.listen(3000, () => {
console.log("Server running on http://localhost:3000");
});
How It Works
- First-time users start with 5 tokens.
- Each request consumes 1 token.
- Redis keeps track of tokens + last refill time.
- Tokens are refilled proportionally every minute.
- If tokens are 0 → respond with HTTP 429.
Headers for Clients
To improve UX, you can add headers like:
res.set('X-RateLimit-Limit', 5);
res.set('X-RateLimit-Remaining', tokens);
Tech Stack for Rate Limiting
- API Gateway (e.g., Amazon API Gateway, Kong, Apigee): Applies user-level rate limiting.
- Redis: Stores counters per user (good for distributed systems).
- NGINX: Can enforce rate limiting via limit_req_zone.
Best Practices of Rate Limiting
- Implement graceful degradation (e.g., retries, backoff).
- Provide users with rate limit headers (e.g.,
X-RateLimit-Limit,X-RateLimit-Remaining). - Log and monitor rejected requests to detect abuse or misconfigurations.
- Use exponential backoff in clients to reduce retry pressure.
What Happens Without Rate Limiting?
- Denial-of-Service (DoS) risk.
- Cost explosion from excessive compute/API calls.
- Poor performance for legitimate users.
- Data inconsistency under high concurrent writes.