Performance
In system design, performance refers to how efficiently a system responds to and processes requests under normal and peak conditions. It determines how fast, scalable, and resource-efficient the system is.
A performant system is one that:
- Responds quickly (low latency)
- Handles many requests simultaneously (high throughput)
- Uses resources effectively (CPU, memory, bandwidth)
Performance Metrics
Availability
The percentage of time the system is operational and accessible.
Example: If a system is down for 52.6 minutes in a year, it has 99.99% availability (often called “four nines”).
Explanation: Mission-critical systems (e.g., banking, healthcare) require high availability. This is often achieved using redundancy and failover systems
Reliability
The ability of a system to function correctly and consistently over time without failure.
- Related metric: Mean Time Between Failures (MTBF)
Example: A payment gateway that crashes once every 30 days is more reliable than one that crashes weekly.
Explanation: Reliability ensures user trust. It is often supported by rigorous testing and graceful error handling.
Durability
The ability of a system to retain data over time, even in the face of failures.
- Often relevant for storage systems
Example: Once a transaction is committed in a database, it will not be lost, even if the server crashes. That’s durability.
Explanation: Durability is critical in databases and file systems (e.g., AWS S3 promises 99.999999999% durability).
Performance Metrics Summary
| Metric | Unit | Key Focus | Example Value |
|---|---|---|---|
| Latency | ms | Speed | 150 ms |
| Throughput | req/sec | Capacity | 10,000 RPS |
| Availability | % | Uptime | 99.99% |
| Reliability | MTBF, errors | Stability | 1 crash/month |
| Scalability | N/A | Load handling | Handles 1M users |
| Durability | % | Data persistence | 99.999999999% |
| Consistency | Strong/Eventual | Data freshness | Real-time profile updates |
| Error Rate | % | Quality | 0.5% |
| Load | % CPU, etc | Utilization | 80% CPU |
| Tail Latency | ms | Outlier performance | 1s at 99th percentile |
- Concurrency: Number of simultaneous users the system supports
- Resource Utilization: Efficiency of CPU, memory, disk, and network usage
- Load Time: Time required to serve and render content to users
Performance-Related Concepts
-
Scalability
- The ability to maintain or improve performance as load increases.
- Horizontal (adding servers) or vertical (more powerful servers) scaling.
-
Caching
- Storing frequently accessed data in fast-access memory (e.g., Redis, CDN).
- Reduces load on databases and improves latency.
-
Load Balancing
- Distributes incoming requests across multiple servers to prevent overload.
-
Asynchronous Processing
- Defers non-critical work (like sending emails) to background jobs.
-
Database Optimization
- Using indexing, denormalization, and query tuning for faster data access.
-
Content Delivery Network (CDN)
- Distributes content closer to users globally to reduce latency.
-
Compression & Minification
- Reducing the size of payloads (e.g., images, scripts) to speed up responses.
Example of Performance
Scenario: Designing a system like Netflix that needs to serve thousands of videos to millions of users without lag.
| Layer | Performance Techniques | Effect |
|---|---|---|
| Frontend | Lazy loading, image compression, minified JS/CSS | Faster page load |
| CDN | Distribute video content via CloudFront or Akamai | Reduce latency, global access |
| Backend | Caching frequently accessed metadata (Redis) | Reduce DB hits, faster APIs |
| Database | Indexing, read replicas, query optimization | Fast data access |
| Load Balancer | Round-robin or IP-hash distribution | Prevent overload |
| Asynchronous Jobs | Transcode video in background | Improve responsiveness |
Example in Action
User Action: A user clicks "Play" on a video.
- Metadata Request:
- Handled by backend API.
- Cache hit returns info instantly (e.g., video title, thumbnail).
- Video Streaming:
- Served from a CDN node closest to the user.
- Reduces buffering (low latency).
- Recommendation Engine:
- Runs asynchronously in the background (no delay in playback).
- Logs & Analytics:
- Collected via message queues (e.g., Kafka), not blocking the main app.
Result: The video starts quickly, system handles thousands of similar requests per second, and users don’t experience noticeable delays.
Performance Optimization Summary
| Strategy | Benefit |
|---|---|
| Caching | Speeds up repeated reads |
| CDNs | Reduces latency for global users |
| Load Balancing | Prevents bottlenecks |
| Asynchronous Design | Keeps UIs responsive |
| Query Optimization | Improves database response time |
| Resource Monitoring | Detects and fixes slow components |