Performance

In system design, performance refers to how efficiently a system responds to and processes requests under normal and peak conditions. It determines how fast, scalable, and resource-efficient the system is.

A performant system is one that:

Responds quickly (low latency)
Handles many requests simultaneously (high throughput)
Uses resources effectively (CPU, memory, bandwidth)

Performance Metrics

Availability

The percentage of time the system is operational and accessible.

\text{Availability} = \frac{\text{Uptime}}{\text{Uptime} + \text{Downtime}} \times 100

Example: If a system is down for 52.6 minutes in a year, it has 99.99% availability (often called “four nines”).

Explanation: Mission-critical systems (e.g., banking, healthcare) require high availability. This is often achieved using redundancy and failover systems

Reliability

The ability of a system to function correctly and consistently over time without failure.

Related metric: Mean Time Between Failures (MTBF)

Example: A payment gateway that crashes once every 30 days is more reliable than one that crashes weekly.

Explanation: Reliability ensures user trust. It is often supported by rigorous testing and graceful error handling.

Durability

The ability of a system to retain data over time, even in the face of failures.

Often relevant for storage systems

Example: Once a transaction is committed in a database, it will not be lost, even if the server crashes. That’s durability.

Explanation: Durability is critical in databases and file systems (e.g., AWS S3 promises 99.999999999% durability).

Performance Metrics Summary

Metric	Unit	Key Focus	Example Value
Latency	ms	Speed	150 ms
Throughput	req/sec	Capacity	10,000 RPS
Availability	%	Uptime	99.99%
Reliability	MTBF, errors	Stability	1 crash/month
Scalability	N/A	Load handling	Handles 1M users
Durability	%	Data persistence	99.999999999%
Consistency	Strong/Eventual	Data freshness	Real-time profile updates
Error Rate	%	Quality	0.5%
Load	% CPU, etc	Utilization	80% CPU
Tail Latency	ms	Outlier performance	1s at 99th percentile

Concurrency: Number of simultaneous users the system supports
Resource Utilization: Efficiency of CPU, memory, disk, and network usage
Load Time: Time required to serve and render content to users

Scalability
- The ability to maintain or improve performance as load increases.
- Horizontal (adding servers) or vertical (more powerful servers) scaling.
Caching
- Storing frequently accessed data in fast-access memory (e.g., Redis, CDN).
- Reduces load on databases and improves latency.
Load Balancing
- Distributes incoming requests across multiple servers to prevent overload.
Asynchronous Processing
- Defers non-critical work (like sending emails) to background jobs.
Database Optimization
- Using indexing, denormalization, and query tuning for faster data access.
Content Delivery Network (CDN)
- Distributes content closer to users globally to reduce latency.
Compression & Minification
- Reducing the size of payloads (e.g., images, scripts) to speed up responses.

Example of Performance

Scenario: Designing a system like Netflix that needs to serve thousands of videos to millions of users without lag.

Layer	Performance Techniques	Effect
Frontend	Lazy loading, image compression, minified JS/CSS	Faster page load
CDN	Distribute video content via CloudFront or Akamai	Reduce latency, global access
Backend	Caching frequently accessed metadata (Redis)	Reduce DB hits, faster APIs
Database	Indexing, read replicas, query optimization	Fast data access
Load Balancer	Round-robin or IP-hash distribution	Prevent overload
Asynchronous Jobs	Transcode video in background	Improve responsiveness

Example in Action

User Action: A user clicks "Play" on a video.

Metadata Request:
- Handled by backend API.
- Cache hit returns info instantly (e.g., video title, thumbnail).
Video Streaming:
- Served from a CDN node closest to the user.
- Reduces buffering (low latency).
Recommendation Engine:
- Runs asynchronously in the background (no delay in playback).
Logs & Analytics:
- Collected via message queues (e.g., Kafka), not blocking the main app.

Result: The video starts quickly, system handles thousands of similar requests per second, and users don’t experience noticeable delays.

Performance Optimization Summary

Strategy	Benefit
Caching	Speeds up repeated reads
CDNs	Reduces latency for global users
Load Balancing	Prevents bottlenecks
Asynchronous Design	Keeps UIs responsive
Query Optimization	Improves database response time
Resource Monitoring	Detects and fixes slow components

Performance Metrics​

Availability​

Reliability​

Durability​

Performance Metrics Summary​

Performance-Related Concepts​

Example of Performance​

Example in Action​

Performance Optimization Summary​