Skip to main content

Performance

In system design, performance refers to how efficiently a system responds to and processes requests under normal and peak conditions. It determines how fast, scalable, and resource-efficient the system is.

A performant system is one that:

  • Responds quickly (low latency)
  • Handles many requests simultaneously (high throughput)
  • Uses resources effectively (CPU, memory, bandwidth)

Performance Metrics

Availability

The percentage of time the system is operational and accessible.

Availability=UptimeUptime+Downtime×100\text{Availability} = \frac{\text{Uptime}}{\text{Uptime} + \text{Downtime}} \times 100

Example: If a system is down for 52.6 minutes in a year, it has 99.99% availability (often called “four nines”).

Explanation: Mission-critical systems (e.g., banking, healthcare) require high availability. This is often achieved using redundancy and failover systems

Reliability

The ability of a system to function correctly and consistently over time without failure.

  • Related metric: Mean Time Between Failures (MTBF)

Example: A payment gateway that crashes once every 30 days is more reliable than one that crashes weekly.

Explanation: Reliability ensures user trust. It is often supported by rigorous testing and graceful error handling.

Durability

The ability of a system to retain data over time, even in the face of failures.

  • Often relevant for storage systems

Example: Once a transaction is committed in a database, it will not be lost, even if the server crashes. That’s durability.

Explanation: Durability is critical in databases and file systems (e.g., AWS S3 promises 99.999999999% durability).

Performance Metrics Summary

MetricUnitKey FocusExample Value
LatencymsSpeed150 ms
Throughputreq/secCapacity10,000 RPS
Availability%Uptime99.99%
ReliabilityMTBF, errorsStability1 crash/month
ScalabilityN/ALoad handlingHandles 1M users
Durability%Data persistence99.999999999%
ConsistencyStrong/EventualData freshnessReal-time profile updates
Error Rate%Quality0.5%
Load% CPU, etcUtilization80% CPU
Tail LatencymsOutlier performance1s at 99th percentile
  • Concurrency: Number of simultaneous users the system supports
  • Resource Utilization: Efficiency of CPU, memory, disk, and network usage
  • Load Time: Time required to serve and render content to users
  1. Scalability

    • The ability to maintain or improve performance as load increases.
    • Horizontal (adding servers) or vertical (more powerful servers) scaling.
  2. Caching

    • Storing frequently accessed data in fast-access memory (e.g., Redis, CDN).
    • Reduces load on databases and improves latency.
  3. Load Balancing

    • Distributes incoming requests across multiple servers to prevent overload.
  4. Asynchronous Processing

    • Defers non-critical work (like sending emails) to background jobs.
  5. Database Optimization

    • Using indexing, denormalization, and query tuning for faster data access.
  6. Content Delivery Network (CDN)

    • Distributes content closer to users globally to reduce latency.
  7. Compression & Minification

    • Reducing the size of payloads (e.g., images, scripts) to speed up responses.

Example of Performance

Scenario: Designing a system like Netflix that needs to serve thousands of videos to millions of users without lag.

LayerPerformance TechniquesEffect
FrontendLazy loading, image compression, minified JS/CSSFaster page load
CDNDistribute video content via CloudFront or AkamaiReduce latency, global access
BackendCaching frequently accessed metadata (Redis)Reduce DB hits, faster APIs
DatabaseIndexing, read replicas, query optimizationFast data access
Load BalancerRound-robin or IP-hash distributionPrevent overload
Asynchronous JobsTranscode video in backgroundImprove responsiveness

Example in Action

User Action: A user clicks "Play" on a video.

  1. Metadata Request:
    • Handled by backend API.
    • Cache hit returns info instantly (e.g., video title, thumbnail).
  2. Video Streaming:
    • Served from a CDN node closest to the user.
    • Reduces buffering (low latency).
  3. Recommendation Engine:
    • Runs asynchronously in the background (no delay in playback).
  4. Logs & Analytics:
    • Collected via message queues (e.g., Kafka), not blocking the main app.

Result: The video starts quickly, system handles thousands of similar requests per second, and users don’t experience noticeable delays.

Performance Optimization Summary

StrategyBenefit
CachingSpeeds up repeated reads
CDNsReduces latency for global users
Load BalancingPrevents bottlenecks
Asynchronous DesignKeeps UIs responsive
Query OptimizationImproves database response time
Resource MonitoringDetects and fixes slow components