Skip to main content

Memory Access Latency

Memory access latency is the time delay between issuing a memory request (read/write) and the moment the data is available to the processor or system component. It is measured in nanoseconds (ns) or CPU cycles, and it plays a critical role in performance-sensitive applications such as databases, in-memory systems, or low-latency services.

Memory Hierarchy & Latency

Modern systems use a hierarchical memory model to balance speed, capacity, and cost.

Memory TypeLatency (approx)Size (typical)Location
CPU Register0.25 nsBytesOn CPU
L1 Cache0.5 – 1 ns~32 KBOn CPU core
L2 Cache3 – 10 ns~256 KBOn CPU chip
L3 Cache10 – 30 ns~8 MBShared on chip
RAM (DRAM)50 – 100 ns~GBsOn motherboard
SSD Storage50 – 150 μs~TBsPCIe/SATA device
HDD5 – 10 ms~TBsExternal disk

As we move down the hierarchy, latency increases and cost per byte decreases.

Why Memory Latency Matters

  1. CPU is faster than memory → Even small delays stall execution.
  2. I/O-bound vs Memory-bound → High latency increases wait time.
  3. Performance bottlenecks → Especially in high-throughput systems.
  4. Cache misses lead to expensive memory fetches → Cache-efficient code matters.

Optimization Techniques for Memory Latency

StrategyDescription
CachingStore frequently accessed data in faster memory
PrefetchingPredict and load future memory needs ahead of time
Memory localityImprove access patterns (e.g., access arrays sequentially)
Data alignmentStructure data to fit cache lines better
Avoiding cache thrashingReduce conflicts in cache sets by designing access-friendly structures
NUMA-awarenessPlace data close to the CPU core using it in NUMA systems

Example with In-Memory Database

Scenario: A real-time analytics service stores data in memory (Redis, Memcached).

  • Accessing hot data in CPU cache: ~1–5 ns (very fast)
  • Accessing cold data in RAM: ~100 ns (20× slower)
  • Accessing persisted data in SSD (fallback): ~100,000 ns = 100 μs