Skip to main content

Storage Consideration

When we talk about storage, we mean how data is physically stored on disk (or SSD, memory). This includes:

  • How tables and indexes are organized on disk.
  • How much space they use.
  • How efficiently data is read/written during queries.

Key Storage Considerations

Data Types and Storage Size

  • Choosing the right data type affects storage.
  • Example:
    • INT (4 bytes) vs BIGINT (8 bytes).
    • If you only need values up to 1M, INT is enough.
  • Storing unnecessarily large types wastes space and slows I/O.

Row Storage Format

  • Fixed-length storage:
    • Faster to access (rows are predictable size).
    • But wastes space if values are small.
    • Example: CHAR(50) always takes 50 bytes.
  • Variable-length storage:
    • Saves space (only stores needed bytes).
    • Slightly slower (extra lookup).
    • Example: VARCHAR(50) only stores actual characters + length info.

Normalization vs Denormalization

  • Normalized tables: reduce redundancy, save storage space, but may require joins (extra reads).
  • Denormalized tables: may increase storage (duplication), but speed up queries.
  • Trade-off depends on workload (OLTP vs OLAP).

Index Storage

  • Every index takes extra disk space.
  • Example:
    • A table of 10M rows, indexed on 3 columns, might double storage needs.
  • Indexes speed up queries but slow down inserts/updates.

Null and Sparse Data

  • Columns with many NULL values should be carefully stored.
  • Some DBMS support sparse columns (store only non-null values).
  • Example: Employee table with 100 optional attributes → better to split into separate tables.

LOB (Large Objects) Storage

  • Images, PDFs, audio, video = BLOBs (Binary Large Objects).
  • Text documents = CLOBs (Character Large Objects).
  • Options for storage:
    1. Inside the database (slower queries, but consistent backups).
    2. Outside the database (filesystem or cloud), with just a reference in DB (faster, smaller DB).

Partitioning & Tablespaces

  • Partitioning (split data into smaller parts) improves manageability + query speed.
  • Tablespaces allow storing different tables/indexes on different physical disks → balance I/O load.

Compression

  • Many DBMS support table and index compression.
  • Reduces disk usage and I/O, but may increase CPU load.
  • Example:
    • Compressing a log table with millions of rows that are rarely updated.

Caching & Buffer Pools

  • Databases use memory caches to reduce disk reads.
  • Storage design should optimize which data fits in cache (e.g., hot vs cold data).