Skip to main content

Partitioning

When working with very large datasets, queries can become slow and storage can be inefficient.

Partitioning is a MySQL technique that splits a large table into smaller, more manageable pieces (called partitions) but still treats them as a single logical table.

  • Partitioning = dividing a table’s rows into separate physical storage chunks.
  • Queries can target only the relevant partition instead of scanning the entire table.
  • MySQL decides automatically which partition to search based on the query condition.

This improves query performance, manageability, and maintenance.

Benefits of Partitioning

  • Faster queries (partition pruning → scans only relevant partitions).
  • Better performance with huge tables (billions of rows).
  • Easier maintenance (you can archive or drop a partition without touching the rest).
  • Efficient storage (old data can be separated).

Limitations of Partitioning

  • Partition key must be part of PRIMARY KEY or UNIQUE KEY.
  • No foreign keys allowed in partitioned tables.
  • Too many partitions can slow down queries.
  • Not all storage engines support it (must be InnoDB in modern MySQL).

Types of Partitioning

  • RANGE → ranges of values (time-based data).
  • LIST → specific values (categories).
  • HASH → evenly distribute with a hash function.
  • KEY → system-managed hash (usually on primary keys).

RANGE Partitioning

Rows are placed into partitions based on ranges of values.

  • Good for time-based data (e.g., monthly logs).
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2019 VALUES LESS THAN (2020),
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION pmax VALUES LESS THAN MAXVALUE
);

Data from 2019 goes into p2019, 2020 into p2020, and so on.

Query:

SELECT * FROM sales WHERE sale_date BETWEEN '2020-01-01' AND '2020-12-31';

MySQL only scans p2020, not the whole table.

LIST Partitioning

Rows are partitioned based on a list of specific values.

  • Good for categorical data (e.g., regions, departments).
CREATE TABLE employees (
id INT,
name VARCHAR(50),
region VARCHAR(10)
)
PARTITION BY LIST COLUMNS (region) (
PARTITION p_north VALUES IN ('North'),
PARTITION p_south VALUES IN ('South'),
PARTITION p_east VALUES IN ('East'),
PARTITION p_west VALUES IN ('West')
);

HASH Partitioning

Rows are distributed into partitions based on a hash function.

  • Good for evenly distributing data when values are unpredictable.
CREATE TABLE orders (
order_id INT,
customer_id INT,
amount DECIMAL(10,2)
)
PARTITION BY HASH(customer_id) PARTITIONS 4;

Rows are distributed across 4 partitions using a hash of customer_id. Useful for load balancing queries across partitions.

KEY Partitioning

Similar to HASH, but MySQL chooses the hash function internally.

  • Typically used with primary keys.
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
)
PARTITION BY KEY(product_id) PARTITIONS 3;

MySQL automatically hashes product_id into 3 partitions.

Practical Example of Partitioning

Imagine a website access log table with millions of rows per year:

CREATE TABLE access_logs (
id INT AUTO_INCREMENT PRIMARY KEY,
log_date DATE NOT NULL,
user_id INT,
action VARCHAR(100)
)
PARTITION BY RANGE (YEAR(log_date)*100 + MONTH(log_date)) (
PARTITION p202201 VALUES LESS THAN (202202),
PARTITION p202202 VALUES LESS THAN (202203),
PARTITION p202203 VALUES LESS THAN (202204),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
  • January 2022 logs → p202201
  • February 2022 logs → p202202

Query example:

SELECT * FROM access_logs WHERE log_date BETWEEN '2022-02-01' AND '2022-02-28';

Only scans p202202, making it much faster.