Sharding
Sharding = a form of horizontal partitioning across multiple servers (databases).
- While partitioning usually happens inside a single database server,
- Sharding distributes the data across multiple servers (shards).
Key Difference:
- Partitioning = within one database.
- Sharding = across multiple databases/servers.
Sharding Strategies
Range-based Sharding
- Each shard stores a specific range of values.
- Example:
- Shard 1 →
CustomerID 1–10000 - Shard 2 →
CustomerID 10001–20000.
- Shard 1 →
Hash-based Sharding
- Use a hash function to decide which shard stores the row.
- Example:
MOD(CustomerID, 4)→ distribute across 4 shards.
Geographic Sharding
- Shard data based on geography.
- Example: EU users stored in European servers, US users in US servers.
Example of Sharding
Imagine an E-commerce platform with millions of users:
- User Table (UserID, Name, Email, Region)
Sharding by region:
- Shard 1 → Users in North America.
- Shard 2 → Users in Europe.
- Shard 3 → Users in Asia.
Benefits:
- Load distributed across servers.
- Queries run only on the relevant shard.
- Better scalability and fault isolation.
Partitioning vs Sharding
| Feature | Partitioning | Sharding |
|---|---|---|
| Scope | Within one database server | Across multiple servers/databases |
| Transparency | Database engine handles it | Application or middleware handles it |
| Purpose | Query optimization, maintenance | Scalability, distributing workload |
| Example | Splitting Orders by year | Splitting Users across regions |
Performance Considerations
Advantages
Partitioning:
- Faster queries (only relevant partition scanned).
- Easier archiving & maintenance.
- Enables parallel query execution. Sharding:
- Massive scalability (data spread across servers).
- High availability (failure in one shard doesn’t kill the whole system).
- Distributes write-heavy workloads.
Disadvantages
Partitioning:
- Still limited by a single server’s hardware.
- Complex queries spanning many partitions may be slower. Sharding:
- More complex application logic (need to know which shard to query).
- Rebalancing shards is difficult.
- Joins across shards are hard.