Skip to main content

Sharding

Sharding = a form of horizontal partitioning across multiple servers (databases).

  • While partitioning usually happens inside a single database server,
  • Sharding distributes the data across multiple servers (shards).

Key Difference:

  • Partitioning = within one database.
  • Sharding = across multiple databases/servers.

Sharding Strategies

Range-based Sharding

  • Each shard stores a specific range of values.
  • Example:
    • Shard 1 → CustomerID 1–10000
    • Shard 2 → CustomerID 10001–20000.

Hash-based Sharding

  • Use a hash function to decide which shard stores the row.
  • Example: MOD(CustomerID, 4) → distribute across 4 shards.

Geographic Sharding

  • Shard data based on geography.
  • Example: EU users stored in European servers, US users in US servers.

Example of Sharding

Imagine an E-commerce platform with millions of users:

  • User Table (UserID, Name, Email, Region)

Sharding by region:

  • Shard 1 → Users in North America.
  • Shard 2 → Users in Europe.
  • Shard 3 → Users in Asia.

Benefits:

  • Load distributed across servers.
  • Queries run only on the relevant shard.
  • Better scalability and fault isolation.

Partitioning vs Sharding

FeaturePartitioningSharding
ScopeWithin one database serverAcross multiple servers/databases
TransparencyDatabase engine handles itApplication or middleware handles it
PurposeQuery optimization, maintenanceScalability, distributing workload
ExampleSplitting Orders by yearSplitting Users across regions

Performance Considerations

Advantages

Partitioning:

  • Faster queries (only relevant partition scanned).
  • Easier archiving & maintenance.
  • Enables parallel query execution. Sharding:
  • Massive scalability (data spread across servers).
  • High availability (failure in one shard doesn’t kill the whole system).
  • Distributes write-heavy workloads.

Disadvantages

Partitioning:

  • Still limited by a single server’s hardware.
  • Complex queries spanning many partitions may be slower. Sharding:
  • More complex application logic (need to know which shard to query).
  • Rebalancing shards is difficult.
  • Joins across shards are hard.