Distributed Systems & Scaling · Topic 154

Managing “Hot Shards”

A hot shard is a partition that receives a disproportionate share of query or write traffic, causing high CPU, memory, or I/O on that node and potentially degrading latency and throughput for the whole cluster.

Summary

A hot shard receives disproportionate query or write traffic, causing high CPU, memory, or I/O and degrading latency and throughput for the cluster.
Causes: sharding strategy (popular keys or key ranges, or a hot region in vector space); skewed data or workload (one tenant or index dominating).
Mitigations: resharding (split and redistribute); replication + load balance; better sharding key; caching. Monitor per-shard QPS and latency. Pipeline: detect hot shard (metrics), apply mitigation (add replicas, reshard, or cache). Practical tip: monitor per-shard QPS and latency; add replicas for read-hot shards before resharding.

Why hot shards occur

Hot shards can arise from sharding strategy: if you shard by ID range or hash, popular keys or key ranges land on the same shard. With vector (spatial) sharding, a region of latent space that is queried often (e.g. “trending” or recent content) can make one shard hot. Skewed data or workload (e.g. one tenant or one index dominating) also creates hot shards.

Mitigations

Mitigations: (1) Resharding—split the hot shard into more partitions and redistribute. (2) Replication—add read replicas for the hot shard and load balance across them. (3) Better sharding key—if possible, spread load by a key that correlates with query distribution (e.g. tenant ID, time bucket). (4) Caching—cache frequent query results or popular vectors at the coordinator or in a separate cache layer to reduce hits on the hot shard. Monitoring per-shard QPS and latency helps detect hot shards early.

Pipeline: detect hot shard via per-shard QPS and latency metrics, apply mitigation (add replicas and load balance, reshard, or add caching). Trade-off: replication is faster to apply; resharding is more invasive but can fix skew. Practical tip: add replicas for read-hot shards before resharding; use a sharding key that spreads load (e.g. tenant ID) when possible.

Frequently Asked Questions

What is a hot shard?

A partition that receives a disproportionate share of query or write traffic, causing high CPU, memory, or I/O on that node and degrading latency and throughput for the whole cluster. See sharding.

What causes hot shards?

Sharding strategy: popular keys or key ranges (ID/hash sharding), or a hot region in vector space (spatial sharding). Skewed data or workload (e.g. one tenant or index dominating) also creates hot shards.

How do I mitigate a hot shard?

Resharding (split the hot shard and redistribute); add replicas and load balance across them; choose a better sharding key that spreads load; cache frequent queries or popular vectors. Monitor per-shard QPS and latency.

Does spatial sharding avoid hot shards?

Not always. A region of latent space that is queried often (e.g. trending or recent content) can make one shard hot. Mitigations (replication, caching, resharding) still apply. See spatial sharding.