Chapter 6: Partitioning

Glossary to help with terminology:

Word	Means…
Partition / shard / tablet / vnode	Smallest chunk of data the system moves around as a unit.
Hot spot	One partition receives disproportionate traffic.
Scatter / gather	Send same query to all shards, merge results.
Pre-split	Manually create initial empty partitions to avoid first-chunk hot-spot.
Routing tier	Stateless processes that map (key → shard) so clients don’t need cluster map.

Screenshot 2025-05-24 at 13.57.50.png

Why bother to partition?

Goal	Partitioning delivers…	Without it you’d hit…
Horizontal scale (size)	Spread TB-PB of data across many disks.	Single server fills its drive or RAM.
Horizontal scale (throughput)	Let many CPUs run queries in parallel.	One box saturates CPU/IO long before data is “big”.
Failure isolation	A bad node only takes its partitions down.	Whole dataset unavailable if that node were monolithic.

Two primary ways to slice a key-value store

Strategy	How it chooses a partition	When it shines	Hidden pain point
Key-range (a.k.a. ordered or “tablet”)	Each partition owns `min_key ≤ k < max_key`.	Range scans, time-series, “get all posts by user”.	Hot-spot if newest keys bunch together (e.g., timestamp keys).
Hash of key	Take `hash(k)` → map into buckets.	Uniform write load, avoids per-key hot-spot.	Range queries destroyed; must scatter/gather.

Screenshot 2025-05-24 at 13.58.38.png

Screenshot 2025-05-24 at 13.58.24.png

Hybrid (compound key) – Hash first component, keep second component ordered. Cassandra does this so you can grab one user’s posts by time but still spread users across cluster.

Secondary indexes: two very different layouts

Layout	Read path	Write path	Trade-off mantra
Document-partitioned (local index)	Client must query all partitions → scatter/gather + merge.	Only touches the partition that stores the document ⇒ fast writes.	Cheap writes, wide reads.
Term-partitioned (global index)	Term tells you exactly which index partition to hit ⇒ one RPC.	Every term in the doc may hash to different index nodes ⇒ multi-partition distributed write.	Cheap reads, wide writes.

Screenshot 2025-05-24 at 13.58.56.png

Screenshot 2025-05-24 at 13.59.16.png