Glossary to help with terminology:

Word Means…
Partition / shard / tablet / vnode Smallest chunk of data the system moves around as a unit.
Hot spot One partition receives disproportionate traffic.
Scatter / gather Send same query to all shards, merge results.
Pre-split Manually create initial empty partitions to avoid first-chunk hot-spot.
Routing tier Stateless processes that map (key → shard) so clients don’t need cluster map.

Screenshot 2025-05-24 at 13.57.50.png


Why bother to partition?

Goal Partitioning delivers… Without it you’d hit…
Horizontal scale (size) Spread TB-PB of data across many disks. Single server fills its drive or RAM.
Horizontal scale (throughput) Let many CPUs run queries in parallel. One box saturates CPU/IO long before data is “big”.
Failure isolation A bad node only takes its partitions down. Whole dataset unavailable if that node were monolithic.

Two primary ways to slice a key-value store

Strategy How it chooses a partition When it shines Hidden pain point
Key-range (a.k.a. ordered or “tablet”) Each partition owns min_key ≤ k < max_key. Range scans, time-series, “get all posts by user”. Hot-spot if newest keys bunch together (e.g., timestamp keys).
Hash of key Take hash(k) → map into buckets. Uniform write load, avoids per-key hot-spot. Range queries destroyed; must scatter/gather.

Screenshot 2025-05-24 at 13.58.38.png

Screenshot 2025-05-24 at 13.58.24.png

Hybrid (compound key) – Hash first component, keep second component ordered. Cassandra does this so you can grab one user’s posts by time but still spread users across cluster.


Secondary indexes: two very different layouts

Layout Read path Write path Trade-off mantra
Document-partitioned (local index) Client must query all partitions → scatter/gather + merge. Only touches the partition that stores the document ⇒ fast writes. Cheap writes, wide reads.
Term-partitioned (global index) Term tells you exactly which index partition to hit ⇒ one RPC. Every term in the doc may hash to different index nodes ⇒ multi-partition distributed write. Cheap reads, wide writes.

Screenshot 2025-05-24 at 13.58.56.png

Screenshot 2025-05-24 at 13.59.16.png