Word | Means… |
---|---|
Partition / shard / tablet / vnode | Smallest chunk of data the system moves around as a unit. |
Hot spot | One partition receives disproportionate traffic. |
Scatter / gather | Send same query to all shards, merge results. |
Pre-split | Manually create initial empty partitions to avoid first-chunk hot-spot. |
Routing tier | Stateless processes that map (key → shard) so clients don’t need cluster map. |
Goal | Partitioning delivers… | Without it you’d hit… |
---|---|---|
Horizontal scale (size) | Spread TB-PB of data across many disks. | Single server fills its drive or RAM. |
Horizontal scale (throughput) | Let many CPUs run queries in parallel. | One box saturates CPU/IO long before data is “big”. |
Failure isolation | A bad node only takes its partitions down. | Whole dataset unavailable if that node were monolithic. |
Strategy | How it chooses a partition | When it shines | Hidden pain point |
---|---|---|---|
Key-range (a.k.a. ordered or “tablet”) | Each partition owns min_key ≤ k < max_key . |
Range scans, time-series, “get all posts by user”. | Hot-spot if newest keys bunch together (e.g., timestamp keys). |
Hash of key | Take hash(k) → map into buckets. |
Uniform write load, avoids per-key hot-spot. | Range queries destroyed; must scatter/gather. |
Hybrid (compound key) – Hash first component, keep second component ordered. Cassandra does this so you can grab one user’s posts by time but still spread users across cluster.
Layout | Read path | Write path | Trade-off mantra |
---|---|---|---|
Document-partitioned (local index) | Client must query all partitions → scatter/gather + merge. | Only touches the partition that stores the document ⇒ fast writes. | Cheap writes, wide reads. |
Term-partitioned (global index) | Term tells you exactly which index partition to hit ⇒ one RPC. | Every term in the doc may hash to different index nodes ⇒ multi-partition distributed write. | Cheap reads, wide writes. |