1 — Data models & schema evolution (Ch 2 + 4)
Quick cue |
10-second answer |
Typical follow-up |
“Relational vs document?” |
Relational wins when many-to-many joins, ad-hoc queries and multi-row ACID matter; document wins for aggregate-by-key patterns and when you need schema-on-read flexibility. |
“How do you avoid joins on doc DB?” → Denormalize or pre-compute views. |
“How do you change schemas safely?” |
Version the payload (ʺV1 vs V2ʺ fields) and migrate in back-fills, not big-bangs; use Avro/Protobuf with schema-registry for forward+backward compatibility. |
“How to clean up old code paths?” → Add a TTL to versions and track usage metrics. |
2 — Storage engines & indexing (Ch 3)
Concept |
Interview gold-nugget |
B-Tree vs LSM-Tree |
B-Tree = read-heavy, point-look-ups (MySQL, Postgres). LSM-Tree = write-heavy, sequential-append (Cassandra, RocksDB). Quote: “SSTables + memtable give you write amplification; B-Trees give you space amplification.” |
Secondary index on huge table |
Explain that every secondary index is itself a key-value store that must be partitioned and replicated just like the primary. Tie to Chapter 6 hot-spot problem. |
Mini-quiz (ask yourself): Why do LSMs bloom-filter every SSTable? (To dodge disk seeks.)
3 — Replication patterns (Ch 5)
Model |
Pitch line interviewer loves |
Leader–Follower |
“Predictable writes, simple reads; but followers lag ⇒ need read-your-own-writes fix.” |
Multi-Leader |
“Great for geo-writes (mobile), but conflict-resolution moves to app layer.” |
Leaderless / Quorum |
“High availability, tunable R/W, but beware write-skew under network partitions.” |
Sample white-board twist: “Your leader dies just after ack-ing a client write—what happens?” → Walk through log-based replication, fencing tokens, and failover election with a higher term.
4 — Partitioning/Sharding (Ch 6)
- Key-range vs hash-mod-N. Hash evens load but kills range scans; range eases scans but can hot-spot.
- Consistent hashing beats hash % N because it avoids “reshuffle 100 % of keys on cluster resize."
- Rebalancing: 3 classic strategies → fixed-size virtual nodes, manual split/merge, dynamic auto-split. Have an anecdote ready about a “hot-user” taking down a shard.
5 — Transactions & isolation (Ch 7)
Isolation level |
One-liner |
Read-Committed |
“No dirty reads.” |
Snapshot (Repeatable) |
“Reads from a consistent snapshot; still allows write-skew.” |
Serializable |
“Equivalent to single-threaded order; can do OCC or two-phase-lock.” |
Remember the killer diagram from the book showing double-booking a doctor’s appointment under snapshot isolation—use it to explain write-skew.