Chapter 1: Functional / Non-Functional Requirements

1 — Data models & schema evolution (Ch 2 + 4)

Quick cue	10-second answer	Typical follow-up
“Relational vs document?”	Relational wins when many-to-many joins, ad-hoc queries and multi-row ACID matter; document wins for aggregate-by-key patterns and when you need schema-on-read flexibility.	“How do you avoid joins on doc DB?” → Denormalize or pre-compute views.
“How do you change schemas safely?”	Version the payload (ʺV1 vs V2ʺ fields) and migrate in back-fills, not big-bangs; use Avro/Protobuf with schema-registry for forward+backward compatibility.	“How to clean up old code paths?” → Add a TTL to versions and track usage metrics.

2 — Storage engines & indexing (Ch 3)

Concept	Interview gold-nugget
B-Tree vs LSM-Tree	B-Tree = read-heavy, point-look-ups (MySQL, Postgres). LSM-Tree = write-heavy, sequential-append (Cassandra, RocksDB). Quote: “SSTables + memtable give you write amplification; B-Trees give you space amplification.”
Secondary index on huge table	Explain that every secondary index is itself a key-value store that must be partitioned and replicated just like the primary. Tie to Chapter 6 hot-spot problem.

Mini-quiz (ask yourself): Why do LSMs bloom-filter every SSTable? (To dodge disk seeks.)

3 — Replication patterns (Ch 5)

Model	Pitch line interviewer loves
Leader–Follower	“Predictable writes, simple reads; but followers lag ⇒ need read-your-own-writes fix.”
Multi-Leader	“Great for geo-writes (mobile), but conflict-resolution moves to app layer.”
Leaderless / Quorum	“High availability, tunable R/W, but beware write-skew under network partitions.”

Sample white-board twist: “Your leader dies just after ack-ing a client write—what happens?” → Walk through log-based replication, fencing tokens, and failover election with a higher term.

4 — Partitioning/Sharding (Ch 6)

Key-range vs hash-mod-N. Hash evens load but kills range scans; range eases scans but can hot-spot.
Consistent hashing beats hash % N because it avoids “reshuffle 100 % of keys on cluster resize."
Rebalancing: 3 classic strategies → fixed-size virtual nodes, manual split/merge, dynamic auto-split. Have an anecdote ready about a “hot-user” taking down a shard.

5 — Transactions & isolation (Ch 7)

Isolation level	One-liner
Read-Committed	“No dirty reads.”
Snapshot (Repeatable)	“Reads from a consistent snapshot; still allows write-skew.”
Serializable	“Equivalent to single-threaded order; can do OCC or two-phase-lock.”

Remember the killer diagram from the book showing double-booking a doctor’s appointment under snapshot isolation—use it to explain write-skew.