(Enough detail for real work, but still short enough to skim before an interview)


1 Why we replicate at all

  1. High availability – keep serving data while a machine or an entire datacenter is down.
  2. Read scaling – let many machines answer read-only queries so one node is not a bottleneck.
  3. Geo-latency – put a copy close to every user so round-trip time is low.
  4. Disconnected work – keep accepting writes even when the WAN or a rack switch breaks.

Every replication scheme is a compromise between those goals, data-loss risk, and operational complexity.


2 Single-leader replication (the classic)

Concept What it really means Design consequences
Leader / follower roles Exactly one node accepts all writes. Followers pull the leader’s log and replay it. Simple mental model; no conflicts to resolve. All write throughput is limited by that one leader.
Synchronous replication Leader waits for an ACK from at least one follower before confirming the write to the client. No data loss if the leader dies, but a slow follower stalls every write.
Asynchronous replication Leader returns as soon as it has written locally. Fast and tolerant of slow followers, but the last few acknowledged writes disappear if the leader crashes.
Semi-sync (“2-safe”) Leader waits until one follower has the log in RAM (not necessarily on disk). <-50 ms extra latency, yet RPO = 0 (bytes aren’t lost). Good middle ground.
Catch-up recovery A lagging follower asks the leader for log entries starting at its last LSN/binlog position. Adding or rebooting a follower requires no cluster downtime.
Fail-over 1️⃣ Detect leader timeout 2️⃣ Elect best follower 3️⃣ Fence the old leader (STONITH) and redirect clients. Wrong timeout ⇒ either long outage (too high) or split-brain (too low).

Log formats you should be able to name

Replication lag introduces three user-visible anomalies

  1. Read-after-write gap – user cannot see what they just posted.

  2. Monotonic-read violation – page refresh shows older data.

  3. Causal flip – answer appears before the question (cross-partition).

    Mitigations: route recent readers to the leader, sticky sessions, attach “last-seen LSN” token with every request.