Fast-lane overview
Relational tables and document-style JSON live at opposite ends of a spectrum: one is built for many-to-many joins and strong schemas; the other is built for nested trees, locality and flexible schemas. The hard part is knowing when each wins, because the wrong pick explodes complexity (or latency) in production systems — and in interviews.
Big concept | Newbie-friendly mental model |
---|---|
Data-model layering | Reality → objects in code → general-purpose model (tables, docs) → bytes on disk. Each layer hides the ugliness below it. |
Relational model | Think Excel sheets with constraints: every row is a tuple, columns are typed, and joins are the super-power that knit sheets together. Born for business transactions. |
Document model | A Russian-doll JSON blob: store the whole tree together for one-to-many reads, skip the join. Great locality, optional schema. |
Schema-on-write vs schema-on-read | RDBMS enforces structure before insert; docs let anything through and leave validation to readers. Flexibility today, surprise bugs tomorrow. |
Normalization | “Don’t repeat yourself” in data: use IDs to avoid duplicated strings, cut write cost, and keep updates atomic. |
History lesson | 1960-70s IMS (hierarchical) and CODASYL (network) fought the same “tree vs graph” battle; SQL won by hiding access paths with a query optimizer. We’re replaying that debate with NoSQL. |
Query languages | MapReduce & JS pipes are cool but verbose; NoSQL vendors inevitably reinvent declarative SQL-ish dialects (e.g., MongoDB aggregation pipeline). |
Interview hook | How to flex Chapter 2 knowledge |
---|---|
“Pick a DB for user profiles.” | If reads fetch the whole profile + nested lists → document beats shredding tables; warn that future features (recommendations, cross-links) may require joins, so add versioned IDs now. |
“Our JSON store now needs friend-of-friend queries.” | Explain that docs hate many-to-many; options: denormalize (duplication risk), app-level joins (latency), or migrate hot paths to relational/graph. |
“Why is schema-on-read dangerous?” | Because every micro-service may interpret the same field differently; catching mistakes late shifts failure from write-time to prod read-time outages. |
“MapReduce or SQL for analytics?” | Say: MR good for one-off ETL, but declarative SQL-on-Hadoop/Spark lets the optimizer reorder stages, so it’s faster and shorter to write. |
Lightning definition pair | Foreign key = relational pointer enabling join; document reference = same idea in doc DB, but follow-ups are manual and non-atomic. |
v2_email
rather than mutating in place.