Style | Trigger | Response-time goal | Example |
---|---|---|---|
Online service | User click / API call | Milliseconds | “Show me my Instagram feed” |
Batch job (offline) | Scheduler or manual start | Minutes – days; focus on throughput | “Re-calculate all product recommendations every night” |
Stream job (near-real-time) | Event arrival | Seconds | “Detect fraud within a minute of a card swipe” |
Do one thing well, chain the tools.
In a terminal you might write:
cat access.log \\
| awk '{print $7}' \\
| sort | uniq -c | sort -nr | head -5
Meaning: read the web-server log, pick the URL field, sort, count duplicates, rank, show top 5.
MapReduce is the same idea stretched across thousands of machines.
Why games like this matter in interviews
Shows you grasp data-parallel thinking: “Bring related records to the same machine, then process them locally.”
Rule of thumb: Mappers prepare data, shuffle moves data, reducers finish data.
Pattern | When to use | Analogy |
---|---|---|
Reduce-side sort-merge join | Both inputs are big. Framework sorts by key, sends same keys to one reducer. | Classic relational join pushed into reduce phase. |
Broadcast (replicated) hash join | One input fits in RAM. Copy it to every mapper; hash-lookup as you stream the big side. | “Little black book next to you while scanning a huge ledger.” |
Partitioned hash join | Both inputs big but pre-partitioned the same way. Build hash table per partition. | “Each worker owns a shard of both tables.” |
Interview hint: articulate the trade-off—network vs. memory vs. duplicate work.