Canonical Systems Teardown
Kafka · Spanner · DynamoDB · Cassandra through one repeatable lens
The point of studying canonical systems is not to memorize them — it’s to internalize a repeatable teardown you can apply to anything. Six lenses, applied in the same order, plus one question that finds the soul of the design: what did this system bet on?
The 6-Lens Teardown Template
Point these six lenses at any storage or messaging system, in order, and you reconstruct its entire design — and expose where it will hurt. The first five are the decisions from this course; the sixth is the synthesis that separates a description from an understanding.
The 6-Lens Teardown Template
Six questions that reconstruct any system's design — and reveal its failure modes. The last lens is the one that matters most: what did it bet on?
| Lens | The question | Where it was covered |
|---|---|---|
| 1 · Data model | What is the fundamental unit, and what access patterns does it privilege? | Module 4 |
| 2 · Partitioning | How is data split across machines, and what happens to a hot key? | Module 4 |
| 3 · Replication | How are copies kept, and what's the durability-vs-latency trade? | Module 3 |
| 4 · Consistency | What does a read guarantee, and is it chosen per operation? | Module 2 |
| 5 · Failure handling | What happens when a node dies, pauses, or partitions? | Modules 1, 7, 8 |
| 6 · The one genius idea | What single bet defines this system — and what did it cost? | All of it |
Run them in order and a system that looked like a black box becomes a sequence of legible trade-offs. The sixth lens is the test of real understanding: anyone can list features; naming the one bet and its price is what a Staff engineer does.
Four teardowns
Each of these systems made a different bet, and each bet is a concept from earlier in this course taken to its logical extreme. Read them side by side and the modules connect into a single map.
Apache Kafka
The log is the universal abstraction. Make it append-only, partitioned, and let consumers track their own position — and fan-out becomes nearly free.
- Data model
- An append-only, ordered log per partition. Not a queue (messages aren't deleted on read) — a durable, replayable sequence. Consumers track an offset into it.
- Partitioning
- Topics split into partitions; a record's key hashes to a partition. Ordering is guaranteed within a partition, not across — so the partition key IS your ordering and parallelism boundary. A hot key is a hot partition (Module 4).
- Replication
- Per-partition leader-follower with an in-sync replica set (ISR). A write is acked once the ISR has it (configurable: acks=all is quorum-ish durability; acks=1 is the async data-loss trade from Module 3).
- Consistency
- Strong order within a partition; at-least-once delivery by default, exactly-once effect via idempotent producers + transactions (the Effectively-Once Triangle, Module 6, productized).
- Failure handling
- A failed leader triggers election of a new one from the ISR (via the KRaft/ZooKeeper controller — consensus, Module 8). Unclean leader election trades consistency for availability and can lose data.
Google Spanner
Bound clock uncertainty with hardware (GPS + atomic clocks) and simply WAIT it out — and you can have externally-consistent transactions across the planet.
- Data model
- Relational rows with SQL and schematized semi-relational tables — a real database, not a key-value store. Rows are organized into hierarchies for locality.
- Partitioning
- Data split into ranges (tablets) that split and move based on load and size — range partitioning (Module 4) with automatic rebalancing.
- Replication
- Each tablet is a Paxos group replicated synchronously across zones/regions. Writes go through Paxos — a majority must agree (Module 8). Durable and consistent, at the cost of cross-region write latency.
- Consistency
- External consistency (linearizable, Module 2's strongest rung) globally. Achieved with TrueTime: an interval clock with a bounded error, where commits WAIT OUT the uncertainty before becoming visible (Module 1, done right).
- Failure handling
- Paxos majorities survive minority failures and partitions on the majority side. Leader leases (Module 8) ensure a single writer per group.
Amazon DynamoDB
Make every dimension a dial you can turn, so performance is PREDICTABLE at any scale. No surprises is the feature.
- Data model
- Key-value / document, addressed by a partition key (+ optional sort key). The data model deliberately privileges point lookups and bounded queries — the access patterns that stay predictable at scale.
- Partitioning
- Hash on the partition key, spread across many storage nodes; partitions split automatically as they grow or get hot. Hot keys are the known enemy (Module 4) — adaptive capacity and split-for-heat exist precisely to fight them.
- Replication
- Synchronous replication across multiple availability zones; a write is durable once a quorum of replicas in the partition's replication group acknowledges (Module 3).
- Consistency
- Per-operation, by design (Module 2): eventually-consistent reads by default (cheap, fast), strongly-consistent reads on request (a flag on the read). You choose at the call site.
- Failure handling
- Transparent to the caller — AZ failures, node failures, and splits are handled below the API. The original Dynamo used vector clocks (Module 1) for conflict reconciliation; DynamoDB favors predictability and managed conflict handling.
Apache Cassandra
No leader, no single point of failure. Any node takes any write, consistency is a per-query dial, and the cluster heals itself — availability above all.
- Data model
- Wide-column: a partition key plus clustering columns that keep rows sorted within a partition. The data model is designed around your queries — you denormalize per access pattern.
- Partitioning
- Consistent hashing on a token ring with virtual nodes (Module 4, exactly the reference implementation), so the fleet can grow and shrink with minimal data movement.
- Replication
- Leaderless. Each key is replicated to N nodes around the ring; any replica can take a read or write. Multi-datacenter aware.
- Consistency
- Tunable per query: ONE, QUORUM, ALL (Module 2's spectrum, exposed as a parameter). W + R > N gives you read-your-writes; lower gives you speed and availability. Conflicts resolved last-write-wins (Module 1's data-loss trap — beware).
- Failure handling
- Designed to never stop taking writes: hinted handoff stores writes for a down node, read-repair and anti-entropy (Merkle trees) converge replicas later. No election, no failover — there's no leader to lose.
| Dimension | Kafka | Spanner | DynamoDB | Cassandra |
|---|---|---|---|---|
| Data model | Partitioned append-only log | Relational + SQL | Key-value / document | Wide-column |
| Consistency default | Per-partition order; EOS optional | External (linearizable), global | Eventual; strong on request | Tunable per query (ONE/QUORUM/ALL) |
| Topology | Leader-follower per partition (ISR) | Paxos groups, synchronous | Quorum across AZs, managed | Leaderless ring, no SPOF |
| The defining bet | The log as universal abstraction | TrueTime — bound clock error, wait it out | Predictability as the feature | Availability above all; tunable |
| Choose when | You need a durable, replayable, ordered event stream and cheap fan-out to many consumers — event sourcing, stream processing, the backbone of an event-driven architecture. | You need strong, global, transactional consistency (financial ledgers, inventory) and can pay cross-region write latency and the operational cost. | You need predictable single-digit-ms performance at any scale with point-lookup access patterns and don't want to operate the database. | You need always-on writes, multi-datacenter availability, and can model your data around queries and tolerate (or avoid) LWW conflicts. |
There is no “best” — each is the logical conclusion of a different bet. Kafka bet on the log, Spanner on bounded time, DynamoDB on predictability, Cassandra on availability. The Staff move in an interview (and in a vendor evaluation) is to name the bet and then ask whether your workload wants what it bought — and can afford what it cost. A system is only “wrong” relative to a workload.
Unclean leader election — when availability silently eats your data
acks=all, a producer’s write is acknowledged only once every in-sync replica has it — the durable choice. But operators control a setting, unclean.leader.election.enable, that decides what happens when every in-sync replica is unavailable.acks=all guaranteed.Every canonical system encodes the trade-offs from this course as configuration, and the defaults are someone else’s opinion about your workload. Tearing a system down with the six lenses isn’t academic — it’s how you find the flags that decide whether you lose data, and set them on purpose. Read the trade-off before the incident reads it to you.
What this sounds like in an interview
Walk me through how you'd design a system like Kafka — a durable, high-throughput event log.
The interviewer wants to see whether you reach for the defining abstraction (the log) and reason about the trade-offs, or just draw queues and brokers.
I'd have producers send messages to a broker, store them in a queue, and have consumers pull from the queue and acknowledge each message so it gets removed.
An append-only log instead of a queue, partitioned across brokers for throughput, replicated for durability. Consumers track an offset so they can replay, and messages aren't deleted on read — they age out by retention.
The core bet is the log as the abstraction: append-only, partitioned (ordering and parallelism per partition, keyed so related events stay ordered), with consumers owning their offset — that's what makes fan-out cheap, because the broker doesn't track per-consumer state. Replication is leader-follower per partition with an in-sync replica set; acks=all gives durability at the cost of write latency. For exactly-once I'd lean on idempotent producers plus transactional writes — at-least-once delivery plus dedup, since exactly-once delivery isn't real.
Same log-centric design, but I'd build it as a stack of the trade-offs and be explicit about each bet. Partitioning: the partition key is simultaneously the ordering boundary, the parallelism unit, and the hot-key risk, so I'd call out that a skewed key melts a partition (Module 4) and how I'd detect it. Replication: per-partition ISR with acks=all for durability, and I'd flag the unclean-leader-election trade explicitly — it's the Write-Path Trilemma as a config flag, and I will not enable it silently. Consistency: per-partition total order only, never global, so cross-partition ordering is the consumer's problem; exactly-once is the effectively-once triangle (idempotent producer + transactional offset commit + idempotent consumer). Coordination: leader election and membership go through a consensus layer (KRaft) — I wouldn't roll my own. And I'd close by naming the bet: the log plus consumer-owned offsets is what turns one write into N cheap reads, and everything else is in service of keeping that log durable and ordered. The trade is that I've given up global ordering and made the partition key load-bearing for three different things at once.
Reached for the defining abstraction, built it as a stack of named trade-offs (each traceable to a module), surfaced the dangerous config (unclean election) before being asked, and closed by naming the bet and its cost. That's the six-lens teardown run in reverse — design as teardown.
Don't use Kafka as a database or a task queue
Kafka is a durable, ordered log optimized for sequential throughput and replay — not random point lookups (no index by arbitrary key) and not per-message ack/retry/priority semantics. Using it as a primary datastore or a job queue fights its data model. Reach for it when you want an ordered, replayable stream with cheap fan-out.
Don't use Spanner when you don't need global strong consistency
Spanner’s external consistency is expensive — cross-region Paxos writes and the operational/financial cost of the platform. If your workload is single-region, or tolerates per-operation weaker consistency (most do), you’re paying for a guarantee you aren’t using. Match the consistency to the operation (Module 2), not to the most impressive database.
Don't pick Cassandra for strong consistency or ad-hoc queries
Cassandra bets on availability and tunable consistency, with last-write-wins conflict resolution that can silently drop concurrent writes (Module 1). It rewards data modeled tightly around known queries and punishes ad-hoc access and read-modify-write invariants. If you need strong consistency or flexible querying, it’s the wrong bet.
Don't choose any of them by reputation
“Netflix uses Cassandra” and “Google built Spanner” tell you nothing about your workload. Each system is the right answer to the bet it made and the wrong answer to the others. Run the six lenses against your access patterns, consistency needs, and scale — then choose. The teardown is the selection method.
Exercises
06-idempotency-outbox into a tiny Kafka — a partitioned, append-only log with per-consumer offsets and replay — or extend 04-consistent-hashing with quorum reads/writes and read-repair (from 03-log-replication) into a tiny Cassandra. Document which of the six lenses your version implements and which it punts on, and why.ONE for both reads and writes — it’ll be blazing fast and always available.” The data is user account balances. Find the bug in this design before it ships.- 01Don’t memorize systems — carry the 6-Lens Teardown Template: data model, partitioning, replication, consistency, failure handling, and the one genius idea. It works on any system, including ones nobody briefed you on.
- 02The sixth lens is the one that matters: name the bet. Kafka bet on the log, Spanner on bounded time, DynamoDB on predictability, Cassandra on availability. A system is only “right” relative to the workload that wants its bet and can afford its cost.
- 03Canonical systems encode this course’s trade-offs as configuration (Kafka’s unclean leader election, Cassandra’s consistency level, DynamoDB’s strong-read flag). The defaults are someone else’s opinion about your workload — read the trade before the incident does.
- 04Choose by teardown, not reputation. “Company X uses Y” is not a reason. Run the six lenses against your access patterns, consistency needs, and scale.
- 05Designing a system is the teardown run in reverse: name the abstraction, then make each of the six decisions as a deliberate, defensible trade. That is the skill this entire course was building.