Coordination: Consensus, Leases, Locks, Sagas
When to reach for Raft, when a lock needs a fencing token, when to avoid coordination entirely
Coordination — getting independent nodes to agree — is the most expensive thing a distributed system can do, and most of the time you need far less of it than you reach for. The senior skill is climbing the coordination ladder only as high as correctness demands: preferring no coordination, then a single-writer lease, and paying for full consensus only when nothing cheaper is correct.
The Coordination Cost Ladder
Every coordination problem can be solved at several rungs, and they are not equally priced. The instinct under pressure is to grab the top rung (“we’ll use consensus”) or a familiar one (“a Redis lock”). The discipline is to start at the bottom and climb only when the rung below is provably insufficient.
Climb only as far as correctness demands. The single most common over-engineering in distributed systems is reaching for a higher rung than the problem requires.
The Fencing-Token Rule: making a lock actually safe
When you genuinely need a single-writer lease (the billing job, the partition owner), a lock alone is not enough — the opening scenario proves it. A lease can expire while its holder is paused, and the holder won’t know. The only place that can enforce safety is the resource being protected, and it enforces it with a fencing token.
The Fencing-Token Rule
A distributed lock is safe only if the protected resource rejects writes carrying a stale token. The lock service issues a monotonically increasing token on every ownership change; the resource remembers the highest it has seen and rejects anything lower.
1func (r *Resource) Write(token int64, value string) bool {2 // The resource — not the lock — enforces exclusion. A paused holder3 // that lost its lease comes back with an OLD token; a newer holder has4 // already bumped highestToken, so the zombie write is rejected.5 if r.fencing && token < r.highestToken {6 return false // reject the zombie writer7 }8 if token > r.highestToken {9 r.highestToken = token10 }11 r.data = value12 return true13}The token must be monotonic and must travel with every write. Worker A gets token 1, pauses; worker B gets token 2 and writes (resource now at 2); A wakes, writes with token 1, and the resource rejects it because 1 < 2. The lock service couldn’t stop A — only the resource could, and only because the token told it A was stale. A lock without fencing is mutual exclusion you merely believe you have.
courses/distributed-systems/reference-impl/08-fencing-lock/A lease-based lock service that stamps a monotonic fencing token on every ownership change, and a resource that rejects stale tokens. The demo runs the exact billing-job scenario: with fencing off, A’s zombie write corrupts the resource; with fencing on, it’s rejected. go run ., with tests for both outcomes.
Consensus in one breath: a majority agrees on an ordered log
Raft (and Paxos) solve one problem: get a group of nodes to agree on a single, ordered sequence of decisions, even as some fail. The shape: one elected leader per term accepts writes, appends them to a log, and replicates them; an entry is committed once a majority has it. The majority requirement is the whole trick — two conflicting majorities can’t exist, so you can never have two leaders both committing in the same term, which is what prevents split brain.
You rarely implement this yourself; you use it through etcd, ZooKeeper, or Consul. But knowing the shape tells you the costs you’re buying: a write needs a round trip to a majority (latency), and you lose availability if a majority isn’t reachable (that’s the C in CAP, chosen on purpose). That’s why consensus is the top rung — correct, and expensive enough that you climb to it last.
Sagas: coordination across services without a lock
Some coordination isn’t “one owner” — it’s “a multi-step operation across services that must all happen or all un-happen.” You met the wrong tool for this in Module 5 (2PC, the availability liability). The right tool is usually a saga: break the operation into local transactions, each with a compensating action that undoes it, and run them in sequence (or via events). If step 3 fails, you run the compensations for steps 2 and 1 — refund the charge, release the inventory — rather than holding locks across all of them.
| Dimension | Coordination avoidance | Lease + fencing | Quorum | Consensus (Raft) |
|---|---|---|---|---|
| Correctness guarantee | Safe IF ops are idempotent/commutative | Single writer, safe with fencing | Latest value if W+R>N | Strongest — single agreed log |
| Availability under minority failure | Full — no coordination to lose | High — lease service is the dependency | Available while a quorum is up | Available while a majority is up |
| Latency cost | None | One round trip to the lease service | Round trip to a quorum | Round trip to a majority per write |
| Complexity | Low (design ops to be safe) | Moderate (lease + fencing on resource) | Moderate (conflict resolution) | High (use etcd/ZK; don't roll your own) |
| Choose when | The operation can be made idempotent or commutative so running it twice or out of order is harmless. The default — try this first. | You need exactly one active owner at a time (a job, a partition, a leader) and can fence the resource. The right rung for most locks. | You need the latest value without a single leader and can resolve conflicts (leaderless replication). | You need globally-agreed state — leader election, membership, config — and nothing cheaper is correct. Use a proven implementation. |
Start at coordination avoidance and climb only when forced. For “only one at a time,” lease + fencing is almost always the right answer — and the fencing is not optional. Reach for consensus last, and when you do, use etcd, ZooKeeper, or Consul rather than implementing Raft yourself. The expensive mistake is paying for a high rung (or a lock without fencing) when the problem lived on a lower one.
Why a distributed lock needs a fencing token
If a distributed lock protects something for correctness, it must issue a monotonic fencing token, and the protected resource must reject any write with a token older than the highest it has seen. If your resource can’t check a token (many can’t), then a lease-based lock is an optimization to reduce contention — not a correctness guarantee — and you should make the operation idempotent instead.
What this sounds like in an interview
You have a scheduled job that must run on exactly one instance of a service that's deployed to 20 machines. How do you guarantee it?
The interviewer wants to see whether you reach for a lock and stop, or think about what 'exactly one' really requires.
I'd use a distributed lock — like a row in the database or a Redis key — so only the instance that grabs it runs the job.
A distributed lock with a TTL so it releases if the holder dies, and the holder renews it while running. Whoever holds the lock runs the job; the others skip it.
A lease-based lock (etcd, ZooKeeper, or a DB lease) where the holder is effectively the leader. But a lock with a TTL can't guarantee exclusion if the holder pauses past its lease — so I'd make the job itself idempotent: a fencing token or a 'this run already happened' check keyed on the run ID, so even if two instances briefly both run, the work happens once. Exclusion reduces duplicate work; idempotency guarantees correctness.
I'd first ask whether I need exclusion at all or just idempotency, because the cheapest correct answer is often the latter: make the job idempotent on a run key and let any instance attempt it — running twice is then harmless and I've avoided a coordination dependency entirely. If I do want a single runner for efficiency, a lease via a real consensus-backed store (etcd/ZK) for leader election, with the critical caveat that the lease is a performance optimization, not a safety guarantee — the safety comes from fencing tokens on whatever the job writes, or from the job's idempotency. I'd be explicit that a Redis-style lock without fencing gives me 'usually one runner', which is fine for reducing duplicate work and not fine for, say, double-billing. So the design is: idempotent job (correctness) + lease-based leader election (efficiency) + fencing tokens on the writes if the resource supports them. The trade is accepting that 'exactly once' is enforced at the resource, not the lock.
Asked whether exclusion was even needed, defaulted to idempotency for correctness, treated leader election as an efficiency optimization, and located the real safety guarantee at the resource (fencing) — not the lock. That's someone who has been burned by a lock that didn't lock.
Don't use a distributed lock for correctness without fencing
A lease-and-timeout lock cannot guarantee mutual exclusion against a paused holder. If correctness depends on “exactly one writer,” you need fencing tokens checked by the resource, or an idempotent operation. A fencing-less lock is a contention optimization, not a safety mechanism — using it as the latter is the billing-job double-charge.
Don't reach for consensus when idempotency would do
Standing up a Raft cluster (or adopting one’s operational burden) to coordinate something that could just be made safe to run twice is paying the top rung for a bottom-rung problem. Ask “what breaks if this runs twice?” before you build a coordination system.
Don't use 2PC where a saga fits
Cross-service “all or nothing” rarely needs the blocking atomicity of two-phase commit. A saga with compensating actions keeps each service available and locally transactional, at the cost of a visible intermediate state — almost always the better trade for a business workflow.
Don't roll your own consensus
Raft and Paxos are famously easy to get subtly, catastrophically wrong (membership changes, log truncation, the edge cases around elections). Unless this is literally your product, use etcd, ZooKeeper, or Consul. The interview answer “I’d implement Paxos” is usually the wrong one; “I’d use a proven coordination service” is right.
Exercises
08-fencing-lock, add lease renewal with a safety margin (the holder renews at half the TTL, and stops working if a renewal fails), and make LockService survive a restart by persisting nextToken so fencing tokens never go backward across a crash. Add a test that a token issued after a restart is still greater than any issued before it.1func (w *Worker) processIfLeader(res *Resource) {2 lease, ok := w.lock.Acquire(w.id, 5)3 if !ok {4 return // someone else holds it; skip5 }6 // we hold the lock, so it's safe to write... right?7 doExpensiveWork() // <- can take longer than the 5-tick lease8 res.Write(lease.Token, w.result) // token passed, but is it still valid?9}- 01Coordination is the most expensive thing a distributed system does. Climb the Coordination Cost Ladder only as far as correctness demands: prefer coordination avoidance (idempotency) → lease + fencing → quorum → consensus.
- 02Ask “what breaks if this runs twice or out of order?” first. If the answer is “nothing” (idempotent), you need no lock at all — the bottom rung, and where most locking code shouldn’t exist.
- 03A distributed lock does not guarantee mutual exclusion on its own (a holder can pause past its lease). The Fencing-Token Rule: the lock issues a monotonic token, and the protected resource rejects any write with a stale one. Safety lives at the resource, not the lock (the Redlock lesson).
- 04Consensus (Raft/Paxos) gets a majority to agree on an ordered log; the majority requirement is what prevents split brain. Use it for leader election, membership, and config — and use etcd/ZK/Consul, never your own implementation.
- 05For cross-service “all or nothing,” prefer a saga (local transactions + compensating actions) over 2PC. Orchestrated for visibility, choreographed for decoupling — and every step idempotent, because sagas retry.