Field Manual
Module 8 · Coordination · 65 min

Coordination: Consensus, Leases, Locks, Sagas

When to reach for Raft, when a lock needs a fencing token, when to avoid coordination entirely

Framework: The Coordination Cost Ladder · The Fencing-Token RuleAnchored to: The Redlock debate (Kleppmann vs antirez)

Coordination — getting independent nodes to agree — is the most expensive thing a distributed system can do, and most of the time you need far less of it than you reach for. The senior skill is climbing the coordination ladder only as high as correctness demands: preferring no coordination, then a single-writer lease, and paying for full consensus only when nothing cheaper is correct.

The Coordination Cost Ladder

Every coordination problem can be solved at several rungs, and they are not equally priced. The instinct under pressure is to grab the top rung (“we’ll use consensus”) or a familiar one (“a Redis lock”). The discipline is to start at the bottom and climb only when the rung below is provably insufficient.

The Coordination Cost Ladder
coordination cost
4
Full consensus (Raft / Paxos)most expensive
A majority agrees on a single ordered log; survives a minority failing. Use for the things that must be globally agreed: leader election, cluster membership, the lock service itself. Cost: round trips and a majority must be alive.
3
Quorum read/writeper-op majority
Each operation touches a majority (W + R > N), so reads see the latest write without a single leader (Module 3). Cheaper than full consensus — no global log — but you own conflict resolution.
2
Single-writer lease + fencingone owner at a time
A lease grants exclusive ownership for a bounded time; a fencing token makes it safe against paused holders. Use for distributed locks, partition ownership, leader-follower. The right rung for most 'only one at a time' needs.
1
Coordination avoidancecheapest — prefer this
Don't coordinate at all: make operations idempotent or commutative (idempotency keys, CRDTs — Modules 1 & 6) so order and exclusivity stop mattering. Most 'we need a lock' is really 'we need this to be safe to run twice.'

Climb only as far as correctness demands. The single most common over-engineering in distributed systems is reaching for a higher rung than the problem requires.

The Fencing-Token Rule: making a lock actually safe

When you genuinely need a single-writer lease (the billing job, the partition owner), a lock alone is not enough — the opening scenario proves it. A lease can expire while its holder is paused, and the holder won’t know. The only place that can enforce safety is the resource being protected, and it enforces it with a fencing token.

Framework · Invariant

The Fencing-Token Rule

A distributed lock is safe only if the protected resource rejects writes carrying a stale token. The lock service issues a monotonically increasing token on every ownership change; the resource remembers the highest it has seen and rejects anything lower.

resource.go — the only place safety can be enforced
1func (r *Resource) Write(token int64, value string) bool {
2 // The resource — not the lock — enforces exclusion. A paused holder
3 // that lost its lease comes back with an OLD token; a newer holder has
4 // already bumped highestToken, so the zombie write is rejected.
5 if r.fencing && token < r.highestToken {
6 return false // reject the zombie writer
7 }
8 if token > r.highestToken {
9 r.highestToken = token
10 }
11 r.data = value
12 return true
13}

The token must be monotonic and must travel with every write. Worker A gets token 1, pauses; worker B gets token 2 and writes (resource now at 2); A wakes, writes with token 1, and the resource rejects it because 1 < 2. The lock service couldn’t stop A — only the resource could, and only because the token told it A was stale. A lock without fencing is mutual exclusion you merely believe you have.

Runnable reference implementation
Go
courses/distributed-systems/reference-impl/08-fencing-lock/

A lease-based lock service that stamps a monotonic fencing token on every ownership change, and a resource that rejects stale tokens. The demo runs the exact billing-job scenario: with fencing off, A’s zombie write corrupts the resource; with fencing on, it’s rejected. go run ., with tests for both outcomes.

Mental model
Consensus in one breath: a majority agrees on an ordered log

Raft (and Paxos) solve one problem: get a group of nodes to agree on a single, ordered sequence of decisions, even as some fail. The shape: one elected leader per term accepts writes, appends them to a log, and replicates them; an entry is committed once a majority has it. The majority requirement is the whole trick — two conflicting majorities can’t exist, so you can never have two leaders both committing in the same term, which is what prevents split brain.

You rarely implement this yourself; you use it through etcd, ZooKeeper, or Consul. But knowing the shape tells you the costs you’re buying: a write needs a round trip to a majority (latency), and you lose availability if a majority isn’t reachable (that’s the C in CAP, chosen on purpose). That’s why consensus is the top rung — correct, and expensive enough that you climb to it last.

Use it when: When you need the lock service, leader election, or config that every node must agree on — the things at the top of the ladder.

Sagas: coordination across services without a lock

Some coordination isn’t “one owner” — it’s “a multi-step operation across services that must all happen or all un-happen.” You met the wrong tool for this in Module 5 (2PC, the availability liability). The right tool is usually a saga: break the operation into local transactions, each with a compensating action that undoes it, and run them in sequence (or via events). If step 3 fails, you run the compensations for steps 2 and 1 — refund the charge, release the inventory — rather than holding locks across all of them.

DimensionCoordination avoidanceLease + fencingQuorumConsensus (Raft)
Correctness guaranteeSafe IF ops are idempotent/commutativeSingle writer, safe with fencingLatest value if W+R>NStrongest — single agreed log
Availability under minority failureFull — no coordination to loseHigh — lease service is the dependencyAvailable while a quorum is upAvailable while a majority is up
Latency costNoneOne round trip to the lease serviceRound trip to a quorumRound trip to a majority per write
ComplexityLow (design ops to be safe)Moderate (lease + fencing on resource)Moderate (conflict resolution)High (use etcd/ZK; don't roll your own)
Choose whenThe operation can be made idempotent or commutative so running it twice or out of order is harmless. The default — try this first.You need exactly one active owner at a time (a job, a partition, a leader) and can fence the resource. The right rung for most locks.You need the latest value without a single leader and can resolve conflicts (leaderless replication).You need globally-agreed state — leader election, membership, config — and nothing cheaper is correct. Use a proven implementation.
Verdict

Start at coordination avoidance and climb only when forced. For “only one at a time,” lease + fencing is almost always the right answer — and the fencing is not optional. Reach for consensus last, and when you do, use etcd, ZooKeeper, or Consul rather than implementing Raft yourself. The expensive mistake is paying for a high rung (or a lock without fencing) when the problem lived on a lower one.

How this fails in production · Redis / the Redlock debate

Why a distributed lock needs a fencing token

The setup
Redis’s creator proposed Redlock, an algorithm for distributed locks across multiple Redis nodes, intended to be safe enough for correctness-critical mutual exclusion. It was widely adopted for exactly the kind of “only one worker” job in this module’s opening scenario.
What happened
Martin Kleppmann published a detailed critique. His core point isn’t a bug in Redlock’s implementation — it’s that no lease-and-timeout lock can guarantee mutual exclusion for correctness, because a process can pause (GC, scheduler, a slow syscall) past its lease without knowing it. When it resumes, it acts as though it still holds the lock. He showed the exact failure: client 1 acquires, pauses, the lease expires, client 2 acquires, and client 1 wakes up and writes — two writers, corruption. The defense, he argued, is a fencing token: a monotonic number the lock hands out, which the storage layer uses to reject writes from an expired holder. antirez responded defending Redlock’s guarantees under its assumptions; the debate became the canonical reference on the topic.
The moment it went wrong
The insight that settled it for practitioners: a lock service, by itself, can never enforce safety, because it isn’t in the path of the actual write. Only the resource being written can reject a stale writer, and it can only do so if it’s given a fencing token to compare. Mutual exclusion for correctness is therefore a property of the resource, not the lock.
The transferable lesson

If a distributed lock protects something for correctness, it must issue a monotonic fencing token, and the protected resource must reject any write with a token older than the highest it has seen. If your resource can’t check a token (many can’t), then a lease-based lock is an optimization to reduce contention — not a correctness guarantee — and you should make the operation idempotent instead.

Martin Kleppmann — How to do distributed locking (and the antirez response)

What this sounds like in an interview

Calibration ladder · L3 → L6

You have a scheduled job that must run on exactly one instance of a service that's deployed to 20 machines. How do you guarantee it?

The interviewer wants to see whether you reach for a lock and stop, or think about what 'exactly one' really requires.

L3 · Junior

I'd use a distributed lock — like a row in the database or a Redis key — so only the instance that grabs it runs the job.

Missed: A bare lock with no TTL deadlocks if the holder dies; with a naive TTL it allows two runners on a pause. And it treats the lock as the correctness guarantee, which it isn't.
L4 · Mid

A distributed lock with a TTL so it releases if the holder dies, and the holder renews it while running. Whoever holds the lock runs the job; the others skip it.

Missed: Better, but still assumes the lock provides exclusion. A GC pause past the TTL gives two runners and double execution — the exact scenario this module opens with.
L5 · Senior

A lease-based lock (etcd, ZooKeeper, or a DB lease) where the holder is effectively the leader. But a lock with a TTL can't guarantee exclusion if the holder pauses past its lease — so I'd make the job itself idempotent: a fencing token or a 'this run already happened' check keyed on the run ID, so even if two instances briefly both run, the work happens once. Exclusion reduces duplicate work; idempotency guarantees correctness.

Missed: Strong — separates exclusion (efficiency) from idempotency (correctness). Missing the explicit 'do I need exclusion at all' framing and naming consensus-backed election vs. a fencing-less Redis lock.
L6 · Staff

I'd first ask whether I need exclusion at all or just idempotency, because the cheapest correct answer is often the latter: make the job idempotent on a run key and let any instance attempt it — running twice is then harmless and I've avoided a coordination dependency entirely. If I do want a single runner for efficiency, a lease via a real consensus-backed store (etcd/ZK) for leader election, with the critical caveat that the lease is a performance optimization, not a safety guarantee — the safety comes from fencing tokens on whatever the job writes, or from the job's idempotency. I'd be explicit that a Redis-style lock without fencing gives me 'usually one runner', which is fine for reducing duplicate work and not fine for, say, double-billing. So the design is: idempotent job (correctness) + lease-based leader election (efficiency) + fencing tokens on the writes if the resource supports them. The trade is accepting that 'exactly once' is enforced at the resource, not the lock.

What scored L6

Asked whether exclusion was even needed, defaulted to idempotency for correctness, treated leader election as an efficiency optimization, and located the real safety guarantee at the resource (fencing) — not the lock. That's someone who has been burned by a lock that didn't lock.

When NOT to use this
Don't use a distributed lock for correctness without fencing

A lease-and-timeout lock cannot guarantee mutual exclusion against a paused holder. If correctness depends on “exactly one writer,” you need fencing tokens checked by the resource, or an idempotent operation. A fencing-less lock is a contention optimization, not a safety mechanism — using it as the latter is the billing-job double-charge.

Don't reach for consensus when idempotency would do

Standing up a Raft cluster (or adopting one’s operational burden) to coordinate something that could just be made safe to run twice is paying the top rung for a bottom-rung problem. Ask “what breaks if this runs twice?” before you build a coordination system.

Don't use 2PC where a saga fits

Cross-service “all or nothing” rarely needs the blocking atomicity of two-phase commit. A saga with compensating actions keeps each service available and locally transactional, at the cost of a visible intermediate state — almost always the better trade for a business workflow.

Don't roll your own consensus

Raft and Paxos are famously easy to get subtly, catastrophically wrong (membership changes, log truncation, the edge cases around elections). Unless this is literally your product, use etcd, ZooKeeper, or Consul. The interview answer “I’d implement Paxos” is usually the wrong one; “I’d use a proven coordination service” is right.

Exercises

Exercise · Design scenario
Design leader election + work assignment for a fleet of workers consuming a partitioned queue, where each partition must be processed by exactly one worker at a time, workers can crash, and the fleet scales up and down. Specify: how a worker claims a partition, what happens when a worker holding a partition pauses or dies, how you prevent two workers from processing the same partition, and which rung of the ladder each piece sits on.
Exercise · Implementation task
In 08-fencing-lock, add lease renewal with a safety margin (the holder renews at half the TTL, and stops working if a renewal fails), and make LockService survive a restart by persisting nextToken so fencing tokens never go backward across a crash. Add a test that a token issued after a restart is still greater than any issued before it.
Exercise · Find the race
This worker uses the lock service correctly to decide whether to act — and still corrupts the resource. The lock isn’t the problem; the way it’s used is. Find the gap.
worker.go — shipped, still corrupted data
1func (w *Worker) processIfLeader(res *Resource) {
2 lease, ok := w.lock.Acquire(w.id, 5)
3 if !ok {
4 return // someone else holds it; skip
5 }
6 // we hold the lock, so it's safe to write... right?
7 doExpensiveWork() // <- can take longer than the 5-tick lease
8 res.Write(lease.Token, w.result) // token passed, but is it still valid?
9}
Walk away with this
  • 01Coordination is the most expensive thing a distributed system does. Climb the Coordination Cost Ladder only as far as correctness demands: prefer coordination avoidance (idempotency) → lease + fencing → quorum → consensus.
  • 02Ask “what breaks if this runs twice or out of order?” first. If the answer is “nothing” (idempotent), you need no lock at all — the bottom rung, and where most locking code shouldn’t exist.
  • 03A distributed lock does not guarantee mutual exclusion on its own (a holder can pause past its lease). The Fencing-Token Rule: the lock issues a monotonic token, and the protected resource rejects any write with a stale one. Safety lives at the resource, not the lock (the Redlock lesson).
  • 04Consensus (Raft/Paxos) gets a majority to agree on an ordered log; the majority requirement is what prevents split brain. Use it for leader election, membership, and config — and use etcd/ZK/Consul, never your own implementation.
  • 05For cross-service “all or nothing,” prefer a saga (local transactions + compensating actions) over 2PC. Orchestrated for visibility, choreographed for decoupling — and every step idempotent, because sagas retry.