Field Manual
Module 6 · Communication · 60 min

Idempotency & 'Exactly-Once' That Survives Contact

Idempotency keys, the transactional outbox, and effectively-once delivery

Framework: The Effectively-Once Triangle · Idempotency-Key LifecycleAnchored to: Stripe idempotency design + duplicate-charge incidents

Exactly-once delivery is impossible over an unreliable network — the sender can never know its message arrived, so it must either risk losing it (at-most-once) or risk duplicating it (at-least-once). What you can build is exactly-once effect: at-least-once delivery plus deduplication, assembled from three specific parts. This module builds all three and names the trap in between.

Idempotency keys: making a retry a no-op

The fix for the double charge is to make the operation idempotent: running it twice has the same effect as running it once. For naturally idempotent operations (set the address to X, delete order 7) you get this free — repeating them changes nothing. For operations with side effects that aren’t naturally idempotent — charge a card, send an email, increment a balance — you make them idempotent with a client-supplied idempotency key: a unique token the client generates per logical operation and reuses on every retry of it.

The server records the key the first time it sees it, performs the side effect, and stores the response — all atomically. Every subsequent request with that key returns the stored response without re-running the side effect. The lifecycle of that key is the whole pattern:

Framework · State machine

The Idempotency-Key Lifecycle

Three states, one ironclad rule: claim the key, do the side effect, and store the response inside a single transaction.

StateHow you reach itWhat a retried request does
NEWKey never seen. Insert (key → IN_PROGRESS) via a UNIQUE constraint — the insert itself is the lock.N/A — this is the first request; proceed to do the work.
IN_PROGRESSKey inserted, side effect not yet committed (original still running, or crashed mid-flight).Return 409 / 'retry shortly'. Do NOT run the side effect — the original may still complete.
COMPLETEDSide effect committed AND the response persisted in the same transaction.Return the stored response verbatim. No side effect. This is the idempotent replay.
EXPIREDTTL passed (keys can't live forever). Record garbage-collected.Treated as NEW — so set the TTL longer than any client will plausibly retry.

The one rule that makes it correct: the side effect and the response-storage must commit together. If you charge the card and then store the response in a second step, a crash between them leaves a charged card with no stored response — and the retry charges again. Atomicity (Module 5) is load-bearing here.

idempotency.go — claim, act, store: one commit
1func (s *Store) Charge(key string, amount int) ChargeResult {
2 if rec, ok := s.idem[key]; ok {
3 if rec.status == statusCompleted {
4 return ChargeResult{Response: rec.response, Replayed: true} // replay
5 }
6 return ChargeResult{Conflict: true} // original still running -> 409
7 }
8 // claim the key, do the side effect, enqueue the event, store the
9 // response — modeled single-threaded, but in production ONE transaction:
10 s.idem[key] = &idemRecord{status: statusInProgress}
11 s.ledger = append(s.ledger, fmt.Sprintf("charged %d (key=%s)", amount, key))
12 s.outbox.Add(fmt.Sprintf("charge.created amount=%d key=%s", amount, key))
13 resp := fmt.Sprintf("ok: charged %d", amount)
14 s.idem[key].status = statusCompleted
15 s.idem[key].response = resp
16 return ChargeResult{Response: resp}
17}

The Effectively-Once Triangle

Idempotency keys handle the synchronous request path. But the charge probably also emits an event — charge.created — that other services consume (email a receipt, update analytics, credit a referral). Now you have two new ways to leak a duplicate or lose an event entirely, and fixing them requires two more pieces. Together they form a triangle: remove any vertex and duplicates (or losses) leak through.

Framework · Triangle (all three required)

The Effectively-Once Triangle

Exactly-once effect isn't one mechanism — it's three working together. Idempotent producer, transactional outbox, idempotent consumer.

remove one → duplicates leakIdempotent produceridempotency keyTransactional outboxcommit event with effectIdempotent consumerdedup by message id
Producer without dedup-store → the event can be lost or doubled by a crash between writing the DB and publishing. Outbox without an idempotent consumer → at-least-once delivery double-applies. Consumer without an idempotent producer → the same logical action enters twice. You need all three.

The transactional outbox solves the dual-write problem: instead of “write the DB, then publish to the broker” (two systems, no atomicity — a crash between them loses or duplicates the event), you write the event to an outbox table in the same database transaction as the side effect. A separate poller publishes outbox rows and marks them sent. The event is published if and only if the side effect committed.

The idempotent consumer closes the loop: because the poller delivers at-least-once (it may crash after publishing but before marking sent), consumers dedup by message ID, so a redelivered event has no extra effect.

Runnable reference implementation
Go
courses/distributed-systems/reference-impl/06-idempotency-outbox/

All three vertices, runnable. The demo charges the same idempotency key three times (one real charge, two replays), then crashes mid-publish so the broker receives the event twice — and the idempotent consumer still applies the effect exactly once. go run ., with tests for each vertex.

Mental model
Delivery vs. effect: stop trying to deliver once

The most expensive confusion in messaging is treating “exactly once” as a delivery guarantee to configure. It isn’t one. A sender that gets no ack cannot know if the message arrived; it must choose to resend (maybe duplicate) or not (maybe lose). There is no third option at the delivery layer — that’s the FLP/two-generals reality from Module 1.

So you stop trying. You choose at-least-once delivery (never lose), accept that duplicates will happen, and make the effect idempotent so duplicates don’t matter. “We handle exactly-once” should always decode to “at-least-once delivery plus idempotent processing.”

Use it when: Any time someone says 'we need exactly-once.' Redirect the conversation from delivery (impossible) to effect (buildable).
DimensionAt-most-onceAt-least-onceAt-least-once + dedupBroker EOS (Kafka)
Can lose messages?Yes — fire and forgetNoLowestSend once, never retry
Can duplicate effects?NoYes — every duplicate appliesLow (just retry)Retry until acked
Implementation costNoNo — dedup absorbs themModerate — keys + outbox + dedupExactly-once EFFECT
What it really isNoNo (within the Kafka boundary)High; only within one broker's transactionsThe triangle, productized
Choose whenLoss is acceptable and duplicates are not — metrics, telemetry samples, best-effort notifications.Loss is unacceptable and the consumer is naturally idempotent already (e.g. setting a value), so duplicates are harmless.Loss is unacceptable AND the effect isn't naturally idempotent (payments, emails, balance changes). The default for business-critical events.Your producers and consumers all live inside one Kafka cluster and you can adopt its transactional API end to end.
Verdict

For anything with a side effect that matters, build at-least-once delivery + deduplication — the triangle. Don’t chase “exactly-once delivery” as a config flag; it doesn’t exist, and the time spent looking for it is time not spent making your effects idempotent. Broker EOS is real but bounded to one cluster’s transactions — the moment an effect leaves that boundary (a card charge, an email), you’re back to the triangle.

How this fails in production · Stripe

Retries that double-charge — and the idempotency key that stops them

The setup
Stripe’s API moves money over the public internet, where requests time out, connections drop, and clients legitimately can’t tell a lost response from a failed request. A naive payments API in that environment double-charges on every retried timeout — the exact failure in this module’s opening scenario, multiplied across millions of integrations.
What happened
Rather than hope clients never retry (they must), Stripe made idempotency a first-class part of the API: clients send an Idempotency-Key header, and Stripe guarantees that replaying a request with the same key returns the original response and performs the side effect at most once. The hard part is the implementation they describe — claiming the key, recording request parameters, persisting the response, and handling the in-progress and crashed-midway states correctly, all without a window where a retry slips a second charge through.
The moment it went wrong
The insight worth stealing: they treat duplicate requests as the normal case, not an error. A money API that assumes clients send each request exactly once is wrong on day one. By designing for retries from the start — and storing the response atomically with the charge — they turn an unavoidable property of networks (duplicates) into a non-event.
The transferable lesson

Make every non-idempotent write endpoint accept an idempotency key, and treat the key’s lifecycle as part of the same transaction as the side effect. The duplicate request is not an edge case to log — it is the contract. Build for it, and the double-charge ticket never gets written.

Stripe — Designing robust and predictable APIs with idempotency

What this sounds like in an interview

Calibration ladder · L3 → L6

How do you make a 'create payment' endpoint safe for clients to retry?

The interviewer wants to see if you treat duplicate requests as the normal case and know how to dedup atomically.

L3 · Junior

I'd check if a payment with the same details already exists before creating a new one, and skip it if so.

Missed: 'Check if it exists' is a TOCTOU race: two concurrent first-requests both check, both miss, both charge. This is the find-the-race bug.
L4 · Mid

I'd have the client send an idempotency key, store it, and if I've seen it before, return the previous result instead of charging again.

Missed: Right mechanism, but doesn't make the check-and-store atomic, so it still races, and doesn't address the crash-between-charge-and-store window.
L5 · Senior

Client-generated idempotency key, and the key handling has to be atomic with the side effect. On the first request I insert the key with a UNIQUE constraint — the insert is the lock — do the charge, and store the response, all in one transaction. A retry hits the key: if it's completed I replay the stored response; if it's still in progress I return a 409 so I don't run the charge twice. The subtle part is that storing the response and doing the charge must commit together, or a crash between them re-charges on retry.

Missed: Strong and correct for the sync path. Missing the event/outbox blast radius, the request-fingerprint guard against key reuse, and naming effect-vs-delivery explicitly.
L6 · Staff

Same atomic idempotency-key design, but I'd cover the full blast radius. Beyond the synchronous charge, the endpoint emits events — so I'd use a transactional outbox to publish 'payment.created' in the same commit, avoiding the dual-write problem, and make downstream consumers dedup by event ID, because delivery is at-least-once. I'd bound the idempotency key with a TTL longer than any client retry window and store a hash of the request body with the key, so a client reusing a key with different parameters gets a 422 instead of silently getting the old response. I'd also be explicit that this gives exactly-once effect, not delivery — there's no such thing as exactly-once delivery, and I'd push back if someone specced it. The trade is a bit of storage and a dedup table for making an unavoidable property of networks — duplicates — harmless.

What scored L6

Made the idempotency atomic with the side effect, extended it to the async event path with an outbox + deduping consumers, guarded against key reuse with a body fingerprint, and explicitly reframed 'exactly-once delivery' as effect. That's someone who has built a payments path.

When NOT to use this
Don't add idempotency keys to naturally idempotent operations

A PUT that sets a resource to a fixed value, a delete by ID, a “mark as read” — these are already idempotent: running them twice changes nothing. Bolting an idempotency-key table onto them adds storage, a dedup lookup, and a TTL to manage, for a property you already had. Spend the mechanism on the operations that actually accumulate (charges, increments, sends).

Don't claim or design for exactly-once delivery

It doesn’t exist over an unreliable network. Speccing it sends a team hunting for a config flag that isn’t there, instead of building at-least-once + dedup. If a requirement says “exactly-once,” translate it to “never lose, and make the effect idempotent” before you design.

Don't store idempotency keys forever

An unbounded key table grows without limit and eventually dominates your storage and lookup cost. Set a TTL longer than any plausible client retry window (hours to a day, not years), and accept that a request retried after expiry is treated as new — which is fine, because no real client retries a day later.

Don't dedup on payload instead of an explicit key

Hashing the request body to detect duplicates breaks two ways: two legitimately-distinct operations with identical payloads (two $5 coffees) collapse into one, and a single operation whose payload varies slightly on retry (a new timestamp) looks like two. Use an explicit client-generated key for the logical operation; reserve the body hash for detecting key reuse with different params.

Exercises

Exercise · Design scenario
Design exactly-once effect for a system that, on a successful order, must (1) charge the card, (2) decrement inventory, (3) send a confirmation email, and (4) credit the referrer. These span four services. Specify how you guarantee each side effect happens once despite retries and at-least-once event delivery, and what the user sees if step 3 (email) is delayed or step 4 (referral) fails. Identify where you’d use an outbox vs. an idempotency key.
Exercise · Implementation task
In 06-idempotency-outbox, wrap Store.Charge in a real net/http handler that reads the Idempotency-Key header, returns 409 while a key is in progress and the stored response on replay, and rejects a reused key whose request body differs (store a body fingerprint alongside the key). Add a TTL sweep that expires old keys.
Exercise · Find the race
This idempotency check shipped to a payments service. It double-charged under exactly the condition it was meant to prevent: two retries arriving at the same instant. Find the window.
charge.ts — shipped, double-charged
1async function charge(key: string, amount: number) {
2 // check if we've seen this idempotency key
3 const existing = await db.query("SELECT response FROM idem WHERE key = $1", [key])
4 if (existing) {
5 return existing.response // replay
6 }
7 // not seen -> do the charge
8 const resp = await paymentGateway.charge(amount)
9 await db.query("INSERT INTO idem (key, response) VALUES ($1, $2)", [key, resp])
10 return resp
11}
Walk away with this
  • 01Your write endpoints will receive duplicate requests — clients must retry on timeout because a network call’s third outcome is “no answer” (Module 1). Design for duplicates as the normal case.
  • 02Make non-idempotent writes idempotent with a client-supplied idempotency key, and claim the key atomically before the side effect (UNIQUE-constraint insert as the lock), storing the response in the same transaction.
  • 03Exactly-once delivery is impossible; exactly-once effect is the Effectively-Once Triangle: idempotent producer + transactional outbox + idempotent consumer. Remove any vertex and duplicates leak.
  • 04The transactional outbox kills the dual-write problem: write the event in the same transaction as the side effect, publish it from there. Delivery stays at-least-once; the deduping consumer makes the effect once.
  • 05When anyone says “exactly-once,” translate it to “at-least-once delivery + idempotent processing” and find the dedup. If you can’t find it, it isn’t exactly-once.