Real-time

Real-time Collaborative Document Editor

Multiple users editing the same document with low latency and consistent convergence.

Scale to anchor on

Tens of millions of concurrent documents, hundreds of editors per active document at peak, sub-100 ms sync between clients.

Requirements

Functional

  • Concurrent editing with eventual convergence.
  • Cursor / selection presence visible to other editors.
  • Offline edits sync on reconnect.
  • Version history.

Non-functional

  • Low latency.
  • Consistent convergence — all clients reach the same state.
  • Bandwidth-efficient over mobile.

High-level architecture

Each document is hosted on a leader node that serializes operations. Clients send operations (OT) or merge updates (CRDT) to the leader, which broadcasts the ordered sequence to all subscribers. Snapshots compress history periodically.

Components

Doc leader
Single source of truth for operation order per document.
Pub/sub fan-out
Distributes ordered operations to subscribed clients.
Snapshot store
Periodic compressed state to bound history size.
Presence service
Ephemeral cursor/selection state with TTL.

Key decisions

Leader per document.
Single ordering authority prevents convergence bugs and simplifies reasoning.
OT vs CRDT.
OT is simpler to reason about with a server in the loop; CRDT shines for peer-to-peer or offline-heavy use cases.
Periodic snapshots.
Without snapshots, operation logs grow unbounded; cold-start of late joiners becomes expensive.
Presence separated from edits.
Different durability and frequency profiles; mixing them complicates both.

Pitfalls

  • No single leader — concurrent ordering disagreements.
  • Operation log without compaction.
  • Treating presence as durable.
  • Forgetting the offline-then-reconnect scenario.

Follow-up questions

  • How do you handle a 1000-editor document?
  • How do offline edits merge on reconnect?
  • What's the version history model?
  • How does the leader failover?

Related patterns

Further reading