Real-time
Real-time Collaborative Document Editor
Multiple users editing the same document with low latency and consistent convergence.
Scale to anchor on
Tens of millions of concurrent documents, hundreds of editors per active document at peak, sub-100 ms sync between clients.
Requirements
Functional
- Concurrent editing with eventual convergence.
- Cursor / selection presence visible to other editors.
- Offline edits sync on reconnect.
- Version history.
Non-functional
- Low latency.
- Consistent convergence — all clients reach the same state.
- Bandwidth-efficient over mobile.
High-level architecture
Each document is hosted on a leader node that serializes operations. Clients send operations (OT) or merge updates (CRDT) to the leader, which broadcasts the ordered sequence to all subscribers. Snapshots compress history periodically.
Components
Doc leader
Single source of truth for operation order per document.
Pub/sub fan-out
Distributes ordered operations to subscribed clients.
Snapshot store
Periodic compressed state to bound history size.
Presence service
Ephemeral cursor/selection state with TTL.
Key decisions
Leader per document.
Single ordering authority prevents convergence bugs and simplifies reasoning.
OT vs CRDT.
OT is simpler to reason about with a server in the loop; CRDT shines for peer-to-peer or offline-heavy use cases.
Periodic snapshots.
Without snapshots, operation logs grow unbounded; cold-start of late joiners becomes expensive.
Presence separated from edits.
Different durability and frequency profiles; mixing them complicates both.
Pitfalls
- No single leader — concurrent ordering disagreements.
- Operation log without compaction.
- Treating presence as durable.
- Forgetting the offline-then-reconnect scenario.
Follow-up questions
- How do you handle a 1000-editor document?
- How do offline edits merge on reconnect?
- What's the version history model?
- How does the leader failover?