Real-time
Real-time Messaging Delivery
Reliably deliver chat messages in order, online and offline, with read receipts and presence.
Scale to anchor on
Billions of users, p50 delivery < 100 ms when both parties online, durable offline delivery, billions of messages/day.
Requirements
Functional
- 1:1 and group chat with in-order delivery per conversation.
- Delivery and read receipts.
- Presence (online / typing).
- Offline durability and history sync.
Non-functional
- End-to-end encryption where required.
- Mobile-battery-friendly transport.
- Survives regional failover.
High-level architecture
WebSocket / persistent connection per device, anchored to a chat gateway. Conversation state lives in a sharded inbox store. A delivery service routes messages and emits receipts. Push notifications wake the client on background or offline state.
Components
Chat gateway
Terminates persistent connections; routes by user → conversation shard.
Conversation store
Per-conversation log of messages; sharded by conversation id for ordering.
Delivery service
Tracks which devices have received and read each message.
Presence service
Pub-sub for online status with TTLs to handle disconnects.
Push gateway
APNs / FCM bridge for offline notification.
Key decisions
Shard by conversation, not user.
Group chats are read by many; sharding by user fans out writes; sharding by conversation gives ordering and locality.
Server-assigned message IDs for ordering.
Clocks disagree across clients; the server is the single ordering authority per conversation.
At-least-once delivery with client-side dedup.
Exactly-once is fiction at this scale; clients dedup using server-assigned IDs.
Separate presence from messaging.
Presence is high-churn ephemeral state — keeping it out of the messaging hot path simplifies both.
Pitfalls
- Designing presence as durable state — it must be ephemeral.
- Client-generated message IDs — clock skew breaks ordering.
- Push notification as the primary delivery — battery and reliability disasters.
- Forgetting the multi-device sync story.
Follow-up questions
- How do you handle a flaky mobile connection?
- How do read receipts behave in a 500-member group?
- How do you support history sync after a fresh device install?
- How does this work with end-to-end encryption?