Real-time

Real-time Messaging Delivery

Reliably deliver chat messages in order, online and offline, with read receipts and presence.

Scale to anchor on

Billions of users, p50 delivery < 100 ms when both parties online, durable offline delivery, billions of messages/day.

Requirements

Functional

  • 1:1 and group chat with in-order delivery per conversation.
  • Delivery and read receipts.
  • Presence (online / typing).
  • Offline durability and history sync.

Non-functional

  • End-to-end encryption where required.
  • Mobile-battery-friendly transport.
  • Survives regional failover.

High-level architecture

WebSocket / persistent connection per device, anchored to a chat gateway. Conversation state lives in a sharded inbox store. A delivery service routes messages and emits receipts. Push notifications wake the client on background or offline state.

Components

Chat gateway
Terminates persistent connections; routes by user → conversation shard.
Conversation store
Per-conversation log of messages; sharded by conversation id for ordering.
Delivery service
Tracks which devices have received and read each message.
Presence service
Pub-sub for online status with TTLs to handle disconnects.
Push gateway
APNs / FCM bridge for offline notification.

Key decisions

Shard by conversation, not user.
Group chats are read by many; sharding by user fans out writes; sharding by conversation gives ordering and locality.
Server-assigned message IDs for ordering.
Clocks disagree across clients; the server is the single ordering authority per conversation.
At-least-once delivery with client-side dedup.
Exactly-once is fiction at this scale; clients dedup using server-assigned IDs.
Separate presence from messaging.
Presence is high-churn ephemeral state — keeping it out of the messaging hot path simplifies both.

Pitfalls

  • Designing presence as durable state — it must be ephemeral.
  • Client-generated message IDs — clock skew breaks ordering.
  • Push notification as the primary delivery — battery and reliability disasters.
  • Forgetting the multi-device sync story.

Follow-up questions

  • How do you handle a flaky mobile connection?
  • How do read receipts behave in a 500-member group?
  • How do you support history sync after a fresh device install?
  • How does this work with end-to-end encryption?

Related patterns

Further reading