Storage

Distributed Key-Value Store (Dynamo-style)

Highly available, partition-tolerant KV store with tunable consistency.

Scale to anchor on

Petabytes of data, single-digit-ms read/write at p99, 5-nines availability, multi-region.

Requirements

Functional

Get/put/delete by key.
Tunable read and write quorums.
Cross-region replication.
Conditional writes.

Non-functional

Availability over strict consistency.
Smooth scaling by adding nodes without downtime.
Predictable tail latency.

High-level architecture

Consistent hashing assigns keys to virtual nodes. Each key is replicated to N successor nodes. Reads and writes use quorum (R, W) tunable per request. Anti-entropy (Merkle trees), hinted handoff, and read repair maintain replicas.

Components

Coordinator nodes

Route requests to replica nodes based on the ring.

Storage nodes

Persist data; participate in quorum and replication.

Anti-entropy service

Background reconciliation between replicas.

Hinted handoff store

Holds writes destined for a temporarily-unreachable node.

Key decisions

Consistent hashing with virtual nodes.

Smooths load distribution and limits the keys moved on membership change to ~1/N.

Quorum (R + W > N) for strong reads.

Allows clients to trade availability for consistency per request.

Vector clocks or last-write-wins for conflict resolution.

Concurrent writes happen under partition; the conflict-resolution policy must be explicit.

Hinted handoff for temporary failures.

Allows writes to continue during transient node loss without losing data.

Pitfalls

Hot keys not addressed — consistent hashing doesn't solve skew.
Last-write-wins silently dropping concurrent updates.
Ignoring read-repair load on the read path.
No back-pressure on anti-entropy traffic during incidents.

Follow-up questions

How do you handle a hot key whose throughput exceeds a single node?
What happens when a region partitions?
How does compaction work and what's its impact on tail latency?

Related patterns

consistent-hashing leader-election caching