Storage
Distributed Key-Value Store (Dynamo-style)
Highly available, partition-tolerant KV store with tunable consistency.
Scale to anchor on
Petabytes of data, single-digit-ms read/write at p99, 5-nines availability, multi-region.
Requirements
Functional
- Get/put/delete by key.
- Tunable read and write quorums.
- Cross-region replication.
- Conditional writes.
Non-functional
- Availability over strict consistency.
- Smooth scaling by adding nodes without downtime.
- Predictable tail latency.
High-level architecture
Consistent hashing assigns keys to virtual nodes. Each key is replicated to N successor nodes. Reads and writes use quorum (R, W) tunable per request. Anti-entropy (Merkle trees), hinted handoff, and read repair maintain replicas.
Components
Coordinator nodes
Route requests to replica nodes based on the ring.
Storage nodes
Persist data; participate in quorum and replication.
Anti-entropy service
Background reconciliation between replicas.
Hinted handoff store
Holds writes destined for a temporarily-unreachable node.
Key decisions
Consistent hashing with virtual nodes.
Smooths load distribution and limits the keys moved on membership change to ~1/N.
Quorum (R + W > N) for strong reads.
Allows clients to trade availability for consistency per request.
Vector clocks or last-write-wins for conflict resolution.
Concurrent writes happen under partition; the conflict-resolution policy must be explicit.
Hinted handoff for temporary failures.
Allows writes to continue during transient node loss without losing data.
Pitfalls
- Hot keys not addressed — consistent hashing doesn't solve skew.
- Last-write-wins silently dropping concurrent updates.
- Ignoring read-repair load on the read path.
- No back-pressure on anti-entropy traffic during incidents.
Follow-up questions
- How do you handle a hot key whose throughput exceeds a single node?
- What happens when a region partitions?
- How does compaction work and what's its impact on tail latency?