← all lessons
System Design · Consistency & Replication ·

Quorum Reads & Writes: Tunable Consistency Without a Single Master

Core concept
In a distributed system with N replicas, a quorum requires that reads and writes succeed on a minimum number of nodes before acknowledging the client. The classic rule is W + R > N, which guarantees at least one node overlaps between every write set and every read set — ensuring you always read a value that reflects the latest write. By tuning W and R independently you can trade latency, availability, and consistency against each other. This is the mechanism behind Cassandra's QUORUM, ONE, and ALL consistency levels, and Amazon Dynamo's configurable (N, W, R) knobs.

Diagram

flowchart LR
    C[Client] -->|Write to W=2| A[Replica A]
    C -->|Write to W=2| B[Replica B]
    C -->|Skip| D[Replica C]
    E[Client Read] -->|Read from R=2| A
    E -->|Read from R=2| B
    A -->|Latest version found| E

Concrete real-world example
Cassandra with N=3, W=2, R=2 (classic QUORUM): a write succeeds once 2 of 3 replicas confirm. A subsequent read contacts 2 replicas and takes the value with the highest timestamp. Because W+R=4 > N=3, the read set must include at least one node that saw the write — stale data is mathematically impossible. If you relax to W=1, R=1 (ONE/ONE), you get lowest latency but no overlap guarantee and can easily read stale data.

One trade-off / gotcha
Quorum does not protect you from concurrent conflicting writes by itself. If two clients write simultaneously and both achieve W=2 on overlapping but different replica sets, you get a write conflict — the system must resolve it via last-write-wins (LWW), vector clocks (per-node version counters that detect concurrent writes), or CRDTs (data types that merge conflicting writes automatically). LWW silently discards one write, so "quorum consistency" can still mean data loss in high-concurrency scenarios without an additional conflict resolution strategy.

An interview-style question to ponder
You have a user-profile service on Cassandra (N=3). The team wants the lowest possible read latency for a global leaderboard, and they're willing to tolerate slightly stale reads. What (W, R) values would you choose, and what availability trade-off are you accepting?

Stuck? Show a hint

Write down the rule W + R > N with N = 3, then ask which single variable most directly cuts read latency — and what pushing it to its minimum costs you in the overlap guarantee. The question already grants that some staleness is acceptable, so spend that slack where it buys the most.

Show answer

Choose W=2, R=1: reads hit a single replica for the fastest possible read path, while writes still require a majority.

  • Why R=1 wins here: a read contacts just one replica, so it returns with minimal latency and keeps serving even when other nodes are down — exactly what a read-heavy global leaderboard wants.
  • Why this means "slightly stale": W+R = 2+1 = 3, which only equals N=3 rather than exceeding it, so the write set and read set aren't guaranteed to overlap — the one replica you read might have missed the latest write. The question said that staleness is acceptable.
  • The availability trade you're accepting: W=2 means a write needs two healthy replicas, so writes tolerate one node failure but are slightly less available than reads. Since leaderboard writes are far rarer than reads, you're spending availability where it's cheap — a good asymmetric trade.
  • But why not W=3, R=1? That actually gives guaranteed fresh reads (W+R = 4 > 3), but a write would then need all three replicas, so a single node being down, slow, or restarting blocks every write, and write latency tracks your slowest replica. You'd throw away the fault tolerance that having three replicas was supposed to buy you.
  • Watch out: keep stale windows short with sensible TTL and compaction, and confirm the business genuinely tolerates eventual consistency before shipping this.