← all lessons
System Design · Consistency & Replication ·

Session Tokens & Sticky Routing: Enforcing Read-Your-Writes Across Stateless Servers

---

- Core concept — In a distributed system with many read replicas (copies of the database serving reads), a user who just wrote data might have their next request routed to a replica that hasn't received that write yet — and they'll see stale data. One practical fix is to embed a write token (a small piece of metadata recording what write the client just performed) into the client's session, then use that token at read time to route the request to a replica that is provably caught up. This keeps your servers stateless (they hold no per-user memory between requests) while still giving each user the illusion of seeing their own writes. The key insight is that the client carries the consistency burden, not the servers.

flowchart LR
    C["Client"] -->|"write"| PRI["Primary DB"]
    PRI -->|"returns write token\n(e.g. LSN 4821)"| C
    C -->|"read + token 4821"| R["Read Router"]
    R -->|"pick replica\n≥ LSN 4821"| REP["Replica (LSN 4830)"]
    REP -->|"fresh result"| C

- Concrete real-world example — PostgreSQL (a popular open-source relational database) exposes a value called the LSN (Log Sequence Number — a monotonically increasing ID for every write). When your application writes a row, the primary returns its current LSN. Your load balancer (the machine that routes requests across replicas) can compare that LSN against each replica's reported replication lag. It sends the read only to a replica whose LSN is ≥ the token. YouTube's comment system reportedly used a similar approach: after posting a comment, the client was pinned to a region that was guaranteed to have processed that write.

- One trade-off / gotcha — This scheme silently degrades when replication lag spikes. If all replicas are behind the required LSN — say, during a network partition or a bulk import — the router has no good option: it either blocks (waits for a replica to catch up, hurting latency), falls back to the primary (defeating the purpose of replicas), or routes stale anyway (breaking the guarantee). You must define a max-wait timeout and a fallback policy, and communicate clearly to the team which guarantee actually holds under load.

- An interview-style question to ponder — You're designing a social app where users can change their username. After the change, the user is redirected to their profile page. Occasionally they see their old username for a few seconds. You have one primary and five read replicas with average replication lag of 80ms but spikes to 2 seconds. How would you apply write tokens to fix this, and what happens to your five replicas during a 2-second lag spike?

Stuck? Show a hint

Start by asking: who holds the token, and who checks it? Then think about what the router must do when no replica qualifies — you'll find that choice forces you to explicitly trade off latency against the consistency guarantee.

Show answer

After the username write, embed the primary's LSN in the user's session cookie (a small file stored in the browser) and pass it with the profile-page read request; the router picks any replica at or beyond that LSN.

  • The write returns LSN 9201. The session cookie is updated: { min_lsn: 9201 }. On the next request, the router polls each replica's current LSN — a cheap metadata call — and selects one that has caught up.
  • During a normal 80ms lag, all five replicas catch up well within a single page-load round-trip (~200–300ms), so the user never notices any slowdown.
  • During a 2-second spike, zero replicas qualify. You now choose: wait up to, say, 300ms then fall back to the primary (good UX, costs primary read capacity), or return stale data with a "refreshing…" UI hint (cheap, but visible glitch). Most production systems choose the primary fallback with a short timeout — the primary is already handling the write load, and one extra read per affected user is negligible.
  • But why not just always read from the primary? Because that eliminates horizontal read scalability entirely — the entire point of replicas is to absorb read traffic, and on a busy social app reads outnumber writes 100:1 or more. Falling back to the primary is a safety valve, not the default path.
  • Watch out: replica LSN metadata must itself be fresh — if replicas report their LSN lazily (e.g., every 500ms), your router may route to a replica that thinks it's caught up but isn't.