Write-Behind (Write-Back) Caching: Deferring Persistence for Speed
Core concept
In write-behind caching, writes go to the cache immediately and are acknowledged to the client, while the actual persistence to the backing store (database, disk) happens asynchronously in the background. This contrasts with write-through (where cache and DB are updated synchronously on every write) and write-around (where writes bypass the cache entirely). The cache layer batches or queues dirty writes and flushes them to the DB after a configurable delay or when certain conditions are met (buffer full, time elapsed). The result is dramatically lower write latency and reduced DB write pressure, but at the cost of durability — if the cache node crashes before flushing, those writes are lost.
Concrete real-world example
Consider a real-time multiplayer game updating player scores every few seconds. With write-through, every score tick hammers the DB — thousands of writes per second per player. With write-behind, Redis (an in-memory data store) holds the current score in-memory and flushes to PostgreSQL every 5 seconds or when a match ends. The DB sees 100x fewer writes; players see instant score updates. This is also how Linux's page cache works: write() syscalls hit kernel memory, fsync() forces the actual disk flush.
One trade-off / gotcha
Data loss window vs. throughput is a hard dial, not a free lunch. If the cache node fails between a write and a flush, you lose data with zero visibility to the client (they already got a success response). This makes write-behind inappropriate for financial transactions or any system where loss is unacceptable. You must pair it with strategies like a Write-Ahead Log (WAL) in the cache tier, replication of the dirty queue, or accepting bounded loss explicitly in your SLA.
An interview-style question to ponder
You're designing a ride-sharing app where driver location is updated every 2 seconds from millions of drivers. Would you use write-through, write-behind, or write-around for persisting these locations — and what failure scenarios must your choice explicitly handle?
Stuck? Show a hint
Don't start from the three cache modes — start from the data itself. Ask how durable a single location update really needs to be, and how quickly the next update (2 seconds later) makes the previous one worthless. The more disposable the data, the more aggressively you can defer persistence for speed.
Show answer
Use write-behind: writes hit an in-memory store like Redis and flush to the database asynchronously, because driver location is high-volume, ephemeral, and loss-tolerant.
- Why not write-through: it would force millions of synchronous DB writes per second for data that is already stale within 2 seconds — the database becomes a pointless bottleneck for values nobody needs durably.
- Why not write-around: the matching and ETA reads need the latest location served straight from cache, but write-around skips the cache on writes, so those reads would miss or go stale.
- Failure scenario you must handle: a cache node crash silently loses the last few seconds of positions, so you lean on drivers continuously re-reporting to self-heal, and you partition the keyspace (e.g. by geohash) so one node failure blanks one region, not the whole map.
- But why is losing data acceptable here? Because a dropped 2-second update is harmless the instant the next one arrives — the data is disposable by nature, unlike an order or a payment.
- Watch out: never treat an unflushed in-memory location as a system of record — do not bill, dispatch, or settle disputes off a value that may vanish on crash.