Backpressure: Slowing Producers Down Before Your System Falls Over
Core concept
When a fast producer sends data faster than a slow consumer can process it, intermediate buffers fill up, memory balloons, and eventually the system crashes or starts dropping messages silently. Backpressure is the discipline of having the consumer signal its capacity limits upstream so the producer slows down proactively, rather than letting the problem explode at the buffer. It trades throughput for stability: the system runs slower but never falls over. Think of it as a water pipe with a valve — the valve doesn't stop flow, it regulates it.
Diagram
flowchart LR
P["Producer\n(fast)"] -->|"sends requests"| Q["Queue / Buffer"]
Q -->|"feeds work"| C["Consumer\n(slow)"]
C -->|"backpressure signal\n(slow down!)"| P
Q -->|"buffer full"| D["Drop / Block / Error"]
Concrete real-world example
Imagine a payment service that receives 50,000 events per second from a message bus (a system for passing data between services), but can only write 10,000 per second to its database. Without backpressure, the in-memory queue absorbs the difference — 40,000 messages every second. After 10 seconds you have 400,000 messages sitting in RAM. The service runs out of memory and crashes, losing everything in the queue. With backpressure, the consumer tells the message bus "I can only accept 10,000/sec"; the message bus in turn tells upstream API servers to pause or reject new requests. The queue never grows unbounded. TCP (the internet's core data-transfer protocol) does exactly this via its "receive window" field — a number in every packet telling the sender how many more bytes the receiver can currently accept.
One trade-off / gotcha
Backpressure propagates the problem upstream, it doesn't solve it. If the root producer can't actually slow down — say, it's an external payment network sending live transactions — then backpressure just shifts where things break: requests pile up or get rejected at the edge instead of internally. You must combine backpressure with load shedding (intentionally dropping low-priority work) or elastically scaling the consumer. Backpressure alone is not a capacity solution; it's a safety valve.
An interview-style question to ponder
You're designing a log ingestion pipeline. A fleet of 10,000 servers each stream logs at variable rates into a central processing cluster. Sometimes the cluster falls behind. How would you implement backpressure across this system, and what happens to logs during the slowdown period — do you block, drop, or buffer them, and at which layer?
Stuck? Show a hint
Start by asking: who in this system can actually slow down, and who can't? That determines where the backpressure signal is useful and where you need a different strategy for the excess.
Show answer
The right design applies backpressure between the central cluster and a durable intermediate buffer, not directly to the 10,000 servers.
- Application servers should never be blocked waiting for a log pipeline — they have user traffic to serve. So you insert a durable queue (a persistent disk-backed buffer, like a log-structured message store) between the servers and the processing cluster. Servers fire-and-forget into the queue.
- The processing cluster reads from the queue at its own pace and signals backpressure to the queue, not the servers. The queue absorbs the burst on disk instead of in RAM — disk is cheap and survives crashes; RAM is not and does not.
- During a slowdown, logs are buffered in the durable queue, not dropped and not blocking servers. Delivery is delayed, not lost. The tradeoff is increased latency (logs appear in dashboards later) and the cost of enough disk capacity to hold the burst.
- But why not just drop logs during slowdown? For debugging and compliance, dropped logs are often unacceptable. The durable queue lets you trade latency for completeness. If your SLA genuinely allows loss (e.g., high-volume metrics with sampling), then dropping at the queue edge with a counter is simpler and cheaper.
- Watch out: if the slowdown is chronic (producer rate permanently exceeds consumer rate), the queue grows without bound and you're just delaying the crash — you must scale the consumer or shed load.