Why IO Throughput Is Different from CPU Throughput
IO-heavy programs fail in ways that CPU-heavy programs often do not. With CPU work, adding threads or optimizing hot loops can increase throughput until you hit core limits. With IO, the limiting factor is frequently external: disk bandwidth, network RTT, kernel buffers, remote service quotas, or downstream consumers. The result is that “faster” producers can make the whole system slower by creating queues, memory growth, timeouts, and retries.
Two concepts matter most for high-throughput IO pipelines: (1) understanding where bytes are buffered and copied, and (2) applying backpressure so producers do not outrun consumers. Backpressure-aware design is not just about speed; it is a safety property that prevents unbounded memory usage and stabilizes latency.
Key terms used in this chapter
- Producer: reads from a source (socket, file, stdin, message queue) and emits items/bytes.
- Consumer: writes to a sink (socket, file, database, stdout) or performs a slow operation.
- Pipeline stage: a step that transforms or routes data between producer and consumer.
- Backpressure: a mechanism by which slow consumers cause producers to slow down or stop temporarily.
- Bounded queue: a buffer with a maximum size; when full, producers block or shed load.
- Throughput: items/bytes per second.
- Latency: time for an item to traverse the pipeline.
Where Throughput Goes: Buffers, Copies, and System Calls
IO throughput is often dominated by overhead outside your language runtime: system calls, context switches, kernel/user copies, TLS encryption, and wakeups. You can usually improve throughput by reducing the number of syscalls and copies, and by batching work to amortize overhead.
Practical levers that apply across languages
- Batch reads/writes: read larger chunks (e.g., 64 KiB) rather than tiny reads; write in fewer calls.
- Use buffered IO: language-level buffering reduces syscalls; but ensure buffers are bounded in pipelines.
- Avoid per-item flushing: flushing forces syscalls and can defeat batching.
- Prefer streaming transforms: process chunks incrementally rather than loading entire payloads.
- Limit concurrency to what the sink can handle: more parallelism can increase contention and queueing.
Backpressure as a First-Class Design Constraint
Backpressure means the pipeline has a feedback path from consumer to producer. Without it, a fast producer can enqueue indefinitely, causing memory growth and eventually GC pressure, swapping, or OOM. With it, the system stabilizes: queues remain bounded, and latency does not explode under load.
Common backpressure mechanisms
- Blocking on bounded queues: producers block when the queue is full. This is simple and effective for in-process pipelines.
- Async await points: in event loops, awaiting a full buffer naturally pauses producers.
- Credits / permits: consumers grant “tokens” to producers; each item consumes a token. This is common in high-performance networking.
- Windowing: allow only N in-flight requests; when responses arrive, open more slots.
- Load shedding: when overloaded, drop or reject work early (e.g., return 429, drop debug logs).
Backpressure is not optional when the source can outpace the sink. If you cannot slow the source (e.g., UDP), you must shed load or increase buffering with explicit limits and loss policies.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Designing a Backpressure-Aware Pipeline: A Step-by-Step Recipe
Step 1: Identify the slowest stage and the “true sink”
The sink is where data ultimately must go: a remote API, a database, disk, or a downstream service. The slowest stage is often the sink, but it can also be a transform (e.g., compression, encryption) or a rate-limited API. Measure or estimate the sink’s sustainable throughput and latency.
Step 2: Choose a unit of flow control
Backpressure can be applied per item, per byte, or per request. For variable-sized payloads, per-byte control is safer. For uniform items, per-item is simpler.
- Per-item: easy; risk if items vary widely in size.
- Per-byte: more accurate; requires tracking sizes and buffer occupancy.
Step 3: Insert bounded buffers between stages
Buffers smooth bursts and allow decoupling, but they must be bounded. Decide buffer sizes based on acceptable memory usage and desired burst tolerance. A useful mental model is: buffer capacity ≈ (burst duration) × (sink throughput).
Step 4: Cap concurrency explicitly
For IO to remote services, the most important knob is often “in-flight requests.” Too few wastes capacity; too many increases tail latency and triggers throttling. Use a semaphore/permit pool to limit concurrency.
Step 5: Define overload behavior
When the system is saturated, decide what happens:
- Block: best for internal batch jobs; can cause upstream timeouts in services.
- Reject early: best for request/response servers; return an error quickly.
- Drop: best for telemetry/logging where loss is acceptable.
Step 6: Make backpressure observable
Backpressure should show up in metrics: queue depth, time spent blocked/awaiting, in-flight counts, and drop/reject counts. This is how you verify that the pipeline is stable under load.
Python: Async Streams, Bounded Queues, and Flow Control
In Python, backpressure-aware IO is commonly built with asyncio. The event loop naturally supports backpressure when you await on bounded queues or stream drains.
Pattern 1: Producer/consumer with a bounded asyncio.Queue
import asyncio
SENTINEL = object()
async def producer(reader: asyncio.StreamReader, q: asyncio.Queue, chunk_size=65536):
while True:
data = await reader.read(chunk_size)
if not data:
break
await q.put(data) # backpressure when q is full
await q.put(SENTINEL)
async def consumer(writer: asyncio.StreamWriter, q: asyncio.Queue):
while True:
item = await q.get()
if item is SENTINEL:
break
writer.write(item)
await writer.drain() # backpressure from socket buffer
async def pipe(reader, writer):
q = asyncio.Queue(maxsize=8) # bounded buffer
await asyncio.gather(producer(reader, q), consumer(writer, q))Notes:
Queue(maxsize=8)bounds memory. Each element is a chunk; total buffered bytes ≈ 8 × chunk_size.writer.drain()is essential; it yields control until the transport buffer has room.- Chunk size is a throughput knob; too small increases overhead, too large increases latency and memory spikes.
Pattern 2: Limiting in-flight requests with a semaphore
import asyncio
async def fetch_one(session, url, sem):
async with sem:
async with session.get(url) as resp:
return await resp.read()
async def fetch_all(session, urls, max_in_flight=100):
sem = asyncio.Semaphore(max_in_flight)
tasks = [asyncio.create_task(fetch_one(session, u, sem)) for u in urls]
return await asyncio.gather(*tasks)This prevents “task storms” that overwhelm DNS, sockets, or the remote service. If you also need bounded memory, combine this with streaming processing of responses rather than storing all results.
Ruby: Bounded Queues, Fibers, and IO Write Backpressure
Ruby’s IO objects provide buffering, but you still need explicit backpressure between pipeline stages. A straightforward approach uses SizedQueue for bounded buffering. For network IO, write can block; for nonblocking sockets you must handle partial writes.
Pattern 1: Threaded pipeline with SizedQueue
require 'thread'
SENTINEL = Object.new
def producer(io_in, q, chunk_size: 65536)
while (data = io_in.read(chunk_size))
q.push(data) # blocks when full
end
q.push(SENTINEL)
end
def consumer(io_out, q)
loop do
item = q.pop
break if item.equal?(SENTINEL)
io_out.write(item) # may block; provides natural backpressure to this thread
end
end
q = SizedQueue.new(8)
t1 = Thread.new { producer(STDIN, q) }
t2 = Thread.new { consumer(STDOUT, q) }
[t1, t2].each(&:join)Notes:
SizedQueueis the backpressure boundary; memory is bounded by queue capacity × chunk size.- For CPU-heavy transforms, add more stages with their own bounded queues, but keep an eye on total buffering across stages.
Pattern 2: Handling partial writes for nonblocking sockets
If you use nonblocking IO, a single write may write fewer bytes than requested. Backpressure becomes explicit: you must retry when the socket is writable.
def write_all(sock, data)
total = 0
while total < data.bytesize
begin
n = sock.write_nonblock(data.byteslice(total..-1))
total += n
rescue IO::WaitWritable
IO.select(nil, [sock])
retry
end
end
endThis pattern prevents silent truncation and naturally slows the producer when the socket cannot accept more data.
Java: NIO, Reactive Streams, and Explicit Demand
Java offers multiple levels of IO abstraction. For throughput and backpressure, two families are common: (1) NIO channels/selectors for low-level control, and (2) Reactive Streams (Flow API) for structured backpressure across stages.
Pattern 1: Backpressure with Java Flow (Reactive Streams)
The key idea is demand: the subscriber requests N items; the publisher must not emit more than requested. This is backpressure by contract.
import java.util.concurrent.Flow;
import java.util.concurrent.SubmissionPublisher;
class BytesSubscriber implements Flow.Subscriber<byte[]> {
private Flow.Subscription sub;
private final int batch;
BytesSubscriber(int batch) { this.batch = batch; }
public void onSubscribe(Flow.Subscription subscription) {
this.sub = subscription;
sub.request(batch); // initial demand
}
public void onNext(byte[] item) {
// write to sink (could be blocking or async)
// ...
sub.request(1); // request next when ready
}
public void onError(Throwable t) { t.printStackTrace(); }
public void onComplete() { }
}
SubmissionPublisher<byte[]> pub = new SubmissionPublisher<>(Runnable::run, 8);
pub.subscribe(new BytesSubscriber(8));Notes:
SubmissionPublisherhas a bounded buffer capacity (second argument). When full,submitcan block or drop depending on configuration.- Real systems often use libraries (Reactor/RxJava/Akka Streams) that provide richer operators and backpressure-aware IO integrations, but the core concept is the same: demand-driven flow.
Pattern 2: Limiting in-flight async operations with a semaphore
Even without Reactive Streams, you can enforce backpressure by limiting in-flight requests.
import java.util.concurrent.*;
Semaphore sem = new Semaphore(100);
ExecutorService pool = Executors.newFixedThreadPool(32);
CompletableFuture<byte[]> fetchAsync(String url) {
return CompletableFuture.supplyAsync(() -> {
sem.acquireUninterruptibly();
try {
// do IO
return new byte[0];
} finally {
sem.release();
}
}, pool);
}This prevents unbounded task submission and stabilizes memory use.
C: Kernel Buffers, Nonblocking IO, and Explicit Backpressure
In C, you often build backpressure by controlling read readiness and write readiness. With nonblocking sockets and epoll/kqueue, you only read when you have buffer space and only write when the socket is writable. This is the most explicit form of backpressure: you do not pull data from the kernel if you cannot push it onward.
Pattern: Event loop with bounded ring buffer
The pipeline is: socket read → ring buffer → socket write. The ring buffer is bounded; when it is near full, you stop reading (disable read interest) until you drain.
// Pseudocode sketch (Linux epoll style)
#define CAP (1<<20) // 1 MiB ring buffer
static unsigned char buf[CAP];
static size_t head = 0, tail = 0; // [tail, head) contains data
size_t used() { return (head - tail) % CAP; }
size_t free_space() { return CAP - 1 - used(); }
void on_readable(int fd) {
if (free_space() < 65536) {
// apply backpressure: stop reading until drained
disable_epollin(fd);
return;
}
// read into ring buffer (handle wrap)
// ...
}
void on_writable(int fd) {
if (used() == 0) {
disable_epollout(fd);
enable_epollin(fd);
return;
}
// write from ring buffer (handle partial writes)
// ...
if (free_space() >= 65536) enable_epollin(fd);
}Notes:
- Backpressure is implemented by toggling read interest based on buffer occupancy.
- Partial writes are normal; you must track how many bytes were written and keep the remainder queued.
- This pattern generalizes to multi-stage pipelines by having bounded buffers between stages and only pulling when downstream has capacity.
Choosing Buffer Sizes and Concurrency Limits
There is no universal best buffer size. Use these practical heuristics:
- Chunk size: start with 32–256 KiB for file/network streaming. Smaller chunks reduce latency but increase syscall overhead; larger chunks increase burstiness and memory spikes.
- Queue capacity: start with 4–32 chunks. This allows some decoupling without hiding sustained overload.
- In-flight requests: start with a value that matches the sink’s parallelism (e.g., DB connection pool size, remote service concurrency limits). Increase until throughput stops improving or tail latency worsens.
A useful safety check is to compute worst-case buffered bytes: sum over all bounded queues (capacity × max item size). Ensure this fits comfortably within your memory budget, including overhead.
Pipeline Patterns That Preserve Throughput Under Load
Fan-out with bounded concurrency
When one input item triggers multiple downstream calls, the risk of amplification is high. Use a semaphore to cap fan-out and a bounded queue to cap pending work. If overload occurs, reject early rather than letting amplification build an unbounded backlog.
Fan-in with fairness
When multiple producers feed one consumer, a single hot producer can starve others. Use per-producer queues with a global cap, or use a fair queueing strategy (round-robin) so backpressure is applied evenly.
Spooling to disk as a pressure relief valve
Sometimes the sink is intermittently slow, but you cannot block the source (e.g., you must accept requests). A bounded in-memory queue plus an on-disk spool can provide durability and smoothing. The critical part is still backpressure: the spool must be bounded by disk quota, and the system must reject or shed load when the spool is full.
Failure Modes and How Backpressure Prevents Them
Unbounded buffering and latency collapse
Without backpressure, queues grow, GC/allocator work increases, and latency becomes dominated by waiting in line rather than processing. Bounded queues force the system to reveal overload immediately: producers block or requests are rejected.
Retry storms
When a downstream service slows, naive clients retry aggressively, multiplying traffic and worsening the slowdown. Backpressure-aware clients limit in-flight requests and apply retry budgets. A practical rule: retries must consume the same permits/credits as original requests, and exponential backoff should be combined with a maximum retry count.
Head-of-line blocking in mixed workloads
If large items and small items share the same queue, large items can delay many small ones. Mitigations include separate queues by class, size-based scheduling, or chunking large items into smaller frames so they interleave fairly.
Practical Checklist for Implementing a Backpressure-Aware IO Pipeline
- Define the sink and its sustainable throughput; treat it as the system’s speed limit.
- Pick a flow-control unit (items or bytes) and enforce it consistently.
- Use bounded buffers between stages; compute worst-case buffered memory.
- Limit in-flight IO with semaphores/permits; avoid unbounded task creation.
- Ensure writes handle partial writes and apply drain/await semantics where available.
- Decide overload behavior (block, reject, drop) and implement it explicitly.
- Expose queue depth, blocked time, in-flight counts, and drop/reject counts as metrics.