All courses > Technology and Programming > Cloud Computing and Web Servers ::

Keep-Alive and Connection Reuse: Reducing Latency Across Requests

Capítulo 6

Estimated reading time: 12 minutes

What “Keep-Alive” and Connection Reuse Mean

When a browser (or any HTTP client) talks to a server, it needs an underlying transport connection. Creating that connection costs time: there is setup work before the first byte of the HTTP response can arrive. “Keep-Alive” is the general idea of not closing the underlying connection immediately after one request/response pair, so the same connection can be reused for subsequent requests to the same origin. “Connection reuse” is the practical outcome: multiple HTTP requests are sent over an already-established connection, avoiding repeated setup overhead and reducing latency.

In modern web stacks, connection reuse shows up in several forms:

HTTP/1.1 persistent connections: multiple requests can be sent sequentially over one TCP connection (one at a time, unless pipelining is used, which is generally avoided).
HTTP/2 multiplexing: many concurrent request/response streams share a single TCP connection.
HTTP/3 multiplexing: many concurrent streams share a QUIC connection (over UDP), with different performance characteristics but the same high-level goal: reuse a single established session.

Even though the term “Keep-Alive” is often associated with HTTP/1.1 headers, the broader performance concept applies across HTTP versions: keep the session open and reuse it to reduce repeated latency costs.

Why Reuse Reduces Latency

Connection setup is not free

Reusing a connection avoids repeating setup steps that must happen before application data can flow. Depending on your protocol stack, setup can include:

Transport setup: establishing the underlying connection state (for TCP this includes a handshake).
TLS setup: negotiating encryption parameters and validating certificates (for HTTPS). Even with session resumption, there is still some work.
Congestion control warm-up: a new connection typically starts cautiously and ramps up sending rate; a reused connection may already have a larger congestion window, allowing faster delivery of larger responses.

From a user’s perspective, these costs show up as extra delay before the first response bytes arrive, and sometimes slower throughput early in the transfer.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Web pages trigger many requests

A single page view often requires fetching multiple resources (HTML, CSS, JS, images, fonts, API calls). If each resource required a fresh connection, the overhead would multiply. Connection reuse amortizes the setup cost across many requests.

Fewer connections also reduces server and network overhead

Each open connection consumes resources: file descriptors, memory buffers, TLS state, and CPU time. Reuse can reduce the total number of concurrent connections needed to serve the same workload. That can improve stability and reduce tail latency under load.

HTTP/1.1 Keep-Alive in Practice

Default behavior and headers

In HTTP/1.1, persistent connections are the default: a connection stays open unless either side indicates it will close it. The most common signal is the Connection: close header in a response or request. If that header is absent, the client will typically assume the connection can be reused.

Some servers also send a Keep-Alive header (for example, to advertise a timeout or max requests). Note that the Keep-Alive header is not required by the core HTTP/1.1 spec for persistence; it is an optional hint used by some implementations.

GET /app.css HTTP/1.1
Host: example.com

HTTP/1.1 200 OK
Content-Type: text/css
Content-Length: 1234

...bytes...

If the server instead intends to close the connection after the response, it will typically include:

HTTP/1.1 200 OK
Connection: close
Content-Length: 1234

...bytes...

Important limitation: one request at a time (usually)

With HTTP/1.1 on a single connection, the common pattern is: send request, wait for response, then send the next request. HTTP/1.1 technically supports pipelining (sending multiple requests without waiting), but it is widely disabled in browsers due to head-of-line blocking and intermediary compatibility issues. In practice, browsers open multiple parallel connections per origin to increase concurrency, but still reuse each connection for multiple sequential requests.

HTTP/2 and HTTP/3: Reuse Plus Multiplexing

Multiplexing changes the performance story

HTTP/2 and HTTP/3 are designed to run many requests concurrently over a single connection. This improves performance by reducing the need for multiple parallel connections and by allowing the client to keep one “hot” connection open.

HTTP/2: multiple streams share one TCP connection. This reduces connection count and improves reuse, but TCP-level head-of-line blocking can still occur if packets are lost (loss affects the whole connection).
HTTP/3: multiple streams share one QUIC connection. QUIC is designed so that loss on one stream does not block others in the same way, improving performance on lossy networks.

For a web server operator, the key operational point is: keep-alive and reuse are even more valuable with HTTP/2/3 because a single long-lived connection can carry a large fraction of a user’s activity to your origin.

Where Keep-Alive Is Configured (Client, Reverse Proxy, Origin)

Connection reuse is a chain: the client connects to something (often a reverse proxy or load balancer), and that component connects to your upstream application server. You can have reuse on the “front side” (client ↔ edge) and on the “back side” (proxy ↔ upstream). Tuning only one side can still leave latency on the other.

Client ↔ reverse proxy: controlled by the proxy’s HTTP keep-alive timeouts, max requests, HTTP/2 settings, and whether it closes idle connections.
Reverse proxy ↔ upstream: controlled by upstream keep-alive pools, idle timeouts, and whether the upstream supports persistent connections reliably.

A common performance pitfall is having excellent client-side reuse (HTTP/2 to the edge) but poor upstream reuse (the proxy opens a new upstream connection for each request). That can shift latency and CPU cost to your internal network and app servers.

Step-by-Step: Verify Connection Reuse From the Command Line

1) Check whether a server closes connections

You can use curl to see whether the server signals connection closure. Run a request and inspect response headers:

curl -I https://example.com/

If you see Connection: close, the server is telling the client not to reuse the connection. If you do not see it, the connection may be reusable (subject to timeouts and intermediaries).

2) Force HTTP/1.1 and observe reuse behavior

To test HTTP/1.1 specifically:

curl --http1.1 -v https://example.com/asset1.js https://example.com/asset2.css

In verbose output, look for whether curl reports “Re-using existing connection” for the second request. If it opens a new connection, something is preventing reuse (server closing, proxy behavior, or client settings).

3) Test HTTP/2 multiplexing

To see whether HTTP/2 is negotiated and used:

curl --http2 -v https://example.com/

In verbose output, you should see that HTTP/2 is in use. While curl won’t always make multiplexing obvious with simple commands, confirming HTTP/2 is a prerequisite for multiplexed reuse in browsers.

4) Measure impact with timing breakdown

curl can show timing phases. Run twice: first with a cold connection, then immediately again (warm connection). Compare:

curl -o /dev/null -s -w "connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total}\n" https://example.com/

On the second run, if the connection is reused, you may see reduced connect/TLS times (sometimes near zero depending on how the client reports reuse). Even if those numbers don’t drop to zero, you can still see improvements in total time due to warmed congestion control and cached TLS sessions.

Step-by-Step: Tune Keep-Alive on a Reverse Proxy (Nginx Example)

Exact directives vary by server, but the concepts are consistent: allow idle keep-alive long enough to be useful, but not so long that you waste resources; and maintain an upstream keep-alive pool so the proxy can reuse connections to your app.

1) Enable reasonable client-side keep-alive

In Nginx, client keep-alive is controlled by keepalive_timeout (and related settings). Example:

http {  keepalive_timeout 30s;  keepalive_requests 1000;}

keepalive_timeout: how long an idle client connection is kept open.
keepalive_requests: how many requests a client can send over one connection before Nginx closes it (helps bound resource usage).

Practical guidance: start with 15–60 seconds for typical web traffic. Very short timeouts reduce reuse on high-latency mobile networks; very long timeouts can increase idle connection count.

2) Enable upstream keep-alive (proxy ↔ app)

Without upstream keep-alive, Nginx may open a new connection to the upstream for each request (depending on configuration and upstream behavior). Configure an upstream block with keepalive:

upstream app_upstream {  server 10.0.0.10:8080;  server 10.0.0.11:8080;  keepalive 64;}server {  location / {    proxy_http_version 1.1;    proxy_set_header Connection "";    proxy_pass http://app_upstream;  }}

Key points:

keepalive 64 creates a pool of idle upstream connections per worker process.
proxy_http_version 1.1 is often needed to use persistent connections to upstreams.
proxy_set_header Connection "" prevents sending Connection: close or other hop-by-hop values that can disable reuse.

After enabling this, the proxy can reuse upstream connections across many client requests, reducing upstream latency and CPU overhead.

3) Validate with logs/metrics

To confirm improvement, watch:

Upstream connection rate: new upstream connections per second should drop.
Upstream response time: should improve, especially for small requests.
CPU usage: TLS termination at the edge may dominate, but upstream CPU often drops when connection churn is reduced.

Common Pitfalls That Prevent Reuse

Misaligned idle timeouts across layers

Reuse depends on the connection staying open long enough to be used again. If any layer closes idle connections too aggressively, reuse collapses. Typical layers include: browser, corporate proxy, CDN/edge, load balancer, reverse proxy, application server.

Example failure mode: the reverse proxy keeps client connections for 60 seconds, but the upstream application server closes idle connections after 5 seconds. The proxy will frequently attempt reuse and then discover the upstream has closed the connection, causing retries or new connections and adding latency.

Connection: close accidentally injected

Some frameworks or intermediaries add Connection: close under certain conditions (errors, specific routes, misconfigured middleware). If you see sporadic lack of reuse, check whether only certain responses include that header.

Too many parallel connections (HTTP/1.1)

With HTTP/1.1, browsers often open multiple connections per origin to increase concurrency. If your keep-alive timeout is too long and traffic is high, you can end up with many idle connections. If it is too short, you lose reuse. Balance is workload-dependent.

Upstream not safe for keep-alive

Some upstream servers mishandle persistent connections (for example, not properly reading request bodies, leaving bytes in the stream, or closing unexpectedly). Symptoms include intermittent 502/504 errors at the proxy, truncated responses, or “upstream prematurely closed connection” logs. In such cases, fix the upstream HTTP handling rather than disabling keep-alive globally.

How Keep-Alive Interacts With TLS and Session Resumption

With HTTPS, keeping a connection open avoids repeating the TLS handshake entirely. If a new connection is needed, TLS session resumption can reduce handshake cost, but it is still typically slower than reusing an already-established connection. Also, session resumption is not guaranteed: it can be affected by server configuration, load balancers, and certificate/key rotation strategies.

Operationally, this means:

Improving keep-alive effectiveness often yields more consistent latency reductions than relying on resumption alone.
Long-lived HTTP/2/3 connections can significantly reduce TLS handshake frequency for active users.

Designing Good Keep-Alive Policies

Choose timeouts based on traffic patterns

Keep-alive is most beneficial when a client makes multiple requests separated by short gaps (loading a page, navigating within a site, polling APIs). If your typical inter-request gap is small, a longer idle timeout increases reuse. If clients are mostly one-off requests, long timeouts just accumulate idle connections.

Practical approach:

Start with a moderate idle timeout (for example 30s).
Measure idle connection counts and memory usage.
Adjust upward if reuse is low and you see frequent reconnects; adjust downward if you see excessive idle connections without performance benefit.

Limit requests per connection when needed

Some environments cap the number of requests per connection to mitigate long-lived connection issues (memory leaks in upstreams, uneven load distribution, or to encourage periodic refresh of state). This is what keepalive_requests (or similar settings) does. Set it high enough to allow meaningful reuse, but not infinite if you have operational reasons to recycle connections.

Plan for load balancing behavior

Connection reuse can interact with load balancing: if a client keeps one connection open for a long time, its traffic may “stick” to a particular edge node or backend (depending on where load balancing happens). With HTTP/2, a single connection can carry many requests, increasing the stickiness effect. This is not inherently bad, but it affects how evenly traffic spreads.

If you rely on even distribution across many backends, ensure your load balancer strategy and upstream keep-alive pool settings match your goals. For example, a reverse proxy with a small upstream keep-alive pool might concentrate traffic on fewer upstream connections; a larger pool can spread reuse across more backends.

Observability: How to Tell Whether Keep-Alive Is Working

Client-side signals

In browser developer tools (Network tab), you can often see whether requests are served over HTTP/2 and whether connections are reused. While the exact UI varies, look for:

Protocol column showing h2/h3 vs http/1.1.
Timing breakdowns where “Stalled/Connecting” time is low after the first request.
Fewer total connections to the same origin during a page load.

Server-side signals

On the server or reverse proxy, useful indicators include:

New connection rate vs request rate: if requests are high but new connections are low, reuse is effective.
Handshake CPU: TLS handshake rate and CPU time should drop when reuse improves.
Upstream connect time: many proxies can log upstream connect time separately; it should decrease with upstream keep-alive.

Tail latency and error rates

Keep-alive tuning can affect tail latency (p95/p99). Too-aggressive timeouts can cause bursts of reconnects, increasing tail latency. Too-long timeouts can increase resource pressure, also increasing tail latency. Watch p95/p99 response times and connection-related errors while tuning.

Practical Scenarios and What to Do

Scenario: API clients making frequent small requests

If you have mobile apps or backend services calling your API repeatedly, keep-alive can dramatically reduce latency and battery/CPU usage. Actions:

Ensure the edge supports persistent connections and does not inject Connection: close.
Set a keep-alive idle timeout that covers typical request intervals (for example, 30–120s depending on polling patterns).
If you control the client, use an HTTP client with connection pooling enabled and a reasonable max idle time.

Scenario: Reverse proxy to upstream app shows high upstream connection churn

Actions:

Enable upstream keep-alive pooling at the proxy.
Confirm upstream supports HTTP/1.1 persistent connections correctly.
Align upstream idle timeout to be slightly higher than the proxy’s expected reuse window, or ensure the proxy detects and refreshes stale connections safely.

Scenario: Many idle connections causing memory pressure

Actions:

Reduce keep-alive timeout gradually and observe impact on reconnect rate and latency.
Set sensible caps: max requests per connection and max keep-alive connections (where supported).
Prefer HTTP/2/3 for clients when possible, because fewer connections can carry the same concurrency.

Now answer the exercise about the content:

Why does reusing an existing HTTP connection typically reduce latency when loading a page with many resources?

You are right! Congratulations, now go to the next page

You missed! Try again.

Keeping the connection open lets multiple requests share an already-established session, avoiding repeated transport and TLS setup and benefiting from a warmed congestion window. This reduces time before response bytes arrive, especially when many resources are fetched.