All courses > Technology and Programming > Cloud Computing and Web Servers ::

Containerized Web Servers and Reverse Proxy Patterns for Web Workloads

Capítulo 3

Estimated reading time: 14 minutes

What “containerized web server” means in practice

A containerized web server is a web-serving process (for example NGINX, Apache httpd, Caddy, Envoy, or an application server like Node.js) packaged with its runtime dependencies into an immutable image. The container image is built once, then run consistently across environments. In web workloads, the container boundary is especially useful because it forces you to be explicit about ports, filesystem layout, configuration injection, and how the process handles signals and connections.

In practice, containerizing a web server is less about “put it in Docker” and more about designing predictable runtime behavior: the server should start quickly, expose a health endpoint, write logs to stdout and stderr, avoid storing mutable state in the image filesystem, and terminate gracefully so that in-flight requests are not dropped. These properties matter whether the container runs alone, behind a reverse proxy, or as part of a service mesh.

Common container patterns for web servers

Single-process container: app server serves HTTP directly

In this pattern, your application process terminates HTTP connections itself (for example, a Go HTTP server, Node.js Express, Python Gunicorn/Uvicorn). It is simple and often sufficient when you rely on an external reverse proxy (Ingress controller, gateway, or sidecar) for TLS, retries, and advanced routing.

Pros: fewer moving parts, smaller images, simpler debugging.
Cons: you must implement or configure production-grade behaviors (timeouts, keep-alive, request limits, compression, static assets) in the app stack or elsewhere.

Two-tier container approach: app server + reverse proxy

This approach uses a reverse proxy (often NGINX or Envoy) to sit in front of the app server. In Kubernetes, you typically do this as two containers in the same Pod (a “sidecar” reverse proxy) or as separate Deployments (a dedicated proxy layer). The proxy handles TLS termination (when not done at the edge), buffering, header normalization, request size limits, and caching of static content, while the app focuses on business logic.

Pros: consistent HTTP behavior across services, good performance for static assets, strong control over timeouts and buffering.
Cons: more configuration, more resources, and more places where misconfiguration can cause 502/504 errors.

Static web server container: serve built assets

For SPAs and static sites, a common pattern is to build assets in one stage and serve them from a minimal web server image (NGINX, Caddy, or even a tiny HTTP server). The key is to separate build-time dependencies from runtime, producing a small, secure image that contains only the compiled assets and server config.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Reverse proxy patterns for web workloads

Pattern 1: Edge reverse proxy (north-south traffic)

An edge reverse proxy sits at the boundary of your cluster or platform and handles incoming traffic from users or external systems. It typically performs TLS termination, HTTP to HTTPS redirects, host and path routing, and sometimes WAF-like filtering. Even if you later adopt a service mesh, the edge proxy remains the primary entry point for “north-south” traffic.

Operationally, this pattern centralizes certificate management and request routing rules, but it can become a bottleneck for configuration ownership: teams may need a process for safely updating routes without impacting other services.

Pattern 2: Service-level reverse proxy (east-west traffic)

A service-level reverse proxy sits close to the application and handles “east-west” traffic between services. This can be a dedicated proxy Deployment shared by multiple apps, or a per-Pod sidecar. The proxy can enforce consistent timeouts, retries, circuit breaking, and observability, reducing the burden on application code.

When implemented as a sidecar, each Pod gets its own proxy instance. This improves isolation and enables per-workload policy, but increases resource usage and operational complexity. When implemented as a shared proxy layer, you reduce per-Pod overhead but reintroduce a shared dependency that can become a single point of failure if not scaled and managed carefully.

Pattern 3: Backend-for-frontend (BFF) reverse proxy

A BFF is a reverse proxy tailored to a specific client experience (web, mobile, partner API). It aggregates calls to multiple backend services, shapes responses, and may handle authentication flows. In containerized environments, a BFF is often implemented as an application service, but it behaves like a reverse proxy from an architectural standpoint.

For web workloads, BFFs are useful when you need to minimize round trips from browsers, hide internal APIs, or enforce client-specific rate limits and caching strategies.

Pattern 4: Internal routing proxy for multi-tenant or multi-app hosting

Some platforms host many apps behind a single internal routing layer. The proxy routes based on hostnames, paths, or headers to different backends. This is common for internal developer platforms where teams deploy independently but share a common entry domain. The key risk is configuration sprawl; you need guardrails such as validation, staged rollouts, and clear ownership boundaries.

Key reverse proxy responsibilities (and what to configure)

Connection management and timeouts

Reverse proxies protect backends by controlling how long connections and requests can live. You should explicitly configure: client read timeout, client body size limits, upstream connect timeout, upstream read timeout, and idle keep-alive settings. Misaligned timeouts are a frequent cause of intermittent 504s: the proxy gives up before the backend responds, or the backend closes idle connections while the proxy expects them to remain open.

Header handling and forwarding

Proxies commonly add or forward headers such as X-Forwarded-For, X-Forwarded-Proto, and X-Request-ID. Your application should trust these headers only from known proxies. In containerized deployments, ensure the app uses the forwarded scheme and host when generating redirects and absolute URLs; otherwise, you may see redirect loops or mixed-content issues when TLS is terminated upstream.

Request buffering and upload behavior

Buffering can improve performance and protect backends from slow clients, but it can also increase memory usage and latency for streaming uploads. Decide whether you want request buffering enabled for large uploads and whether you need streaming semantics (for example, for WebSockets, SSE, or gRPC). For streaming protocols, ensure the proxy supports them and that buffering is disabled or tuned appropriately.

Compression and caching

Compression (gzip, brotli) is often best handled at the proxy to keep application code simpler. Caching is more nuanced: caching static assets at the proxy is usually safe, but caching dynamic responses requires careful cache keys, headers, and invalidation strategy. For SPAs, a common approach is to cache hashed asset filenames aggressively while serving index.html with short caching to allow rapid updates.

TLS termination and re-encryption

TLS can be terminated at the edge proxy, at a service-level proxy, or end-to-end. If you terminate TLS at the edge and forward plain HTTP internally, you simplify internal traffic but must ensure network policies and cluster boundaries are trustworthy. If you re-encrypt to backends, you gain defense in depth but add certificate distribution and handshake overhead. In service mesh environments, mTLS between services is often handled by sidecars, while the edge still terminates external TLS.

Practical step-by-step: containerizing a static web server with NGINX

Step 1: Create a minimal NGINX config for SPA routing

Single-page applications often need “history mode” routing: unknown paths should return index.html. Create an NGINX config that serves static files and falls back to index.html when a file is not found.

server {  listen 8080;  server_name _;  root /usr/share/nginx/html;  index index.html;  location / {    try_files $uri $uri/ /index.html;  }  location /assets/ {    expires 1y;    add_header Cache-Control "public, max-age=31536000, immutable";  }  gzip on;  gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;}

Step 2: Use a multi-stage Dockerfile to build and serve

Build the frontend in a builder stage, then copy only the compiled assets into a small runtime image. This reduces image size and attack surface.

# Stage 1: build assetsFROM node:20-alpine AS buildWORKDIR /appCOPY package*.json ./RUN npm ciCOPY . .RUN npm run build# Stage 2: runtime web serverFROM nginx:1.27-alpineCOPY nginx.conf /etc/nginx/conf.d/default.confCOPY --from=build /app/dist/ /usr/share/nginx/html/EXPOSE 8080CMD ["nginx", "-g", "daemon off;"]

Step 3: Run locally and verify behavior

Build and run the container, then verify that deep links return index.html and that assets are cached with long-lived headers.

docker build -t my-spa:local .docker run --rm -p 8080:8080 my-spa:localcurl -I http://localhost:8080/ curl -I http://localhost:8080/assets/app.12345.js

Check for HTTP 200 on “/some/deep/link” and confirm Cache-Control headers on assets. If you see 404s on deep links, your try_files rule is not matching your build output path.

Practical step-by-step: reverse proxying an app server with NGINX in a single Pod (sidecar-style)

When to use this pattern

This pattern is useful when you want consistent proxy behavior per workload (timeouts, buffering, header normalization) without relying on a shared proxy layer. It also helps when the app server is not ideal at serving static content or when you need to enforce strict request limits close to the app.

Step 1: Define upstream and proxy settings

Assume the app listens on localhost:3000 inside the Pod. NGINX listens on 8080 and proxies to the app. Configure timeouts and headers explicitly.

server {  listen 8080;  location / {    proxy_pass http://127.0.0.1:3000;    proxy_http_version 1.1;    proxy_set_header Host $host;    proxy_set_header X-Forwarded-Proto $scheme;    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;    proxy_set_header X-Request-ID $request_id;    proxy_connect_timeout 2s;    proxy_read_timeout 30s;    proxy_send_timeout 30s;    client_max_body_size 10m;  }  location /healthz {    access_log off;    return 200 "ok";  }}

Step 2: Package the proxy config

In Kubernetes you would usually mount this config via a ConfigMap. Keep the proxy image generic and inject config at runtime. This allows you to update proxy behavior without rebuilding the application image.

Step 3: Ensure graceful shutdown

When a Pod is terminated, Kubernetes sends SIGTERM and then waits for a grace period. Your app and proxy should stop accepting new connections and finish in-flight requests. For NGINX, ensure it runs in the foreground and can receive signals. For the app, implement signal handling to stop accepting new requests and close keep-alive connections. If you do not, you may see spikes of 502/499 during rollouts.

Practical step-by-step: Envoy as a reverse proxy for modern HTTP (gRPC, WebSockets, and retries)

Why Envoy is common in cloud-native stacks

Envoy is frequently used as a data-plane proxy because it supports advanced L7 features: HTTP/2 and gRPC, rich routing, retries with budgets, outlier detection, and detailed metrics. It is also a common building block for service mesh sidecars and gateways.

Step 1: Minimal Envoy config to proxy HTTP to an upstream

This example listens on 8080 and forwards to an upstream cluster at 127.0.0.1:3000. It also sets a request timeout and enables access logs.

static_resources:  listeners:  - name: listener_0    address:      socket_address: { address: 0.0.0.0, port_value: 8080 }    filter_chains:    - filters:      - name: envoy.filters.network.http_connection_manager        typed_config:          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager          stat_prefix: ingress_http          route_config:            name: local_route            virtual_hosts:            - name: backend              domains: ["*"]              routes:              - match: { prefix: "/" }                route: { cluster: app, timeout: 30s }          http_filters:          - name: envoy.filters.http.router  clusters:  - name: app    connect_timeout: 2s    type: STATIC    lb_policy: ROUND_ROBIN    load_assignment:      cluster_name: app      endpoints:      - lb_endpoints:        - endpoint:            address:              socket_address: { address: 127.0.0.1, port_value: 3000 }

Step 2: Decide on retries carefully

Retries can improve perceived reliability but can also amplify load during partial outages. If you enable retries at the proxy, define which failures are retryable (for example, connect-failure, reset, 503) and cap retry budgets. Avoid retrying non-idempotent requests (POST) unless you have idempotency keys and the backend supports safe replays.

Step 3: Support WebSockets and streaming

For WebSockets, ensure the proxy preserves upgrade headers and does not buffer responses. Envoy generally supports WebSockets automatically through the HTTP connection manager, but you must ensure timeouts and idle settings match your application expectations. For gRPC, ensure HTTP/2 is enabled end-to-end and that max concurrent streams and message sizes are configured to prevent unexpected resets.

Reverse proxy pitfalls and how to avoid them

Port confusion and binding to localhost

Inside containers, applications sometimes bind to 127.0.0.1 by default. That works only if the proxy is in the same network namespace (same Pod). If the app is meant to be reached from outside the container or Pod, it must bind to 0.0.0.0. A common failure mode is “connection refused” from the proxy because the app is listening only on localhost in a different container or different Pod.

Mismatch between proxy and app timeouts

If the proxy read timeout is shorter than the app’s worst-case response time, the proxy will return 504 even though the app eventually completes. If the app has a shorter idle timeout than the proxy keep-alive, the proxy may reuse a closed connection and see resets. Align timeouts across layers and prefer explicit, documented defaults per workload type (API, file upload, streaming).

Large headers and cookies

Modern authentication systems can produce large cookies or JWT headers. Proxies have header size limits; exceeding them yields 431 or 400 errors that can be hard to diagnose. If you expect large headers, tune proxy buffer sizes and consider reducing cookie bloat by storing session state server-side or using shorter tokens.

Incorrect client IP and scheme

If you do not forward X-Forwarded-For and X-Forwarded-Proto correctly, your app logs will show the proxy IP, and your app may generate HTTP redirects even when the user is on HTTPS. Ensure the proxy sets these headers and that the app framework is configured to trust them only from known proxy sources.

Choosing between NGINX, Envoy, and “no proxy”

Use “no proxy” when the platform already provides one

If your environment already has a robust edge proxy and you do not need per-service L7 features, running only the app server can be the best choice. You reduce CPU and memory overhead and avoid duplicated configuration. This is common when a gateway or mesh sidecar already handles retries, mTLS, and metrics.

Use NGINX when you need simple, high-performance HTTP serving

NGINX excels at static content, straightforward reverse proxying, buffering, and mature operational behavior. It is often the pragmatic choice for serving SPAs, handling uploads with size limits, and normalizing headers. Its configuration is widely understood, and it performs well with modest resource usage.

Use Envoy when you need advanced L7 routing and observability

Envoy is a strong fit for gRPC-heavy systems, sophisticated traffic management, and deep metrics. It is also a natural choice when you want consistency with service mesh data planes. The tradeoff is configuration complexity; you should treat Envoy config as code, validate it, and roll it out carefully.

Operational checklist for containerized web servers behind reverse proxies

Health endpoints and readiness behavior

Expose a lightweight health endpoint that does not depend on slow downstreams. Separate “liveness” (process is running) from “readiness” (safe to receive traffic). If the proxy is in front, ensure the readiness signal reflects the entire request path: proxy can reach the app, and the app can serve requests. Otherwise, you may route traffic to Pods where the proxy is up but the app is not yet accepting connections.

Logging and correlation IDs

Write access logs and application logs to stdout/stderr. Ensure a request ID is generated at the edge (or proxy) and propagated to the app and downstream calls. This is essential for debugging latency and 5xx errors in distributed systems. If your proxy can emit structured logs (JSON), align fields with your log aggregation queries.

Resource sizing and limits

Reverse proxies consume memory for connection tracking and buffering. If you set aggressive buffering or allow large uploads, memory usage can spike. Size memory requests and limits accordingly, and prefer streaming where appropriate. For CPU, TLS termination and compression can be significant; if you enable both, benchmark under realistic traffic.

Security hardening basics

Run as non-root where possible, use read-only root filesystem for static servers, and mount configuration as read-only. Disable server tokens and unnecessary modules. Keep images minimal and patch regularly. For reverse proxies, ensure only required ports are exposed and that admin interfaces (if any) are not reachable from untrusted networks.

Now answer the exercise about the content:

A reverse proxy returns intermittent 504 errors even though the backend eventually finishes processing. Which configuration issue is the most likely cause?

You are right! Congratulations, now go to the next page

You missed! Try again.

A common cause of intermittent 504s is misaligned timeouts, especially when the proxy read timeout is shorter than the backend response time, so the proxy returns 504 even though the backend would complete later.

Next chapter

Kubernetes Services for Stable Networking and Load Distribution

15%

Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

New course

20 pages