Free Ebook cover Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

New course

20 pages

Observability for Web Serving: Metrics, Logs, and Distributed Tracing

Capítulo 10

Estimated reading time: 0 minutes

+ Exercise

Why Observability Matters for Web Serving

In cloud-native web serving, “observability” means you can understand what your system is doing from the signals it emits, without needing to reproduce issues locally. For a web workload, that translates into answering questions like: Are users seeing errors? Is latency increasing for a specific route? Which upstream dependency is slow? Did a rollout change behavior? Observability is typically built on three complementary signal types: metrics (numerical time series for trends and alerting), logs (event records for detailed context), and distributed traces (end-to-end request journeys across services). Each signal is useful alone, but the real power comes from correlating them: a metric alert points to a problematic endpoint, logs show the error details, and traces reveal where time was spent across hops.

The Three Pillars: Metrics, Logs, and Traces

Metrics: fast, cheap, and great for alerting

Metrics are aggregated numeric measurements over time. For web serving, the most common are request rate (RPS), error rate, and latency (often called the “RED” method: Rate, Errors, Duration). Metrics are ideal for dashboards and alerts because they are compact and queryable. The trade-off is that metrics rarely contain enough context to explain a specific failure without additional signals.

Logs: detailed context for debugging

Logs are discrete events, typically text or JSON. For web workloads, logs often include request method, path, status code, latency, user agent, and sometimes application-specific fields. Logs are excellent for answering “what happened?” and “why did it happen?” but can become expensive at high volume. The key is to standardize structure, reduce noise, and ensure logs can be correlated to metrics and traces using shared identifiers (for example, a trace ID).

Distributed tracing: follow a request across services

Distributed tracing records a request as it traverses multiple components (edge proxy, service mesh sidecars, application services, databases, external APIs). A trace is composed of spans; each span represents a timed operation with attributes (route, status code, peer service, error flag). Tracing is the most direct way to answer “where is the latency coming from?” and “which dependency caused the error?” For web serving, tracing becomes especially valuable when requests fan out to multiple backends or when retries/timeouts obscure root causes.

Golden Signals for Web Serving (What to Measure First)

To avoid collecting everything, start with a small set of signals that cover user experience and system health. Common “golden signals” for web serving are: latency (p50/p90/p99), traffic (requests per second), errors (5xx rate, 4xx rate by route), and saturation (CPU, memory, connection pools, queue depth). For an ingress or gateway, add TLS handshake errors, upstream connect time, and response size. For a service mesh, add mTLS handshake failures, retry counts, and circuit breaker events. These signals should be broken down by dimensions that matter: service, route, status code, and (when safe) tenant or region. Be cautious with high-cardinality labels like full URLs, user IDs, or raw IP addresses, which can overload metric storage.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

Instrumenting Metrics for Web Workloads

Use consistent metric naming and labels

Whether you use Prometheus-style metrics or another backend, consistency matters more than the specific tool. A practical approach is to ensure every HTTP server exports at least: a request counter, a request duration histogram, and an in-flight gauge. Labels should include service name, method, route template (not raw path), and status code class (2xx/4xx/5xx). Route templates prevent cardinality explosions (for example, “/users/:id” instead of “/users/123”).

Step-by-step: add basic HTTP metrics to an application

The exact code depends on your language, but the steps are consistent:

  • Choose a metrics library compatible with your runtime (for example, Prometheus client libraries).
  • Expose a /metrics endpoint on a separate port or path (avoid mixing with public endpoints if possible).
  • Instrument request count and duration around your HTTP handler or middleware.
  • Normalize route labels (use framework route names or templates).
  • Deploy and verify by curling the metrics endpoint from inside the cluster.
# Example: verify metrics endpoint from a debug pod (conceptual) kubectl run -it --rm debug --image=curlimages/curl -- sh -c 'curl -s http://my-service:8081/metrics | head'

When you add histograms for latency, choose buckets that match web serving realities (for example, 5ms to 10s). Too few buckets hides tail latency; too many increases cost. If you cannot afford histograms, export summary quantiles carefully, but note they are harder to aggregate across instances.

Ingress and gateway metrics

Your edge component is a critical observability vantage point because it sees all traffic. Ensure you can break down metrics by host, route, upstream service, and response code. Useful edge metrics include: request duration (total and upstream), upstream connect time, retries, active connections, and request/response bytes. If you run multiple gateways, label metrics by gateway instance and zone to detect localized issues.

Logging for Web Serving: Structure, Correlation, and Cost Control

Prefer structured logs (JSON) with stable fields

Structured logs enable reliable filtering and aggregation. For HTTP requests, a recommended baseline log schema includes: timestamp, level, service, environment, request_id, trace_id, method, route, status, duration_ms, client_ip (if policy allows), user_agent, and upstream service. Keep message strings short and put details into fields. Avoid logging secrets, authorization headers, or full request bodies by default.

Step-by-step: implement request logging with correlation IDs

  • Generate or accept an incoming request ID header (for example, X-Request-Id). If missing, create one at the edge or in the app.
  • Propagate the request ID to upstream calls and include it in every log line for that request.
  • If you use tracing, also include the trace ID and span ID in logs.
  • Log at the end of the request with status and duration; log errors with stack traces separately.
# Example JSON log line (conceptual) {"ts":"2026-01-06T10:15:30Z","level":"info","service":"checkout","env":"prod","request_id":"a1b2c3","trace_id":"4f2d...","method":"POST","route":"/api/v1/orders","status":201,"duration_ms":87,"bytes_out":512}

Correlation IDs let you pivot from a metric spike (e.g., 5xx rate) to the exact failing requests in logs, and then to traces for those requests. If your edge proxy can inject request IDs and trace context, you reduce the burden on application teams and improve consistency.

Reduce log volume without losing signal

High-traffic web serving can generate massive logs. Practical controls include: sampling successful request logs (keep all errors), lowering verbosity for health checks, and using separate loggers for access logs vs application logs. Another effective technique is to log only aggregated summaries for very hot endpoints and rely on metrics for normal behavior, reserving detailed logs for anomalies.

Distributed Tracing in a Kubernetes Web Stack

Trace context propagation

Tracing works only if trace context is propagated across hops. The common standard is W3C Trace Context (traceparent and tracestate headers). Your edge proxy or service mesh can start traces and propagate context automatically, but application code must propagate context to downstream HTTP clients and background tasks. If you have asynchronous workflows, ensure you link spans correctly so traces remain navigable.

Step-by-step: enable tracing with a service mesh and an application

  • Enable tracing in the mesh control plane and configure a collector endpoint (for example, OpenTelemetry Collector).
  • Configure sampling (start with a low percentage in production; increase temporarily during incidents).
  • Ensure sidecars/gateways inject and propagate W3C trace headers.
  • Instrument the application with OpenTelemetry SDK to create spans for key operations (HTTP handlers, database calls, external API calls).
  • Export spans to the collector using OTLP (gRPC or HTTP).
  • Verify by generating traffic and checking that traces include both edge and service spans.
# Example environment variables for OTLP export (conceptual) OTEL_SERVICE_NAME=checkout OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 OTEL_TRACES_SAMPLER=parentbased_traceidratio OTEL_TRACES_SAMPLER_ARG=0.05

Even with mesh-generated spans, application-level spans are essential because they show internal work (template rendering, cache lookups, DB queries). Without them, traces may show only “time spent in service” without explaining why.

What to look for in traces

For web serving, traces are most valuable when you focus on: long tail latency (p95/p99 traces), error traces (HTTP 5xx or gRPC errors), retry storms (multiple similar spans), and dependency bottlenecks (slow DB spans). Add attributes like route, user-facing operation name, and upstream peer service. Avoid high-cardinality attributes such as raw user IDs unless you have a privacy-safe approach.

Correlation: Moving Between Metrics, Logs, and Traces

Use exemplars and trace IDs

A practical correlation workflow is: a dashboard shows a latency spike; you click a data point and jump to a representative trace (an “exemplar”); from the trace you retrieve the trace ID; then you search logs for that trace ID to see error messages and application context. To enable this, ensure your metrics backend supports exemplars (or at least you can surface trace IDs in logs and link them in your log UI). Also ensure your access logs at the edge include trace IDs so you can correlate user-facing failures with backend traces.

Step-by-step: build an incident drill-down path

  • Create a dashboard panel for RED metrics per route (rate, errors, duration).
  • Add a panel for saturation (CPU/memory) and for dependency errors (DB, external API).
  • Configure alerts on SLO-like thresholds (e.g., 5xx > 1% for 5 minutes, p99 > 1s for 10 minutes).
  • In the alert, include links to: the dashboard, a log query filtered by service/route/time, and a trace search filtered by service and status.
  • Validate the workflow by simulating a failure (e.g., force a 500 on a test route) and ensuring you can pivot across all three signals within minutes.

Service Level Objectives (SLOs) for Web Serving

Define user-centric SLOs

SLOs translate observability into reliability targets. For web serving, a common SLO is availability (successful responses) and latency (requests under a threshold). Define success carefully: for many APIs, 5xx are failures, but some 4xx might also indicate a user-impacting problem (for example, 429 rate limiting could be considered a failure if it blocks legitimate traffic). Latency SLOs should be route-specific: a static asset endpoint and a checkout endpoint should not share the same threshold.

Step-by-step: compute an error budget from metrics

  • Choose an SLI: for example, “percentage of HTTP requests with status < 500”.
  • Pick a window: 30 days is common for budgeting; shorter windows for alerting.
  • Compute error rate: errors / total requests, filtered by service and route group.
  • Set an SLO target: e.g., 99.9% success.
  • Derive error budget: 0.1% of requests may fail in the window.
  • Alert on burn rate: detect when you are consuming the budget too quickly (fast burn) or steadily (slow burn).
# Conceptual PromQL-like expressions (adapt to your stack) # Total requests rate sum(rate(http_requests_total{service="checkout"}[5m])) # Error requests rate sum(rate(http_requests_total{service="checkout",status=~"5.."}[5m])) # Error ratio (5m) (error_rate) / (total_rate)

Burn-rate alerting reduces noise compared to static thresholds because it accounts for both severity and duration. It also aligns alerts with user impact rather than internal symptoms.

Practical Deployment Patterns on Kubernetes

Collecting metrics

In Kubernetes, metrics collection typically relies on scraping endpoints or receiving OTLP metrics. Ensure each workload exposes metrics on a dedicated port and that your scraping configuration selects pods by labels. For multi-tenant clusters, enforce limits on label cardinality and retention. If you use a service mesh, decide whether you will rely on mesh metrics, application metrics, or both; in practice, you usually need both: mesh metrics for network behavior and app metrics for business logic.

Collecting logs

Standard practice is to write logs to stdout/stderr and let a node-level agent ship them to a backend. The key operational choices are: log format (JSON), parsing rules, retention, and sampling. Ensure Kubernetes metadata (namespace, pod, container, node) is attached to logs so you can filter quickly during incidents. If you run multiple versions during rollouts, include a version label in logs to compare behavior across revisions.

Collecting traces with OpenTelemetry Collector

The OpenTelemetry Collector is commonly deployed as a gateway (central deployment) or as an agent (per-node DaemonSet). The gateway model centralizes configuration; the agent model reduces cross-node traffic and can be more resilient. In either case, configure processors for batching, tail-based sampling (if needed), and attribute filtering to remove sensitive data. Export to your tracing backend of choice. For web serving, tail-based sampling can be useful to keep a higher percentage of slow or error traces while sampling normal traffic aggressively.

Debugging Scenarios (How to Apply the Signals)

Scenario: p99 latency spike on a single route

Start with metrics: identify which route and which status codes correlate with the spike. Check whether the spike is global or isolated to one zone or one gateway. Then pivot to traces filtered by that route and time window; look for a common slow span (database query, external API call, cache miss). Finally, use logs for the trace IDs to confirm error messages, timeouts, or payload anomalies. If you see retries in traces, verify whether retry policy is amplifying load; metrics like retry count and upstream connect time help confirm.

Scenario: increased 5xx after a new revision

Use metrics split by version label to confirm whether errors are concentrated in the new revision. Check logs for stack traces or validation failures; ensure your logs include version and route. Use traces to see whether failures occur before or after a downstream call. If failures happen quickly with no downstream spans, the issue is likely in request parsing or authentication; if failures occur after a downstream span, the issue may be a dependency or a timeout. This workflow avoids guessing and reduces time to rollback or fix.

Scenario: intermittent timeouts at the edge

Edge metrics can show upstream connect time and upstream response time separately. If connect time is high, suspect DNS, network policy, or pod readiness issues. If response time is high, suspect application slowness or dependency latency. Traces can confirm whether requests are queued, retried, or hitting circuit breakers. Logs can reveal whether timeouts align with garbage collection pauses, thread pool exhaustion, or rate limiting.

Common Pitfalls and How to Avoid Them

High-cardinality metrics

Avoid labels like full URL, raw user ID, or request ID in metrics. Put those in logs or traces instead. Use route templates and status code classes to keep metric cardinality manageable.

Uncorrelated signals

If logs don’t include trace IDs, and traces don’t include route names, you lose the ability to pivot quickly. Standardize on W3C trace context and ensure your logging middleware enriches logs with trace_id and span_id.

Too much data, not enough insight

Collecting everything can make systems slower and more expensive. Start with golden signals, add targeted instrumentation for critical paths, and use sampling for logs and traces. Prefer tail-based sampling for traces when you need to retain slow/error requests.

Now answer the exercise about the content:

A team wants to debug a p99 latency spike on a single route in a Kubernetes web service. Which workflow best uses observability signals to find the root cause efficiently?

You are right! Congratulations, now go to the next page

You missed! Try again.

Metrics quickly show which route and status codes correlate with the spike. Traces then reveal where time is spent across hops, and logs searched by trace ID provide detailed error context and confirm timeouts or anomalies.

Next chapter

Ingress and Service Monitoring Dashboards and Alerting Signals

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.