Why structured logging and correlation matter
When an incident happens, you usually need to answer: what request triggered it, which user was affected, what code path ran, how long it took, and what failed. Free-form logs (random strings) make this slow because you can’t reliably filter, aggregate, or correlate events across middleware, controllers, services, and external calls.
Structured logging means logging as consistent key/value fields (typically JSON). Request correlation means attaching a stable identifier (request ID / correlation ID) to every log line produced while handling that request, so you can trace a single request across layers and across services.
Recommended baseline fields
timestamp(usually added by the logger)level:debug,info,warn,errormsg: short human-readable messagerequestId: correlation IDroute/method/statusCodedurationMsuserId(if known), but never raw tokens/passwordserrorobject fields when applicable (name/message/stack in non-production)
Step-by-step: build a logger with consistent fields
1) Create a logger module (JSON logs, levels)
Use a JSON logger so every log entry is machine-parseable. The example below uses pino because it’s fast and produces clean JSON, but the pattern applies to other loggers.
// src/observability/logger.js
const pino = require('pino');
const isProd = process.env.NODE_ENV === 'production';
const logger = pino({
level: process.env.LOG_LEVEL || (isProd ? 'info' : 'debug'),
base: {
service: process.env.SERVICE_NAME || 'api',
env: process.env.NODE_ENV || 'development'
},
redact: {
// Redact common sensitive locations if they appear in logged objects
paths: [
'req.headers.authorization',
'req.headers.cookie',
'req.body.password',
'req.body.token',
'user.password',
'user.token'
],
remove: true
},
serializers: {
// Keep request/response logging small and predictable
req(req) {
return {
method: req.method,
url: req.url,
headers: {
'user-agent': req.headers['user-agent'],
'x-request-id': req.headers['x-request-id']
},
remoteAddress: req.socket?.remoteAddress
};
},
res(res) {
return {
statusCode: res.statusCode
};
},
err(err) {
return {
type: err.name,
message: err.message,
stack: isProd ? undefined : err.stack
};
}
}
});
module.exports = { logger };Key idea: define a stable schema early. If every team member logs different keys, you lose most of the value of structured logging.
2) Add request ID middleware (correlation)
Accept an incoming request ID from a trusted upstream (like a gateway) or generate one. Return it in the response so clients can report it.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
// src/observability/requestId.js
const { randomUUID } = require('crypto');
function requestIdMiddleware(req, res, next) {
const incoming = req.header('x-request-id');
const requestId = (incoming && incoming.length < 200) ? incoming : randomUUID();
req.requestId = requestId;
res.setHeader('x-request-id', requestId);
next();
}
module.exports = { requestIdMiddleware };3) Attach a per-request logger (consistent fields everywhere)
Create a child logger for each request so you don’t have to manually include requestId on every call. Add other stable fields that help debugging (method, route, userId when known).
// src/observability/requestLoggerContext.js
const { logger } = require('./logger');
function requestLoggerContext(req, res, next) {
// route is not available yet; we can add it later in access logs
req.log = logger.child({
requestId: req.requestId,
method: req.method
});
next();
}
module.exports = { requestLoggerContext };Usage rule: in controllers/services, log via req.log (or a context logger) rather than a global logger, so correlation is automatic.
Request/response logging middleware (access logs)
Separate access logs (one line per request with method/route/status/duration) from application logs (business events, warnings, errors). Access logs are high-volume and are often routed/stored differently.
4) Implement access logging with timing and route
Capture start time, then log on response finish. Prefer res.on('finish') so you log even when the response is sent early.
// src/observability/accessLog.js
function accessLogMiddleware(options = {}) {
const {
sampleRate = 1.0, // 1.0 = log all requests
ignorePaths = ['/health', '/metrics']
} = options;
return function accessLog(req, res, next) {
if (ignorePaths.includes(req.path)) return next();
if (Math.random() > sampleRate) return next();
const start = process.hrtime.bigint();
res.on('finish', () => {
const end = process.hrtime.bigint();
const durationMs = Number(end - start) / 1e6;
// Express sets req.route only after a match; fallback to req.path
const route = req.route?.path ? `${req.baseUrl}${req.route.path}` : req.path;
req.log.info({
type: 'access',
route,
statusCode: res.statusCode,
durationMs: Math.round(durationMs)
}, 'request completed');
});
next();
};
}
module.exports = { accessLogMiddleware };Sampling tip: keep sampleRate at 1.0 in early stages, then consider sampling only successful requests (e.g., sample 10% of 2xx, but 100% of 4xx/5xx) once volume grows.
5) Wire the observability middleware early
Order matters: request ID first, then request logger context, then access log.
// src/app.js
const express = require('express');
const { requestIdMiddleware } = require('./observability/requestId');
const { requestLoggerContext } = require('./observability/requestLoggerContext');
const { accessLogMiddleware } = require('./observability/accessLog');
const app = express();
app.use(requestIdMiddleware);
app.use(requestLoggerContext);
app.use(accessLogMiddleware({ sampleRate: 1.0 }));
module.exports = { app };Logging inside services without leaking sensitive data
Service-layer logs should describe what happened and include identifiers that help you find related data, but should avoid raw payloads and secrets. A safe pattern is to log:
- IDs (orderId, userId, paymentId)
- Counts (itemsCount)
- State transitions (from/to)
- External dependency name and result category (timeout, 5xx, invalid response)
- Never full tokens, passwords, full card numbers, or full request bodies
Pattern: pass a logger (or context) into services
Instead of importing a global logger everywhere, accept a logger parameter so the caller can provide the request-scoped logger.
// src/services/orderService.js
async function createOrder({ log, userId, cartId }) {
log.info({ userId, cartId }, 'creating order');
// ... business logic
log.info({ userId, cartId, orderId: 'ord_123' }, 'order created');
return { orderId: 'ord_123' };
}
module.exports = { createOrder };Controller usage:
// inside a controller handler
const { createOrder } = require('../services/orderService');
async function postOrder(req, res) {
const userId = req.user?.id; // if you have auth
const cartId = req.body.cartId;
const result = await createOrder({
log: req.log,
userId,
cartId
});
res.status(201).json(result);
}Redaction strategies
Redaction should be layered:
- At the logger: use
redactpaths for common sensitive fields. - At the call site: avoid logging large objects; log a curated subset.
- At boundaries: if you must log payloads for debugging, log only in non-production and ensure it’s still redacted.
Example of safe “pick fields” logging:
// Avoid: log.info({ body: req.body }, 'incoming request')
// Prefer:
req.log.debug({
type: 'input',
cartId: req.body.cartId,
itemsCount: Array.isArray(req.body.items) ? req.body.items.length : undefined
}, 'received order input');Capturing error context (user, route, timing)
Errors are most useful when they include enough context to reproduce and scope impact, but not sensitive data. You want:
requestId(automatic via request logger)userId(if known)route,method,statusCodedurationMs(time to failure)errorfields (type/message/stack depending on environment)
6) Add a lightweight timing context
Store a start time on the request so both access logs and error logs can include duration.
// src/observability/requestTiming.js
function requestTiming(req, res, next) {
req._startAt = process.hrtime.bigint();
next();
}
function getDurationMs(req) {
if (!req._startAt) return undefined;
const end = process.hrtime.bigint();
return Math.round(Number(end - req._startAt) / 1e6);
}
module.exports = { requestTiming, getDurationMs };Wire it near the top:
// app.js
const { requestTiming } = require('./observability/requestTiming');
app.use(requestTiming);7) Log errors with request context (without duplicating response logic)
In your error handler, log once with structured context. Keep error logs as type: 'app_error' (application logs), distinct from access logs.
// src/observability/errorLog.js
const { getDurationMs } = require('./requestTiming');
function errorLoggingMiddleware(err, req, res, next) {
const route = req.route?.path ? `${req.baseUrl}${req.route.path}` : req.path;
req.log.error({
type: 'app_error',
route,
statusCode: res.statusCode || 500,
durationMs: getDurationMs(req),
userId: req.user?.id,
err
}, 'request failed');
next(err);
}
module.exports = { errorLoggingMiddleware };Important: avoid logging the same error multiple times (e.g., once in a service and again in the error handler). A good rule is: services may log context (what they were doing) at warn/info, but the final error is logged centrally once.
Separating application logs from access logs
Even if both are JSON, treat them as different streams:
- Access logs: high-volume, one per request, used for traffic patterns and latency.
- Application logs: business events, warnings, errors, dependency failures, state transitions.
Practical separation options:
- Different
typefield (as shown:type: 'access'vstype: 'app_error') so you can filter in your log system. - Different logger instances writing to different destinations (stdout vs file, or different transports).
- Different sampling: access logs may be sampled; application errors usually are not.
Sampling guidance (keep signal, reduce noise)
| Log kind | Suggested sampling | Why |
|---|---|---|
| 5xx access logs | 100% | Rare and high value |
| 4xx access logs | Often 100% initially; later consider 10–50% depending on volume | Useful for client issues/abuse patterns |
| 2xx access logs | 10–100% depending on traffic | Mostly noise at scale |
| Application errors | 100% | Primary debugging signal |
| Debug logs | 0% in production (disabled via level) | Costly and risky |
Implement sampling in the access log middleware (as shown). For application logs, prefer controlling volume via log level and by logging fewer, higher-quality events.
Health endpoints for maintainability
Health endpoints provide fast signals for load balancers and on-call debugging. Keep them lightweight and avoid leaking internal details.
8) Liveness and readiness endpoints
- Liveness: “process is running” (no dependency checks).
- Readiness: “process can serve traffic” (optionally checks critical dependencies like DB connectivity).
// src/routes/health.js
const express = require('express');
const router = express.Router();
router.get('/health/live', (req, res) => {
res.status(200).json({ status: 'ok' });
});
router.get('/health/ready', async (req, res) => {
// Keep it fast; timebox checks if you add them.
// Example placeholder: await db.ping({ timeoutMs: 200 });
res.status(200).json({ status: 'ok' });
});
module.exports = { healthRouter: router };Operational tip: exclude these paths from access logs or sample them heavily to avoid noise.
Lightweight metrics signals (latency buckets, error rates)
Metrics complement logs: logs tell you what happened for a specific request; metrics tell you how often and how bad across time. You can implement basic metrics without a heavy stack by exposing counters and histograms in-memory.
9) Implement minimal counters and latency buckets
Below is a small in-memory metrics module with:
- Request count by status class (2xx/4xx/5xx)
- Error count
- Latency buckets (histogram-like)
// src/observability/metrics.js
const latencyBucketsMs = [25, 50, 100, 250, 500, 1000, 2500, 5000];
const metrics = {
startedAt: Date.now(),
requestsTotal: 0,
responsesByClass: { '2xx': 0, '3xx': 0, '4xx': 0, '5xx': 0 },
errorsTotal: 0,
latencyBuckets: Object.fromEntries(latencyBucketsMs.map(b => [String(b), 0])),
latencyOverflow: 0
};
function observeRequest({ statusCode, durationMs }) {
metrics.requestsTotal += 1;
const klass = `${Math.floor(statusCode / 100)}xx`;
if (metrics.responsesByClass[klass] !== undefined) metrics.responsesByClass[klass] += 1;
const bucket = latencyBucketsMs.find(b => durationMs <= b);
if (bucket) metrics.latencyBuckets[String(bucket)] += 1;
else metrics.latencyOverflow += 1;
if (statusCode >= 500) metrics.errorsTotal += 1;
}
function snapshot() {
return {
uptimeSec: Math.floor((Date.now() - metrics.startedAt) / 1000),
...metrics
};
}
module.exports = { observeRequest, snapshot, latencyBucketsMs };10) Record metrics in middleware
// src/observability/metricsMiddleware.js
const { observeRequest } = require('./metrics');
const { getDurationMs } = require('./requestTiming');
function metricsMiddleware(options = {}) {
const { ignorePaths = ['/health/live', '/health/ready', '/metrics'] } = options;
return function (req, res, next) {
if (ignorePaths.includes(req.path)) return next();
res.on('finish', () => {
const durationMs = getDurationMs(req) ?? 0;
observeRequest({ statusCode: res.statusCode, durationMs });
});
next();
};
}
module.exports = { metricsMiddleware };11) Expose a /metrics endpoint (simple JSON)
For a lightweight setup, JSON is enough. If you later adopt Prometheus, you can switch to a text format without changing the recording middleware.
// src/routes/metrics.js
const express = require('express');
const { snapshot } = require('../observability/metrics');
const router = express.Router();
router.get('/metrics', (req, res) => {
res.status(200).json(snapshot());
});
module.exports = { metricsRouter: router };12) Wire health + metrics routes and middleware
// src/app.js
const { healthRouter } = require('./routes/health');
const { metricsRouter } = require('./routes/metrics');
const { metricsMiddleware } = require('./observability/metricsMiddleware');
const { errorLoggingMiddleware } = require('./observability/errorLog');
app.use(metricsMiddleware());
app.use(healthRouter);
app.use(metricsRouter);
// ... your other routes
// Log errors with context before your error response middleware
app.use(errorLoggingMiddleware);
// app.use(yourErrorResponseMiddleware);Operational checklist: maintainable logging and signals
- Every request gets a
requestId; it’s returned inx-request-id. - Use a request-scoped logger (
req.log) so correlation is automatic. - Access logs are one-per-request with
route,statusCode,durationMs, and sampling. - Application logs focus on business events and dependency failures; avoid payload dumps.
- Redact secrets at the logger level and avoid logging sensitive fields at call sites.
- Error logs include
userId(if known), route, timing, and error details (stack only outside production). - Expose
/health/liveand/health/readyfor operational checks. - Expose lightweight
/metricswith request counts, error rate signals, and latency buckets.