Outbox Pattern and Logical Replication for Reliable Pipelines

Capítulo 13

Estimated reading time: 11 minutes

+ Exercise

Why the Outbox Pattern Exists

Reliable event pipelines often fail in a very specific place: the boundary between a PostgreSQL transaction and an external side effect such as publishing to Kafka, enqueueing to a queue, or calling a webhook. If you update the database and then publish an event as a separate step, you can end up with a committed state change but no event (publish failed), or an event without a committed state change (publish succeeded but the transaction rolled back). The outbox pattern removes this gap by making “the event to publish” part of the same PostgreSQL transaction as the business change.

The core idea is to write an outbox record (an event row) into PostgreSQL in the same transaction as the domain update. A separate relay process later reads those outbox rows and publishes them to the external system. Because the outbox row is committed atomically with the domain change, the relay can crash, restart, and still publish everything that was committed.

Logical replication (logical decoding) complements the outbox pattern by providing a streaming mechanism from PostgreSQL’s WAL (Write-Ahead Log). Instead of polling the outbox table, you can stream committed inserts from the outbox via a replication slot. This typically reduces load, improves latency, and gives you a durable resume position (the slot’s confirmed LSN) for recovery.

Two Implementation Styles: Polling vs Logical Replication

Outbox + Polling Relay

The relay periodically queries the outbox table for unpublished events, claims a batch, publishes them, then marks them as published. This approach is simple and works without replication privileges. It does, however, create extra database work: repeated queries, row locking, and updates to mark published rows.

Outbox + Logical Replication (CDC-style)

The application inserts outbox rows (ideally append-only). A connector consumes those inserts from WAL using a logical replication slot and publishes them to the broker. The connector advances the slot only after successful delivery, so restarts resume from the last confirmed position. This approach is efficient and low-latency, but requires PostgreSQL logical replication configuration and careful monitoring of slot lag and WAL retention.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Outbox Table Design as an Integration Contract

Think of the outbox table as a contract between your application and the pipeline. It should be stable, easy to insert into, and easy to consume from. A practical schema:

CREATE TABLE outbox_events (  id                bigserial PRIMARY KEY,  aggregate_type    text NOT NULL,  aggregate_id      text NOT NULL,  event_type         text NOT NULL,  payload            jsonb NOT NULL,  headers            jsonb NOT NULL DEFAULT '{}'::jsonb,  occurred_at         timestamptz NOT NULL DEFAULT now(),  trace_id           text,  tenant_id          text,  -- polling-oriented fields (optional if using logical replication only)  published_at        timestamptz,  publish_attempts    integer NOT NULL DEFAULT 0,  next_attempt_at     timestamptz);

Practical guidance for these columns:

  • aggregate_type, aggregate_id help downstream routing and partitioning decisions (for example, message key = aggregate_id).
  • event_type is a stable name consumers can subscribe to (for example, OrderCreated).
  • payload is the event body; keep it self-contained enough for consumers to act.
  • headers is where you put schema version, content type, correlation identifiers, or routing hints.
  • occurred_at is the event creation timestamp inside the transaction (useful for observability and debugging).
  • published_at and retry fields are needed for polling; for logical replication you can often avoid updating rows entirely.

Step-by-Step: Writing Domain Changes and Outbox Events Atomically

The application must insert into the outbox in the same database transaction as the domain write. Example: create an order and emit OrderCreated.

BEGIN;INSERT INTO orders (order_id, customer_id, status, total_amount, created_at)VALUES ($1, $2, 'created', $3, now());INSERT INTO outbox_events (aggregate_type, aggregate_id, event_type, payload, headers, trace_id, tenant_id)VALUES (  'order',  $1,  'OrderCreated',  jsonb_build_object(    'order_id', $1,    'customer_id', $2,    'status', 'created',    'total_amount', $3,    'created_at', now()  ),  jsonb_build_object(    'schema_version', 1,    'content_type', 'application/json'  ),  $4,  $5);COMMIT;

What this buys you operationally:

  • If the transaction commits, the outbox row exists and will eventually be published.
  • If the transaction rolls back, there is no outbox row to publish.
  • The relay can be restarted without losing committed events.

Polling Relay: A Practical, Concurrent-Safe Pattern

A polling relay should do three things repeatedly: claim a batch, publish it, then mark it as published (or schedule a retry). The key is to claim rows in a way that supports multiple workers without double-processing.

Indexes for Polling

For a table that is frequently queried for unpublished rows, add a partial index:

CREATE INDEX outbox_unpublished_idxON outbox_events (id)WHERE published_at IS NULL;

If you use backoff via next_attempt_at:

CREATE INDEX outbox_retry_idxON outbox_events (next_attempt_at, id)WHERE published_at IS NULL;

Claiming Rows with FOR UPDATE SKIP LOCKED

This pattern allows multiple relay instances to share the work safely:

WITH batch AS (  SELECT id  FROM outbox_events  WHERE published_at IS NULL    AND (next_attempt_at IS NULL OR next_attempt_at <= now())  ORDER BY id  LIMIT 100  FOR UPDATE SKIP LOCKED)UPDATE outbox_events oSET publish_attempts = publish_attempts + 1FROM batchWHERE o.id = batch.idRETURNING o.id, o.aggregate_type, o.aggregate_id, o.event_type, o.payload, o.headers, o.trace_id, o.tenant_id, o.occurred_at;

Important operational detail: keep this transaction short. The relay should not hold locks while performing network calls. A common approach is:

  • Transaction A: claim rows and return them.
  • Publish outside the transaction.
  • Transaction B: mark published or schedule retry.

Marking Published

UPDATE outbox_eventsSET published_at = now(), next_attempt_at = NULLWHERE id = ANY($1::bigint[]);

Scheduling Retries with Backoff

If publishing fails, do not set published_at. Instead, schedule a retry:

UPDATE outbox_eventsSET next_attempt_at = now() + interval '30 seconds'WHERE id = ANY($1::bigint[]);

Practical retry policy:

  • Use exponential backoff to avoid hammering a broken broker.
  • After N attempts, move the event to a dead-letter table and alert.
  • Record the last error message in a separate column if you need debugging (but be mindful of unbounded text growth).

Reducing Table Bloat in Polling Mode

Polling mode typically updates rows (attempt counters, published timestamps), which can create bloat. Common mitigations:

  • Delete or archive published rows on a schedule (for example, keep 7–30 days online).
  • Partition the outbox by time and drop old partitions (if you already operate partitioning safely).
  • Keep the outbox minimal: avoid large payloads if you can store a reference and fetch details elsewhere (but note that fetching details later can reintroduce coupling and failure modes).

Logical Replication: Streaming the Outbox from WAL

Logical replication reads committed changes from PostgreSQL’s WAL and emits them as a change stream. For outbox usage, the cleanest approach is to make the outbox table append-only and stream only INSERT operations. Each insert corresponds to one event to publish.

This approach avoids the “mark published” update entirely. The connector’s progress is tracked by the replication slot position, not by mutating outbox rows.

PostgreSQL Configuration Essentials

Logical decoding requires settings like:

-- postgresql.conf (values depend on your environment)wal_level = logicalmax_replication_slots = 10max_wal_senders = 10

You also need a replication-capable role (availability depends on hosting):

CREATE ROLE outbox_replicator WITH LOGIN REPLICATION PASSWORD '...';GRANT SELECT ON outbox_events TO outbox_replicator;

Step-by-Step: Publication and Slot

1) Create a publication that includes only the outbox table:

CREATE PUBLICATION outbox_pub FOR TABLE outbox_events;

2) Create a logical replication slot. Some connectors create slots automatically; creating it manually helps with visibility:

SELECT pg_create_logical_replication_slot('outbox_slot', 'pgoutput');

3) Run a connector/relay that:

  • Connects via the replication protocol.
  • Subscribes to the publication.
  • Receives inserted outbox rows.
  • Publishes them to the broker.
  • Acknowledges WAL positions (LSNs) only after successful publish.

The acknowledgment rule is the reliability core: if the connector reads changes but crashes before publishing, it must not acknowledge the LSN. On restart, PostgreSQL will resend from the last confirmed position.

Monitoring Replication Slots to Avoid WAL Accumulation

Replication slots retain WAL until consumers confirm progress. Monitor slot state:

SELECT slot_name, active, restart_lsn, confirmed_flush_lsnFROM pg_replication_slotsWHERE slot_name = 'outbox_slot';

Operational practices:

  • Alert on increasing lag (difference between current WAL LSN and confirmed_flush_lsn).
  • Alert when a slot is inactive for too long.
  • Have a documented procedure for decommissioning consumers: dropping a slot discards the resume position and can cause data loss in the stream if not handled carefully.

Append-Only Outbox Enforcement (Optional)

If you commit to logical replication and want a clean stream, you can enforce that the outbox table is append-only (no updates/deletes). This prevents accidental “mark published” updates that would appear in the change stream.

CREATE OR REPLACE FUNCTION prevent_outbox_update_delete()RETURNS trigger LANGUAGE plpgsql AS $$BEGIN  RAISE EXCEPTION 'outbox_events is append-only';END;$$;CREATE TRIGGER outbox_no_update_deleteBEFORE UPDATE OR DELETE ON outbox_eventsFOR EACH ROW EXECUTE FUNCTION prevent_outbox_update_delete();

Use this only if you have an alternative retention strategy (for example, partition dropping) and you are confident you will not need to mutate outbox rows for operational reasons.

Event Shaping: Turning Outbox Rows into Broker Messages

Whether you poll or stream via logical replication, you still need a consistent message format. A practical approach is to publish a clean envelope that is independent of the CDC tool’s internal representation. For example:

{  "event_type": "OrderCreated",  "aggregate_type": "order",  "aggregate_id": "ORD-123",  "occurred_at": "2026-01-11T10:15:00Z",  "trace_id": "c9f1...",  "tenant_id": "t-42",  "schema_version": 1,  "payload": {    "order_id": "ORD-123",    "customer_id": "C-9",    "status": "created",    "total_amount": "49.90"  }}

Mapping rules that work well in practice:

  • Topic selection: aggregate_type (for example, order.events) or event_type (for example, OrderCreated topic). Prefer fewer topics with clear filtering rules unless you have strong operational reasons to split.
  • Message key: aggregate_id so all events for an aggregate land in the same partition in partitioned brokers.
  • Headers: copy trace_id, tenant_id, and schema_version for observability and compatibility.

Handling Multi-Event Transactions and Atomic Visibility

Applications sometimes write multiple outbox rows in one transaction (for example, OrderCreated plus InventoryReserved). Both polling and logical replication can handle this, but there are differences in how you should think about it:

  • Polling: if you claim rows by id, you may publish them in the same order they were inserted, but you are not inherently aware of transaction boundaries unless you store a transaction identifier in the outbox row. If atomic “publish as a unit” matters, add a tx_id column (application-generated) and publish grouped by it.
  • Logical replication: changes are emitted in commit order and typically grouped by transaction. A connector can preserve transaction boundaries or flatten them depending on configuration. If you need “all or nothing” publication for a group, ensure your connector supports transaction metadata or that your downstream processing can tolerate partial visibility.

Operational Tradeoffs: When to Prefer Each Approach

Prefer Polling When

  • You cannot enable logical replication (permissions, hosting restrictions).
  • You want the simplest operational model and can accept slightly higher database load.
  • You need fine-grained retry scheduling and dead-letter handling directly in SQL.

Prefer Logical Replication When

  • You need low-latency streaming with minimal database overhead.
  • You want a durable resume cursor without updating outbox rows.
  • You already operate CDC connectors and have monitoring for replication slots and lag.

Common Failure Modes and How to Design for Them

Relay Publishes but Crashes Before Marking Published (Polling)

This creates a duplicate publish risk because the row remains unpublished. The standard mitigation is to make the publish operation safe to retry at the broker/consumer side and to include stable identifiers in the message (for example, outbox id as a message header). Even if consumers handle duplicates, you should still aim to minimize them by marking published promptly and using small batches.

Connector Stops Consuming a Replication Slot (Logical Replication)

WAL accumulates. Mitigate with monitoring and alerting on lag, and ensure connectors are deployed with high availability. If you must rebuild a connector, plan how to recover the stream position (for example, by keeping the slot and reusing it, or by replaying from a known point if you intentionally drop it).

Schema Changes to the Outbox Table

Because the outbox is an integration contract, treat schema changes carefully:

  • Additive changes (new nullable columns) are usually safe.
  • Renames and type changes can break connectors or downstream transforms.
  • If using logical replication, ensure the publication and connector handle the new columns as expected.

Practical Checklist for Production Readiness

  • Outbox inserts are in the same transaction as domain changes.
  • Outbox rows contain routing metadata (aggregate_type/id, event_type) and observability metadata (trace_id, tenant_id).
  • Polling mode: FOR UPDATE SKIP LOCKED claiming, retry backoff, dead-letter strategy, and cleanup/retention plan.
  • Logical replication mode: wal_level=logical, publication limited to outbox table, replication slot monitoring, and a plan for connector restarts and slot lifecycle.
  • Message format is stable and decoupled from CDC envelopes; include schema version and stable identifiers.

Now answer the exercise about the content:

What is the main reliability benefit of writing an outbox event row in the same PostgreSQL transaction as the domain change?

You are right! Congratulations, now go to the next page

You missed! Try again.

By inserting the outbox row atomically with the domain update, a commit means the event is durably recorded for later publishing, and a rollback means no event exists. This avoids mismatches between database state and external side effects.

Next chapter

Analytics-Ready Event Data and Operational-to-Analytical Flows

Arrow Right Icon
Free Ebook cover Event-Driven Data Modeling with PostgreSQL: Designing for Streams, Time, and Change
87%

Event-Driven Data Modeling with PostgreSQL: Designing for Streams, Time, and Change

New course

15 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.