Idempotency, Deduplication, and Exactly-Once Effects

Capítulo 3

Estimated reading time: 13 minutes

+ Exercise

Why idempotency matters in event-driven PostgreSQL systems

In event-driven systems, the same event can be delivered more than once. Retries happen because of network timeouts, consumer crashes, broker redeliveries, or manual replays. If your database writes are not designed to tolerate duplicates, you can accidentally double-apply business effects: double-charge a customer, double-decrement inventory, or create duplicate rows that later require cleanup.

Idempotency is the property that applying the same operation multiple times produces the same final state as applying it once. Deduplication is the mechanism you use to detect and ignore duplicates. “Exactly-once effects” means that even if the message delivery is at-least-once, the business effect in PostgreSQL is applied once (or equivalently, multiple applications are harmless).

In practice, you rarely get true end-to-end exactly-once delivery across all components. What you can reliably build is exactly-once effects at the database boundary: the consumer may see duplicates, but PostgreSQL state changes remain correct.

Core patterns for exactly-once effects

Pattern 1: Idempotent writes using a unique event identifier

The simplest approach is to ensure every event has a stable unique identifier (often a UUID). You then record which event IDs have been applied. If the same event arrives again, you detect it and skip applying the effect.

There are two common ways to do this in PostgreSQL:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • Insert the event ID into a “processed events” table with a unique constraint, and only apply the effect if the insert succeeds.
  • Use the event ID as a unique key directly on the target table (when the effect is naturally represented as a row keyed by event ID).

Pattern 2: Deduplication via UPSERT (INSERT ... ON CONFLICT)

PostgreSQL’s UPSERT lets you express “insert if new, otherwise do nothing” or “insert if new, otherwise update deterministically.” This is a workhorse for idempotency because it moves the race-condition handling into the database’s concurrency control.

Pattern 3: Transactional coupling of dedup + effect

To achieve exactly-once effects, deduplication and the business update must be in the same database transaction. Otherwise, you can mark an event as processed but fail to apply the effect (or apply the effect but fail to mark processed), leading to inconsistencies on retries.

Designing a processed-events table

A dedicated table is a flexible, explicit way to deduplicate across many different projections or side effects.

Schema

CREATE TABLE processed_event_ids (  consumer_name text NOT NULL,  event_id uuid NOT NULL,  processed_at timestamptz NOT NULL DEFAULT now(),  PRIMARY KEY (consumer_name, event_id));

Key points:

  • consumer_name scopes deduplication to a specific consumer/projection. Two different consumers can legitimately process the same event independently.
  • The composite primary key enforces uniqueness and provides an efficient lookup.
  • processed_at is useful for debugging and for retention policies.

Step-by-step: apply an event exactly once

Assume an incoming event with event_id and payload that should update a read model table (for example, a customer balance projection). The steps are:

  • Start a transaction.
  • Attempt to insert (consumer_name, event_id) into processed_event_ids.
  • If the insert succeeds, apply the business update.
  • If the insert conflicts, do nothing (the event is a duplicate for this consumer).
  • Commit.

In SQL, you can express this with a CTE so the effect runs only when the insert succeeds:

BEGIN;WITH inserted AS (  INSERT INTO processed_event_ids (consumer_name, event_id)  VALUES ('balance_projection', $1::uuid)  ON CONFLICT DO NOTHING  RETURNING 1)UPDATE customer_balance bSET balance_cents = b.balance_cents + $2::bigintFROM insertedWHERE b.customer_id = $3::uuid;COMMIT;

How it works:

  • If the event is new, inserted returns one row, and the UPDATE runs.
  • If the event is a duplicate, inserted returns zero rows, and the UPDATE affects zero rows.

This is safe under concurrency: two workers racing to process the same event will contend on the primary key; only one will “win” and apply the update.

Handling missing target rows

Sometimes the projection row might not exist yet (e.g., first time you see a customer). You can combine insert-or-update logic with the same gating CTE:

BEGIN;WITH inserted AS (  INSERT INTO processed_event_ids (consumer_name, event_id)  VALUES ('balance_projection', $1::uuid)  ON CONFLICT DO NOTHING  RETURNING 1)INSERT INTO customer_balance (customer_id, balance_cents)SELECT $3::uuid, $2::bigintFROM insertedON CONFLICT (customer_id) DO UPDATESET balance_cents = customer_balance.balance_cents + EXCLUDED.balance_cents;COMMIT;

This applies the delta exactly once per event, even if the row is created on the fly.

Deduplication without a separate table

Using event_id as a natural unique key

If the effect of an event is “create a row representing this event,” you can store the event ID directly in that table with a unique constraint. Example: a table of issued invoices where each invoice is created by an event.

CREATE TABLE invoice (  invoice_id uuid PRIMARY KEY,  created_from_event_id uuid UNIQUE NOT NULL,  customer_id uuid NOT NULL,  amount_cents bigint NOT NULL,  created_at timestamptz NOT NULL DEFAULT now());

Then your consumer can do:

INSERT INTO invoice (invoice_id, created_from_event_id, customer_id, amount_cents)VALUES ($1::uuid, $2::uuid, $3::uuid, $4::bigint)ON CONFLICT (created_from_event_id) DO NOTHING;

This is idempotent: duplicates are ignored. It is also self-documenting: you can trace which event created the row.

When this is not enough

This approach works when the event maps to a single row creation. It is less suitable when the event applies a delta to an existing row (balances, counters, inventory), because you need to ensure the delta is not applied twice. In those cases, a processed-events table or a per-entity dedup ledger is usually better.

Exactly-once effects for counters and balances

Counters and balances are where duplicates hurt the most because the operation is not naturally idempotent (adding 10 twice is not the same as adding 10 once). The solution is to make the effect conditional on “first time seen.”

Per-entity dedup ledger

Instead of a global processed-events table, you can keep a ledger keyed by the entity (e.g., account) and event ID. This can improve locality and make retention easier.

CREATE TABLE account_postings (  account_id uuid NOT NULL,  event_id uuid NOT NULL,  delta_cents bigint NOT NULL,  posted_at timestamptz NOT NULL DEFAULT now(),  PRIMARY KEY (account_id, event_id));

Then apply the posting and update the balance in one transaction:

BEGIN;WITH posting AS (  INSERT INTO account_postings (account_id, event_id, delta_cents)  VALUES ($1::uuid, $2::uuid, $3::bigint)  ON CONFLICT DO NOTHING  RETURNING delta_cents)UPDATE account_balanceSET balance_cents = balance_cents + (SELECT COALESCE(SUM(delta_cents), 0) FROM posting)WHERE account_id = $1::uuid;COMMIT;

Notes:

  • If the event is duplicate, posting returns no rows and the balance update adds 0.
  • This pattern also leaves an auditable trail of applied deltas.

Idempotency with deterministic recomputation

Another approach is to avoid applying deltas and instead compute the current value deterministically from a set of facts. For example, you can store postings and compute balances as a sum. This can be done on demand or via a materialized projection. The idempotency then reduces to ensuring postings are unique by event ID.

Even if you maintain a cached balance for performance, keeping the postings table gives you a way to repair or verify the cache.

Deduplication windows, retention, and storage growth

A processed-events table grows over time. You need a retention strategy that matches your replay and retry behavior.

Choosing a retention period

  • If duplicates only occur due to short-term retries, you can retain processed IDs for a limited window (e.g., 7–30 days).
  • If you support long replays (months), you may need longer retention or a different strategy (e.g., per-entity ledgers that are part of the domain record).

Time-based cleanup

If you choose a window, you can delete old rows:

DELETE FROM processed_event_idsWHERE consumer_name = 'balance_projection'  AND processed_at < now() - interval '30 days';

Operational considerations:

  • Run cleanup in small batches to avoid long locks and bloat.
  • Make sure your replay horizon does not exceed your dedup retention, or duplicates during replay will be treated as new.

Partitioning for large volumes

For high-throughput consumers, consider partitioning processed_event_ids by time (e.g., monthly) so you can drop old partitions quickly. The primary key must include the partition key if you want global uniqueness per partitioning scheme; alternatively, keep uniqueness within partitions and ensure your application’s dedup window aligns with partitions.

Idempotent updates with “last seen version” (ordering-aware dedup)

Sometimes you don’t just need to ignore duplicates; you need to ignore out-of-order or older events. A common technique is to store a per-entity “last applied position” (a monotonic sequence number, version, or logical timestamp) and only apply updates that move the state forward.

Schema example

CREATE TABLE customer_profile_projection (  customer_id uuid PRIMARY KEY,  email text,  phone text,  last_event_version bigint NOT NULL DEFAULT 0);

Step-by-step conditional update

Assume each event for a customer carries event_version that increases by 1 per customer.

UPDATE customer_profile_projectionSET email = $2::text,    phone = $3::text,    last_event_version = $4::bigintWHERE customer_id = $1::uuid  AND last_event_version < $4::bigint;

This is idempotent and ordering-safe:

  • Duplicate event with same version: condition fails after first apply.
  • Older event arrives late: condition fails.
  • Newer event applies successfully.

Trade-off: you must have a trustworthy monotonic version per entity. If you only have a global stream position, you can store last_stream_position instead, but be careful when multiple streams/entities interleave.

Exactly-once effects when producing outgoing messages (outbox-style effects)

A common “exactly-once effect” requirement is: update PostgreSQL state and emit a message/event exactly once. The hard part is avoiding the case where the database commits but the message publish fails (or vice versa).

The database-friendly way to get exactly-once effects is to persist the intent to publish in the same transaction as the state change, and have a separate publisher process send it. The deduplication key for publishing is the outbox row’s primary key (or event ID), and the publisher must be idempotent too (safe to retry sending).

Minimal outbox table for idempotent publishing

CREATE TABLE outbox_message (  outbox_id bigserial PRIMARY KEY,  message_id uuid NOT NULL UNIQUE,  topic text NOT NULL,  payload jsonb NOT NULL,  created_at timestamptz NOT NULL DEFAULT now(),  published_at timestamptz);

In the same transaction where you apply an incoming event (deduped), you insert an outbox row with a stable message_id (often derived from the triggering event ID).

BEGIN;WITH inserted AS (  INSERT INTO processed_event_ids (consumer_name, event_id)  VALUES ('billing', $1::uuid)  ON CONFLICT DO NOTHING  RETURNING 1),effect AS (  UPDATE invoice_status  SET status = 'paid'  FROM inserted  WHERE invoice_status.invoice_id = $2::uuid  RETURNING invoice_id)INSERT INTO outbox_message (message_id, topic, payload)SELECT gen_random_uuid(), 'invoice.paid', jsonb_build_object('invoice_id', invoice_id)FROM effect;COMMIT;

Important detail: the outbox insert is gated by the same dedup decision. If the incoming event is duplicate, no outbox message is created, preventing duplicate downstream notifications.

Idempotent publisher loop

The publisher reads unpublished rows and marks them published. The marking must be safe under concurrency (multiple publisher instances) and safe under retries.

WITH next AS (  SELECT outbox_id  FROM outbox_message  WHERE published_at IS NULL  ORDER BY outbox_id  FOR UPDATE SKIP LOCKED  LIMIT 100)UPDATE outbox_message oSET published_at = now()FROM nextWHERE o.outbox_id = next.outbox_idRETURNING o.outbox_id, o.message_id, o.topic, o.payload;

Your application publishes the returned messages. If publishing fails after the DB update, you risk losing the message. To avoid that, many implementations reverse the order: lock rows, publish, then mark published. That introduces the opposite risk (duplicates) if the process crashes after publish but before marking. The practical resolution is to make the publish operation idempotent using message_id as a dedup key on the broker/consumer side, or to use a broker that supports producer idempotency keyed by message ID. From PostgreSQL’s perspective, the outbox ensures the message is not forgotten; idempotent publishing ensures duplicates are harmless.

Common pitfalls and how to avoid them

Pitfall: dedup outside the transaction

If you check “have I processed this event?” with a SELECT, then later write effects, you have a race condition: two workers can both see “not processed” and both apply the effect. Always use a unique constraint plus ON CONFLICT (or equivalent) inside the same transaction as the effect.

Pitfall: non-deterministic updates on conflict

UPSERT can be idempotent or not depending on the update clause. For example, SET count = count + 1 on conflict is not idempotent for duplicates. Prefer either DO NOTHING or deterministic updates based on immutable event data, or gate the update behind a processed-event insert as shown earlier.

Pitfall: using timestamps as dedup keys

Timestamps are not stable identifiers. Two distinct events can share a timestamp, and the same event can be reserialized with a different timestamp. Use a true event ID or a deterministic hash of immutable fields (only if you are certain the fields are identical across retries).

Pitfall: retention shorter than replay horizon

If you delete processed IDs after 30 days but later replay 90 days of events, the consumer will reapply effects for older events. Align retention with operational practices, or use per-entity ledgers that are part of the durable domain record.

Testing idempotency and deduplication in PostgreSQL

Step-by-step test plan

  • Pick a representative event type that updates a projection (e.g., balance delta).
  • Apply the event once; record the resulting state.
  • Apply the same event again (same event_id); verify state is unchanged.
  • Apply the same event concurrently from two sessions; verify state changes once.
  • Simulate crash between dedup and effect by forcing an error; verify transaction rollback leaves neither dedup marker nor partial effect.

Concurrent duplicate simulation

In two psql sessions, run the same transaction with the same event_id. With the CTE gating pattern, one session will apply the update; the other will do nothing. Your assertion is that the final state equals “applied once.”

Choosing the right approach

Use these guidelines:

  • If the effect is “create a row,” prefer a unique constraint on event_id (or created_from_event_id) and ON CONFLICT DO NOTHING.
  • If the effect is a delta (counters/balances), gate the update behind a dedup insert (processed-events table) or record the delta in a unique ledger table and derive/update totals.
  • If events can arrive out of order, store and compare a per-entity version/position and only apply forward-moving updates.
  • If you must emit outgoing messages, persist an outbox record in the same transaction as the state change, and make publishing idempotent using stable message IDs.

Now answer the exercise about the content:

Which approach best ensures exactly-once effects when an event may be delivered multiple times to a PostgreSQL consumer that applies a balance delta?

You are right! Congratulations, now go to the next page

You missed! Try again.

Exactly-once effects at the database boundary require deduplication and the business update to be coupled in the same transaction. A unique constraint with ON CONFLICT (often gated via a CTE) ensures duplicates do not reapply the delta, even under concurrency.

Next chapter

Time, Sequence, and Ordering Guarantees in Event Streams

Arrow Right Icon
Free Ebook cover Event-Driven Data Modeling with PostgreSQL: Designing for Streams, Time, and Change
20%

Event-Driven Data Modeling with PostgreSQL: Designing for Streams, Time, and Change

New course

15 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.