Why idempotency matters in event-driven PostgreSQL systems
In event-driven systems, the same event can be delivered more than once. Retries happen because of network timeouts, consumer crashes, broker redeliveries, or manual replays. If your database writes are not designed to tolerate duplicates, you can accidentally double-apply business effects: double-charge a customer, double-decrement inventory, or create duplicate rows that later require cleanup.
Idempotency is the property that applying the same operation multiple times produces the same final state as applying it once. Deduplication is the mechanism you use to detect and ignore duplicates. “Exactly-once effects” means that even if the message delivery is at-least-once, the business effect in PostgreSQL is applied once (or equivalently, multiple applications are harmless).
In practice, you rarely get true end-to-end exactly-once delivery across all components. What you can reliably build is exactly-once effects at the database boundary: the consumer may see duplicates, but PostgreSQL state changes remain correct.
Core patterns for exactly-once effects
Pattern 1: Idempotent writes using a unique event identifier
The simplest approach is to ensure every event has a stable unique identifier (often a UUID). You then record which event IDs have been applied. If the same event arrives again, you detect it and skip applying the effect.
There are two common ways to do this in PostgreSQL:
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
- Insert the event ID into a “processed events” table with a unique constraint, and only apply the effect if the insert succeeds.
- Use the event ID as a unique key directly on the target table (when the effect is naturally represented as a row keyed by event ID).
Pattern 2: Deduplication via UPSERT (INSERT ... ON CONFLICT)
PostgreSQL’s UPSERT lets you express “insert if new, otherwise do nothing” or “insert if new, otherwise update deterministically.” This is a workhorse for idempotency because it moves the race-condition handling into the database’s concurrency control.
Pattern 3: Transactional coupling of dedup + effect
To achieve exactly-once effects, deduplication and the business update must be in the same database transaction. Otherwise, you can mark an event as processed but fail to apply the effect (or apply the effect but fail to mark processed), leading to inconsistencies on retries.
Designing a processed-events table
A dedicated table is a flexible, explicit way to deduplicate across many different projections or side effects.
Schema
CREATE TABLE processed_event_ids ( consumer_name text NOT NULL, event_id uuid NOT NULL, processed_at timestamptz NOT NULL DEFAULT now(), PRIMARY KEY (consumer_name, event_id));Key points:
consumer_namescopes deduplication to a specific consumer/projection. Two different consumers can legitimately process the same event independently.- The composite primary key enforces uniqueness and provides an efficient lookup.
processed_atis useful for debugging and for retention policies.
Step-by-step: apply an event exactly once
Assume an incoming event with event_id and payload that should update a read model table (for example, a customer balance projection). The steps are:
- Start a transaction.
- Attempt to insert
(consumer_name, event_id)intoprocessed_event_ids. - If the insert succeeds, apply the business update.
- If the insert conflicts, do nothing (the event is a duplicate for this consumer).
- Commit.
In SQL, you can express this with a CTE so the effect runs only when the insert succeeds:
BEGIN;WITH inserted AS ( INSERT INTO processed_event_ids (consumer_name, event_id) VALUES ('balance_projection', $1::uuid) ON CONFLICT DO NOTHING RETURNING 1)UPDATE customer_balance bSET balance_cents = b.balance_cents + $2::bigintFROM insertedWHERE b.customer_id = $3::uuid;COMMIT;How it works:
- If the event is new,
insertedreturns one row, and theUPDATEruns. - If the event is a duplicate,
insertedreturns zero rows, and theUPDATEaffects zero rows.
This is safe under concurrency: two workers racing to process the same event will contend on the primary key; only one will “win” and apply the update.
Handling missing target rows
Sometimes the projection row might not exist yet (e.g., first time you see a customer). You can combine insert-or-update logic with the same gating CTE:
BEGIN;WITH inserted AS ( INSERT INTO processed_event_ids (consumer_name, event_id) VALUES ('balance_projection', $1::uuid) ON CONFLICT DO NOTHING RETURNING 1)INSERT INTO customer_balance (customer_id, balance_cents)SELECT $3::uuid, $2::bigintFROM insertedON CONFLICT (customer_id) DO UPDATESET balance_cents = customer_balance.balance_cents + EXCLUDED.balance_cents;COMMIT;This applies the delta exactly once per event, even if the row is created on the fly.
Deduplication without a separate table
Using event_id as a natural unique key
If the effect of an event is “create a row representing this event,” you can store the event ID directly in that table with a unique constraint. Example: a table of issued invoices where each invoice is created by an event.
CREATE TABLE invoice ( invoice_id uuid PRIMARY KEY, created_from_event_id uuid UNIQUE NOT NULL, customer_id uuid NOT NULL, amount_cents bigint NOT NULL, created_at timestamptz NOT NULL DEFAULT now());Then your consumer can do:
INSERT INTO invoice (invoice_id, created_from_event_id, customer_id, amount_cents)VALUES ($1::uuid, $2::uuid, $3::uuid, $4::bigint)ON CONFLICT (created_from_event_id) DO NOTHING;This is idempotent: duplicates are ignored. It is also self-documenting: you can trace which event created the row.
When this is not enough
This approach works when the event maps to a single row creation. It is less suitable when the event applies a delta to an existing row (balances, counters, inventory), because you need to ensure the delta is not applied twice. In those cases, a processed-events table or a per-entity dedup ledger is usually better.
Exactly-once effects for counters and balances
Counters and balances are where duplicates hurt the most because the operation is not naturally idempotent (adding 10 twice is not the same as adding 10 once). The solution is to make the effect conditional on “first time seen.”
Per-entity dedup ledger
Instead of a global processed-events table, you can keep a ledger keyed by the entity (e.g., account) and event ID. This can improve locality and make retention easier.
CREATE TABLE account_postings ( account_id uuid NOT NULL, event_id uuid NOT NULL, delta_cents bigint NOT NULL, posted_at timestamptz NOT NULL DEFAULT now(), PRIMARY KEY (account_id, event_id));Then apply the posting and update the balance in one transaction:
BEGIN;WITH posting AS ( INSERT INTO account_postings (account_id, event_id, delta_cents) VALUES ($1::uuid, $2::uuid, $3::bigint) ON CONFLICT DO NOTHING RETURNING delta_cents)UPDATE account_balanceSET balance_cents = balance_cents + (SELECT COALESCE(SUM(delta_cents), 0) FROM posting)WHERE account_id = $1::uuid;COMMIT;Notes:
- If the event is duplicate,
postingreturns no rows and the balance update adds 0. - This pattern also leaves an auditable trail of applied deltas.
Idempotency with deterministic recomputation
Another approach is to avoid applying deltas and instead compute the current value deterministically from a set of facts. For example, you can store postings and compute balances as a sum. This can be done on demand or via a materialized projection. The idempotency then reduces to ensuring postings are unique by event ID.
Even if you maintain a cached balance for performance, keeping the postings table gives you a way to repair or verify the cache.
Deduplication windows, retention, and storage growth
A processed-events table grows over time. You need a retention strategy that matches your replay and retry behavior.
Choosing a retention period
- If duplicates only occur due to short-term retries, you can retain processed IDs for a limited window (e.g., 7–30 days).
- If you support long replays (months), you may need longer retention or a different strategy (e.g., per-entity ledgers that are part of the domain record).
Time-based cleanup
If you choose a window, you can delete old rows:
DELETE FROM processed_event_idsWHERE consumer_name = 'balance_projection' AND processed_at < now() - interval '30 days';Operational considerations:
- Run cleanup in small batches to avoid long locks and bloat.
- Make sure your replay horizon does not exceed your dedup retention, or duplicates during replay will be treated as new.
Partitioning for large volumes
For high-throughput consumers, consider partitioning processed_event_ids by time (e.g., monthly) so you can drop old partitions quickly. The primary key must include the partition key if you want global uniqueness per partitioning scheme; alternatively, keep uniqueness within partitions and ensure your application’s dedup window aligns with partitions.
Idempotent updates with “last seen version” (ordering-aware dedup)
Sometimes you don’t just need to ignore duplicates; you need to ignore out-of-order or older events. A common technique is to store a per-entity “last applied position” (a monotonic sequence number, version, or logical timestamp) and only apply updates that move the state forward.
Schema example
CREATE TABLE customer_profile_projection ( customer_id uuid PRIMARY KEY, email text, phone text, last_event_version bigint NOT NULL DEFAULT 0);Step-by-step conditional update
Assume each event for a customer carries event_version that increases by 1 per customer.
UPDATE customer_profile_projectionSET email = $2::text, phone = $3::text, last_event_version = $4::bigintWHERE customer_id = $1::uuid AND last_event_version < $4::bigint;This is idempotent and ordering-safe:
- Duplicate event with same version: condition fails after first apply.
- Older event arrives late: condition fails.
- Newer event applies successfully.
Trade-off: you must have a trustworthy monotonic version per entity. If you only have a global stream position, you can store last_stream_position instead, but be careful when multiple streams/entities interleave.
Exactly-once effects when producing outgoing messages (outbox-style effects)
A common “exactly-once effect” requirement is: update PostgreSQL state and emit a message/event exactly once. The hard part is avoiding the case where the database commits but the message publish fails (or vice versa).
The database-friendly way to get exactly-once effects is to persist the intent to publish in the same transaction as the state change, and have a separate publisher process send it. The deduplication key for publishing is the outbox row’s primary key (or event ID), and the publisher must be idempotent too (safe to retry sending).
Minimal outbox table for idempotent publishing
CREATE TABLE outbox_message ( outbox_id bigserial PRIMARY KEY, message_id uuid NOT NULL UNIQUE, topic text NOT NULL, payload jsonb NOT NULL, created_at timestamptz NOT NULL DEFAULT now(), published_at timestamptz);In the same transaction where you apply an incoming event (deduped), you insert an outbox row with a stable message_id (often derived from the triggering event ID).
BEGIN;WITH inserted AS ( INSERT INTO processed_event_ids (consumer_name, event_id) VALUES ('billing', $1::uuid) ON CONFLICT DO NOTHING RETURNING 1),effect AS ( UPDATE invoice_status SET status = 'paid' FROM inserted WHERE invoice_status.invoice_id = $2::uuid RETURNING invoice_id)INSERT INTO outbox_message (message_id, topic, payload)SELECT gen_random_uuid(), 'invoice.paid', jsonb_build_object('invoice_id', invoice_id)FROM effect;COMMIT;Important detail: the outbox insert is gated by the same dedup decision. If the incoming event is duplicate, no outbox message is created, preventing duplicate downstream notifications.
Idempotent publisher loop
The publisher reads unpublished rows and marks them published. The marking must be safe under concurrency (multiple publisher instances) and safe under retries.
WITH next AS ( SELECT outbox_id FROM outbox_message WHERE published_at IS NULL ORDER BY outbox_id FOR UPDATE SKIP LOCKED LIMIT 100)UPDATE outbox_message oSET published_at = now()FROM nextWHERE o.outbox_id = next.outbox_idRETURNING o.outbox_id, o.message_id, o.topic, o.payload;Your application publishes the returned messages. If publishing fails after the DB update, you risk losing the message. To avoid that, many implementations reverse the order: lock rows, publish, then mark published. That introduces the opposite risk (duplicates) if the process crashes after publish but before marking. The practical resolution is to make the publish operation idempotent using message_id as a dedup key on the broker/consumer side, or to use a broker that supports producer idempotency keyed by message ID. From PostgreSQL’s perspective, the outbox ensures the message is not forgotten; idempotent publishing ensures duplicates are harmless.
Common pitfalls and how to avoid them
Pitfall: dedup outside the transaction
If you check “have I processed this event?” with a SELECT, then later write effects, you have a race condition: two workers can both see “not processed” and both apply the effect. Always use a unique constraint plus ON CONFLICT (or equivalent) inside the same transaction as the effect.
Pitfall: non-deterministic updates on conflict
UPSERT can be idempotent or not depending on the update clause. For example, SET count = count + 1 on conflict is not idempotent for duplicates. Prefer either DO NOTHING or deterministic updates based on immutable event data, or gate the update behind a processed-event insert as shown earlier.
Pitfall: using timestamps as dedup keys
Timestamps are not stable identifiers. Two distinct events can share a timestamp, and the same event can be reserialized with a different timestamp. Use a true event ID or a deterministic hash of immutable fields (only if you are certain the fields are identical across retries).
Pitfall: retention shorter than replay horizon
If you delete processed IDs after 30 days but later replay 90 days of events, the consumer will reapply effects for older events. Align retention with operational practices, or use per-entity ledgers that are part of the durable domain record.
Testing idempotency and deduplication in PostgreSQL
Step-by-step test plan
- Pick a representative event type that updates a projection (e.g., balance delta).
- Apply the event once; record the resulting state.
- Apply the same event again (same event_id); verify state is unchanged.
- Apply the same event concurrently from two sessions; verify state changes once.
- Simulate crash between dedup and effect by forcing an error; verify transaction rollback leaves neither dedup marker nor partial effect.
Concurrent duplicate simulation
In two psql sessions, run the same transaction with the same event_id. With the CTE gating pattern, one session will apply the update; the other will do nothing. Your assertion is that the final state equals “applied once.”
Choosing the right approach
Use these guidelines:
- If the effect is “create a row,” prefer a unique constraint on
event_id(orcreated_from_event_id) andON CONFLICT DO NOTHING. - If the effect is a delta (counters/balances), gate the update behind a dedup insert (processed-events table) or record the delta in a unique ledger table and derive/update totals.
- If events can arrive out of order, store and compare a per-entity version/position and only apply forward-moving updates.
- If you must emit outgoing messages, persist an outbox record in the same transaction as the state change, and make publishing idempotent using stable message IDs.