All courses > Technology and Programming > Databases ::

Modeling Append-Only Event Tables in PostgreSQL

Capítulo 2

Estimated reading time: 14 minutes

What “append-only” means in PostgreSQL event tables

An append-only event table is a relational table designed so that rows are inserted once and never updated or deleted. Each row represents an immutable fact that happened at a point in time: an event. In PostgreSQL, “append-only” is primarily a modeling and operational discipline rather than a built-in table type: you design schema constraints, permissions, and ingestion patterns so that the only permitted data change is INSERT.

This style is useful when you want a durable, auditable record of change over time. Instead of storing only the latest state (for example, the current email address), you store a sequence of events (for example, email changed from A to B at time T). Downstream views, projections, and read models can then be derived from the event stream.

In practice, modeling append-only event tables in PostgreSQL involves decisions about: event identity and ordering, time columns, payload representation, constraints that enforce immutability, indexing for common access patterns, partitioning for growth, and operational safeguards for ingestion and retention.

Core columns: the minimal event envelope

Most event tables benefit from a small set of “envelope” columns that are consistent across event types. A common baseline includes:

event_id: a unique identifier for the event (often a UUID).
stream_id: the identifier of the entity or stream the event belongs to (for example, customer_id, order_id).
event_type: a discriminator such as customer.email_changed.
occurred_at: when the event happened in the domain (business time).
recorded_at: when the event was recorded in the database (ingestion time).
payload: event data (often JSONB) containing event-specific fields.
metadata: optional JSONB for correlation IDs, causation IDs, actor, tenant, source, schema version, etc.

Two time columns are worth emphasizing. occurred_at captures the time the event is said to have happened (which may arrive late or be corrected by later events). recorded_at captures the time the database accepted the insert and is useful for operational ordering, backfills, and debugging ingestion pipelines.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Step-by-step: create a generic append-only event table

The following example models a single table that can store multiple event types for a given domain. It uses UUIDs, JSONB payloads, and constraints that support immutability and ordering.

1) Enable useful extensions

CREATE EXTENSION IF NOT EXISTS pgcrypto;

pgcrypto provides gen_random_uuid() for event IDs.

2) Create the table

CREATE TABLE app_events (  event_id     uuid PRIMARY KEY DEFAULT gen_random_uuid(),  stream_id    uuid NOT NULL,  event_type   text NOT NULL,  occurred_at  timestamptz NOT NULL,  recorded_at  timestamptz NOT NULL DEFAULT now(),  payload      jsonb NOT NULL,  metadata     jsonb NOT NULL DEFAULT '{}'::jsonb,  -- Optional: a per-stream sequence number for deterministic ordering  stream_seq   bigint NOT NULL,  -- Optional: schema versioning for payload evolution  schema_version integer NOT NULL DEFAULT 1,  -- Basic sanity checks  CONSTRAINT payload_is_object CHECK (jsonb_typeof(payload) = 'object'),  CONSTRAINT metadata_is_object CHECK (jsonb_typeof(metadata) = 'object'));

This design includes stream_seq, a monotonically increasing sequence number per stream_id. It is extremely useful for deterministic ordering within a stream, especially when multiple events share the same timestamp or when clocks are not perfectly synchronized.

3) Enforce per-stream ordering uniqueness

CREATE UNIQUE INDEX app_events_stream_seq_ux  ON app_events(stream_id, stream_seq);

This prevents two events from claiming the same position in the same stream.

4) Add indexes for common queries

Typical access patterns include: read all events for a stream in order, read events by type in a time window, and read events since a checkpoint.

-- Stream replay (most common)CREATE INDEX app_events_stream_order_ix  ON app_events(stream_id, stream_seq);-- Time-based scans (operational, backfills, analytics)CREATE INDEX app_events_recorded_at_ix  ON app_events(recorded_at);CREATE INDEX app_events_occurred_at_ix  ON app_events(occurred_at);-- Filter by event typeCREATE INDEX app_events_type_recorded_ix  ON app_events(event_type, recorded_at);

If you frequently filter on fields inside payload (for example, payload->>'email'), consider adding expression indexes for those specific fields rather than relying on a broad GIN index everywhere.

5) Enforce append-only behavior (no UPDATE/DELETE)

Append-only is best enforced with a combination of privileges and triggers. Privileges prevent accidental writes by most roles; triggers provide a safety net.

-- Example roles (adapt to your environment)REVOKE UPDATE, DELETE ON app_events FROM PUBLIC;-- Only allow inserts to the writer roleGRANT INSERT, SELECT ON app_events TO app_writer;GRANT SELECT ON app_events TO app_reader;

Then add a trigger to block updates and deletes even if a privileged role tries to run them.

CREATE OR REPLACE FUNCTION prevent_event_mutation()RETURNS trigger LANGUAGE plpgsql AS $$BEGIN  RAISE EXCEPTION 'append-only table: % on % is not allowed', TG_OP, TG_TABLE_NAME;END;$$;CREATE TRIGGER app_events_no_update  BEFORE UPDATE ON app_events  FOR EACH ROW EXECUTE FUNCTION prevent_event_mutation();CREATE TRIGGER app_events_no_delete  BEFORE DELETE ON app_events  FOR EACH ROW EXECUTE FUNCTION prevent_event_mutation();

With this in place, the only way to “change history” is to insert compensating events (for example, customer.email_corrected) or to run privileged maintenance operations in a controlled manner.

Choosing an event identity and ordering strategy

Event tables need a clear notion of identity (to deduplicate) and ordering (to replay deterministically). PostgreSQL gives you multiple options, each with trade-offs.

Event identity: UUID, ULID, or natural keys

UUID is common and easy. It is random by default, which can reduce index locality (more page splits) at very high insert rates.
Time-sortable IDs (such as ULID-like schemes) improve locality. PostgreSQL doesn’t ship ULID natively, but you can store them as uuid or text if your application generates them.
Natural keys (like (source_system, source_event_id)) can be used for deduplication when ingesting from external systems.

A practical approach is to keep event_id as a UUID primary key and add a separate unique constraint for idempotency when needed (for example, (metadata->>'idempotency_key') or (source, source_event_id)).

Ordering: timestamps vs per-stream sequence

Relying only on occurred_at for ordering can be ambiguous when multiple events share the same timestamp or when events arrive out of order. A per-stream sequence number (stream_seq) provides a stable ordering for replay and simplifies “read from position N” queries.

To allocate stream_seq, you typically do it transactionally at write time. One pattern is to maintain a per-stream counter table.

Step-by-step: allocate per-stream sequence numbers safely

Create a table that tracks the next sequence per stream:

CREATE TABLE event_streams (  stream_id uuid PRIMARY KEY,  next_seq  bigint NOT NULL);

Then, in a single transaction, upsert the stream row and use it to assign the next sequence number. Here is a pattern that avoids race conditions by locking the stream row:

BEGIN;-- Ensure stream row existsINSERT INTO event_streams(stream_id, next_seq)VALUES ($1, 1)ON CONFLICT (stream_id) DO NOTHING;-- Lock and fetch next sequenceSELECT next_seqFROM event_streamsWHERE stream_id = $1FOR UPDATE;-- Use the fetched next_seq as stream_seq in the event insertINSERT INTO app_events(stream_id, stream_seq, event_type, occurred_at, payload, metadata)VALUES ($1, $2, $3, $4, $5::jsonb, $6::jsonb);-- Increment next_seqUPDATE event_streamsSET next_seq = next_seq + 1WHERE stream_id = $1;COMMIT;

This approach guarantees that each stream gets a contiguous sequence with no duplicates. If you can tolerate gaps (common in distributed systems), you can use other allocation methods, but contiguous per-stream sequences are convenient for correctness checks and replay logic.

Modeling the payload: JSONB vs typed columns

Event payloads can be stored in different ways:

JSONB payload: flexible, easy to evolve, good for heterogeneous event types in one table. You trade some type safety and may need additional indexes for performance.
Typed columns: strong constraints and fast queries for specific fields, but schema changes are more frequent and multi-type tables become awkward.
Hybrid: keep a JSONB payload plus a few extracted columns that are frequently queried (for example, tenant_id, actor_id, order_total).

A common pattern is to keep the canonical event in JSONB and add generated or maintained columns for high-value query dimensions.

Example: extracted columns for filtering

ALTER TABLE app_events  ADD COLUMN tenant_id uuid;-- Backfill tenant_id from metadata if presentUPDATE app_eventsSET tenant_id = NULLIF(metadata->>'tenant_id','')::uuidWHERE tenant_id IS NULL;CREATE INDEX app_events_tenant_recorded_ix  ON app_events(tenant_id, recorded_at);

If you always include tenant_id in metadata, you can set it at insert time in the application, or use a trigger to populate it. Triggers add overhead; application-side population is usually simpler and faster.

Immutability beyond UPDATE/DELETE: handling corrections and redactions

Append-only does not mean “never fix mistakes”; it means you fix them by appending new facts. Two common cases:

Corrections: an earlier event was wrong (wrong amount, wrong email). You append a correction event that references the earlier event via metadata (for example, metadata.correction_of_event_id).
Redactions: sensitive data should not be stored long-term. Prefer not to put secrets in events in the first place. If you must handle regulatory deletion, you can separate sensitive fields into a different store or encrypt them with key rotation. If physical deletion is required, it breaks strict append-only; treat it as a controlled maintenance process with audit logging.

For corrections, it helps to standardize metadata fields:

-- Example metadata fields inside JSONB-- {--   "correlation_id": "...",--   "causation_id": "...",--   "actor": "user:123",--   "correction_of": "",--   "idempotency_key": "..."-- }

Idempotency and deduplication at the database layer

Event ingestion often needs to be idempotent: if the same command is retried, you should not insert duplicate events. PostgreSQL can enforce this with unique constraints.

Step-by-step: idempotency key constraint

Assume the writer includes an idempotency key in metadata. You can enforce uniqueness per stream (or per tenant) using an expression index.

CREATE UNIQUE INDEX app_events_idempotency_uxON app_events (stream_id, (metadata->>'idempotency_key'))WHERE (metadata ? 'idempotency_key');

Then inserts can use ON CONFLICT to avoid duplicates:

INSERT INTO app_events(stream_id, stream_seq, event_type, occurred_at, payload, metadata)VALUES ($1, $2, $3, $4, $5::jsonb, $6::jsonb)ON CONFLICT ON CONSTRAINT app_events_idempotency_uxDO NOTHING;

Be careful: if you also enforce (stream_id, stream_seq) uniqueness, you need a consistent strategy for allocating stream_seq on retries (for example, allocate sequence only after checking idempotency, or reuse the same stream_seq for the same idempotency key).

Partitioning append-only event tables for scale

Event tables grow continuously. PostgreSQL partitioning helps manage large volumes by splitting data into smaller physical chunks, improving maintenance (vacuum, index rebuilds), and enabling partition pruning for time-based queries.

Two common partitioning strategies:

Partition by recorded_at (time): good for operational queries, retention policies, and bulk archival.
Partition by tenant_id (hash/list): good for multi-tenant isolation and parallelism, but retention becomes more complex.

Time-based partitioning is often the simplest starting point.

Step-by-step: monthly partitions by recorded_at

CREATE TABLE app_events_p (  event_id     uuid NOT NULL,  stream_id    uuid NOT NULL,  event_type   text NOT NULL,  occurred_at  timestamptz NOT NULL,  recorded_at  timestamptz NOT NULL,  payload      jsonb NOT NULL,  metadata     jsonb NOT NULL,  stream_seq   bigint NOT NULL,  schema_version integer NOT NULL,  PRIMARY KEY (event_id, recorded_at)) PARTITION BY RANGE (recorded_at);

Note: with partitioning, primary keys must include the partition key unless you use other approaches. Including recorded_at in the primary key is a common workaround.

Create partitions:

CREATE TABLE app_events_2026_01 PARTITION OF app_events_p  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');CREATE TABLE app_events_2026_02 PARTITION OF app_events_p  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Add indexes on each partition (or create them on the parent in newer PostgreSQL versions to propagate):

CREATE INDEX ON app_events_2026_01 (stream_id, stream_seq);CREATE INDEX ON app_events_2026_01 (event_type, recorded_at);

When partitioning by time, you typically insert using recorded_at defaulting to now(). If you backfill historical events with old recorded_at values, ensure the corresponding partitions exist.

Constraints and validation for event quality

Because events are immutable, validating them at insert time is valuable. PostgreSQL constraints can catch many issues early:

NOT NULL on required envelope fields.
CHECK constraints for JSON shape basics (object vs array).
CHECK constraints for allowed event types (if you have a controlled vocabulary).
Foreign keys are usually avoided for event streams (to prevent write coupling), but can be used carefully for reference data like tenants.

Example: constrain event_type to a known set

CREATE TABLE event_types (  event_type text PRIMARY KEY);INSERT INTO event_types(event_type)VALUES ('customer.created'), ('customer.email_changed'), ('order.placed');ALTER TABLE app_events  ADD CONSTRAINT app_events_event_type_fk  FOREIGN KEY (event_type) REFERENCES event_types(event_type);

This adds safety but also adds coupling: you must deploy new event types to event_types before writing them. Many teams prefer a looser approach: validate event types in application code and keep the database generic.

Reading patterns: replay, snapshots, and checkpoints

Append-only event tables are written once and read many times. Efficient read patterns depend on stable ordering and good indexes.

Replay a single stream

SELECT event_id, event_type, occurred_at, payload, metadata, stream_seqFROM app_eventsWHERE stream_id = $1ORDER BY stream_seq;

This query should use the (stream_id, stream_seq) index and remain fast even as the table grows, because it touches a narrow slice per stream.

Read from a checkpoint (per stream)

SELECT event_id, event_type, occurred_at, payload, metadata, stream_seqFROM app_eventsWHERE stream_id = $1  AND stream_seq > $2ORDER BY stream_seq;

This is the basis for incremental projections: store the last processed stream_seq and resume from there.

Read by recorded_at for catch-up processing

SELECT event_id, stream_id, event_type, recorded_at, payloadFROM app_eventsWHERE recorded_at >= $1 AND recorded_at < $2ORDER BY recorded_at, event_id;

Ordering by recorded_at is useful for operational pipelines. Adding event_id as a tie-breaker makes pagination stable.

Operational safeguards: vacuum, fillfactor, and bloat considerations

Append-only tables have a favorable bloat profile because they avoid updates, which are a major source of dead tuples. However, you still need to consider:

Index growth: indexes grow with every insert. Keep only the indexes you need.
HOT updates are irrelevant: since you don’t update, you don’t benefit from HOT; focus on insert performance.
Autovacuum: still needed for visibility map maintenance and to vacuum indexes in some cases, but workload is generally simpler than update-heavy tables.
Fillfactor: for append-only tables, default fillfactor is usually fine; for indexes, consider settings only if you observe heavy page splits.

If you partition by time, you can reduce maintenance by setting older partitions to read-only and, if needed, moving them to cheaper storage or dumping them to archival systems.

Design variations: one table for all events vs per-aggregate tables

There are two common structural approaches:

Single event table: one table stores all event types across the domain. Pros: simple ingestion, uniform tooling, easy global queries. Cons: very large table, heterogeneous payloads, more indexing compromises.
Multiple event tables: separate tables per bounded area or per stream category (for example, customer_events, order_events). Pros: clearer ownership, tailored indexes, smaller tables. Cons: more schema objects, cross-domain queries require unions.

A practical compromise is to keep one table per major domain area and standardize the envelope columns across them so that ingestion and querying patterns remain consistent.

Putting it together: a practical insert workflow

To make append-only event tables reliable, combine: transactional sequence allocation, idempotency, and strict immutability. A typical write path looks like:

Validate the command in application code.
Compute the event payload and metadata (including correlation/idempotency keys).
Start a transaction.
Check idempotency (or rely on a unique constraint and handle conflict).
Allocate stream_seq for the stream.
Insert the event row.
Commit.

When you follow this discipline, PostgreSQL becomes a robust event store: inserts are fast, history is preserved, and consumers can replay deterministically using stream_seq and time columns.

Now answer the exercise about the content:

Which approach best enforces an append-only event table in PostgreSQL so that history is preserved and changes are made by new facts?

You are right! Congratulations, now go to the next page

You missed! Try again.

Append-only is enforced by preventing UPDATE/DELETE through privileges plus safety-net triggers. Corrections are handled by inserting new compensating or correction events rather than mutating existing rows.