Event-Driven Thinking for Database Design

Capítulo 1

Estimated reading time: 13 minutes

+ Exercise

What “event-driven thinking” means in a database context

Event-driven thinking treats the database not only as a place to store the latest state, but as a place to record what happened, when it happened, and why it changed. Instead of modeling your system primarily around mutable entities (for example, “Order” with a status column that gets overwritten), you model around facts that occur in time (for example, “OrderPlaced”, “PaymentAuthorized”, “OrderShipped”). Each fact is an event: an immutable record that something occurred.

This shift changes how you design tables, keys, constraints, queries, and downstream integrations. The core idea is simple: state is derived; events are primary. In practice, you often keep both: an append-only event log for truth and auditability, plus derived read models (tables/materialized views) for fast queries.

Events vs. state: a concrete contrast

State-first modeling typically stores one row per entity and updates it in place. Event-driven modeling stores one row per event and rarely updates it. The “current state” becomes a computation over events.

  • State-first: orders(status, updated_at) is updated from PLACED to PAID to SHIPPED.
  • Event-first: order_events(event_type, occurred_at, payload) gets new rows: OrderPlaced, PaymentCaptured, OrderShipped. Current status is derived by selecting the latest relevant event(s).

Event-driven thinking is not “store everything as JSON and hope for the best.” It is a disciplined approach to capturing change as data, with explicit semantics, ordering, and constraints so that the database can enforce correctness.

Why event-driven thinking affects database design decisions

Immutability and auditability become first-class

When events are immutable, you gain a reliable audit trail: you can answer “what happened?” without relying on overwritten columns or fragile application logs. This influences schema design: you optimize for inserts, ordering, and integrity of append-only records.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Time becomes a key dimension

Events are inherently temporal. Your schema needs a clear approach to timestamps (occurred time vs. recorded time), ordering, and late arrivals. This affects indexing, partitioning, and query patterns. Even if you maintain a current-state table, you will often compute it from events using time-aware logic.

Integration becomes simpler and safer

Many systems need to publish changes to other services, analytics, or search. If your database already stores changes as events, you can stream them out with fewer “diff” computations. The database design can support this by making event identifiers stable, ensuring idempotency, and enabling efficient incremental reads.

Core building blocks of an event-driven database model

1) Event identity

Every event should have a unique identifier that is stable across retries and replays. In PostgreSQL, you commonly use uuid (or bigint sequences if you control generation). The key is that the same logical event should not create duplicates if the producer retries.

2) Aggregate identity (the “thing” the event belongs to)

Events typically relate to an aggregate (for example, an order, an account, a subscription). You store an aggregate_id so you can fetch the event stream for that aggregate efficiently.

3) Event type

Store an explicit event type (string or enum). Avoid overloading a single “type” column with ambiguous meanings. The type should map to a well-defined schema and business meaning.

4) Event payload

Payload can be normalized columns, JSONB, or a hybrid. A practical approach is: keep frequently queried fields as columns (for indexing and constraints), and keep the full event body in jsonb for flexibility and forward compatibility.

5) Occurred time and recorded time

It is common to store both:

  • occurred_at: when the business event happened (from the producer’s perspective).
  • recorded_at: when the database accepted it.

This distinction matters when events arrive late or out of order.

6) Causation and correlation

To debug and trace workflows, store optional metadata like correlation_id (ties multiple events to one business process) and causation_id (the event that triggered this event). These are invaluable when reconstructing why something happened.

A practical schema pattern in PostgreSQL

The following schema illustrates a general-purpose event table for an “order” domain. It is intentionally explicit about identity, ordering, and metadata.

CREATE TABLE order_events (  event_id        uuid PRIMARY KEY,  order_id        uuid NOT NULL,  event_type      text NOT NULL,  occurred_at     timestamptz NOT NULL,  recorded_at     timestamptz NOT NULL DEFAULT now(),  correlation_id  uuid NULL,  causation_id    uuid NULL,  actor_type      text NULL,  actor_id        text NULL,  payload         jsonb NOT NULL,  -- Optional: a lightweight version number for payload schema evolution  schema_version  int NOT NULL DEFAULT 1);CREATE INDEX ON order_events (order_id, occurred_at);CREATE INDEX ON order_events (order_id, recorded_at);CREATE INDEX ON order_events (event_type, recorded_at);CREATE INDEX order_events_payload_gin ON order_events USING gin (payload);

Design notes:

  • Primary key: event_id supports idempotent inserts (the producer can retry with the same UUID).
  • Ordering: You can order by occurred_at for business time, or recorded_at for ingestion time. Index both if you need both query styles.
  • Payload indexing: A GIN index on payload can help ad-hoc queries, but don’t rely on it for core access patterns if you can model key fields as columns.

Step-by-step: modeling a workflow as events

This step-by-step process helps you convert a state-based design into an event-driven one without guessing.

Step 1: List business facts, not screens or API endpoints

Write down facts that are meaningful in the domain, phrased in past tense. For an order workflow:

  • OrderPlaced
  • ItemAddedToOrder
  • ItemRemovedFromOrder
  • PaymentAuthorized
  • PaymentCaptured
  • OrderShipped
  • OrderCancelled

A good test: if the fact happened yesterday, would it still be true today that it happened? If yes, it is a good candidate for an event.

Step 2: Define the aggregate boundary and event stream key

Choose the identifier that groups events into a coherent stream. For orders, that is order_id. Ensure every event includes it. If you have events that span multiple aggregates (for example, inventory reservation affecting multiple orders), decide whether to duplicate references (store both order_id and reservation_id) or create separate streams per aggregate with correlation IDs.

Step 3: Specify minimal required fields per event type

For each event type, decide what must be present to make the event meaningful and replayable. Example:

  • OrderPlaced: customer_id, initial items, currency
  • PaymentAuthorized: amount, payment_provider, authorization_id
  • OrderShipped: carrier, tracking_number, shipped_items

In PostgreSQL, you can enforce some of this with CHECK constraints on JSONB, or by extracting key fields into columns. A hybrid approach is common: store amount and currency as columns for indexing and constraints, and keep the rest in payload.

Step 4: Decide how to handle ordering and concurrency

Event streams need a consistent notion of order. If multiple writers can append to the same stream, you need a strategy to prevent conflicting sequences.

A practical pattern is to maintain an expected_version (optimistic concurrency) per aggregate. You store a monotonically increasing stream_version on each event and enforce uniqueness on (order_id, stream_version).

ALTER TABLE order_events ADD COLUMN stream_version bigint NOT NULL;CREATE UNIQUE INDEX order_events_stream_version_uniq  ON order_events(order_id, stream_version);

Then, when appending events, you compute the next version based on the current max and ensure it matches the expected version. You can do this in a transaction to avoid races.

-- Append one event with optimistic concurrency (simplified)BEGIN;-- Lock the stream by locking the latest row (or a separate stream table)SELECT coalesce(max(stream_version), 0) AS current_version  FROM order_events  WHERE order_id = $1  FOR UPDATE;-- Application checks current_version == expected_version-- Insert next eventINSERT INTO order_events(event_id, order_id, stream_version, event_type, occurred_at, payload)VALUES ($2, $1, $3, $4, $5, $6::jsonb);COMMIT;

If you want a cleaner lock target, create a separate order_streams table with one row per order and lock that row when appending.

Step 5: Build derived state (read model) for common queries

Many queries want “current order status” or “orders awaiting shipment.” Computing this on the fly from events can be expensive. Event-driven thinking encourages you to create derived tables that are updated from events.

A simple derived table might look like:

CREATE TABLE orders_current (  order_id        uuid PRIMARY KEY,  customer_id     uuid NOT NULL,  status          text NOT NULL,  total_amount    numeric(12,2) NOT NULL DEFAULT 0,  currency        text NOT NULL,  last_event_at   timestamptz NOT NULL);CREATE INDEX ON orders_current (status, last_event_at);

You can update this table in the same transaction that inserts events (synchronous projection) or asynchronously (background worker/consumer). Synchronous updates simplify consistency but can increase write latency; asynchronous updates improve decoupling but require handling eventual consistency.

Ensuring correctness with database constraints

Event-driven design still benefits from relational constraints; you just apply them differently.

Idempotency: prevent duplicates

If producers may retry, use a stable event_id and let the primary key reject duplicates. If you cannot guarantee stable IDs, use a natural idempotency key (for example, payment_provider + authorization_id) and enforce uniqueness with a partial unique index for that event type.

-- Example: prevent duplicate PaymentAuthorized events per authorization_idCREATE UNIQUE INDEX order_events_payment_auth_uniqON order_events ((payload->>'authorization_id'))WHERE event_type = 'PaymentAuthorized';

This pattern is powerful but should be used carefully: expression indexes on JSONB require consistent payload structure.

Domain invariants: enforce “can’t happen” rules

Some invariants are best enforced at write time. Examples:

  • An order cannot be shipped before it is placed.
  • An order cannot be cancelled after it is shipped.

In an event store, these are typically enforced by checking the current derived state (or last relevant event) before inserting a new event. In PostgreSQL, you can implement this in application logic inside a transaction, or with a trigger that rejects invalid sequences. Triggers can centralize rules but may become complex; keep them focused on invariants that must never be bypassed.

Referential integrity: keep links valid

Events often reference other entities: customer_id, product_id, warehouse_id. You can store these references as columns and use foreign keys, even if the rest of the payload is JSONB. This gives you strong guarantees without forcing full normalization of event bodies.

ALTER TABLE order_events ADD COLUMN customer_id uuid;ALTER TABLE order_events  ADD CONSTRAINT order_events_customer_fk  FOREIGN KEY (customer_id) REFERENCES customers(customer_id);

Whether every event should carry customer_id depends on your access patterns. If you frequently query events by customer, denormalizing it into the event table is often worth it.

Query patterns you design for

Fetch an aggregate’s event stream

This is the most important query. It should be fast and predictable.

SELECT event_id, stream_version, event_type, occurred_at, payloadFROM order_eventsWHERE order_id = $1ORDER BY stream_version ASC;

Indexing on (order_id, stream_version) supports this efficiently.

Find the latest event of a type

SELECT *FROM order_eventsWHERE order_id = $1 AND event_type = 'OrderShipped'ORDER BY occurred_at DESC, recorded_at DESC LIMIT 1;

If this is frequent, consider a composite index like (order_id, event_type, occurred_at DESC).

Build state on the fly (when acceptable)

For small streams, you can derive state in a query. Example: compute current status as the most recent status-changing event.

SELECT event_type AS current_status, occurred_atFROM order_eventsWHERE order_id = $1  AND event_type IN ('OrderPlaced','PaymentCaptured','OrderShipped','OrderCancelled')ORDER BY occurred_at DESC, recorded_at DESC LIMIT 1;

This is simple but can become expensive at scale; it also requires careful definition of which events affect status.

Designing event types and payloads for evolution

Prefer additive changes

Event payloads will evolve. Design so consumers can tolerate new fields. In PostgreSQL, JSONB makes additive evolution easy, but you should still version your event schema (for example, schema_version) and keep event types stable.

Don’t rewrite old events

Immutability is central. If you need to correct something, append a compensating event (for example, PaymentCaptureReversed) rather than updating the original. This keeps the timeline truthful and avoids breaking downstream consumers that already processed the event.

Use explicit event names rather than overloaded “status changed”

A generic StatusChanged event can be tempting, but it often loses meaning and makes projections harder. Prefer explicit events that capture intent and context. If you do use a generic event, ensure the payload contains enough detail to interpret it unambiguously.

Bridging event-driven storage with transactional requirements

Many systems still need strong transactional guarantees for certain operations (for example, reserving inventory and recording an order placement). Event-driven thinking does not remove the need for transactions; it changes what you commit.

A practical approach is: in one database transaction, append the event(s) and update any necessary derived tables or constraints. The transaction boundary becomes “the event is recorded and the system’s authoritative log reflects it.” From there, other processes can react.

Example: place an order with a derived current-state update

BEGIN;-- 1) Insert OrderPlaced eventINSERT INTO order_events(event_id, order_id, stream_version, event_type, occurred_at, customer_id, payload)VALUES ($event_id, $order_id, 1, 'OrderPlaced', $occurred_at, $customer_id, $payload::jsonb);-- 2) Initialize current stateINSERT INTO orders_current(order_id, customer_id, status, total_amount, currency, last_event_at)VALUES ($order_id, $customer_id, 'PLACED', $total_amount, $currency, $occurred_at);COMMIT;

This pattern keeps the event log and the read model consistent at commit time. If you later decide to project asynchronously, you can keep the event insert as the only synchronous step and let a projector update orders_current.

Common design mistakes and how to avoid them

Storing events but still thinking in overwrites

A frequent mistake is to store events but treat them as “change records” for debugging only, while the real logic still depends on mutable state tables. If the event log is not trusted, it will drift from reality. Decide which source is authoritative for which questions. If events are the source of truth for business history, ensure all meaningful changes are represented as events.

Using one giant event table without access-pattern indexes

An event table can grow quickly. If you do not index for the primary queries (fetch stream by aggregate, filter by type, range by time), performance will degrade. Start with the stream access index and add others based on measured needs.

Overusing JSONB for everything

JSONB is excellent for flexible payloads, but core fields that you filter, join, or constrain should often be columns. A good heuristic: if a field appears in WHERE clauses or JOIN conditions frequently, promote it to a column.

Ignoring late or out-of-order events

In distributed systems, events can arrive late. If you only store recorded_at and treat it as business time, you may compute incorrect state. Store occurred_at and be explicit about which timestamp your projections use.

Practical checklist for event-driven database design

  • Define event types as past-tense business facts with clear meaning.
  • Choose an aggregate_id that groups events into a stream and index for stream reads.
  • Use stable event identifiers for idempotency; add unique constraints for natural keys when needed.
  • Store occurred_at and recorded_at if late arrival is possible.
  • Decide on a concurrency strategy (stream_version + uniqueness is a common baseline).
  • Keep payload flexible but promote frequently queried fields to columns.
  • Create derived read models for common queries; update them synchronously or asynchronously based on consistency needs.
  • Enforce critical invariants at write time (transactional checks or focused triggers).
  • Plan for schema evolution with additive changes and optional schema_version metadata.

Now answer the exercise about the content:

In an event-driven database model, what is the primary purpose of keeping an append-only event log alongside a derived current-state table?

You are right! Congratulations, now go to the next page

You missed! Try again.

An append-only event log preserves immutable history and auditability. A derived current-state table can then be computed and maintained for efficient reads, while the event log remains the authoritative record of change over time.

Next chapter

Modeling Append-Only Event Tables in PostgreSQL

Arrow Right Icon
Free Ebook cover Event-Driven Data Modeling with PostgreSQL: Designing for Streams, Time, and Change
7%

Event-Driven Data Modeling with PostgreSQL: Designing for Streams, Time, and Change

New course

15 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.