All courses > Technology and Programming > Databases ::

Designing Event Payloads with JSONB and Typed Columns

Capítulo 5

Estimated reading time: 15 minutes

Why payload design matters: flexibility vs. guarantees

In an event-driven database, the event payload is the part that carries business facts: what happened, to whom, and with what attributes. PostgreSQL gives you two strong but different tools for payloads: JSONB (flexible, schema-on-read) and typed columns (strict, schema-on-write). Designing payloads well is not about picking one and ignoring the other; it is about choosing where you need flexibility, where you need guarantees, and how you will evolve the payload over time without breaking consumers.

JSONB is excellent for capturing event-specific attributes that vary by event type, evolve frequently, or are sparse. Typed columns are excellent for attributes that are queried often, must be validated, indexed efficiently, or participate in constraints and joins. A robust design commonly uses a hybrid: a small set of typed “envelope” columns for stable, high-value fields, plus a JSONB “payload” for the rest.

Core patterns for event payloads in PostgreSQL

Pattern A: All-JSONB payloads

In this pattern, the event table has a JSONB column (often named data or payload) that contains most event attributes. You still typically keep some typed columns for routing and filtering (like event_type), but the business fields live in JSONB.

Pros: fast iteration, easy to add fields, can store nested structures naturally.
Cons: weaker guarantees unless you add explicit checks, more complex indexing, risk of inconsistent shapes across producers.

Pattern B: Fully typed event tables per event type

Here, each event type (or small family) gets its own table with typed columns. This gives maximum validation and query performance but can lead to many tables and more migration work as payloads evolve.

Pros: strong constraints, simple SQL, predictable performance.
Cons: schema churn, more DDL, harder to support “unknown future fields”.

Pattern C: Hybrid envelope + JSONB payload (recommended default)

This pattern uses typed columns for stable, frequently queried fields and JSONB for the variable part. It is often the best balance for event-driven modeling in PostgreSQL.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Pros: strong guarantees where needed, flexible evolution elsewhere, good indexing options.
Cons: requires discipline: decide what belongs in typed columns vs JSONB, and enforce JSON shape for critical fields.

Designing the envelope: what should be typed?

Even when you store most attributes in JSONB, you should identify a minimal set of typed columns that make the system operable and queryable. The exact set depends on your domain, but the following categories are common:

Event classification: event_type (text), event_version (int), sometimes schema_name (text) if you have multiple domains.
Entity targeting: entity_type (text) and entity_id (uuid/text) when you frequently filter by “all events for X”.
Correlation: correlation_id, causation_id (uuid/text) to connect events to a request or parent event.
Producer metadata: producer (text), trace_id (text/uuid) if you integrate with tracing.
Operational flags: is_redacted (boolean), contains_pii (boolean) if you need compliance controls.

Typed columns are also useful for fields that you will index heavily. Indexing JSONB is possible, but you should not force every common query to extract from JSONB if the field is stable and central to your access patterns.

Designing the JSONB payload: shape, naming, and evolution

Prefer stable top-level keys and consistent types

JSONB makes it easy to add fields, but it also makes it easy to accidentally create multiple shapes for the “same” event type. Establish conventions:

Use consistent key naming: choose snake_case or camelCase and stick to it.
Keep stable keys at the top level: for example, amount, currency, reason.
Use nested objects for grouped data: customer, address, payment_method.
Keep types stable: do not switch a field from string to number later; add a new field if needed.

Include explicit versioning for payload changes

When payloads evolve, consumers need a way to interpret them. A practical approach is to store an integer event_version as a typed column and treat it as the version of the payload schema for that event_type. When you introduce a breaking change (rename a field, change meaning), increment event_version and support both versions in consumers until migration is complete.

Represent optionality explicitly

In JSON, missing keys and null are different. Decide a rule:

Missing key means “not applicable / not provided”.
null means “known to be empty” (use sparingly).

This matters for queries and constraints. If you want to enforce that a field exists, check for key existence rather than non-null value.

Step-by-step: implementing a hybrid event table

Step 1: Create the table with typed envelope + JSONB payload

The following example shows a single event table that can store multiple event types, with typed columns for common access patterns and a JSONB payload for event-specific attributes.

CREATE TABLE app_event_log (  event_id        uuid PRIMARY KEY,  event_type      text NOT NULL,  event_version   integer NOT NULL DEFAULT 1,  entity_type     text NOT NULL,  entity_id       uuid NOT NULL,  correlation_id  uuid,  causation_id    uuid,  producer        text NOT NULL,  occurred_at     timestamptz NOT NULL,  payload         jsonb NOT NULL,  headers         jsonb NOT NULL DEFAULT '{}'::jsonb );

payload holds business data. headers can hold operational metadata that may vary (for example, request IP, user agent, feature flags). Keeping these separate helps you avoid mixing business facts with transport/operational details.

Step 2: Add basic JSONB sanity constraints

JSONB columns accept any JSON value. For event payloads, you usually want an object at the top level.

ALTER TABLE app_event_log  ADD CONSTRAINT payload_is_object  CHECK (jsonb_typeof(payload) = 'object');

If you also want headers to be an object:

ALTER TABLE app_event_log  ADD CONSTRAINT headers_is_object  CHECK (jsonb_typeof(headers) = 'object');

Step 3: Enforce required keys per event type (targeted checks)

For critical event types, enforce that required keys exist. PostgreSQL’s JSONB operators make this straightforward. Example: for invoice_issued you require invoice_id, amount, and currency.

ALTER TABLE app_event_log  ADD CONSTRAINT invoice_issued_required_keys  CHECK (    event_type <> 'invoice_issued'    OR (payload ? 'invoice_id' AND payload ? 'amount' AND payload ? 'currency')  );

You can also enforce types. Example: amount should be numeric (in JSON it will be a number), and currency should be a string.

ALTER TABLE app_event_log  ADD CONSTRAINT invoice_issued_key_types  CHECK (    event_type <> 'invoice_issued'    OR (      jsonb_typeof(payload->'amount') = 'number'      AND jsonb_typeof(payload->'currency') = 'string'    )  );

This approach scales when you apply it only to event types where correctness is essential and the schema is stable enough to justify constraints.

Step 4: Promote frequently queried fields into typed columns (generated columns)

A common pain point is repeatedly extracting the same JSONB keys in queries and indexes. PostgreSQL supports generated columns, which can “materialize” extracted values as typed columns. This gives you typed access and indexing while still storing the source of truth in JSONB.

Example: promote payload.invoice_id and payload.amount for invoice_issued events.

ALTER TABLE app_event_log  ADD COLUMN invoice_id uuid  GENERATED ALWAYS AS (    CASE WHEN event_type = 'invoice_issued'         THEN (payload->>'invoice_id')::uuid         ELSE NULL END  ) STORED,  ADD COLUMN invoice_amount numeric  GENERATED ALWAYS AS (    CASE WHEN event_type = 'invoice_issued'         THEN (payload->>'amount')::numeric         ELSE NULL END  ) STORED;

Now you can index and query these columns efficiently without repeating JSON extraction logic everywhere.

Step 5: Add indexes for both typed and JSONB access

Indexes should reflect your read patterns. With the hybrid approach, you typically combine:

B-tree indexes on typed columns for equality/range filters.
GIN indexes on JSONB for containment queries when you need flexible filtering.

Examples:

CREATE INDEX app_event_log_entity_idx  ON app_event_log (entity_type, entity_id, occurred_at); CREATE INDEX app_event_log_type_time_idx  ON app_event_log (event_type, occurred_at);

For JSONB containment queries (e.g., find events where payload contains a key/value pair):

CREATE INDEX app_event_log_payload_gin  ON app_event_log USING gin (payload jsonb_path_ops);

jsonb_path_ops is efficient for containment (@>) but supports fewer operator classes than the default. If you need broader JSONB querying, use the default GIN operator class:

CREATE INDEX app_event_log_payload_gin_default  ON app_event_log USING gin (payload);

Choose one based on your actual query operators; avoid creating both unless you have measured the need.

Querying JSONB payloads safely and efficiently

Extracting scalar values

Use ->> to extract text and cast to the desired type. Example: filter invoice events by currency.

SELECT event_id, occurred_at, payload->>'currency' AS currency FROM app_event_log WHERE event_type = 'invoice_issued'   AND payload->>'currency' = 'USD';

If you cast, guard against missing keys or wrong types by combining with key checks or constraints. Example:

SELECT event_id, (payload->>'amount')::numeric AS amount FROM app_event_log WHERE event_type = 'invoice_issued'   AND payload ? 'amount';

Containment queries with @>

Containment is powerful for matching partial JSON structures. Example: find events where status is cancelled.

SELECT event_id, occurred_at FROM app_event_log WHERE payload @> '{"status":"cancelled"}'::jsonb;

This is where a GIN index on payload can help significantly.

Querying nested structures

Example: payload has customer object with country.

SELECT event_id FROM app_event_log WHERE event_type = 'invoice_issued'   AND payload->'customer'->>'country' = 'DE';

If this becomes a common filter, consider promoting customer.country to a generated column and indexing it.

Typed columns: when strict schema is the better choice

Typed columns are not only about performance; they are about correctness and maintainability. Prefer typed columns when:

You need constraints: foreign keys, check constraints, uniqueness, non-null requirements.
You need consistent numeric semantics: money amounts, quantities, rates, where casting from JSON text repeatedly is risky.
You need efficient joins: joining on a JSON-extracted value is possible but often slower and harder to index well.
You need stable reporting: BI queries benefit from typed columns and predictable schemas.

A practical compromise is to keep the canonical value in typed columns and store a JSONB payload for additional context. For example, store amount and currency as typed columns, and keep payload for optional details like discount breakdown or UI-specific metadata.

Step-by-step: designing payloads for a small set of event types

Consider three event types in a billing domain: invoice_issued, payment_received, and invoice_cancelled. The goal is to design payloads that are consistent, evolvable, and queryable.

Step 1: Define minimal required fields per event type

invoice_issued: invoice_id, customer_id, amount, currency, due_date
payment_received: payment_id, invoice_id, amount, currency, method
invoice_cancelled: invoice_id, reason

Keep identifiers consistently typed in JSON (strings that can be cast to UUID) and keep amounts as JSON numbers.

Step 2: Create constraints for required keys and types

ALTER TABLE app_event_log  ADD CONSTRAINT payment_received_required  CHECK (    event_type <> 'payment_received' OR (      payload ? 'payment_id' AND payload ? 'invoice_id' AND payload ? 'amount' AND payload ? 'currency' AND payload ? 'method'    )  ); ALTER TABLE app_event_log  ADD CONSTRAINT payment_received_types  CHECK (    event_type <> 'payment_received' OR (      jsonb_typeof(payload->'amount') = 'number'      AND jsonb_typeof(payload->'currency') = 'string'      AND jsonb_typeof(payload->'method') = 'string'    )  );

This does not require you to fully formalize every field for every event type, but it prevents the most damaging inconsistencies.

Step 3: Promote cross-event join keys

If you frequently join events by invoice_id across multiple event types, promote it. You can do this with a generated column that extracts invoice_id for the relevant event types.

ALTER TABLE app_event_log  ADD COLUMN invoice_id_promoted uuid  GENERATED ALWAYS AS (    CASE      WHEN payload ? 'invoice_id' THEN (payload->>'invoice_id')::uuid      ELSE NULL    END  ) STORED; CREATE INDEX app_event_log_invoice_id_promoted_idx  ON app_event_log (invoice_id_promoted, occurred_at);

Now you can efficiently query “all events for invoice X” without caring which event type produced them.

Schema evolution strategies with JSONB and typed columns

Additive changes: add new keys

Adding a new key to JSONB is typically safe for consumers that ignore unknown fields. When you add a key that becomes important for filtering or reporting, consider promoting it to a generated column later, after it stabilizes.

Renames and semantic changes: introduce new keys and deprecate old ones

Renaming a key in-place forces consumers to update in lockstep. A safer approach is:

Add the new key (e.g., total_amount).
Keep the old key (e.g., amount) for a transition period.
Optionally increment event_version for new events.
Update consumers to prefer the new key, fallback to the old key.

In PostgreSQL queries, you can coalesce:

SELECT COALESCE(payload->>'total_amount', payload->>'amount') AS amount_text FROM app_event_log WHERE event_type = 'invoice_issued';

When to migrate JSONB fields into typed columns

Promoting fields is a common evolution step. A practical workflow:

Observe: identify fields that appear in many queries or dashboards.
Stabilize: ensure the field’s meaning and type are stable across producers.
Promote: add a generated column (or a regular column populated by a backfill) and index it.
Enforce: add constraints ensuring the JSONB field exists and has the correct type for relevant event types.

If you need to support older events that lack the field, keep the promoted column nullable and handle nulls in queries.

Validation options beyond simple CHECK constraints

JSON Schema validation (application-side or database-side)

PostgreSQL does not include built-in JSON Schema validation, but you can still enforce structure in two ways:

Application-side validation: validate payloads before insert using a JSON Schema library, then store validated JSONB.
Database-side validation: use CHECK constraints for critical keys/types and optionally PL/pgSQL triggers for more complex rules.

Database-side validation is valuable when multiple producers write to the same database and you need a single enforcement point.

Trigger-based validation for complex rules

Some rules are hard to express as a single CHECK constraint, such as conditional requirements based on multiple fields. A trigger can validate and raise an error. Keep trigger logic small and deterministic to avoid operational surprises.

CREATE OR REPLACE FUNCTION validate_invoice_issued_payload() RETURNS trigger AS $$ BEGIN   IF NEW.event_type = 'invoice_issued' THEN     IF NOT (NEW.payload ? 'due_date') THEN       RAISE EXCEPTION 'invoice_issued requires due_date';     END IF;     IF jsonb_typeof(NEW.payload->'due_date') <> 'string' THEN       RAISE EXCEPTION 'due_date must be a string (ISO-8601)';     END IF;   END IF;   RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_validate_invoice_issued BEFORE INSERT OR UPDATE ON app_event_log FOR EACH ROW EXECUTE FUNCTION validate_invoice_issued_payload();

This example enforces presence and type. If you want to validate date format, you can attempt a cast to timestamptz and catch exceptions, but be careful: heavy parsing in triggers can become a bottleneck at high write rates.

Handling arrays and repeated structures in payloads

Events sometimes include arrays, such as line items on an invoice. JSONB is a natural fit, but querying arrays can be more expensive. Example payload:

{  "invoice_id": "3d2f...",  "currency": "USD",  "items": [    {"sku": "A1", "qty": 2, "unit_price": 10.00},    {"sku": "B2", "qty": 1, "unit_price": 25.00}  ]}

If you only need the array for downstream processing, keep it in JSONB. If you need frequent analytics (e.g., top SKUs), consider extracting items into a separate relational table keyed by event_id (or by invoice_id), while still storing the original JSONB payload for completeness.

For occasional queries, you can use jsonb_array_elements:

SELECT e.event_id, item->>'sku' AS sku, (item->>'qty')::int AS qty FROM app_event_log e CROSS JOIN LATERAL jsonb_array_elements(e.payload->'items') AS item WHERE e.event_type = 'invoice_issued';

Use this sparingly for large datasets; it can expand rows significantly.

Security and privacy considerations in payload design

JSONB payloads can easily accumulate sensitive data because they are flexible. Establish rules about what is allowed in payloads and what must be stored elsewhere (or not stored at all). Practical measures:

Separate PII into dedicated fields or tables with stricter access controls, and store only references in the event payload.
Use typed flags like contains_pii to support auditing and selective access.
Consider redaction workflows where payload can be replaced with a redacted form and is_redacted set to true, while keeping the event record.

From a querying perspective, keeping sensitive fields out of JSONB reduces the chance of accidental exposure through ad-hoc JSON extraction.

Practical checklist: choosing JSONB vs typed columns per field

Query frequency: if you filter/join/group by it often, prefer typed (or generated) columns.
Validation need: if wrong values would cause incorrect downstream effects, prefer typed + constraints.
Volatility: if the field changes often or is experimental, keep it in JSONB first.
Sparsity: if only a small fraction of events have it, JSONB avoids many nullable columns.
Structure: nested/variable structures fit JSONB; flat stable attributes fit typed columns.
Indexing: if you need fast lookups, typed columns are simplest; JSONB can work with GIN but requires careful operator choice.

Now answer the exercise about the content:

In a hybrid event table design, why would you promote a frequently queried JSONB field into a typed generated column?

You are right! Congratulations, now go to the next page

You missed! Try again.

Generated columns can materialize values extracted from JSONB into typed fields. This makes common filters and indexes faster and simpler while keeping the JSONB payload as the source of truth for variable attributes.