All courses > Technology and Programming > Databases ::

Querying Historical State and Point-in-Time Views

Capítulo 8

Estimated reading time: 14 minutes

Why historical state matters

Event-driven systems rarely need only the “current” truth. You often need to answer questions like: “What did we believe at 10:03 yesterday?”, “Which customers were in segment A on the day we sent the campaign?”, or “What was the account balance immediately before the disputed transaction?” These are point-in-time questions. They require reconstructing state as-of a timestamp (or as-of a logical position in a stream) and doing it in a way that is explainable, repeatable, and performant.

In PostgreSQL, there are two broad ways to query historical state:

Reconstruct on demand from events (compute the state at query time).
Materialize state over time (store snapshots or validity intervals so point-in-time queries become direct lookups).

This chapter focuses on querying patterns and data structures for historical state and point-in-time views, assuming you already have append-only events and appropriate temporal indexes/partitioning in place.

Define “as-of”: timestamp vs. stream position

A point-in-time view must be defined against a time axis. In practice you will choose one of these:

Wall-clock time (e.g., occurred_at): “state as it was in the real world at time T”.
Ingestion/processing time (e.g., ingested_at): “state as the system knew it at time T”.
Stream position (e.g., global_seq or per-aggregate version): “state after applying events up to position P”.

Be explicit because late-arriving events can change “real-world” history. If you ask “as-of occurred_at”, you are asking for a view that may change when late events arrive (unless you freeze history). If you ask “as-of ingested_at”, you get a stable audit of what the system knew then. If you ask “as-of stream position”, you get deterministic replay semantics.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Practical rule

Use ingested_at or a stream position for reproducible analytics and audits; use occurred_at for business questions about the real world, but be prepared to handle late data and corrections.

On-demand reconstruction: compute state as-of T

On-demand reconstruction means you compute the state for an entity by applying all events up to the as-of boundary. This is the most flexible approach and requires no extra storage, but can be expensive if you do it for many entities or long histories.

Example event table assumptions

Assume an event table like:

-- Simplified shape; columns may differ in your system design.  CREATE TABLE account_events (   event_id uuid PRIMARY KEY,   account_id uuid NOT NULL,   occurred_at timestamptz NOT NULL,   ingested_at timestamptz NOT NULL,   event_type text NOT NULL,   amount_cents bigint,   payload jsonb );

We want to answer: “What was the account balance as-of a timestamp?” If events are deposits/withdrawals, the balance is a sum of signed amounts.

Single-entity as-of query

This query reconstructs balance for one account as-of a time boundary:

SELECT  e.account_id,  COALESCE(SUM(    CASE e.event_type      WHEN 'deposit' THEN e.amount_cents      WHEN 'withdrawal' THEN -e.amount_cents      ELSE 0    END  ), 0) AS balance_cents FROM account_events e WHERE e.account_id = $1   AND e.occurred_at <= $2;

Step-by-step:

Filter to the entity (account_id).
Filter to events up to the as-of boundary (occurred_at <= $2).
Aggregate by applying a deterministic reducer (here, signed sum).

This pattern generalizes to any reducer: last-known value, set membership, counters, etc.

Last-known value as-of T (SCD type 1 style)

For attributes that change over time (e.g., account status), you often want the last event that sets the attribute at or before T.

SELECT DISTINCT ON (e.account_id)  e.account_id,  (e.payload->>'status') AS status,  e.occurred_at AS status_as_of FROM account_events e WHERE e.account_id = $1   AND e.event_type = 'status_changed'   AND e.occurred_at <= $2 ORDER BY e.account_id, e.occurred_at DESC;

Step-by-step:

Filter to the attribute-setting event type.
Restrict to events at or before the boundary.
Pick the latest event using DISTINCT ON with descending order.

DISTINCT ON is PostgreSQL-specific and very effective for “latest row per group” queries.

Point-in-time view for many entities

Point-in-time queries become harder when you need the state for many entities at once (e.g., “all accounts as-of midnight”). A naive approach would compute each account independently. Instead, use grouping and window functions.

Example: last-known status for all accounts as-of T:

WITH candidates AS (  SELECT   e.account_id,   e.payload->>'status' AS status,   e.occurred_at,   ROW_NUMBER() OVER (PARTITION BY e.account_id ORDER BY e.occurred_at DESC) AS rn  FROM account_events e  WHERE e.event_type = 'status_changed'    AND e.occurred_at <= $1 ) SELECT account_id, status, occurred_at AS status_as_of FROM candidates WHERE rn = 1;

Step-by-step:

Filter to relevant events and the as-of boundary.
Rank events per entity by descending time.
Pick rank 1 per entity.

This is flexible but can still be heavy if the candidate set is large. That’s where materialized historical state helps.

Materializing historical state: snapshots and validity intervals

Materialization means you store derived state so that point-in-time queries are direct lookups rather than replays. There are two common shapes:

Snapshots: store full state at specific times/positions (e.g., daily, hourly, or every N events).
Validity intervals: store rows with valid_from and valid_to so you can query “which row was valid at time T”.

Snapshots are simpler to compute incrementally and are great for “as-of” queries with a coarse resolution plus a small replay gap. Validity intervals provide exact point-in-time answers without replay, at the cost of maintaining interval boundaries.

Snapshot table pattern

For an account balance snapshot:

CREATE TABLE account_balance_snapshots (  account_id uuid NOT NULL,  as_of timestamptz NOT NULL,  balance_cents bigint NOT NULL,  PRIMARY KEY (account_id, as_of) );

To answer “balance as-of T”, you pick the latest snapshot at or before T and then replay events after the snapshot up to T.

WITH snap AS (  SELECT DISTINCT ON (s.account_id)   s.account_id, s.as_of, s.balance_cents  FROM account_balance_snapshots s  WHERE s.account_id = $1    AND s.as_of <= $2  ORDER BY s.account_id, s.as_of DESC ), delta AS (  SELECT   e.account_id,   COALESCE(SUM(CASE e.event_type     WHEN 'deposit' THEN e.amount_cents     WHEN 'withdrawal' THEN -e.amount_cents     ELSE 0   END), 0) AS delta_cents  FROM account_events e  JOIN snap ON snap.account_id = e.account_id  WHERE e.occurred_at > snap.as_of    AND e.occurred_at <= $2  GROUP BY e.account_id ) SELECT  snap.account_id,  snap.balance_cents + delta.delta_cents AS balance_cents,  $2::timestamptz AS as_of FROM snap JOIN delta USING (account_id);

Step-by-step:

Find the latest snapshot at or before the boundary.
Sum the event deltas after the snapshot time up to the boundary.
Add delta to snapshot value.

If there is no snapshot yet, you can fall back to replay from the beginning, or store an initial snapshot at account creation.

Validity interval (bitemporal-lite) pattern

For attributes like status, you can store a row per status period:

CREATE TABLE account_status_history (  account_id uuid NOT NULL,  status text NOT NULL,  valid_from timestamptz NOT NULL,  valid_to timestamptz,  PRIMARY KEY (account_id, valid_from) );

To query status as-of T:

SELECT status FROM account_status_history WHERE account_id = $1   AND valid_from <= $2   AND (valid_to IS NULL OR valid_to > $2) ORDER BY valid_from DESC LIMIT 1;

Step-by-step:

Find rows whose validity interval covers the boundary.
Pick the most recent interval start if there are overlaps (ideally there are none).

Maintaining valid_to requires updating the previous row when a new status starts. That update is acceptable because the history table is derived state, not the immutable event log.

Building point-in-time views with SQL views and functions

Once you have a reliable as-of query pattern, wrap it so application code doesn’t reimplement it inconsistently. PostgreSQL gives you several options:

SQL functions that accept (entity_id, as_of) and return a row.
Views for “current state” (as-of now), often backed by derived tables.
Parameterized views don’t exist directly, but functions can act as parameterized views.

Function returning point-in-time state

Example: return account status and balance as-of T by combining snapshot+delta and status history.

CREATE OR REPLACE FUNCTION account_state_as_of(p_account_id uuid, p_as_of timestamptz) RETURNS TABLE (  account_id uuid,  as_of timestamptz,  balance_cents bigint,  status text ) LANGUAGE sql STABLE AS $$  WITH bal AS (    WITH snap AS (      SELECT DISTINCT ON (s.account_id)       s.account_id, s.as_of, s.balance_cents      FROM account_balance_snapshots s      WHERE s.account_id = p_account_id        AND s.as_of <= p_as_of      ORDER BY s.account_id, s.as_of DESC    ), delta AS (      SELECT       e.account_id,       COALESCE(SUM(CASE e.event_type         WHEN 'deposit' THEN e.amount_cents         WHEN 'withdrawal' THEN -e.amount_cents         ELSE 0       END), 0) AS delta_cents      FROM account_events e      JOIN snap ON snap.account_id = e.account_id      WHERE e.occurred_at > snap.as_of        AND e.occurred_at <= p_as_of      GROUP BY e.account_id    )    SELECT snap.account_id, (snap.balance_cents + delta.delta_cents) AS balance_cents    FROM snap JOIN delta USING (account_id)  ), st AS (    SELECT h.status    FROM account_status_history h    WHERE h.account_id = p_account_id      AND h.valid_from <= p_as_of      AND (h.valid_to IS NULL OR h.valid_to > p_as_of)    ORDER BY h.valid_from DESC    LIMIT 1  )  SELECT p_account_id, p_as_of, bal.balance_cents, st.status  FROM bal CROSS JOIN st; $$;

Notes:

STABLE indicates the function returns consistent results within a statement (appropriate for as-of queries).
Returning a table makes it easy to join this function into larger queries.

Handling corrections and late events in point-in-time queries

Historical queries become tricky when the event stream includes corrections (e.g., a reversal) or late-arriving events (occurred earlier than already-processed events). Your query strategy must match your business rules.

Option A: “As the system knew then” (ingestion-time)

If you define point-in-time by ingested_at, your as-of queries are stable: once an event is ingested, it affects all future as-of queries, but not past ingestion-time queries.

Example: balance as-of ingestion time:

SELECT COALESCE(SUM(CASE e.event_type  WHEN 'deposit' THEN e.amount_cents  WHEN 'withdrawal' THEN -e.amount_cents  ELSE 0 END), 0) AS balance_cents FROM account_events e WHERE e.account_id = $1   AND e.ingested_at <= $2;

Option B: “As it happened” (occurred-time) with corrections

If you define point-in-time by occurred_at, late events can change the answer for past times. That may be correct (you are refining history) or unacceptable (you need frozen reports). If you need frozen reports, you can:

Use ingestion-time for the report definition.
Persist report outputs with a run timestamp and treat them as immutable artifacts.
Introduce a “finalization” rule (e.g., after 7 days, treat history as closed) and route late events into adjustment streams.

In SQL terms, the key is to choose the boundary column and stick to it consistently across all point-in-time views.

Temporal joins: point-in-time across multiple entities

Many questions require combining historical state from different tables as-of the same time. This is a temporal join: “join A as-of T to B as-of T”. If you store validity intervals, temporal joins become straightforward.

Example: join account status at time T to transactions at time T

Assume a transaction table (could be derived from events) with occurred_at. You want to label each transaction with the account status at that time.

SELECT  t.tx_id,  t.account_id,  t.occurred_at,  t.amount_cents,  h.status FROM transactions t JOIN account_status_history h   ON h.account_id = t.account_id  AND h.valid_from <= t.occurred_at  AND (h.valid_to IS NULL OR h.valid_to > t.occurred_at) WHERE t.occurred_at >= $1 AND t.occurred_at < $2;

Step-by-step:

Filter transactions to the time window.
Join to the status interval that covers each transaction time.
Project status alongside the transaction.

This pattern is the backbone of point-in-time enrichment in analytics.

Generating validity intervals from events (practical workflow)

If you only have events, you can derive validity intervals with a batch job or incremental process. The core idea is: for each entity, sort attribute-change events by time, then set valid_to of each row to the next row’s valid_from.

Batch rebuild example for status history

This example rebuilds account_status_history from scratch for a subset of accounts. It uses window functions to compute the next change time.

-- Rebuild for accounts in a given list (or a time window).  DELETE FROM account_status_history h WHERE h.account_id = ANY($1::uuid[]);  INSERT INTO account_status_history (account_id, status, valid_from, valid_to) SELECT  account_id,  status,  occurred_at AS valid_from,  LEAD(occurred_at) OVER (PARTITION BY account_id ORDER BY occurred_at) AS valid_to FROM (  SELECT   e.account_id,   e.occurred_at,   e.payload->>'status' AS status  FROM account_events e  WHERE e.account_id = ANY($1::uuid[])    AND e.event_type = 'status_changed' ) x;

Step-by-step:

Delete existing derived history for the target accounts.
Select all status change events for those accounts.
Use LEAD to compute the end time of each status period.
Insert intervals; the last interval has valid_to null.

In an incremental pipeline, you would update only affected accounts when new status events arrive, but the SQL logic for interval boundaries remains the same.

Point-in-time views for aggregates and cohorts

Historical state is not only per-entity. You often need cohort membership or aggregate metrics as-of T: “How many active accounts existed at time T?” or “Which customers were in tier Gold at time T?” These are easiest when you have validity intervals for the attribute that defines membership.

Count active accounts as-of T

SELECT COUNT(*) AS active_accounts FROM account_status_history h WHERE h.status = 'active'   AND h.valid_from <= $1   AND (h.valid_to IS NULL OR h.valid_to > $1);

List accounts in a segment as-of T

SELECT h.account_id FROM account_status_history h WHERE h.status = 'gold'   AND h.valid_from <= $1   AND (h.valid_to IS NULL OR h.valid_to > $1) ORDER BY h.account_id;

These queries are direct and predictable because the time logic is encoded in the derived history table.

Testing point-in-time correctness with targeted fixtures

Point-in-time logic is easy to get subtly wrong (off-by-one boundaries, inclusive vs exclusive end times, missing initial state). Treat it like business logic and test it with deterministic fixtures.

Boundary conventions

Pick a consistent interval convention. A common choice is:

valid_from is inclusive
valid_to is exclusive

Then “valid at time T” is valid_from <= T AND (valid_to IS NULL OR valid_to > T). This avoids double-counting at exact change instants.

Minimal test dataset approach

Create a small set of events for one entity with three changes and query multiple as-of times: before the first change, exactly at a change, between changes, and after the last change. Verify that:

Exactly-at-change returns the new value if you treat valid_from as inclusive.
Exactly-at-next-change does not return the previous value if valid_to is exclusive.
Missing initial state is handled explicitly (either null, a default, or a required creation event).

Choosing the right strategy

Use on-demand reconstruction when:

You query point-in-time state rarely or for a small number of entities.
You need maximum flexibility across many event types.
You can tolerate higher latency for deep history queries.

Use snapshots when:

You need fast as-of queries for numeric/accumulating state (balances, counters).
You can accept replaying a small tail of events after the snapshot.
You want a simple incremental maintenance process.

Use validity intervals when:

You need frequent point-in-time lookups and temporal joins.
You need exact answers without replay.
The state can be represented as “one value valid during a time range” (status, tier, owner, price plan).

Now answer the exercise about the content:

For reproducible audits, which as-of definition is recommended because it stays stable even when late events arrive?

You are right! Congratulations, now go to the next page

You missed! Try again.

ingested_at gives a stable audit of what the system knew at time T, and a stream position provides deterministic replay semantics. Using occurred_at can change past answers when late events arrive.