Why time and order are hard in event streams
In an event-driven system, many decisions depend on “what happened first” and “what is the latest known state.” In practice, event streams rarely deliver a perfect, single, global ordering. Producers run on different machines with different clocks, networks reorder packets, retries create duplicates, and partitions/shards create parallel timelines. PostgreSQL can store events reliably, but you must be explicit about what “time” and “order” mean for your domain and what guarantees you can (and cannot) enforce.
This chapter focuses on three distinct notions that are often conflated:
- Event time: when the real-world event occurred (as asserted by the producer or device).
- Ingestion time: when your system first observed/accepted the event (database commit time, broker append time).
- Processing time: when a consumer handled the event (which may be later and may vary per consumer).
Ordering guarantees are always relative to a scope. You can often guarantee order within a single entity (an order, an account, a device), but not across all entities. The key design move is to define ordering scopes and encode them in your schema and queries.
Choosing the ordering scope: global, partition, entity
Before designing columns and constraints, decide what ordering you actually need:
- Global ordering (all events across the system): expensive and often unnecessary; becomes a bottleneck.
- Partition ordering (all events in a shard/stream partition): common in brokers; still not necessarily meaningful for business rules.
- Entity ordering (all events for a given aggregate/entity id): usually what business logic needs (e.g., “for this account, apply events in sequence”).
In PostgreSQL, you can represent these scopes by storing an entity_id (or multiple keys) and maintaining a per-entity sequence number, plus timestamps for event time and ingestion time. You then query and enforce order within that entity.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Representing time correctly in PostgreSQL
Use the right timestamp types
Prefer timestamptz for anything that represents an instant in time. Store event time as timestamptz when the producer can provide a reliable instant. Use timestamp (without time zone) only for “wall clock” local times that are not instants (rare in event streams).
Common columns you will see in an event table:
occurred_at timestamptz: producer-asserted event time.ingested_at timestamptz: database time when inserted, typically defaultnow().recorded_at timestamptz: sometimes used as a synonym for ingestion time; choose one term and be consistent.
Clock skew and trust boundaries
Producer clocks can be wrong. If you accept occurred_at from devices or external systems, treat it as data, not truth. A practical pattern is to store both occurred_at and ingested_at and define rules such as:
- Use
occurred_atfor user-facing timelines when it is within a reasonable skew window. - Use
ingested_atfor deterministic ordering whenoccurred_atis missing or suspicious.
You can encode “suspicious” as a computed flag (e.g., occurred_at more than 10 minutes in the future relative to ingested_at) and handle it explicitly in queries.
Sequence numbers: the backbone of deterministic ordering
Timestamps alone rarely provide a stable total order. Two events can share the same timestamp, and event time can arrive out of order. For deterministic replay and consistent projections, introduce a sequence number that is monotonic within an ordering scope.
There are two common sequence concepts:
- Stream position: a monotonically increasing number assigned by the storage/broker for a stream/partition (e.g., Kafka offset). This is great for resuming consumption and for “what did I read last.”
- Entity version: a monotonically increasing number per entity (e.g., account_version). This is great for enforcing correct per-entity ordering and detecting gaps.
In PostgreSQL, you can store both, but they serve different purposes. If you only store one, prefer the one that matches your business ordering scope (often entity version).
Schema example: per-entity version and ingestion time
create table account_events ( event_id uuid primary key, account_id uuid not null, account_version bigint not null, event_type text not null, occurred_at timestamptz, ingested_at timestamptz not null default now(), payload jsonb not null, unique (account_id, account_version));The unique (account_id, account_version) constraint ensures you cannot have two different events claiming to be the same version for the same account. It does not, by itself, guarantee that versions are contiguous (no gaps), but it provides a strong foundation for deterministic ordering.
Step-by-step: assigning per-entity sequence numbers safely
Assigning account_version is easy in a single-threaded world and tricky under concurrency. You want to avoid two writers assigning the same next version. A practical approach is to keep a separate “entity head” table that tracks the latest version and update it transactionally.
1) Create an entity head table
create table account_event_heads ( account_id uuid primary key, last_version bigint not null);2) Insert the first head row when the account is created
insert into account_event_heads(account_id, last_version) values ($1, 0);3) Append an event with a transactional version increment
The pattern below increments last_version and uses the new value as account_version for the event. It relies on row-level locking of the head row.
with next as ( update account_event_heads set last_version = last_version + 1 where account_id = $1 returning last_version) insert into account_events( event_id, account_id, account_version, event_type, occurred_at, payload) select gen_random_uuid(), $1, next.last_version, $2, $3, $4 from next returning event_id, account_version;Step-by-step what happens:
- The
updatelocks the head row for thataccount_id. - Concurrent writers for the same account will queue, ensuring versions are assigned in order.
- The returned
last_versionbecomes the event’saccount_version.
This gives you a strict per-account order that is independent of timestamps and robust to out-of-order arrival from upstream systems (as long as you assign versions at ingestion time in your boundary service).
4) Handling missing head rows
If events can arrive before the head row exists, you can upsert the head row. Be careful: you must still ensure a single head row and correct increments.
insert into account_event_heads(account_id, last_version) values ($1, 0) on conflict (account_id) do nothing;Then run the append statement. This splits initialization from incrementing, which is often simpler to reason about.
Ordering rules for reading: stable sort keys
When replaying events, always use a stable, deterministic ordering. A common mistake is ordering only by occurred_at. Instead, order by the sequence number and use timestamps as secondary metadata.
Per-entity replay
select * from account_events where account_id = $1 order by account_version asc;Timeline view with tie-breakers
If you need a cross-entity timeline (e.g., an audit log), define a deterministic tie-breaker. If you do not have a global stream position, you can use ingested_at plus a unique id as a tie-breaker, understanding that this is ingestion order, not real-world order.
select event_id, account_id, event_type, occurred_at, ingested_at from account_events order by ingested_at asc, event_id asc;This produces a stable order for pagination and UI, but it does not claim “truth” about causality across accounts.
Out-of-order arrival and late events
Late events are events whose occurred_at is earlier than events you have already processed for that entity or window. They are normal in distributed systems. You need a policy for how projections and reports react to them.
Common policies:
- Recompute on late arrival: keep projections rebuildable; when a late event arrives, recompute affected aggregates.
- Allow corrections: store correction events that adjust derived values rather than rewriting history.
- Watermarks: treat event time as “final” only up to a moving boundary (e.g., “we accept late events up to 24 hours”).
In PostgreSQL, a practical technique is to maintain projections with a “last processed version” per entity (or per partition) and to re-run projection logic for an entity if you detect a late event that changes the ordering by event time. If you rely on account_version as the canonical order, late events are simply events that occurred earlier but were recorded later; they do not disrupt replay order, but they may affect event-time analytics.
Detecting late events relative to event time
select e.* from account_events e join ( select account_id, max(occurred_at) as max_occurred from account_events group by account_id) m using (account_id) where e.occurred_at is not null and e.occurred_at < m.max_occurred and e.ingested_at >= now() - interval '1 day';This identifies candidates for “arrived late” within a recent ingestion window. You can use such queries to trigger recomputation jobs or to monitor upstream clock issues.
Gaps, duplicates, and “did we miss something?”
Even with per-entity versions, you may see gaps (missing versions) if events are produced elsewhere and you are consuming them, or if you assign versions upstream and ingest downstream. Gaps matter because they break the assumption that “I have a complete history up to version N.”
Checking for gaps per entity
select account_id, account_version, lag(account_version) over (partition by account_id order by account_version) as prev_version from account_events;You can wrap this to find where account_version is not prev_version + 1. For large datasets, do this incrementally (e.g., only for recently ingested events) and index (account_id, account_version).
When you detect gaps, you need a domain-specific response:
- Pause processing for that entity until missing events arrive.
- Fetch missing events from the source of truth.
- Mark the entity as “incomplete history” and avoid producing certain outputs.
Concurrency and ordering: what PostgreSQL guarantees
PostgreSQL provides strong transactional guarantees, but it does not magically impose a meaningful order on concurrent transactions unless you define one. Key points:
- Commit order is not a business order: two transactions can commit in either order depending on scheduling.
- Sequences are global counters, not per-entity order: a
bigserialgives a unique increasing number, but it interleaves all entities and can have gaps due to rollbacks. - Row-level locks can serialize per-entity writes: the head-row update pattern creates a clear order for one entity without bottlenecking all entities.
If you need strict per-entity ordering, do not rely on ingested_at or event_id alone. Use a per-entity version assigned under a lock (or assigned by a single writer per entity).
Pagination and “stable ordering” for APIs
Event streams are often exposed via APIs: “give me events after X.” Offset-based pagination (limit/offset) is unstable under inserts. Prefer keyset pagination using a stable sort key.
Keyset pagination by ingestion order
select event_id, ingested_at, payload from account_events where (ingested_at, event_id) > ($1::timestamptz, $2::uuid) order by ingested_at, event_id limit 200;This works well for audit feeds and integration endpoints. It is stable because the tuple (ingested_at, event_id) is unique and ordered.
Keyset pagination by entity version
select event_id, account_version, payload from account_events where account_id = $1 and account_version > $2 order by account_version limit 200;This is the simplest and most robust way to stream events for a single entity.
Event-time analytics: ordering within windows
Analytics often care about event time: “count signups per hour based on when they happened.” Late events complicate this because a window you thought was complete may change. In PostgreSQL, you can model this by computing aggregates over occurred_at and deciding how far back you are willing to update.
Windowed aggregation with a lateness allowance
Suppose you accept late events up to 48 hours. You can recompute aggregates for the last 48 hours on a schedule.
select date_trunc('hour', occurred_at) as hour_bucket, count(*) from account_events where occurred_at >= now() - interval '48 hours' group by 1 order by 1;For larger systems, you would store hourly aggregates in a separate table and upsert the affected buckets. The key is that your policy (48 hours) is explicit and operationally manageable.
When you need both: causal order vs. display order
Many domains need two different orderings:
- Causal/application order: the order in which you apply events to compute state. This should be deterministic and usually uses per-entity version.
- Human timeline order: the order you show to users, often based on occurred_at (with fallbacks).
Do not force one ordering to serve both purposes. Store the data needed for each and be explicit in queries. A common approach is:
- Apply events by
account_version. - Display by
coalesce(occurred_at, ingested_at)with tie-breakers.
select event_id, event_type, occurred_at, ingested_at from account_events where account_id = $1 order by coalesce(occurred_at, ingested_at) asc, account_version asc;This gives a user-friendly timeline while still keeping a deterministic secondary key.
Practical checklist for designing ordering guarantees
- Define the ordering scope you truly need (usually per entity).
- Store both event time (
occurred_at) and ingestion time (ingested_at). - Introduce a deterministic sequence number for the ordering scope (e.g.,
account_version). - Assign sequence numbers transactionally (head-row update) or via a single writer per entity.
- Always query with a stable
order bythat matches your guarantee. - Adopt an explicit late-event policy for event-time analytics (watermark/lateness window).
- Use keyset pagination with composite cursors for feeds.