Why temporal and stream-read indexing is different
Temporal queries and stream reads look similar because both are “time-shaped,” but they stress PostgreSQL in different ways. Temporal queries typically ask: “What was true at time T?” or “What changed between T1 and T2?” Stream reads typically ask: “Give me the next N events after my cursor, in order, as fast as possible.” The indexing strategy must therefore optimize two distinct access patterns:
- Range scans over time: filter by a time window and often by an entity or category, then aggregate or reconstruct state.
- Ordered incremental scans: fetch the next page of events after a cursor (time-based, sequence-based, or both), usually with a stable order and minimal random I/O.
This chapter focuses on indexes that make those patterns fast and predictable, and on how to verify that PostgreSQL is actually using them.
Core index building blocks for temporal workloads
B-tree for ordered scans and equality + range
B-tree indexes are the default choice for temporal access because they support ordered scans and efficient range conditions. They are ideal when your predicates look like:
occurred_at >= $from AND occurred_at < $toentity_id = $id AND occurred_at >= $fromstream_id = $stream AND (occurred_at, event_id) > ($cursor_time, $cursor_id)
They also support index-only scans when the query can be satisfied from the index plus the visibility map (more on that below).
BRIN for large, append-heavy time series
BRIN (Block Range INdex) is designed for very large tables where rows are physically correlated with the indexed column (commonly time). BRIN is tiny and cheap to maintain, and it can prune large portions of the table for time-range queries. It is not as precise as B-tree; it narrows down which heap blocks might contain matching rows, then PostgreSQL checks those blocks.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
BRIN is often a good fit when:
- Data is inserted in roughly increasing
occurred_atorder. - Queries are mostly time-window scans over large ranges.
- You want minimal write overhead.
BRIN is less ideal for tight “next page after cursor” reads where you need exact ordering and small page sizes; B-tree usually wins there.
GiST/SP-GiST for range types (valid time)
If you model validity as a range (for example valid_during tstzrange), GiST indexes can accelerate “contains” and “overlaps” queries. This is common in bitemporal or slowly-changing facts where you ask “which version was valid at time T?”
Typical predicate: valid_during @> $as_of_time
Indexing for “read the stream” queries
Define a stable cursor and order
Stream reads need a stable, total order. In practice you usually order by a time column plus a tie-breaker that is unique and monotonic enough for your stream. A common pattern is:
ORDER BY occurred_at, event_id
Where event_id is unique (often a bigserial or generated id). Even if timestamps collide, the pair gives a strict order.
Create a composite B-tree index matching the order
For a stream read that filters by stream key (for example stream_id) and then orders by time and id, create an index that starts with the filter columns and then the order columns:
CREATE INDEX CONCURRENTLY events_stream_cursor_idx ON events (stream_id, occurred_at, event_id);This index supports:
- Finding the first row for a given stream quickly.
- Continuing from a cursor using a multi-column comparison.
- Returning rows already in the desired order (no sort step).
Use keyset pagination (seek method), not OFFSET
OFFSET pagination forces PostgreSQL to scan and discard rows, which becomes slower as you move forward. Keyset pagination uses the last seen key as a cursor:
SELECT event_id, occurred_at, event_type, payload FROM events WHERE stream_id = $1 AND (occurred_at, event_id) > ($2, $3) ORDER BY occurred_at, event_id LIMIT 500;With the composite index above, PostgreSQL can do an index range scan that starts exactly at the cursor position.
Step-by-step: verify the plan uses the index
1) Ensure the index exists and is valid.
\d+ events2) Run the query with EXPLAIN (ANALYZE, BUFFERS) in a staging environment.
EXPLAIN (ANALYZE, BUFFERS) SELECT event_id, occurred_at, event_type FROM events WHERE stream_id = 42 AND (occurred_at, event_id) > ('2026-01-01T00:00:00Z', 1000000) ORDER BY occurred_at, event_id LIMIT 500;3) Look for an Index Scan or Index Only Scan on events_stream_cursor_idx and confirm there is no explicit Sort node. Also check buffers: a good plan for “next page” reads should touch few heap pages.
Covering indexes for stream reads (INCLUDE)
If your consumer needs a small set of columns, you can reduce heap access by using an index that “covers” the query. PostgreSQL supports INCLUDE columns that are stored in the index leaf pages but not part of the sort key.
CREATE INDEX CONCURRENTLY events_stream_cursor_cover_idx ON events (stream_id, occurred_at, event_id) INCLUDE (event_type);Now a query that selects only event_id, occurred_at, and event_type can often be served via an index-only scan, provided the visibility map is sufficiently set (which depends on vacuuming and how recently pages were modified).
Practical guidance:
- Keep INCLUDE columns small and frequently read.
- Avoid including large payload columns; it bloats the index and increases write cost.
- Measure: index-only scans are not guaranteed; they depend on MVCC visibility.
Partial indexes for “hot” subsets
Stream consumers often read only recent events (for example, last 7 days) or only a subset of event types. A partial index can speed up those reads while keeping index size small.
CREATE INDEX CONCURRENTLY events_recent_stream_idx ON events (stream_id, occurred_at, event_id) WHERE occurred_at >= now() - interval '7 days';Notes:
- Partial indexes are most effective when the predicate is stable and matches the query exactly.
- Because
now()changes, consider using a fixed boundary updated periodically (for example, recreate the index monthly) or use a predicate based on a persisted “bucket” column (likeoccurred_date).
Indexing for time-window analytics and temporal filters
Time-range scans with B-tree
For queries that filter by time window and then group or aggregate, a simple B-tree on occurred_at can help, but it may not be selective enough if the window is large. Still, it can be valuable when combined with another filter (like event_type or entity_id).
CREATE INDEX CONCURRENTLY events_time_idx ON events (occurred_at);Better for common patterns:
CREATE INDEX CONCURRENTLY events_type_time_idx ON events (event_type, occurred_at);This supports queries like:
SELECT date_trunc('hour', occurred_at) AS hour, count(*) FROM events WHERE event_type = 'payment_authorized' AND occurred_at >= $1 AND occurred_at < $2 GROUP BY 1 ORDER BY 1;BRIN for wide windows and massive tables
When the table is huge and inserts are append-like, BRIN can be the difference between scanning everything and scanning only the relevant block ranges.
CREATE INDEX CONCURRENTLY events_occurred_at_brin ON events USING brin (occurred_at) WITH (pages_per_range = 64);pages_per_range controls granularity. Smaller values make the index more precise but larger; larger values make it smaller but less selective. A practical step-by-step approach:
1) Start with a moderate value like 64 or 128.
2) Run representative time-window queries with EXPLAIN (ANALYZE, BUFFERS).
3) If too many heap blocks are visited, reduce pages_per_range.
4) If the BRIN index is large relative to expectations, increase pages_per_range.
5) Periodically run:
VACUUM (ANALYZE) events; -- and consider brin summarization if neededBRIN relies on summaries; if data arrives out of order, BRIN may become less effective because block ranges contain mixed time values.
Composite indexes for “entity over time” lookups
A frequent temporal query is: “show me all changes for entity X in a time window.” Index for that by placing the equality column first:
CREATE INDEX CONCURRENTLY events_entity_time_idx ON events (entity_id, occurred_at, event_id);This supports both:
- Time-window scans for a single entity.
- Cursor-based reads per entity (if you treat entity as a stream).
Be careful about creating too many similar composite indexes. Each additional index increases write amplification. Prefer indexes that serve multiple critical queries.
Indexing “as-of” queries with range validity
If you store versions with a validity range, you can accelerate “as-of time” lookups using GiST. Example table:
CREATE TABLE entity_versions ( entity_id bigint NOT NULL, valid_during tstzrange NOT NULL, version_id bigint NOT NULL, data jsonb NOT NULL, PRIMARY KEY (entity_id, version_id));Create a GiST index on the range, typically combined with entity_id via a multicolumn GiST (supported) or via separate indexes depending on query shape. A common approach is:
CREATE INDEX CONCURRENTLY entity_versions_valid_gist ON entity_versions USING gist (valid_during); CREATE INDEX CONCURRENTLY entity_versions_entity_btree ON entity_versions (entity_id);Query:
SELECT * FROM entity_versions WHERE entity_id = $1 AND valid_during @> $2 ORDER BY lower(valid_during) DESC LIMIT 1;Practical note: ordering by range bounds may still require extra work; if you frequently need “latest valid at time T,” consider also storing a generated column for valid_from and indexing (entity_id, valid_from DESC) while still using the range for correctness checks.
Handling ties, clock skew, and “same timestamp” bursts
Temporal indexing often assumes time is strictly increasing, but real systems produce bursts with identical timestamps and sometimes out-of-order arrivals. Indexing should still guarantee deterministic reads:
- Always include a tie-breaker in the ordering and index (for example
event_id). - When consumers rely on “read after cursor,” use the same tuple ordering in both the query and the index.
- If you have a separate ingestion time and event time, decide which one drives stream reads. Many systems read by ingestion order for operational consumers and use event time for analytics.
Example of ingestion-ordered stream read:
CREATE INDEX CONCURRENTLY events_stream_ingest_idx ON events (stream_id, ingested_at, event_id);Then read with (ingested_at, event_id) as the cursor key.
Index maintenance considerations for high write rates
Write amplification and choosing the minimum viable set
Every index must be updated on insert, and potentially on update. For append-heavy event tables, inserts dominate, so the cost is predictable but can be large. A practical approach to index selection:
- List your top 3–5 critical query patterns (stream reads, entity timeline, time-window aggregation, etc.).
- Create one index per pattern, but try to reuse one composite index for multiple patterns.
- Prefer BRIN for broad time-window filtering when B-tree would be huge and not selective enough.
- Avoid indexing large payload fields; instead index small extracted columns that represent the filter keys.
Fillfactor and page splits
B-tree indexes can suffer from page splits if inserts land in the middle of the index order. If your index order is mostly increasing (time plus increasing id), page splits are reduced. If you index on a column with random distribution (like UUID) as the leading key, you can get heavy churn.
If you must use a random leading key, consider:
- Using a different leading key that matches the access pattern.
- Setting a lower fillfactor on the index to leave room for inserts (trade-off: larger index).
CREATE INDEX CONCURRENTLY events_stream_cursor_idx ON events (stream_id, occurred_at, event_id) WITH (fillfactor = 90);VACUUM, visibility map, and index-only scans
Index-only scans are attractive for stream reads, but they depend on the visibility map indicating that heap pages contain only tuples visible to all transactions. If the table is constantly receiving inserts, recent pages may not be marked all-visible yet. Practical steps:
- Ensure autovacuum is tuned to keep up with insert volume.
- Expect index-only scans to work better for slightly older data than for the newest pages.
- Measure with
EXPLAIN (ANALYZE, BUFFERS): if you see many heap fetches, the index is not fully “covering” in practice.
Designing indexes around common temporal query templates
Template 1: “Events for a stream between two times”
Query:
SELECT event_id, occurred_at, event_type FROM events WHERE stream_id = $1 AND occurred_at >= $2 AND occurred_at < $3 ORDER BY occurred_at, event_id;Index:
CREATE INDEX CONCURRENTLY events_stream_time_idx ON events (stream_id, occurred_at, event_id);Template 2: “Next page after cursor”
Query:
SELECT event_id, occurred_at, event_type FROM events WHERE stream_id = $1 AND (occurred_at, event_id) > ($2, $3) ORDER BY occurred_at, event_id LIMIT $4;Index: same as Template 1. This reuse is a key reason to prefer the composite index.
Template 3: “Count events by type in a time window”
Query:
SELECT event_type, count(*) FROM events WHERE occurred_at >= $1 AND occurred_at < $2 GROUP BY event_type;Index options:
- If the window is narrow and you frequently filter by type too:
(event_type, occurred_at). - If the window is wide and the table is huge: BRIN on
occurred_atplus possibly a separate B-tree onevent_type(planner may bitmap-and them).
CREATE INDEX CONCURRENTLY events_type_time_idx ON events (event_type, occurred_at); CREATE INDEX CONCURRENTLY events_time_brin ON events USING brin (occurred_at);Template 4: “Latest event per entity as of time T”
When you need the most recent event before a timestamp for an entity, you can use a descending index to reduce scanning:
CREATE INDEX CONCURRENTLY events_entity_time_desc_idx ON events (entity_id, occurred_at DESC, event_id DESC);Query:
SELECT event_id, occurred_at, payload FROM events WHERE entity_id = $1 AND occurred_at <= $2 ORDER BY occurred_at DESC, event_id DESC LIMIT 1;This pattern is common in “as-of” reconstruction where you want the last known change before a point in time.
Diagnosing slow temporal queries: a practical checklist
1) Confirm the predicate matches the index
PostgreSQL can only use the leftmost prefix of a B-tree index efficiently. If your index is (stream_id, occurred_at, event_id) but your query filters only by occurred_at, it may not use that index as expected. Either add a separate time index (B-tree or BRIN) or rewrite queries to include the leading key when appropriate.
2) Avoid functions on indexed columns in WHERE
Predicates like date_trunc('day', occurred_at) = $day prevent direct use of a plain index on occurred_at. Prefer range predicates:
WHERE occurred_at >= $day_start AND occurred_at < $day_start + interval '1 day'If you truly need function-based filtering, consider an expression index:
CREATE INDEX CONCURRENTLY events_occurred_day_idx ON events (date_trunc('day', occurred_at));Use expression indexes sparingly; they add maintenance cost and can multiply quickly across variants.
3) Watch for sorts and large bitmap heap scans
For stream reads, a Sort node is a red flag: it means the index does not provide the required order. For time-window analytics, a bitmap heap scan can be fine, but if it touches a large fraction of the table, BRIN or different clustering may help.
4) Consider clustering for read-heavy timelines
If you frequently read by (stream_id, occurred_at), clustering the table on that index can improve locality, reducing random I/O:
CLUSTER events USING events_stream_time_idx;Clustering is a one-time rewrite and does not maintain itself automatically. It is most useful when the table is relatively stable or when you can periodically recluster during maintenance windows.
Putting it together: a minimal, high-impact index set
For many event-driven systems, a pragmatic baseline for temporal and stream-read performance is:
- Composite B-tree for stream reads:
(stream_id, occurred_at, event_id) - Composite B-tree for entity timelines (if needed):
(entity_id, occurred_at, event_id) - BRIN on occurred_at for broad time-window scans on very large tables
- Optional covering index using
INCLUDEfor the most common consumer projection
From there, add partial or specialized indexes only when a query is both frequent and slow, and confirm the improvement with EXPLAIN (ANALYZE, BUFFERS) and production-like data volumes.