Why read models exist: separating write truth from query convenience
In an event-driven database, the event log is the source of truth for what happened. It is optimized for durable writes and for reconstructing state, but it is rarely the best shape for serving application queries directly. Most application queries want “current state” or “aggregated state” with fast lookups, joins, and filters. A read model is a query-optimized representation derived from events, built so that reads are simple and fast.
Two common ways to build read models in PostgreSQL are: (1) projections maintained incrementally as events arrive, and (2) materialized views refreshed from the event log (or from intermediate tables). Both approaches produce tables or view-like structures that are not authoritative; they can be rebuilt from the event log if needed. The key design goal is to make them correct, maintainable, and operationally predictable.
Projections: incremental read models updated from events
A projection is a process (often a background worker or application component) that consumes events in order and updates one or more read tables. The read tables are ordinary PostgreSQL tables designed for the application’s query patterns: “get order summary by id”, “list open orders by customer”, “search products with current price”, “show account balance”, etc.
Projection patterns you will use most often
- Latest-state table: one row per entity (e.g., one row per order) with the current fields needed for reads.
- Index table: a table that supports a specific query, often denormalized (e.g., order_id, customer_id, status, updated_at) with indexes that match filters and sorting.
- Aggregate table: precomputed counts/sums (e.g., daily revenue per product, open orders per customer).
- Join/denormalized table: merges data from multiple event streams into a single read-optimized row (e.g., order + customer snapshot fields for a dashboard).
Projections are incremental: they update only what changed since the last processed event. This is typically the fastest way to keep read models near-real-time.
Materialized views: batch-built read models
A PostgreSQL materialized view stores the result of a query physically. You can refresh it periodically (or on demand) to reflect new events. Materialized views are attractive when: you can tolerate some staleness, the query is complex but stable, and you prefer database-managed refresh over custom projection code.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Materialized views are not incremental by default; REFRESH MATERIALIZED VIEW recomputes the whole result set. REFRESH MATERIALIZED VIEW CONCURRENTLY allows reads during refresh but requires a unique index on the materialized view and still recomputes the full view. For very high-volume event logs, you often combine materialized views with intermediate tables (e.g., daily rollups) to keep refresh costs bounded.
Choosing between projections and materialized views
Use projections when you need low latency, fine control over updates, and predictable per-event work. Use materialized views when you need a compact way to publish a derived dataset and can accept periodic refresh. Many systems use both: projections for entity-level “current state” and materialized views for reporting-style aggregates.
Decision checklist
- Latency: projections can be seconds or sub-second; materialized views are typically minutes or longer depending on refresh.
- Complexity: projections require code and operational handling; materialized views require careful refresh scheduling and indexing.
- Cost model: projections cost per event; materialized views cost per refresh (often proportional to total data scanned).
- Failure recovery: projections need checkpointing; materialized views need refresh retry and possibly rebuild.
- Query flexibility: projections can be tailored to specific endpoints; materialized views are best for stable queries.
Step-by-step: building a projection-backed read model
The steps below show a practical workflow for creating a read model from an event table. The example uses an order domain, but the technique applies to any entity stream.
1) Define the read model tables around query needs
Start from the queries your application must serve. Suppose you need: (a) fetch an order summary by id, (b) list open orders for a customer sorted by last update, and (c) show totals (items_count, total_amount).
Create a primary “order_summary” table and a secondary index table for listing.
CREATE TABLE read_order_summary ( order_id uuid PRIMARY KEY, customer_id uuid NOT NULL, status text NOT NULL, items_count integer NOT NULL DEFAULT 0, total_amount numeric(12,2) NOT NULL DEFAULT 0, currency text NOT NULL DEFAULT 'USD', last_event_id uuid NOT NULL, last_event_at timestamptz NOT NULL, updated_at timestamptz NOT NULL DEFAULT now());CREATE INDEX read_order_summary_customer_status_idx ON read_order_summary (customer_id, status, last_event_at DESC);last_event_id and last_event_at are critical for operational correctness: they let you detect whether an incoming event is newer than what you have applied, and they support troubleshooting (“what was the last applied event for this order?”).
2) Create a projection checkpoint table
A projection needs to remember how far it has processed the event stream. Store a checkpoint per projection (and optionally per partition/shard if you parallelize).
CREATE TABLE projection_checkpoint ( projection_name text PRIMARY KEY, last_processed_position bigint NOT NULL, updated_at timestamptz NOT NULL DEFAULT now());INSERT INTO projection_checkpoint (projection_name, last_processed_position) VALUES ('order_summary_projection', 0) ON CONFLICT (projection_name) DO NOTHING;The “position” is a monotonically increasing stream cursor (often a sequence-based column in the event table). Your projection reads events where position > last_processed_position, applies them, then advances the checkpoint in the same transaction (or in a carefully controlled sequence of transactions).
3) Write deterministic event-to-row handlers
Implement handlers that map each event type to an update on the read model. Keep handlers deterministic: given the same event and the same prior read state, they produce the same result. Avoid querying external systems during projection.
Below is an example of applying events using SQL upserts. Assume an event table order_events with columns: position, event_id, event_type, order_id, occurred_at, and payload fields as needed.
-- Pseudocode-ish SQL for applying one event (run inside a transaction)WITH e AS ( SELECT * FROM order_events WHERE position = $1)INSERT INTO read_order_summary AS ros ( order_id, customer_id, status, items_count, total_amount, currency, last_event_id, last_event_at, updated_at)SELECT e.order_id, (e.payload->>'customer_id')::uuid, 'CREATED', 0, 0, COALESCE(e.payload->>'currency','USD'), e.event_id, e.occurred_at, now()FROM eWHERE e.event_type = 'OrderCreated'ON CONFLICT (order_id) DO UPDATE SET customer_id = EXCLUDED.customer_id, status = EXCLUDED.status, currency = EXCLUDED.currency, last_event_id = EXCLUDED.last_event_id, last_event_at = EXCLUDED.last_event_at, updated_at = now();For subsequent events, update only the relevant fields. For example, when an item is added, increment counts and totals. When status changes, update status.
-- Apply OrderItemAddedWITH e AS ( SELECT * FROM order_events WHERE position = $1)UPDATE read_order_summary rosSET items_count = ros.items_count + (e.payload->>'quantity')::int, total_amount = ros.total_amount + (e.payload->>'line_total')::numeric, last_event_id = e.event_id, last_event_at = e.occurred_at, updated_at = now()FROM eWHERE e.event_type = 'OrderItemAdded' AND ros.order_id = e.order_id AND e.occurred_at >= ros.last_event_at; -- guard against late/out-of-order deliveryThe guard condition is a pragmatic safety net when events can arrive late or be replayed. Depending on your ordering guarantees, you might instead compare stream position or per-entity sequence. The important part is: do not let older events overwrite newer read state.
4) Process events in batches and advance the checkpoint
Projections are usually run in a loop: fetch a batch of events after the checkpoint, apply them in order, then update the checkpoint. Batching reduces overhead and improves throughput.
-- Example batch fetchSELECT position, event_id, event_type, order_id, occurred_at, payloadFROM order_eventsWHERE position > $last_positionORDER BY positionASC LIMIT 500;Apply each event (or group by type) and then update the checkpoint. A common approach is to update the checkpoint to the maximum position successfully applied in the batch.
UPDATE projection_checkpointSET last_processed_position = $max_position, updated_at = now()WHERE projection_name = 'order_summary_projection';Operationally, you want the checkpoint update to reflect exactly what you have applied. If you apply events and crash before advancing the checkpoint, you will reprocess some events on restart. That is acceptable if your handlers are safe to reapply (for example, by using upserts and monotonic guards).
5) Make the read model query-friendly
Once the table exists, tune it for the read patterns. Add indexes that match filters and sort orders. Keep rows narrow and avoid storing fields you never query. If you need full-text search or complex filtering, consider dedicated columns or generated columns derived from payload fields.
-- Example: list open orders for a customerSELECT order_id, status, items_count, total_amount, currency, last_event_atFROM read_order_summaryWHERE customer_id = $1 AND status IN ('CREATED','PAID')ORDER BY last_event_at DESC LIMIT 50;Handling corrections and rebuilds
Because read models are derived, you must plan for rebuilds. Rebuilds happen when you change projection logic, add new fields, fix a bug, or need to recover from corruption.
Rebuild strategy: truncate and replay
The simplest rebuild is: stop the projection, truncate the read tables, reset the checkpoint, and replay all events. This is easy but can be slow for large histories.
BEGIN;TRUNCATE TABLE read_order_summary;UPDATE projection_checkpointSET last_processed_position = 0, updated_at = now()WHERE projection_name = 'order_summary_projection';COMMIT;Then restart the projection to process from the beginning.
Rebuild strategy: backfill in parallel, then switch
For large datasets, build a new table side-by-side and then swap. This reduces downtime for reads.
CREATE TABLE read_order_summary_v2 (LIKE read_order_summary INCLUDING ALL);-- backfill into v2 using a dedicated projection instance-- then swap atomicallyALTER TABLE read_order_summary RENAME TO read_order_summary_old;ALTER TABLE read_order_summary_v2 RENAME TO read_order_summary;DROP TABLE read_order_summary_old;When swapping, ensure that application queries point to the correct table (or use a stable view name that you repoint).
Materialized views in practice: building reporting read models
Materialized views shine for reporting queries that scan many events and produce aggregates. For example, you might need daily revenue per currency, or counts of orders by status per day. These are expensive to compute on the fly from the event log but can be refreshed periodically.
1) Create an intermediate “fact” table (optional but recommended)
If your event log is large, refreshing a materialized view directly from it can be too slow. A common pattern is to first project events into a compact fact table (still derived, but smaller and more queryable), then build materialized views from that fact table.
Example: a fact table of payments captured.
CREATE TABLE fact_payments ( payment_id uuid PRIMARY KEY, order_id uuid NOT NULL, customer_id uuid NOT NULL, amount numeric(12,2) NOT NULL, currency text NOT NULL, paid_at timestamptz NOT NULL);CREATE INDEX fact_payments_paid_at_idx ON fact_payments (paid_at);This fact table can be maintained by a projection that listens for PaymentCaptured events and inserts one row per payment.
2) Define the materialized view for the report
Now define a materialized view that aggregates the fact table.
CREATE MATERIALIZED VIEW mv_daily_revenue ASSELECT date_trunc('day', paid_at) AS day, currency, sum(amount) AS revenue, count(*) AS payments_countFROM fact_paymentsGROUP BY 1,2;CREATE UNIQUE INDEX mv_daily_revenue_day_currency_uidx ON mv_daily_revenue (day, currency);The unique index enables concurrent refresh.
3) Refresh strategy and scheduling
Choose a refresh cadence based on acceptable staleness and cost. For example, refresh every 5 minutes during business hours and hourly overnight. Use concurrent refresh if you need the view to remain readable during refresh.
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;If refresh time grows, consider partitioning the fact table by time and creating multiple materialized views per time window, or maintaining a rolling aggregate table via projection instead of a materialized view.
Keeping projections and materialized views consistent with application reads
Applications often need to know “how fresh” a read model is. Instead of guessing, expose freshness explicitly. Add metadata columns (like last_event_at) and/or a small table that records the projection’s last processed position and timestamp. Your API can then decide whether to serve the read model, wait briefly, or fall back to another path.
SELECT projection_name, last_processed_position, updated_atFROM projection_checkpointWHERE projection_name IN ('order_summary_projection');For materialized views, store the last refresh time in a separate table you update after refresh, or query PostgreSQL system catalogs carefully. A simple explicit table is often easiest operationally.
Designing read models for change: schema evolution and new fields
Read models evolve as product requirements change. Because they are derived, you can add columns and backfill them by replaying events or by running a one-time backfill query. Prefer additive changes: add a new column, populate it, then update application code to read it.
Add a column and populate incrementally
Suppose you add shipping_country to the order summary. Add the column with a default, then update the projection to set it when it sees an event that contains shipping info.
ALTER TABLE read_order_summary ADD COLUMN shipping_country text;CREATE INDEX read_order_summary_shipping_country_idx ON read_order_summary (shipping_country);If historical orders need the value, run a backfill by scanning relevant events and updating the table, or rebuild the read model. Choose based on volume and operational constraints.
Operational safeguards: performance and correctness knobs
Transaction sizing and locking
Apply events in batches, but keep transactions bounded. Very large transactions can bloat WAL, increase lock durations, and make retries expensive. A common approach is to process 100–1000 events per transaction, depending on event size and update cost.
Hot rows and contention
If many events update the same entity (a “hot” order or account), you can see contention on that row. Mitigations include: minimizing updated columns, avoiding unnecessary index updates, and splitting read models (e.g., keep frequently changing counters in a separate table). For aggregates, consider writing per-interval buckets (per minute/hour) and aggregating at query time.
Monitoring projection lag
Track lag as the difference between the latest event position and the checkpoint position, and as the time difference between “now” and the last applied event timestamp. Alert when lag exceeds thresholds.
-- Latest position in the streamSELECT max(position) AS latest_position FROM order_events;-- Projection positionSELECT last_processed_position FROM projection_checkpoint WHERE projection_name='order_summary_projection';Reprocessing and replay safety
Design projection updates so that reprocessing does not corrupt state. Use upserts for create-like events, monotonic guards to prevent older events from overwriting newer state, and store the last applied marker per entity. When you need to change logic in a non-additive way (e.g., fix a calculation), plan a rebuild or a targeted correction job.
Combining projections and materialized views: a common architecture
A practical architecture in PostgreSQL is layered:
- Layer 1: entity projections produce current-state tables (fast lookups, API reads).
- Layer 2: fact projections produce append-only or insert-only fact tables (payments, shipments, logins).
- Layer 3: materialized views aggregate facts for dashboards and reports.
This layering keeps each component simple: entity projections handle “latest state”, fact projections provide clean inputs for analytics, and materialized views provide fast aggregates without forcing your application endpoints to run heavy group-bys.
Worked example: from events to an API endpoint
Assume an endpoint GET /customers/{id}/open-orders needs to return the latest 50 open orders with totals. With the projection table created earlier, the endpoint becomes a single indexed query.
SELECT order_id, status, items_count, total_amount, currency, last_event_atFROM read_order_summaryWHERE customer_id = $1 AND status IN ('CREATED','PAID')ORDER BY last_event_at DESC LIMIT 50;Without a read model, you might need to scan many events per order, compute totals, and filter by current status—expensive and complex. The projection turns that into a straightforward query with predictable performance.
Worked example: a reporting query with a materialized view
Assume a dashboard needs “revenue per day for the last 30 days” and must load quickly. With mv_daily_revenue, the query is simple and stable.
SELECT day, currency, revenue, payments_countFROM mv_daily_revenueWHERE day >= date_trunc('day', now()) - interval '30 days'ORDER BY day ASC, currency ASC;The heavy work happens during refresh, not during dashboard load. If refresh becomes too slow, move more work into the fact projection (for example, maintaining a rolling daily aggregate table incrementally) and keep the materialized view as a thin layer or remove it entirely.