SQL for Business Intelligence: Common Table Expressions for Readable, Auditable Queries

Capítulo 10

Estimated reading time: 9 minutes

+ Exercise

Why CTEs are a primary BI pattern

In BI work, a single query often becomes a long-lived asset: it feeds a dashboard tile, is scheduled in a warehouse, and gets reviewed when numbers change. Common Table Expressions (CTEs) let you structure that query as a readable pipeline of named steps. Each step can be audited independently, and the final SELECT becomes a clear “assembly” of prepared datasets.

A CTE is defined with a WITH clause and a name, followed by a SELECT statement. You can define multiple CTEs, separated by commas, and then reference them in a final SELECT (or in later CTEs). Think of CTEs as temporary, named result sets that exist only for the duration of the query.

WITH cte_name AS ( SELECT ... ) SELECT ... FROM cte_name;

CTE naming conventions that communicate intent

Dashboards are maintained by teams. Names should communicate purpose, grain, and whether the dataset is raw, cleaned, or aggregated. A practical convention is to use prefixes that mirror a BI pipeline:

  • stg_: staging (minimal reshaping, selecting needed columns, basic joins)
  • cln_: cleaning (standardizing values, handling edge cases, applying business rules)
  • agg_: aggregations (grouping to a defined grain, KPI calculations)
  • fct_ or final: final dataset for the dashboard tile
  • qa_: quality checks (row counts, duplicate detection, reconciliation)

Also encode the grain in the name when helpful, for example agg_daily_revenue or agg_customer_month. This reduces the risk of accidentally joining datasets at incompatible grains later.

Building a dashboard query as a CTE chain

A strong BI pattern is to chain CTEs that each do one job: stage, clean, aggregate, then produce the final shape. Each CTE should be testable by running SELECT * from that CTE (usually with a small LIMIT during development) to confirm row counts and key fields.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Step-by-step example: daily revenue and order count

Scenario: You need a daily time series for a dashboard: total revenue and number of completed orders per day for a given date range. You want the query to be auditable and easy to modify.

1) Staging CTE: select the base rows at the intended grain

Start from the table that represents the base event you want to count. For orders, the natural grain is typically one row per order. Keep only the columns you need and apply non-controversial filters (like excluding test data if that is a standard rule in your organization).

WITH stg_orders AS (  SELECT     o.order_id,     o.customer_id,     o.order_timestamp,     o.status,     o.total_amount   FROM orders o  WHERE o.order_timestamp >= DATE '2025-01-01'    AND o.order_timestamp <  DATE '2025-02-01'),

Audit tip: at this stage, you should be able to answer “how many orders are in scope?” by running SELECT COUNT(*) FROM stg_orders.

2) Cleaning CTE: standardize and apply business rules

Cleaning is where you define what “counts” for the metric. For example, you may only want completed/paid orders, and you may want to derive an order_date for grouping. Keep this CTE focused: one purpose is “define valid orders and derived fields used downstream.”

cln_orders AS (  SELECT     order_id,     customer_id,     CAST(order_timestamp AS DATE) AS order_date,     total_amount  FROM stg_orders  WHERE status IN ('completed', 'paid')),

Audit tip: validate that the filter behaves as expected by checking counts by status before and after. Because each CTE is selectable, you can quickly run: SELECT status, COUNT(*) FROM stg_orders GROUP BY status and compare to SELECT COUNT(*) FROM cln_orders.

3) Aggregation CTE: define the reporting grain explicitly

Now aggregate to the grain required by the dashboard tile: one row per day. This is where you calculate KPIs.

agg_daily_kpis AS (  SELECT     order_date,     COUNT(*) AS orders_completed,     SUM(total_amount) AS revenue  FROM cln_orders  GROUP BY order_date),

Audit tip: the name agg_daily_kpis signals that the grain is daily. This helps prevent later joins that accidentally multiply rows.

4) Final SELECT: shape the dataset for the dashboard

The final SELECT should be short and focused on presentation: selecting columns, applying final formatting, and ordering. Avoid hiding business logic here; keep logic in earlier CTEs so it is testable.

final AS (  SELECT     order_date,     orders_completed,     revenue  FROM agg_daily_kpis) SELECT * FROM final ORDER BY order_date;

Auditability: make each CTE testable and single-purpose

To keep queries auditable, treat each CTE as a checkpoint with a single responsibility. A practical checklist:

  • One purpose per CTE: stage columns, clean rules, aggregate, or final shape.
  • Stable keys: if a CTE is at order grain, it should retain order_id (or the natural key) until you intentionally aggregate away from it.
  • Test with SELECT: during development, run SELECT * from each CTE to inspect sample rows and confirm derived fields.
  • Document intent via naming: cln_ should not aggregate; agg_ should clearly state its grain.

Common pitfall: unintentionally changing grain between CTEs

The most frequent BI bug is a silent grain change that inflates or deflates metrics. This often happens when you join a “one row per entity” dataset to a “many rows per entity” dataset without re-aggregating, or when you aggregate and then join back to detail rows.

Example of a grain trap

Suppose you compute daily revenue (one row per day) and then join it to order-level rows (many rows per day). If you then sum revenue again, you will multiply it by the number of orders per day.

WITH cln_orders AS (  SELECT     order_id,     CAST(order_timestamp AS DATE) AS order_date,     total_amount  FROM orders  WHERE status IN ('completed', 'paid')), agg_daily_revenue AS (  SELECT     order_date,     SUM(total_amount) AS revenue  FROM cln_orders  GROUP BY order_date) SELECT   o.order_date,   SUM(r.revenue) AS wrong_revenue FROM cln_orders o JOIN agg_daily_revenue r   ON o.order_date = r.order_date GROUP BY o.order_date;

Why it’s wrong: agg_daily_revenue has one row per day, but joining it to cln_orders repeats that revenue value for every order on that day.

How to avoid it

  • Join at the same grain: if your final output is daily, keep the final assembly at daily grain (join daily-to-daily).
  • Aggregate after joins when needed: if you must join detail tables, do it before aggregating, and then aggregate once at the end.
  • Encode grain in CTE names: it becomes obvious when you are about to join agg_daily_* to stg_order_*.

Adding QA CTEs for row-count and duplication checks

For dashboard queries, add explicit QA CTEs that compute checks you can review during development and keep for future audits. These QA CTEs should not change the final dataset; they should produce diagnostics you can query when investigating issues.

Row-count checks between stages

A common audit question is: “Where did rows drop?” Create a QA CTE that reports counts at each stage. This is especially helpful when cleaning rules change.

WITH stg_orders AS (  SELECT order_id, order_timestamp, status, total_amount  FROM orders  WHERE order_timestamp >= DATE '2025-01-01'    AND order_timestamp <  DATE '2025-02-01'), cln_orders AS (  SELECT     order_id,     CAST(order_timestamp AS DATE) AS order_date,     total_amount  FROM stg_orders  WHERE status IN ('completed', 'paid')), agg_daily_kpis AS (  SELECT     order_date,     COUNT(*) AS orders_completed,     SUM(total_amount) AS revenue  FROM cln_orders  GROUP BY order_date), qa_counts AS (  SELECT 'stg_orders' AS step, COUNT(*) AS row_count FROM stg_orders  UNION ALL  SELECT 'cln_orders' AS step, COUNT(*) AS row_count FROM cln_orders  UNION ALL  SELECT 'agg_daily_kpis' AS step, COUNT(*) AS row_count FROM agg_daily_kpis) SELECT * FROM qa_counts ORDER BY step;

How to use it: run the QA query when validating changes. If cln_orders drops unexpectedly, you know the issue is in the cleaning rules, not the aggregation.

Duplicate key checks (grain validation)

If a CTE is supposed to be one row per key (for example, one row per order_id), add a QA CTE that finds duplicates. This catches join explosions early.

WITH stg_orders AS (  SELECT order_id, customer_id, order_timestamp  FROM orders), qa_dupe_orders AS (  SELECT     order_id,     COUNT(*) AS rows_per_order  FROM stg_orders  GROUP BY order_id  HAVING COUNT(*) > 1) SELECT * FROM qa_dupe_orders ORDER BY rows_per_order DESC;

If this returns rows, your “order grain” assumption is broken (or you staged from a source that is not truly one row per order). Fix the staging step before building metrics.

Reconciliation checks (sum of parts vs total)

When you aggregate, you can add a QA CTE that compares totals computed two ways. For example, total revenue from cleaned orders should equal the sum of daily revenue.

WITH cln_orders AS (  SELECT     CAST(order_timestamp AS DATE) AS order_date,     total_amount  FROM orders  WHERE status IN ('completed', 'paid')), agg_daily AS (  SELECT order_date, SUM(total_amount) AS revenue  FROM cln_orders  GROUP BY order_date), qa_reconcile AS (  SELECT     (SELECT SUM(total_amount) FROM cln_orders) AS revenue_from_detail,     (SELECT SUM(revenue) FROM agg_daily) AS revenue_from_daily) SELECT * FROM qa_reconcile;

If these differ, you likely have a filtering mismatch, a date derivation issue, or unintended duplication.

Design pattern: keep the final SELECT simple and stable

Dashboards benefit from stable schemas. Use CTEs to isolate change: if business rules change, update cln_*; if KPI definitions change, update agg_*; keep the final output columns consistent.

  • Prefer adding new columns in earlier CTEs so they can be tested in isolation.
  • Avoid hidden logic in the final SELECT; treat it as a contract for the dashboard.
  • Keep joins localized: do joins in staging CTEs where you can validate row counts before aggregation.

Chaining multiple CTEs for complex BI metrics

For more complex metrics, you can extend the pipeline with additional CTEs while preserving the “single purpose” rule. A common structure is:

  • stg_*: bring in base facts and dimensions needed
  • cln_*: apply business definitions and derive fields
  • agg_*: aggregate to the dashboard grain
  • qa_*: validate counts, duplicates, reconciliations
  • final: select and order the dashboard dataset

When the query grows, this structure makes it easier to review changes in code review and to troubleshoot metric shifts by running each CTE independently.

Now answer the exercise about the content:

You notice a BI query is inflating revenue after joining a daily revenue CTE to order-level rows. Which change best prevents this grain trap while keeping the query auditable?

You are right! Congratulations, now go to the next page

You missed! Try again.

Inflation happens when a one-row-per-day CTE is joined to many order rows, repeating daily revenue per order. Prevent it by joining at the same grain (daily-to-daily) or joining details first and aggregating once at the end.

Next chapter

SQL for Business Intelligence: Building Dashboard-Ready Datasets and Validating Results

Arrow Right Icon
Free Ebook cover SQL for Business Intelligence: From Zero to Confident Querying
91%

SQL for Business Intelligence: From Zero to Confident Querying

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.