SQL for Business Intelligence: GROUP BY for Reporting Tables and Time Series

Capítulo 7

Estimated reading time: 7 minutes

+ Exercise

Why GROUP BY matters for dashboards and recurring reports

Dashboards and recurring reports rarely show raw rows. They show summarized tables: revenue by month, orders by region, active customers by product category. GROUP BY is the tool that turns detailed data into a reporting table where each row represents a bucket (a dimension combination) and each column is a metric (an aggregated measure).

Think of a BI reporting table as: dimensions (the “by” columns) + metrics (the aggregated numbers). GROUP BY defines the grain (row-level meaning) of the result.

GROUP BY basics: building a reporting table

The rule that prevents SQL errors and logic bugs

In a grouped query, every selected column must be either:

  • Included in the GROUP BY (a dimension), or
  • Wrapped in an aggregate function (a metric).

If you select a non-aggregated column that is not grouped, many databases will throw an error. Some systems may allow it but return an arbitrary value, which creates silent logic bugs in BI outputs.

Example: sales by region

SELECT  o.region, SUM(o.order_amount) AS revenue, COUNT(*) AS orders FROM orders o GROUP BY o.region;

Here, region is the dimension (one row per region). revenue and orders are metrics.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Common mistake: selecting a non-grouped column

-- Incorrect: o.order_date is neither grouped nor aggregated SELECT  o.region, o.order_date, SUM(o.order_amount) AS revenue FROM orders o GROUP BY o.region;

Fix it by either grouping by the date (changing the grain) or removing it from the select list.

Step-by-step pattern for BI reporting tables

Step 1: Choose the dimensions (define the grain)

Ask: “What should one row represent?” Examples:

  • One row per month and region
  • One row per product_category
  • One row per week, channel, and country

Step 2: Define the metrics (aggregated measures)

Pick metrics that match the business question. Examples:

  • SUM(order_amount) as revenue
  • COUNT(*) as number of orders
  • COUNT(DISTINCT customer_id) as unique customers

Step 3: Write the query with only grouped columns + metrics

SELECT  o.region, DATE_TRUNC('month', o.order_date) AS month, SUM(o.order_amount) AS revenue, COUNT(*) AS orders FROM orders o GROUP BY  o.region, DATE_TRUNC('month', o.order_date);

Step 4: Verify the grain: one row per unique dimension combination

A quick QA check is to compare row counts:

  • The result should have at most one row per (region, month).
  • If you see duplicates, you likely grouped by the wrong columns or introduced duplication earlier in your pipeline.

You can validate by checking the number of distinct dimension combinations in the source data matches the number of rows in your grouped output.

-- Distinct dimension combinations in the raw data SELECT  COUNT(*) AS distinct_region_months FROM (   SELECT DISTINCT     o.region,     DATE_TRUNC('month', o.order_date) AS month   FROM orders o ) x;

Grouping time series for reporting (day, week, month)

Time series dashboards depend on consistent time buckets. The key is to transform a timestamp/date into a bucket start (day/week/month) and group by that bucket.

Daily buckets

SELECT  CAST(o.order_date AS DATE) AS day, SUM(o.order_amount) AS revenue FROM orders o GROUP BY CAST(o.order_date AS DATE) ORDER BY day;

Weekly buckets

Many SQL dialects support DATE_TRUNC. If yours differs, use the equivalent function for week bucketing.

SELECT  DATE_TRUNC('week', o.order_date) AS week_start, COUNT(*) AS orders FROM orders o GROUP BY DATE_TRUNC('week', o.order_date) ORDER BY week_start;

Monthly buckets

SELECT  DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue, COUNT(DISTINCT o.customer_id) AS customers FROM orders o GROUP BY DATE_TRUNC('month', o.order_date) ORDER BY month_start;

Practical QA: ensure your bucket column is stable

Always group by the exact expression you select (or by its alias if your database allows it). If you group by the raw date but select a truncated month, you will get multiple rows per month.

-- Incorrect: grouping by order_date creates many rows per month SELECT  DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue FROM orders o GROUP BY o.order_date;

Grouping by business dimensions (region, product category)

Dimensions are descriptive attributes used to slice metrics. Common dashboard dimensions include region, product category, channel, and customer segment.

One dimension: revenue by product category

SELECT  p.category, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN products p ON p.product_id = oi.product_id GROUP BY p.category ORDER BY revenue DESC;

Multiple dimensions: monthly revenue by region and category

SELECT  DATE_TRUNC('month', o.order_date) AS month_start, o.region, p.category, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id JOIN products p ON p.product_id = oi.product_id GROUP BY  DATE_TRUNC('month', o.order_date), o.region, p.category ORDER BY month_start, o.region, p.category;

Grain check: what does one row mean?

In the query above, one row represents exactly one (month_start, region, category) combination. If you later add product_id to the select list, you must decide whether you want to change the grain (group by product_id too) or keep the current grain (do not select product_id).

Filtering aggregated results with HAVING

WHERE filters rows before grouping. HAVING filters groups after metrics are computed. Use HAVING for conditions on aggregated values, such as “only show regions with at least 100 orders”.

Example: keep only high-volume regions

SELECT  o.region, COUNT(*) AS orders, SUM(o.order_amount) AS revenue FROM orders o GROUP BY o.region HAVING COUNT(*) >= 100 ORDER BY revenue DESC;

Example: monthly buckets with minimum revenue threshold

SELECT  DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue FROM orders o GROUP BY DATE_TRUNC('month', o.order_date) HAVING SUM(o.order_amount) >= 50000 ORDER BY month_start;

Combine WHERE and HAVING intentionally

A common reporting need is: filter the input dataset to the relevant scope (using WHERE), then filter the summarized output (using HAVING).

SELECT  o.region, DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue FROM orders o WHERE o.order_date >= DATE '2025-01-01' GROUP BY o.region, DATE_TRUNC('month', o.order_date) HAVING SUM(o.order_amount) > 0;

Practical checklist to avoid GROUP BY reporting bugs

1) Confirm the intended grain before writing SQL

  • Write it in words: “One row per ____.”
  • List the dimension columns that define that row.

2) Keep SELECT clean: only dimensions + metrics

  • If a column is not a dimension, it should not appear unless aggregated.
  • If you need a descriptive label (like category name), ensure it is functionally determined by your grouped keys (or include it in the group by).

3) Group by the same expressions you select

  • Especially for time buckets: group by the truncated date expression, not the raw timestamp.

4) Validate uniqueness of dimension combinations

  • Row count should match the number of distinct dimension combinations.
  • If it does not, revisit your dimensions or upstream joins/filters.

5) Use HAVING only for aggregated conditions

  • Use WHERE for row-level filters (date ranges, status, region scope).
  • Use HAVING for group-level filters (minimum orders, minimum revenue).

Now answer the exercise about the content:

When building a monthly revenue report, which approach best prevents getting multiple rows per month?

You are right! Congratulations, now go to the next page

You missed! Try again.

Time series reporting needs stable buckets. If you select a truncated month, you should group by that same expression; grouping by the raw date creates many rows within the same month.

Next chapter

SQL for Business Intelligence: Working with NULLs and Data Quality Edge Cases

Arrow Right Icon
Free Ebook cover SQL for Business Intelligence: From Zero to Confident Querying
64%

SQL for Business Intelligence: From Zero to Confident Querying

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.