Why GROUP BY matters for dashboards and recurring reports
Dashboards and recurring reports rarely show raw rows. They show summarized tables: revenue by month, orders by region, active customers by product category. GROUP BY is the tool that turns detailed data into a reporting table where each row represents a bucket (a dimension combination) and each column is a metric (an aggregated measure).
Think of a BI reporting table as: dimensions (the “by” columns) + metrics (the aggregated numbers). GROUP BY defines the grain (row-level meaning) of the result.
GROUP BY basics: building a reporting table
The rule that prevents SQL errors and logic bugs
In a grouped query, every selected column must be either:
- Included in the
GROUP BY(a dimension), or - Wrapped in an aggregate function (a metric).
If you select a non-aggregated column that is not grouped, many databases will throw an error. Some systems may allow it but return an arbitrary value, which creates silent logic bugs in BI outputs.
Example: sales by region
SELECT o.region, SUM(o.order_amount) AS revenue, COUNT(*) AS orders FROM orders o GROUP BY o.region;Here, region is the dimension (one row per region). revenue and orders are metrics.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Common mistake: selecting a non-grouped column
-- Incorrect: o.order_date is neither grouped nor aggregated SELECT o.region, o.order_date, SUM(o.order_amount) AS revenue FROM orders o GROUP BY o.region;Fix it by either grouping by the date (changing the grain) or removing it from the select list.
Step-by-step pattern for BI reporting tables
Step 1: Choose the dimensions (define the grain)
Ask: “What should one row represent?” Examples:
- One row per
monthandregion - One row per
product_category - One row per
week,channel, andcountry
Step 2: Define the metrics (aggregated measures)
Pick metrics that match the business question. Examples:
SUM(order_amount)as revenueCOUNT(*)as number of ordersCOUNT(DISTINCT customer_id)as unique customers
Step 3: Write the query with only grouped columns + metrics
SELECT o.region, DATE_TRUNC('month', o.order_date) AS month, SUM(o.order_amount) AS revenue, COUNT(*) AS orders FROM orders o GROUP BY o.region, DATE_TRUNC('month', o.order_date);Step 4: Verify the grain: one row per unique dimension combination
A quick QA check is to compare row counts:
- The result should have at most one row per (region, month).
- If you see duplicates, you likely grouped by the wrong columns or introduced duplication earlier in your pipeline.
You can validate by checking the number of distinct dimension combinations in the source data matches the number of rows in your grouped output.
-- Distinct dimension combinations in the raw data SELECT COUNT(*) AS distinct_region_months FROM ( SELECT DISTINCT o.region, DATE_TRUNC('month', o.order_date) AS month FROM orders o ) x;Grouping time series for reporting (day, week, month)
Time series dashboards depend on consistent time buckets. The key is to transform a timestamp/date into a bucket start (day/week/month) and group by that bucket.
Daily buckets
SELECT CAST(o.order_date AS DATE) AS day, SUM(o.order_amount) AS revenue FROM orders o GROUP BY CAST(o.order_date AS DATE) ORDER BY day;Weekly buckets
Many SQL dialects support DATE_TRUNC. If yours differs, use the equivalent function for week bucketing.
SELECT DATE_TRUNC('week', o.order_date) AS week_start, COUNT(*) AS orders FROM orders o GROUP BY DATE_TRUNC('week', o.order_date) ORDER BY week_start;Monthly buckets
SELECT DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue, COUNT(DISTINCT o.customer_id) AS customers FROM orders o GROUP BY DATE_TRUNC('month', o.order_date) ORDER BY month_start;Practical QA: ensure your bucket column is stable
Always group by the exact expression you select (or by its alias if your database allows it). If you group by the raw date but select a truncated month, you will get multiple rows per month.
-- Incorrect: grouping by order_date creates many rows per month SELECT DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue FROM orders o GROUP BY o.order_date;Grouping by business dimensions (region, product category)
Dimensions are descriptive attributes used to slice metrics. Common dashboard dimensions include region, product category, channel, and customer segment.
One dimension: revenue by product category
SELECT p.category, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN products p ON p.product_id = oi.product_id GROUP BY p.category ORDER BY revenue DESC;Multiple dimensions: monthly revenue by region and category
SELECT DATE_TRUNC('month', o.order_date) AS month_start, o.region, p.category, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id JOIN products p ON p.product_id = oi.product_id GROUP BY DATE_TRUNC('month', o.order_date), o.region, p.category ORDER BY month_start, o.region, p.category;Grain check: what does one row mean?
In the query above, one row represents exactly one (month_start, region, category) combination. If you later add product_id to the select list, you must decide whether you want to change the grain (group by product_id too) or keep the current grain (do not select product_id).
Filtering aggregated results with HAVING
WHERE filters rows before grouping. HAVING filters groups after metrics are computed. Use HAVING for conditions on aggregated values, such as “only show regions with at least 100 orders”.
Example: keep only high-volume regions
SELECT o.region, COUNT(*) AS orders, SUM(o.order_amount) AS revenue FROM orders o GROUP BY o.region HAVING COUNT(*) >= 100 ORDER BY revenue DESC;Example: monthly buckets with minimum revenue threshold
SELECT DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue FROM orders o GROUP BY DATE_TRUNC('month', o.order_date) HAVING SUM(o.order_amount) >= 50000 ORDER BY month_start;Combine WHERE and HAVING intentionally
A common reporting need is: filter the input dataset to the relevant scope (using WHERE), then filter the summarized output (using HAVING).
SELECT o.region, DATE_TRUNC('month', o.order_date) AS month_start, SUM(o.order_amount) AS revenue FROM orders o WHERE o.order_date >= DATE '2025-01-01' GROUP BY o.region, DATE_TRUNC('month', o.order_date) HAVING SUM(o.order_amount) > 0;Practical checklist to avoid GROUP BY reporting bugs
1) Confirm the intended grain before writing SQL
- Write it in words: “One row per ____.”
- List the dimension columns that define that row.
2) Keep SELECT clean: only dimensions + metrics
- If a column is not a dimension, it should not appear unless aggregated.
- If you need a descriptive label (like category name), ensure it is functionally determined by your grouped keys (or include it in the group by).
3) Group by the same expressions you select
- Especially for time buckets: group by the truncated date expression, not the raw timestamp.
4) Validate uniqueness of dimension combinations
- Row count should match the number of distinct dimension combinations.
- If it does not, revisit your dimensions or upstream joins/filters.
5) Use HAVING only for aggregated conditions
- Use
WHEREfor row-level filters (date ranges, status, region scope). - Use
HAVINGfor group-level filters (minimum orders, minimum revenue).