Why NULLs and edge cases matter in BI
In BI, small data quality issues can quietly distort KPIs. Two common sources are NULL values (unknown or missing) and edge cases like divide-by-zero. SQL treats NULL differently than “0”, “empty string”, or “Unknown”, so you need explicit handling to keep calculations stable and labels consistent.
What NULL means (and what it does not)
- NULL = missing/unknown/not applicable. It is not a value you can safely treat as zero unless the business definition says so.
- NULL propagates through most expressions: if any input is NULL, the result is often NULL.
- NULL is not equal to anything, including another NULL; comparisons require special handling (covered elsewhere). Here we focus on calculation and reporting impacts.
How NULL impacts arithmetic and KPI math
Arithmetic with NULL: why totals become NULL
If any term is NULL, the whole arithmetic expression can become NULL. This commonly breaks derived measures like revenue = price * quantity, or margin = revenue - cost.
-- Example: line item math with missing values (NULLs propagate) SELECT order_id, unit_price, quantity, unit_price * quantity AS line_revenue FROM order_lines;If quantity is NULL for a row, line_revenue becomes NULL, which may later disappear from sums or create unexpected gaps.
Aggregations and NULL: what gets counted and summed
Aggregations typically ignore NULL inputs, which can be helpful or dangerous depending on meaning.
SUM(column)ignores NULL rows (treats them as “no contribution”).AVG(column)ignores NULL rows (averages only known values).COUNT(column)counts only non-NULL values.COUNT(*)counts rows regardless of NULLs.
-- Spot the difference between COUNT(*) and COUNT(column) SELECT COUNT(*) AS total_rows, COUNT(discount_amount) AS rows_with_discount_value FROM order_lines;BI pitfall: if discount_amount is NULL when “no discount” and also NULL when “unknown discount”, then AVG(discount_amount) and COUNT(discount_amount) can misrepresent reality. Decide whether NULL means “0” or “unknown” and encode accordingly.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Using COALESCE to set defaults (without changing meaning)
COALESCE(a, b, c) returns the first non-NULL value. In BI, it’s used to provide a reporting-friendly default while keeping logic explicit.
Step-by-step: defaulting numeric NULLs for arithmetic
Goal: prevent NULL propagation in arithmetic when the business meaning supports a default.
- Step 1: Identify which fields can safely default to 0 (example:
discount_amountwhere NULL truly means “no discount recorded”). - Step 2: Apply
COALESCEat calculation time, not by overwriting raw data. - Step 3: Keep the raw column available for QA and auditing.
-- If NULL discount means “no discount”, default to 0 for net revenue math SELECT order_id, line_id, unit_price, quantity, COALESCE(discount_amount, 0) AS discount_amount_filled, (unit_price * quantity) - COALESCE(discount_amount, 0) AS net_revenue FROM order_lines;Important: if NULL means “unknown discount”, defaulting to 0 would understate discounts and overstate net revenue. In that case, keep NULL and handle reporting differently (for example, separate “Unknown” bucket or exclude from certain KPIs).
Defaulting text NULLs for consistent labels
Dimensions often have missing attributes (for example, region not captured). If you leave them NULL, reports may show blank labels or group them inconsistently. A consistent label improves readability and prevents “blank” from being interpreted as a real category.
-- Consistent labeling for missing dimension values SELECT customer_id, COALESCE(region, 'Unknown') AS region_label FROM customers;Choose one standard label (e.g., 'Unknown' or 'Unassigned') and use it everywhere so dashboards don’t split the same concept across multiple spellings.
Safe division: avoid divide-by-zero and misleading rates
Rates and ratios are common BI metrics (conversion rate, margin %, average order value). Two edge cases break them: denominator = 0 and denominator = NULL. You want a predictable output: either NULL (unknown/undefined) or 0 (only if the business definition says so).
Step-by-step: implement safe division with NULLIF
- Step 1: Convert zero denominators to NULL using
NULLIF(denominator, 0). - Step 2: Divide by that result; division by NULL yields NULL instead of an error.
- Step 3: Optionally apply
COALESCEto display a default, but only if that matches business meaning.
-- Safe division: returns NULL when orders = 0 (undefined rate) SELECT region, returned_orders, total_orders, returned_orders * 1.0 / NULLIF(total_orders, 0) AS return_rate FROM region_order_stats;If your dashboard requires a numeric value, decide explicitly how to display undefined rates. For example, you might keep NULL (so BI tools can show “N/A”), or you might display 0 only when “no orders” should be treated as “0% return rate” (often it should not).
-- Optional display default (use with caution) SELECT region, COALESCE(returned_orders * 1.0 / NULLIF(total_orders, 0), 0) AS return_rate_display FROM region_order_stats;Missing dimension values: “unknown region” and consistent bucketing
Missing dimension values can come from incomplete customer profiles, late-arriving dimension records, or imperfect joins. In reporting, you typically want these rows to remain visible (so you can quantify the issue) while keeping categories stable.
Pattern: label unknowns consistently in the reporting layer
-- Example: sales by region with consistent unknown label SELECT COALESCE(c.region, 'Unknown') AS region_label, SUM(o.order_amount) AS revenue FROM orders o JOIN customers c ON o.customer_id = c.customer_id GROUP BY COALESCE(c.region, 'Unknown');Practical tip: use the exact same expression in both SELECT and the grouping clause so that all NULL regions roll into one bucket.
Pattern: distinguish “missing” vs “not applicable” when needed
Sometimes NULL has multiple meanings. If you can detect the reason, label it differently to avoid mixing issues.
-- Example: differentiate missing region vs not applicable SELECT CASE WHEN c.customer_type = 'B2B' AND c.region IS NULL THEN 'Not Applicable' WHEN c.region IS NULL THEN 'Unknown' ELSE c.region END AS region_label, SUM(o.order_amount) AS revenue FROM orders o JOIN customers c ON o.customer_id = c.customer_id GROUP BY CASE WHEN c.customer_type = 'B2B' AND c.region IS NULL THEN 'Not Applicable' WHEN c.region IS NULL THEN 'Unknown' ELSE c.region END;QA checklist: NULL rates, meaning checks, and documentation
1) Check NULL rates per column (profiling for BI readiness)
Before you decide on defaults, measure how often NULL occurs. High NULL rates can indicate upstream collection issues or broken pipelines.
-- Null rate per column (example for a few key fields) SELECT COUNT(*) AS total_rows, SUM(CASE WHEN region IS NULL THEN 1 ELSE 0 END) AS region_nulls, SUM(CASE WHEN region IS NULL THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS region_null_rate, SUM(CASE WHEN discount_amount IS NULL THEN 1 ELSE 0 END) AS discount_nulls, SUM(CASE WHEN discount_amount IS NULL THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS discount_null_rate FROM customers;Repeat this pattern for critical fields used in KPIs: revenue components, status fields, dates, and key dimensions.
2) Confirm defaulting does not change business meaning
Defaulting NULLs can silently change KPIs. Validate by comparing metrics with and without defaulting, and by isolating the affected population.
-- Compare revenue with and without defaulting discount NULLs SELECT SUM((unit_price * quantity) - discount_amount) AS net_revenue_raw, SUM((unit_price * quantity) - COALESCE(discount_amount, 0)) AS net_revenue_filled, SUM(CASE WHEN discount_amount IS NULL THEN (unit_price * quantity) ELSE 0 END) AS revenue_from_null_discount_rows FROM order_lines;- If
net_revenue_rawis much lower due to NULL propagation, you may be losing rows in the calculation. - If
net_revenue_filledincreases materially, confirm that NULL truly means “0 discount” and not “unknown discount”.
3) Validate safe division outputs (no hidden inflation)
When using safe division, ensure you’re not turning undefined rates into 0 without noticing. Track how many rows have zero/NULL denominators.
-- QA for denominator issues SELECT SUM(CASE WHEN total_orders IS NULL THEN 1 ELSE 0 END) AS denom_null_rows, SUM(CASE WHEN total_orders = 0 THEN 1 ELSE 0 END) AS denom_zero_rows, COUNT(*) AS total_rows FROM region_order_stats;4) Document assumptions for downstream reporting
Every time you replace NULL with a default or label unknowns, write down the assumption so downstream users interpret dashboards correctly. Keep documentation close to the SQL (comments) and in your metric definitions.
-- Assumptions: -- 1) region NULL is labeled as 'Unknown' for reporting consistency. -- 2) discount_amount NULL is treated as 0 ONLY because upstream system stores no-discount as NULL. -- 3) return_rate is NULL when total_orders = 0 (undefined), not forced to 0. SELECT COALESCE(region, 'Unknown') AS region_label, SUM(order_amount) AS revenue FROM orders_enriched GROUP BY COALESCE(region, 'Unknown');