Why “missing matches” matter after you already chose the right join
Once you use an outer join (most often a LEFT JOIN), you have explicitly decided to keep rows even when the other side has no match. The next challenge is what to do with the columns that become NULL on the non-matching side. If you ignore those NULLs, you can end up with confusing reports (blank labels, missing totals) or, worse, accidentally undo the outer join by filtering in the wrong place.
This chapter focuses on three practical skills that come up constantly in real SQL work:
- Providing defaults for missing matches (display defaults and calculation defaults).
- Using COALESCE (and related expressions) correctly and intentionally.
- Filtering outer-joined queries without turning them into accidental inner joins.
Recognizing “NULL because missing match” vs “NULL because the data is NULL”
In an outer join result, NULLs can appear for two different reasons:
- NULL from the base table: the column is actually NULL in the stored row.
- NULL from the join: the row on the other side didn’t exist, so every selected column from that table is NULL in the result.
These two cases can look identical in the output but often mean different things. For example, if you join customers to orders and see NULL in orders.order_id, that usually means “no order exists.” But if you see NULL in customers.email, that might mean “email not provided.”
When you apply defaults, be clear about which kind of NULL you are addressing. Defaulting an email to 'unknown' might be acceptable for display, but defaulting an order_id to 0 can be misleading because it suggests an order exists.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Defaults for display: making reports readable without changing meaning
For reporting, you often want a human-friendly placeholder when a match is missing. Typical examples:
- Show
'No order'when a customer has none. - Show
'Unassigned'when an employee has no manager. - Show
0for a missing aggregate like “total spend” (because “no rows” should behave like zero in many business contexts).
The key is to default in the SELECT list (or in a presentation layer) rather than altering join logic. This preserves the correct row set and makes the output clearer.
Example: customers with optional orders (display defaults)
SELECT c.customer_id, c.name, COALESCE(o.status, 'No order') AS order_status, o.order_id FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id;Here, customers without orders still appear. For those rows, o.status is NULL because there is no matching order; COALESCE replaces it with 'No order' for readability.
Be careful: defaulting o.order_id to 0 might be confusing. It’s usually better to leave identifiers as NULL and default only descriptive fields.
COALESCE: the workhorse for defaults
COALESCE(a, b, c, ...) returns the first non-NULL value from the list. It is standard SQL and widely supported. You can use it for:
- Display defaults: show a placeholder when a column is NULL.
- Fallback logic: choose a value from multiple sources (e.g., shipping address vs billing address).
- Calculation defaults: treat NULL as 0 for arithmetic, when that matches the business meaning.
Step-by-step: using COALESCE for a fallback chain
Scenario: you want a contact email for each customer. Prefer customers.primary_email, but if it’s NULL, use customers.secondary_email, and if that’s also NULL, show 'no-email@invalid' for display.
SELECT customer_id, COALESCE(primary_email, secondary_email, 'no-email@invalid') AS contact_email FROM customers;Step-by-step reasoning:
- If
primary_emailis not NULL, you get it. - Otherwise, if
secondary_emailis not NULL, you get it. - Otherwise, you get the placeholder.
This pattern is especially useful after joins when you want to prefer a value from one table but fall back to another.
COALESCE for numeric calculations: when NULL should behave like 0
Aggregates and arithmetic are common places where missing matches cause NULLs. For example, if you LEFT JOIN customers to payments and compute total paid, customers with no payments might produce NULL totals (depending on how you write the query). Often, you want 0 instead.
SELECT c.customer_id, c.name, COALESCE(SUM(p.amount), 0) AS total_paid FROM customers c LEFT JOIN payments p ON p.customer_id = c.customer_id GROUP BY c.customer_id, c.name;Important nuance: SUM(p.amount) returns NULL when there are no matching payment rows (because there are no values to sum). Wrapping it in COALESCE(..., 0) makes “no payments” show as 0, which is usually the desired business interpretation.
Defaults in join outputs: choosing between COALESCE and CASE
COALESCE is ideal when “NULL means missing” and you simply want a fallback value. Sometimes you need more explicit logic, such as distinguishing between “missing match” and “matched row with NULL in a column.” In those cases, a CASE expression can be clearer.
Example: distinguish missing match from a real NULL value
Suppose orders.status can be NULL even for real orders (maybe status not set yet). You want:
'No order'when there is no order row.'Pending'when an order exists but status is NULL.- Otherwise, the actual status.
SELECT c.customer_id, c.name, CASE WHEN o.order_id IS NULL THEN 'No order' WHEN o.status IS NULL THEN 'Pending' ELSE o.status END AS order_status FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id;This uses o.order_id IS NULL as the “missing match” indicator, because if there is no joined row, all o.* columns are NULL. Using a key column (like order_id) is typically the most reliable way to detect missing matches.
Filtering without accidental inner joins: the most common outer join bug
A frequent mistake is to write a LEFT JOIN and then add a WHERE filter on a column from the right table. This can eliminate the NULL-extended rows and effectively turn the query into an inner join, even though the SQL still says LEFT JOIN.
Conceptually, if you filter on o.status = 'PAID' in the WHERE clause, rows where o.status is NULL (i.e., customers with no order) will not satisfy the condition, so they are removed. That defeats the purpose of the outer join.
Example of the accidental inner join
SELECT c.customer_id, c.name, o.order_id, o.status FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id WHERE o.status = 'PAID';Despite the LEFT JOIN, this returns only customers who have a matching order with status PAID. Customers with no orders are removed because o.status is NULL and fails the WHERE predicate.
Correct patterns for filtering while preserving unmatched rows
There are several correct patterns. Which one you choose depends on the business question you’re answering.
Pattern 1: Put the filter in the ON clause (keep unmatched rows)
If you want to keep all customers, but only attach orders that meet a condition, move the condition into the join predicate.
SELECT c.customer_id, c.name, o.order_id, o.status FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id AND o.status = 'PAID';Now the meaning is: “Return all customers; join only their PAID orders if they exist.” Customers without PAID orders still appear, with NULLs in o.*.
Step-by-step reasoning:
- Start with all customers.
- Try to match orders by customer_id.
- Additionally require the order to be PAID to count as a match.
- If no PAID order exists, the customer row remains with NULLs for order columns.
Pattern 2: Keep the WHERE filter but explicitly allow NULLs (keep unmatched rows)
Sometimes you want to filter on a right-table column but still keep unmatched rows. You can do that by allowing the NULL-extended rows through.
SELECT c.customer_id, c.name, o.order_id, o.status FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id WHERE o.status = 'PAID' OR o.order_id IS NULL;This keeps customers with PAID orders and customers with no orders. However, it can be less readable and easier to get wrong, especially when multiple conditions are involved. Prefer the ON-clause pattern when the intent is “filter which rows match,” not “filter which left rows appear.”
Pattern 3: Filter after aggregation (keep left rows, compute conditional metrics)
Often you don’t actually need to filter rows; you need conditional totals or counts. Instead of filtering the joined rows away, compute conditional aggregates.
Example: count paid orders per customer, but still show customers with zero paid orders.
SELECT c.customer_id, c.name, COALESCE(SUM(CASE WHEN o.status = 'PAID' THEN 1 ELSE 0 END), 0) AS paid_order_count FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id GROUP BY c.customer_id, c.name;This approach avoids accidental inner joins entirely because you are not filtering out rows; you are counting selectively.
Filtering for “missing matches” intentionally (anti-join style)
Another common task is to find rows that do not have a match (e.g., customers with no orders, products never sold, employees with no manager record). With a LEFT JOIN, you can filter for missing matches by checking a key column from the right table for NULL.
Example: customers with no orders
SELECT c.customer_id, c.name FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id WHERE o.order_id IS NULL;This is not an accidental inner join; it’s the opposite. You are intentionally selecting only the unmatched rows.
Step-by-step:
- Start with all customers.
- Attempt to match orders.
- Keep only those where the order key is NULL, meaning no match was found.
Tip: use a non-nullable key column from the right table (like order_id) for the NULL check. Checking a nullable attribute (like o.status) can misclassify rows where an order exists but the attribute is NULL.
Defaults and filtering together: common real-world scenarios
Scenario A: show all customers, label segment, and filter to a region
Imagine customers optionally match a segments table (some customers not segmented yet). You want to show all customers in region 'West' and display 'Unsegmented' when there is no segment match.
SELECT c.customer_id, c.name, c.region, COALESCE(s.segment_name, 'Unsegmented') AS segment_name FROM customers c LEFT JOIN segments s ON s.segment_id = c.segment_id WHERE c.region = 'West';Filtering on the left table (c.region) is safe: it does not remove unmatched segment rows because the customer row is still present.
Scenario B: show all customers, attach only “active” subscriptions, and compute a boolean flag
Suppose customers may have subscriptions, and you only care about active ones. You want all customers, plus a flag indicating whether they currently have an active subscription.
SELECT c.customer_id, c.name, CASE WHEN s.subscription_id IS NULL THEN 0 ELSE 1 END AS has_active_subscription FROM customers c LEFT JOIN subscriptions s ON s.customer_id = c.customer_id AND s.status = 'ACTIVE';Here the filter belongs in the ON clause because it determines what counts as a match. The CASE then turns “missing match” into a 0/1 indicator.
Scenario C: show all products, but only include products whose latest price is above a threshold (without losing products with no price)
Sometimes the business rule is subtle: you might want to keep products even if they have no price record yet, but if they do have a price, you only want those above a threshold. That’s a classic case where a naive WHERE filter breaks the outer join.
SELECT p.product_id, p.name, pr.price FROM products p LEFT JOIN product_prices pr ON pr.product_id = p.product_id AND pr.is_current = 1 WHERE pr.price >= 100 OR pr.product_id IS NULL;This keeps products with current price ≥ 100 and products with no current price row. If you instead put WHERE pr.price >= 100 alone, you would drop products with no price row.
Depending on your SQL dialect and data model, you might prefer moving the threshold into the ON clause and then filtering on the left side only, but note the difference in meaning: putting pr.price >= 100 in ON means “match only expensive prices,” which will show cheap-priced products as if they had no price (NULL). That may or may not be acceptable. The OR pattern above distinguishes “cheap price exists” from “no price exists” by keeping only expensive or missing.
COALESCE in predicates: powerful but easy to misuse
You can use COALESCE inside a WHERE clause, but be careful: it can change the meaning of NULLs and can hide the difference between “missing match” and “real NULL.”
Example: treating missing status as a specific value
SELECT c.customer_id, c.name, o.status FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id WHERE COALESCE(o.status, 'NO_ORDER') IN ('PAID', 'NO_ORDER');This keeps customers with PAID orders and customers with no orders. It reads nicely, but it also treats orders with NULL status as 'NO_ORDER', which may be wrong. If NULL status means “unknown yet,” you just mixed it with “no order exists.” In that case, prefer explicit checks like o.order_id IS NULL for missing matches.
Practical checklist: avoid surprises with missing matches
- Default in SELECT, not in join logic, when your goal is readability (labels like 'Unassigned', 'No order').
- Use COALESCE for simple fallback; use
CASEwhen you must distinguish different kinds of NULL. - When filtering on right-table columns in an outer join, decide whether the filter defines what counts as a match (put it in ON) or whether it defines which left rows should be included (put it in WHERE, but then handle NULL-extended rows intentionally).
- To find missing matches, filter with
WHERE right_table.key IS NULLusing a non-nullable key column. - For metrics, prefer conditional aggregation over filtering away rows, so you can keep the full left-side population and compute zeros correctly.