All courses > Technology and Programming > Databases ::

Join Conditions: ON vs WHERE, Multi-Column Keys, and Null-Safe Thinking

Capítulo 7

Estimated reading time: 14 minutes

Why Join Conditions Matter More Than the Join Type

Once you know which join type you need, the next source of bugs is usually not the join type itself, but the join condition: where you place it, how many columns it uses, and how it behaves with NULLs. This chapter focuses on three practical areas: (1) putting predicates in ON vs WHERE, (2) joining with multi-column keys (composite keys), and (3) thinking “null-safely” so that missing values do not silently change your result set.

Throughout, assume a typical SQL engine (PostgreSQL, MySQL, SQL Server, etc.). Some details vary by database, but the core ideas are portable.

ON vs WHERE: Two Places to Filter, Two Different Meanings

In a query with joins, you can filter rows in two main places:

ON clause: defines how rows from two tables match (and can also include additional restrictions on the matched rows).
WHERE clause: filters the rows after the join has been formed.

For inner joins, moving a predicate between ON and WHERE often produces the same final rows (not always, but often). For outer joins, it can completely change the meaning, because outer joins intentionally keep unmatched rows. A WHERE filter can remove those preserved rows, effectively turning an outer join into an inner join.

Example Data Model

Use a simple scenario with customers and orders. Customers may have no orders; orders may have a status.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

customers(customer_id, name, country_code) orders(order_id, customer_id, status, created_at, total_amount)

Filtering the Joined Table: Put It in ON to Preserve Unmatched Rows

Suppose you want a list of all customers, plus their “PAID” orders if they have any. Customers without paid orders should still appear.

Correct approach (filter in ON):

SELECT c.customer_id, c.name, o.order_id, o.status, o.total_amount FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id  AND o.status = 'PAID';

What happens:

The LEFT JOIN keeps every customer row.
The ON clause matches only orders that belong to the customer and are PAID.
If a customer has no PAID orders, the order columns are NULL, but the customer row remains.

Common mistake (filter in WHERE):

SELECT c.customer_id, c.name, o.order_id, o.status, o.total_amount FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id WHERE o.status = 'PAID';

This removes customers with no orders (or no PAID orders), because for those rows o.status is NULL, and WHERE o.status = 'PAID' evaluates to unknown/false. The result behaves like an inner join with respect to PAID orders.

Step-by-Step: How the Database Thinks About It

It helps to reason in phases:

Phase 1: FROM + JOIN + ON builds a joined row set. For a LEFT JOIN, unmatched left rows are kept with NULLs on the right side.
Phase 2: WHERE removes rows that do not satisfy the predicate.

So if you want to keep unmatched rows, avoid putting conditions on the “optional” side in WHERE unless you explicitly handle NULLs.

When It’s Fine to Use WHERE with a LEFT JOIN

Sometimes you actually want to remove unmatched rows, even if you wrote a LEFT JOIN for convenience or readability. In that case, a WHERE predicate on the right table is intentional.

Example: “Show customers who have at least one PAID order.”

SELECT DISTINCT c.customer_id, c.name FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id WHERE o.status = 'PAID';

This is logically valid, but many teams prefer to express it as an inner join to make intent obvious:

SELECT DISTINCT c.customer_id, c.name FROM customers c JOIN orders o   ON o.customer_id = c.customer_id WHERE o.status = 'PAID';

Either way, the key is that the WHERE clause is now intentionally filtering out customers without a matching PAID order.

Filtering the Primary Table: WHERE Is Usually the Right Place

If you filter columns from the preserved (left) table in a LEFT JOIN, putting the filter in WHERE typically does not change the “outer” behavior, because those rows exist regardless of matches.

Example: “All US customers, plus their PAID orders if any.”

SELECT c.customer_id, c.name, o.order_id FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id  AND o.status = 'PAID' WHERE c.country_code = 'US';

This keeps only US customers, and still preserves US customers with no PAID orders.

Multi-Column Keys: Joining on Composite Identifiers

Not every relationship is a single-column foreign key. Many real schemas use composite keys: the identity of a row is defined by multiple columns together. Joining on only one of the columns can create duplicates, false matches, or missing matches.

Typical Composite-Key Scenarios

Line items: (order_id, line_number) identifies a line in an order.
Time series: (device_id, reading_timestamp) identifies a reading.
Localization: (product_id, locale) identifies a translated description.
Multi-tenant systems: (tenant_id, entity_id) identifies an entity unique within a tenant.

Example: Order Lines Joined to Shipments

Assume:

order_lines(order_id, line_no, sku, qty) shipments(order_id, line_no, shipped_qty, shipped_at)

Each shipment record refers to a specific line. If you join only on order_id, every line will match every shipment for that order, multiplying rows.

Incorrect join (missing part of the key):

SELECT l.order_id, l.line_no, l.sku, s.shipped_qty FROM order_lines l LEFT JOIN shipments s   ON s.order_id = l.order_id;

If an order has 3 lines and 2 shipment rows, you may get 6 joined rows, which is wrong for most reporting needs.

Correct join (full composite key):

SELECT l.order_id, l.line_no, l.sku, s.shipped_qty FROM order_lines l LEFT JOIN shipments s   ON s.order_id = l.order_id  AND s.line_no  = l.line_no;

Step-by-Step Checklist for Composite Joins

When you suspect a composite key, use this checklist:

Identify the natural key: Which columns together uniquely identify a row in each table?
Confirm uniqueness: If you can, verify with a constraint or by checking duplicates.
Join on all key columns: Include every column needed to uniquely match.
Watch data types and formatting: A join on CHAR(10) vs VARCHAR or different casing can cause mismatches.

Composite Keys and Additional Predicates

Sometimes you need both: match on the composite key and apply a filter to the joined table. Keep the key equality predicates and the “optional-side filters” together in ON when you want to preserve unmatched rows.

Example: “Show all order lines, plus shipments that happened in 2026.”

SELECT l.order_id, l.line_no, l.sku, s.shipped_qty, s.shipped_at FROM order_lines l LEFT JOIN shipments s   ON s.order_id = l.order_id  AND s.line_no  = l.line_no  AND s.shipped_at >= DATE '2026-01-01'  AND s.shipped_at <  DATE '2027-01-01';

This keeps all lines even if they have no shipments in 2026.

Multi-Column Joins and Performance Notes

Composite joins can be fast if indexed appropriately. Practical guidance:

Index the foreign key columns in the child table in the same order you join on them (database-specific, but generally helpful).
Avoid applying functions to join columns (e.g., UPPER(col)) because it can prevent index use; normalize data instead or use computed indexes if supported.
If you must join on expressions, consider creating a persisted computed column (SQL Server) or generated column (MySQL) and indexing it.

Null-Safe Thinking: NULL Is Not a Value

NULL represents “unknown” or “missing,” and it behaves differently from normal values. The most important rule: NULL = NULL is not true; it is unknown. This affects joins, filters, and anti-joins.

How NULL Affects Equality in Join Conditions

Consider a table of contacts and a table of email preferences. Some contacts may have NULL email addresses.

contacts(contact_id, email) email_preferences(email, marketing_opt_in)

If you join on email:

SELECT c.contact_id, c.email, p.marketing_opt_in FROM contacts c LEFT JOIN email_preferences p   ON p.email = c.email;

Rows where c.email is NULL will not match any preference row, even if p.email is also NULL (and typically it should not be NULL, but data happens). The result will show NULLs on the preference side.

This is usually correct: a missing email should not “match” another missing email. But you must decide intentionally.

Null-Safe Equality (Database-Specific Options)

Sometimes you do want NULLs to match (for example, a composite key where one component can be NULL but still participates in identity, or when comparing “optional attributes”). Standard SQL provides IS NOT DISTINCT FROM (supported by PostgreSQL and some others) to treat NULLs as equal.

PostgreSQL-style null-safe join:

SELECT a.id, a.optional_code, b.description FROM table_a a LEFT JOIN table_b b   ON a.optional_code IS NOT DISTINCT FROM b.optional_code;

In MySQL, a common null-safe equality operator is <=>:

SELECT a.id, a.optional_code, b.description FROM table_a a LEFT JOIN table_b b   ON a.optional_code <=> b.optional_code;

In SQL Server, you often emulate null-safe equality with explicit logic:

SELECT a.id, a.optional_code, b.description FROM table_a a LEFT JOIN table_b b   ON (a.optional_code = b.optional_code)   OR (a.optional_code IS NULL AND b.optional_code IS NULL);

Use these patterns only when matching NULL-to-NULL is truly desired; otherwise you can accidentally create matches that hide missing data problems.

NULLs in Multi-Column Keys: Decide Whether NULL Is Allowed

Composite keys and NULLs are a risky combination. If a column participates in a key, it is often best modeled as NOT NULL. If the schema allows NULL anyway, you must decide how to treat it in joins:

Strict matching: use normal equality; NULL never matches, so rows with NULL in key parts will remain unmatched.
Null-safe matching: use IS NOT DISTINCT FROM / <=> / explicit OR logic to allow NULL-to-NULL matches.

Be consistent across queries; inconsistent null handling is a common source of “why does this report disagree with that report?” issues.

WHERE Predicates and NULL: Three-Valued Logic

SQL uses three-valued logic: true, false, and unknown. Any comparison with NULL (except IS NULL / IS NOT NULL) yields unknown. In WHERE, unknown behaves like false (the row is filtered out).

Example: “Find orders not shipped.” If shipped_at is NULL when not shipped:

SELECT order_id FROM orders WHERE shipped_at = NULL;

This returns no rows, because = NULL is never true. Correct:

SELECT order_id FROM orders WHERE shipped_at IS NULL;

Null-Safe Filtering After a LEFT JOIN

If you must filter on the right table in WHERE while keeping unmatched rows, you typically need to include an explicit NULL allowance.

Example: “All customers, and if they have orders, show only PAID orders; but still keep customers with no orders.”

One approach is to keep the filter in ON (usually simplest). If you cannot (for example, because of a complex OR condition you want to centralize), you can do:

SELECT c.customer_id, c.name, o.order_id, o.status FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id WHERE o.status = 'PAID'    OR o.order_id IS NULL;

The OR o.order_id IS NULL keeps the unmatched customer rows. Choose a right-table column that is guaranteed non-NULL when a match exists (often the primary key).

Practical Patterns and Pitfalls

Pattern: Put “Optional-Side” Filters in ON

Rule of thumb for outer joins: if a predicate refers to columns from the table that may be missing (the nullable side of the join), put it in ON when you want to preserve unmatched rows.

-- Keep all products, attach only active promotions SELECT p.product_id, p.name, pr.promo_id FROM products p LEFT JOIN promotions pr   ON pr.product_id = p.product_id  AND pr.is_active = 1;

Pitfall: Turning a LEFT JOIN into an INNER JOIN Accidentally

Any WHERE predicate that requires a non-NULL value from the right table will remove unmatched rows. Common examples:

WHERE right_table.some_column = 'X'
WHERE right_table.some_column > 0
WHERE right_table.some_column IN (...)

If you truly need such a filter, decide whether you intended an inner join, or add explicit NULL-handling logic.

Pattern: Join on the Full Business Key, Not Just “What Seems to Work”

If you see duplicated rows after a join, suspect an incomplete join condition. Before adding DISTINCT as a quick fix, verify whether you joined on all necessary columns.

Example with localization:

products(product_id, sku) product_names(product_id, locale, name)

Correct join includes locale:

SELECT p.product_id, pn.name FROM products p LEFT JOIN product_names pn   ON pn.product_id = p.product_id  AND pn.locale = 'en-US';

Joining only on product_id would return multiple rows per product (one per locale), which might be correct for some tasks but wrong for “one name per product” output.

Pattern: Anti-Join with NULL Awareness

A frequent requirement is “find rows in A that have no match in B.” The robust pattern is a LEFT JOIN plus a NULL check on a non-nullable column from B (often the primary key).

-- Customers with no orders SELECT c.customer_id, c.name FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id WHERE o.order_id IS NULL;

This works because when there is no match, all columns from orders are NULL. Use a column that cannot be NULL in real matched rows; otherwise you can misclassify matched rows as unmatched.

Pitfall: NOT IN and NULLs

When excluding values, NOT IN can behave unexpectedly if the subquery returns NULL. For example:

SELECT c.customer_id FROM customers c WHERE c.customer_id NOT IN (SELECT o.customer_id FROM orders o);

If orders.customer_id contains NULL, the NOT IN predicate can become unknown for all rows, returning nothing. Prefer a null-safe anti-join pattern:

SELECT c.customer_id FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id WHERE o.customer_id IS NULL;

Or use NOT EXISTS, which is typically NULL-safe:

SELECT c.customer_id FROM customers c WHERE NOT EXISTS (   SELECT 1   FROM orders o   WHERE o.customer_id = c.customer_id );

Hands-On Walkthrough: Debugging a Join That “Loses Rows”

Use this process when a query returns fewer rows than expected after adding a join.

Step 1: Start with the primary table only

SELECT COUNT(*) FROM customers;

Record the count.

Step 2: Add the join without extra filters

SELECT COUNT(*) FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id;

If the count decreases, you did not actually do a LEFT JOIN (or you later added a WHERE filter). If the count increases dramatically, you may have a one-to-many relationship (expected) or an incomplete join condition (bug).

Step 3: Add filters one at a time, choosing ON vs WHERE intentionally

If the filter is about which right-table rows should attach (but you still want all left rows), keep it in ON:

SELECT COUNT(*) FROM customers c LEFT JOIN orders o   ON o.customer_id = c.customer_id  AND o.status = 'PAID';

If you put it in WHERE, verify you truly want to remove customers without paid orders.

Step 4: Check for composite key needs

If you see unexpected multiplication, inspect candidate keys. For example, if joining order lines to shipments, ensure you used both order_id and line_no.

Step 5: Inspect NULL behavior explicitly

If a join column can be NULL, decide whether NULL should match. If not, accept unmatched rows. If yes, use a null-safe equality operator appropriate to your database.

Now answer the exercise about the content:

In a LEFT JOIN that should keep all rows from the left table, where should you place a filter on a column from the right table (for example, only PAID orders) to avoid removing unmatched left rows?

You are right! Congratulations, now go to the next page

You missed! Try again.

For a LEFT JOIN, putting right-table filters in ON limits which right rows match while still preserving unmatched left rows with NULLs on the right. A right-table predicate in WHERE can filter out those NULL-extended rows, effectively turning it into an inner join.