All courses > Technology and Programming > Databases ::

Left Join: Preserving the Primary Table While Filling Missing Matches

Capítulo 3

Estimated reading time: 11 minutes

What a LEFT JOIN Guarantees

A LEFT JOIN returns all rows from the table on the left side of the join (often called the “primary” table in the query), and it returns matching rows from the table on the right side when they exist. When there is no match on the right, the query still keeps the left row and fills the right-side columns with NULL.

This makes LEFT JOIN the default choice when your question is “show me everything from A, and add data from B when available.” It is especially useful for reporting, data quality checks, and building datasets where the left table defines the population you care about (customers, orders, products, employees, etc.).

Key properties you can rely on

Row preservation: every row from the left table appears at least once in the result.
Optional enrichment: right-table columns are populated only when a match exists; otherwise they are NULL.
One-to-many expansion: if a left row matches multiple right rows, the left row is repeated once per match (this is expected and often desired).
Join condition matters: what counts as a “match” is defined entirely by the ON clause.

A Practical Scenario: Customers and Their Latest Support Ticket

Suppose you want a list of all customers, plus their most recent support ticket if they have one. Customers without tickets should still appear.

We’ll use a simplified schema:

customers(customer_id, name, email, created_at)

tickets(ticket_id, customer_id, status, created_at)

Sample data (conceptual):

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Customer 1: has 2 tickets
Customer 2: has 0 tickets
Customer 3: has 1 ticket

A basic LEFT JOIN to list customers and any tickets:

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
ORDER BY c.customer_id, t.created_at;

What you should expect:

Customer 2 will still appear, but t.ticket_id and t.status will be NULL.
Customer 1 will appear twice (once per ticket) because of the one-to-many relationship.

Step-by-step: building the query safely

When you write a LEFT JOIN query, build it in small steps so you can verify row counts and avoid accidental filtering.

Step 1: Start with the left table only

SELECT
  customer_id,
  name
FROM customers;

At this step, you know exactly how many rows you expect: one per customer.

Step 2: Add the LEFT JOIN with only the key match

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id;

Now check:

Do you still see all customers?
Did the row count increase due to multiple tickets per customer? If yes, that’s normal.

Step 3: Add right-table filters carefully (ON vs WHERE)

This is where many LEFT JOIN queries accidentally turn into “inner-like” results. If you put a condition on the right table in the WHERE clause, you may remove rows where the right side is NULL, which defeats the purpose of preserving left rows.

Example goal: “Show all customers, and include only tickets with status = 'open' when they exist.”

Correct (filter in ON):

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
 AND t.status = 'open';

This keeps customers with no open tickets (or no tickets at all). Those customers will have NULL ticket columns.

Common mistake (filter in WHERE):

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
WHERE t.status = 'open';

This removes customers whose joined ticket row is NULL (because NULL = 'open' is not true). The result no longer preserves all customers.

Finding “Missing Matches” with LEFT JOIN

LEFT JOIN is a powerful tool for detecting gaps: “Which left-side records have no corresponding right-side record?” This is a common data quality and business question.

Example: Customers with no tickets

SELECT
  c.customer_id,
  c.name
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
WHERE t.ticket_id IS NULL;

Why this works: customers without tickets produce a joined row where all t.* columns are NULL. Filtering for t.ticket_id IS NULL isolates those unmatched left rows.

Example: Products with no sales

Another classic: list products that have never been ordered.

SELECT
  p.product_id,
  p.product_name
FROM products p
LEFT JOIN order_items oi
  ON oi.product_id = p.product_id
WHERE oi.product_id IS NULL;

Note the WHERE clause is safe here because you are intentionally selecting the unmatched rows (the ones with NULL on the right).

LEFT JOIN with Aggregation: Counting Matches Without Losing Left Rows

LEFT JOIN is frequently combined with GROUP BY to compute counts and sums while keeping all left-side entities, even those with zero matches.

Example: Count tickets per customer (including zero)

SELECT
  c.customer_id,
  c.name,
  COUNT(t.ticket_id) AS ticket_count
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
GROUP BY
  c.customer_id,
  c.name
ORDER BY c.customer_id;

Important detail: use COUNT(t.ticket_id), not COUNT(*).

COUNT(t.ticket_id) counts only non-NULL ticket IDs, so customers with no tickets get 0.
COUNT(*) counts rows in the joined result; even unmatched customers still produce one row, so they would incorrectly get 1.

Example: Sum revenue per product (including zero)

SELECT
  p.product_id,
  p.product_name,
  COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue
FROM products p
LEFT JOIN order_items oi
  ON oi.product_id = p.product_id
GROUP BY
  p.product_id,
  p.product_name
ORDER BY revenue DESC;

SUM(...) returns NULL when there are no matched rows, so COALESCE(..., 0) converts that to 0 for reporting.

Choosing the “Primary Table” Intentionally

In a LEFT JOIN, the left table defines what must be kept. That means you should choose the left table based on the population you want in the final result.

Example: Orders vs Customers

If you want a report of all orders, including orders placed by customers that might be missing from the customers table (data issue), you would put orders on the left:

SELECT
  o.order_id,
  o.customer_id,
  c.name
FROM orders o
LEFT JOIN customers c
  ON c.customer_id = o.customer_id;

If you instead want all customers and their orders (including customers with no orders), put customers on the left:

SELECT
  c.customer_id,
  c.name,
  o.order_id
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id;

Same two tables, different question, different “primary” table.

LEFT JOIN and Multiple Matches: Avoiding Accidental Multiplication

LEFT JOIN can multiply rows when the right table has multiple matches per left key. This is correct behavior, but it can surprise you when you later aggregate or join additional tables.

Example: Customers, Orders, and Payments

Imagine:

One customer has many orders.
Each order can have multiple payments (partial payments, retries, etc.).

If you join all three tables directly, you can multiply rows (orders × payments), which can inflate sums.

SELECT
  c.customer_id,
  o.order_id,
  p.payment_id,
  p.amount
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id
LEFT JOIN payments p
  ON p.order_id = o.order_id;

This is fine for detail-level listings, but if you want totals, you should aggregate at the right level first.

Step-by-step fix: pre-aggregate the right side

Goal: total paid per order, while keeping all orders (even unpaid).

SELECT
  o.order_id,
  o.customer_id,
  COALESCE(paid.total_paid, 0) AS total_paid
FROM orders o
LEFT JOIN (
  SELECT
    order_id,
    SUM(amount) AS total_paid
  FROM payments
  GROUP BY order_id
) paid
  ON paid.order_id = o.order_id;

Now each order matches at most one row from the derived table paid, preventing multiplication.

LEFT JOIN with “Latest Row” Lookups

A frequent requirement is: “For each left row, attach the most recent related row from the right table.” A naive LEFT JOIN returns all related rows, not just the latest. You can solve this with window functions (when available) or with a correlated subquery pattern.

Option A: Window function (recommended when supported)

Goal: each customer plus their latest ticket (if any).

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status,
  t.created_at AS ticket_created_at
FROM customers c
LEFT JOIN (
  SELECT
    ticket_id,
    customer_id,
    status,
    created_at,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY created_at DESC
    ) AS rn
  FROM tickets
) t
  ON t.customer_id = c.customer_id
 AND t.rn = 1
ORDER BY c.customer_id;

Why the filter t.rn = 1 is in the ON clause: it preserves customers with no tickets. If you put WHERE t.rn = 1, customers with no tickets would be removed.

Option B: Correlated subquery (portable but can be slower)

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status,
  t.created_at AS ticket_created_at
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
 AND t.created_at = (
   SELECT MAX(t2.created_at)
   FROM tickets t2
   WHERE t2.customer_id = c.customer_id
 );

This attaches the ticket with the maximum created_at per customer. If multiple tickets share the same timestamp, you may get multiple rows; in that case, add a tie-breaker (e.g., max ticket_id) or use the window function approach.

Handling NULLs from the Right Table

Because unmatched right-side columns become NULL, you often need to decide how to present those values.

Replacing NULL with a default value

SELECT
  c.customer_id,
  c.name,
  COALESCE(t.status, 'no_ticket') AS latest_ticket_status
FROM customers c
LEFT JOIN (
  SELECT
    customer_id,
    status,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
  FROM tickets
) t
  ON t.customer_id = c.customer_id
 AND t.rn = 1;

COALESCE is a simple way to make reports clearer, but be careful not to confuse “missing” with a real status value. Choose defaults that cannot be mistaken for genuine data.

LEFT JOIN vs Filtering the Left Table

LEFT JOIN preserves left rows, but you can still filter the left table in the WHERE clause without breaking that guarantee (as long as your filters reference only left-table columns or expressions that don’t depend on right-table values).

Example: Only active customers, plus any tickets

SELECT
  c.customer_id,
  c.name,
  t.ticket_id,
  t.status
FROM customers c
LEFT JOIN tickets t
  ON t.customer_id = c.customer_id
WHERE c.created_at >= DATE '2025-01-01';

This is safe: you are narrowing the left-side population first, then enriching it with optional ticket data.

Common LEFT JOIN Pitfalls and How to Avoid Them

Pitfall 1: Right-table conditions in WHERE remove unmatched rows

If you want to keep left rows regardless of matches, put right-table conditions in ON, not WHERE. Use WHERE for left-table filters, or for intentional unmatched detection (WHERE right_key IS NULL).

Pitfall 2: Counting with COUNT(*)

When aggregating, prefer COUNT(right_table.primary_key) to get 0 for unmatched rows. Use COUNT(*) only when you truly want to count joined result rows.

Pitfall 3: Unintended row multiplication

When joining multiple one-to-many relationships, pre-aggregate or select the “one row per key” representation before joining further.

Pitfall 4: Joining on non-unique or dirty keys

If the join key is not unique on the right side (or contains duplicates due to data issues), LEFT JOIN will multiply rows. If you expect one match, enforce it by joining to a deduplicated subquery:

SELECT
  c.customer_id,
  c.name,
  x.segment
FROM customers c
LEFT JOIN (
  SELECT
    customer_id,
    MAX(segment) AS segment
  FROM customer_segments
  GROUP BY customer_id
) x
  ON x.customer_id = c.customer_id;

This turns a potentially many-row right table into one row per customer before the join.

Checklist: Writing LEFT JOIN Queries That Stay “Left”

Decide which table defines the population you must keep; put it on the left.
Start with the left table alone; confirm expected row count.
Add the LEFT JOIN with only key matching in ON.
Put right-table filters in ON if you still want unmatched left rows.
Use WHERE right_key IS NULL when you specifically want the missing matches.
When aggregating, use COUNT(right_key) and COALESCE for sums.
Watch for one-to-many expansions; pre-aggregate when you need one row per left key.

Now answer the exercise about the content:

In a LEFT JOIN between customers (left) and tickets (right), where should you place a filter like t.status = open if you want to keep customers who have no tickets or no open tickets?

You are right! Congratulations, now go to the next page

You missed! Try again.

Putting right-table conditions in ON keeps all left-table rows. If the filter is in WHERE, rows with NULLs from the right side are removed, making the result behave more like an inner join.