What a LEFT JOIN Guarantees
A LEFT JOIN returns all rows from the table on the left side of the join (often called the “primary” table in the query), and it returns matching rows from the table on the right side when they exist. When there is no match on the right, the query still keeps the left row and fills the right-side columns with NULL.
This makes LEFT JOIN the default choice when your question is “show me everything from A, and add data from B when available.” It is especially useful for reporting, data quality checks, and building datasets where the left table defines the population you care about (customers, orders, products, employees, etc.).
Key properties you can rely on
- Row preservation: every row from the left table appears at least once in the result.
- Optional enrichment: right-table columns are populated only when a match exists; otherwise they are
NULL. - One-to-many expansion: if a left row matches multiple right rows, the left row is repeated once per match (this is expected and often desired).
- Join condition matters: what counts as a “match” is defined entirely by the
ONclause.
A Practical Scenario: Customers and Their Latest Support Ticket
Suppose you want a list of all customers, plus their most recent support ticket if they have one. Customers without tickets should still appear.
We’ll use a simplified schema:
customers(customer_id, name, email, created_at)
tickets(ticket_id, customer_id, status, created_at)Sample data (conceptual):
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
- Customer 1: has 2 tickets
- Customer 2: has 0 tickets
- Customer 3: has 1 ticket
A basic LEFT JOIN to list customers and any tickets:
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
ORDER BY c.customer_id, t.created_at;What you should expect:
- Customer 2 will still appear, but
t.ticket_idandt.statuswill beNULL. - Customer 1 will appear twice (once per ticket) because of the one-to-many relationship.
Step-by-step: building the query safely
When you write a LEFT JOIN query, build it in small steps so you can verify row counts and avoid accidental filtering.
Step 1: Start with the left table only
SELECT
customer_id,
name
FROM customers;At this step, you know exactly how many rows you expect: one per customer.
Step 2: Add the LEFT JOIN with only the key match
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id;Now check:
- Do you still see all customers?
- Did the row count increase due to multiple tickets per customer? If yes, that’s normal.
Step 3: Add right-table filters carefully (ON vs WHERE)
This is where many LEFT JOIN queries accidentally turn into “inner-like” results. If you put a condition on the right table in the WHERE clause, you may remove rows where the right side is NULL, which defeats the purpose of preserving left rows.
Example goal: “Show all customers, and include only tickets with status = 'open' when they exist.”
Correct (filter in ON):
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
AND t.status = 'open';This keeps customers with no open tickets (or no tickets at all). Those customers will have NULL ticket columns.
Common mistake (filter in WHERE):
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
WHERE t.status = 'open';This removes customers whose joined ticket row is NULL (because NULL = 'open' is not true). The result no longer preserves all customers.
Finding “Missing Matches” with LEFT JOIN
LEFT JOIN is a powerful tool for detecting gaps: “Which left-side records have no corresponding right-side record?” This is a common data quality and business question.
Example: Customers with no tickets
SELECT
c.customer_id,
c.name
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
WHERE t.ticket_id IS NULL;Why this works: customers without tickets produce a joined row where all t.* columns are NULL. Filtering for t.ticket_id IS NULL isolates those unmatched left rows.
Example: Products with no sales
Another classic: list products that have never been ordered.
SELECT
p.product_id,
p.product_name
FROM products p
LEFT JOIN order_items oi
ON oi.product_id = p.product_id
WHERE oi.product_id IS NULL;Note the WHERE clause is safe here because you are intentionally selecting the unmatched rows (the ones with NULL on the right).
LEFT JOIN with Aggregation: Counting Matches Without Losing Left Rows
LEFT JOIN is frequently combined with GROUP BY to compute counts and sums while keeping all left-side entities, even those with zero matches.
Example: Count tickets per customer (including zero)
SELECT
c.customer_id,
c.name,
COUNT(t.ticket_id) AS ticket_count
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
GROUP BY
c.customer_id,
c.name
ORDER BY c.customer_id;Important detail: use COUNT(t.ticket_id), not COUNT(*).
COUNT(t.ticket_id)counts only non-NULL ticket IDs, so customers with no tickets get 0.COUNT(*)counts rows in the joined result; even unmatched customers still produce one row, so they would incorrectly get 1.
Example: Sum revenue per product (including zero)
SELECT
p.product_id,
p.product_name,
COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue
FROM products p
LEFT JOIN order_items oi
ON oi.product_id = p.product_id
GROUP BY
p.product_id,
p.product_name
ORDER BY revenue DESC;SUM(...) returns NULL when there are no matched rows, so COALESCE(..., 0) converts that to 0 for reporting.
Choosing the “Primary Table” Intentionally
In a LEFT JOIN, the left table defines what must be kept. That means you should choose the left table based on the population you want in the final result.
Example: Orders vs Customers
If you want a report of all orders, including orders placed by customers that might be missing from the customers table (data issue), you would put orders on the left:
SELECT
o.order_id,
o.customer_id,
c.name
FROM orders o
LEFT JOIN customers c
ON c.customer_id = o.customer_id;If you instead want all customers and their orders (including customers with no orders), put customers on the left:
SELECT
c.customer_id,
c.name,
o.order_id
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id;Same two tables, different question, different “primary” table.
LEFT JOIN and Multiple Matches: Avoiding Accidental Multiplication
LEFT JOIN can multiply rows when the right table has multiple matches per left key. This is correct behavior, but it can surprise you when you later aggregate or join additional tables.
Example: Customers, Orders, and Payments
Imagine:
- One customer has many orders.
- Each order can have multiple payments (partial payments, retries, etc.).
If you join all three tables directly, you can multiply rows (orders × payments), which can inflate sums.
SELECT
c.customer_id,
o.order_id,
p.payment_id,
p.amount
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
LEFT JOIN payments p
ON p.order_id = o.order_id;This is fine for detail-level listings, but if you want totals, you should aggregate at the right level first.
Step-by-step fix: pre-aggregate the right side
Goal: total paid per order, while keeping all orders (even unpaid).
SELECT
o.order_id,
o.customer_id,
COALESCE(paid.total_paid, 0) AS total_paid
FROM orders o
LEFT JOIN (
SELECT
order_id,
SUM(amount) AS total_paid
FROM payments
GROUP BY order_id
) paid
ON paid.order_id = o.order_id;Now each order matches at most one row from the derived table paid, preventing multiplication.
LEFT JOIN with “Latest Row” Lookups
A frequent requirement is: “For each left row, attach the most recent related row from the right table.” A naive LEFT JOIN returns all related rows, not just the latest. You can solve this with window functions (when available) or with a correlated subquery pattern.
Option A: Window function (recommended when supported)
Goal: each customer plus their latest ticket (if any).
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status,
t.created_at AS ticket_created_at
FROM customers c
LEFT JOIN (
SELECT
ticket_id,
customer_id,
status,
created_at,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY created_at DESC
) AS rn
FROM tickets
) t
ON t.customer_id = c.customer_id
AND t.rn = 1
ORDER BY c.customer_id;Why the filter t.rn = 1 is in the ON clause: it preserves customers with no tickets. If you put WHERE t.rn = 1, customers with no tickets would be removed.
Option B: Correlated subquery (portable but can be slower)
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status,
t.created_at AS ticket_created_at
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
AND t.created_at = (
SELECT MAX(t2.created_at)
FROM tickets t2
WHERE t2.customer_id = c.customer_id
);This attaches the ticket with the maximum created_at per customer. If multiple tickets share the same timestamp, you may get multiple rows; in that case, add a tie-breaker (e.g., max ticket_id) or use the window function approach.
Handling NULLs from the Right Table
Because unmatched right-side columns become NULL, you often need to decide how to present those values.
Replacing NULL with a default value
SELECT
c.customer_id,
c.name,
COALESCE(t.status, 'no_ticket') AS latest_ticket_status
FROM customers c
LEFT JOIN (
SELECT
customer_id,
status,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM tickets
) t
ON t.customer_id = c.customer_id
AND t.rn = 1;COALESCE is a simple way to make reports clearer, but be careful not to confuse “missing” with a real status value. Choose defaults that cannot be mistaken for genuine data.
LEFT JOIN vs Filtering the Left Table
LEFT JOIN preserves left rows, but you can still filter the left table in the WHERE clause without breaking that guarantee (as long as your filters reference only left-table columns or expressions that don’t depend on right-table values).
Example: Only active customers, plus any tickets
SELECT
c.customer_id,
c.name,
t.ticket_id,
t.status
FROM customers c
LEFT JOIN tickets t
ON t.customer_id = c.customer_id
WHERE c.created_at >= DATE '2025-01-01';This is safe: you are narrowing the left-side population first, then enriching it with optional ticket data.
Common LEFT JOIN Pitfalls and How to Avoid Them
Pitfall 1: Right-table conditions in WHERE remove unmatched rows
If you want to keep left rows regardless of matches, put right-table conditions in ON, not WHERE. Use WHERE for left-table filters, or for intentional unmatched detection (WHERE right_key IS NULL).
Pitfall 2: Counting with COUNT(*)
When aggregating, prefer COUNT(right_table.primary_key) to get 0 for unmatched rows. Use COUNT(*) only when you truly want to count joined result rows.
Pitfall 3: Unintended row multiplication
When joining multiple one-to-many relationships, pre-aggregate or select the “one row per key” representation before joining further.
Pitfall 4: Joining on non-unique or dirty keys
If the join key is not unique on the right side (or contains duplicates due to data issues), LEFT JOIN will multiply rows. If you expect one match, enforce it by joining to a deduplicated subquery:
SELECT
c.customer_id,
c.name,
x.segment
FROM customers c
LEFT JOIN (
SELECT
customer_id,
MAX(segment) AS segment
FROM customer_segments
GROUP BY customer_id
) x
ON x.customer_id = c.customer_id;This turns a potentially many-row right table into one row per customer before the join.
Checklist: Writing LEFT JOIN Queries That Stay “Left”
- Decide which table defines the population you must keep; put it on the left.
- Start with the left table alone; confirm expected row count.
- Add the LEFT JOIN with only key matching in
ON. - Put right-table filters in
ONif you still want unmatched left rows. - Use
WHERE right_key IS NULLwhen you specifically want the missing matches. - When aggregating, use
COUNT(right_key)andCOALESCEfor sums. - Watch for one-to-many expansions; pre-aggregate when you need one row per left key.