All courses > Technology and Programming > Databases ::

Self Join: Relating a Table to Itself with Aliases and Clear Conditions

Capítulo 6

Estimated reading time: 12 minutes

What a Self Join Is (and Why You’d Use It)

A self join is a join where the left and right sides come from the same physical table, but you treat them as two different logical tables by giving each side a different alias. This is useful whenever a row needs to be related to other rows in the same table: employees to managers, products to “similar products,” events to prior events, customers to referrals, categories to parent categories, and so on.

The key idea is that the table contains a relationship that points back into itself (directly or indirectly). In practice, that relationship is usually represented by a foreign key column that references the table’s own primary key (for example, employees.manager_id referencing employees.employee_id), or by a column that can be matched to another row’s column (for example, two rows share the same department_id but have different employee_id values).

Because both sides are the same table, clarity depends on two things: (1) aliases that make each role obvious, and (2) a join condition that precisely describes how one row relates to another. Without those, self joins can quickly become confusing or produce incorrect results (like pairing a row with itself or creating duplicate pairs).

Core Pattern: Aliases Create Two Roles

When you self join, you must alias the table at least twice. Each alias represents a role in the relationship. A common pattern is:

SELECT  ...columns from both roles...  FROM table_name AS role1 JOIN table_name AS role2   ON ...relationship between role1 and role2...

Even though role1 and role2 refer to the same underlying table, SQL treats them as separate sources for the purpose of the query. This lets you select columns from each role independently and compare them.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Example dataset: employees

Assume a table like:

employees(  employee_id   INT PRIMARY KEY,  full_name     VARCHAR(200),  title         VARCHAR(100),  manager_id    INT NULL,  department_id INT,  hire_date     DATE)

Here, manager_id stores the employee_id of the manager. That is a classic self-referencing relationship.

Step-by-Step: Employee to Manager (Most Common Self Join)

Step 1: Decide the two roles and name them clearly

We want each employee and their manager. The two roles are:

e = employee (the person we’re listing)
m = manager (the person referenced by e.manager_id)

Step 2: Write the join condition that connects the roles

The relationship is: the employee’s manager_id equals the manager’s employee_id.

e.manager_id = m.employee_id

Step 3: Choose join type based on whether every employee has a manager

Top-level employees (like a CEO) may have manager_id as NULL. If you want to keep them in the result, use a left join from employees to managers. If you only want employees who have a manager, use an inner join. In many org charts, keeping top-level employees is useful, so we’ll use a left join.

Step 4: Select columns with explicit aliasing

Use column aliases to make output readable and avoid ambiguity.

SELECT  e.employee_id,  e.full_name  AS employee_name,  e.title      AS employee_title,  m.full_name  AS manager_name,  m.title      AS manager_title FROM employees AS e LEFT JOIN employees AS m   ON e.manager_id = m.employee_id ORDER BY e.employee_id;

This query reads naturally: “from employees as e, left join employees as m on e.manager_id = m.employee_id.” The aliases make it clear which columns belong to which role.

Preventing Common Self Join Mistakes

Mistake 1: Ambiguous column references

Because both sides share the same column names, you must qualify columns with the alias. This will fail or behave unexpectedly:

SELECT employee_id, full_name FROM employees e JOIN employees m ON manager_id = employee_id;

It’s ambiguous: which employee_id is on the right side of the join condition? Always qualify:

... ON e.manager_id = m.employee_id

Mistake 2: Joining a row to itself unintentionally

Some self joins are based on “same group” matching (same department, same category, same date). In those cases, if you join on equality of the group key, you will match each row to itself unless you explicitly exclude it.

Example: pairing employees in the same department.

SELECT  e1.full_name AS employee_1,  e2.full_name AS employee_2,  e1.department_id FROM employees AS e1 JOIN employees AS e2   ON e1.department_id = e2.department_id;

This includes pairs like (Alice, Alice). To exclude self-pairs, add:

AND e1.employee_id <> e2.employee_id

Mistake 3: Duplicate pairs (A,B) and (B,A)

When you create pairs from the same table, you often want each pair only once. If you join employees to employees in the same department, you’ll get both (Alice, Bob) and (Bob, Alice). To keep only one ordering, enforce a consistent rule such as “left id is less than right id.”

SELECT  e1.full_name AS employee_1,  e2.full_name AS employee_2,  e1.department_id FROM employees AS e1 JOIN employees AS e2   ON e1.department_id = e2.department_id  AND e1.employee_id < e2.employee_id ORDER BY e1.department_id, e1.full_name, e2.full_name;

This does two things at once: it removes self-pairs and removes mirrored duplicates.

Self Join for Hierarchies: Parent/Child Categories

Self joins are a natural fit for hierarchical data stored in a single table. Consider:

categories(  category_id   INT PRIMARY KEY,  category_name VARCHAR(200),  parent_id     INT NULL)

Each category may have a parent category. To list categories with their parent names:

SELECT  c.category_id,  c.category_name,  p.category_name AS parent_category_name FROM categories AS c LEFT JOIN categories AS p   ON c.parent_id = p.category_id ORDER BY c.category_name;

Again, the aliases communicate roles: c is the child category, p is the parent category.

Adding a second level (grandparent) with multiple self joins

You can chain self joins to walk up the hierarchy a fixed number of levels. For example, category → parent → grandparent:

SELECT  c.category_name        AS category,  p.category_name        AS parent,  gp.category_name       AS grandparent FROM categories AS c LEFT JOIN categories AS p   ON c.parent_id = p.category_id LEFT JOIN categories AS gp   ON p.parent_id = gp.category_id ORDER BY c.category_name;

This is still a self join conceptually, but you’re using the same table three times with three aliases. This approach is practical when you only need a small, known depth. For unknown depth, you typically need a recursive query (often a recursive CTE), which is a different technique than a basic join.

Self Join for “Previous Row” Comparisons (Same Table, Different Time)

Another common use is comparing a row to another row in the same table that represents a different moment in time: previous order, previous status change, previous price, previous login, etc. This can be done with a self join when you can define “previous” using a key or date.

Example dataset: product_prices

product_prices(  product_id   INT,  valid_from   DATE,  price        DECIMAL(10,2),  PRIMARY KEY(product_id, valid_from))

We want to compare each price record to the immediately prior price record for the same product. One approach is to self join each row to the maximum earlier valid_from date for that product.

Step-by-step:

Alias pp as the “current” price row.
Alias prev as the “previous” price row.
Join on same product and ensure prev.valid_from is earlier than pp.valid_from.
Pick only the closest earlier row using a correlated subquery in the join condition.

SELECT  pp.product_id,  pp.valid_from      AS current_valid_from,  pp.price           AS current_price,  prev.valid_from    AS previous_valid_from,  prev.price         AS previous_price FROM product_prices AS pp LEFT JOIN product_prices AS prev   ON prev.product_id = pp.product_id  AND prev.valid_from = (    SELECT MAX(p2.valid_from)    FROM product_prices AS p2    WHERE p2.product_id = pp.product_id      AND p2.valid_from < pp.valid_from  ) ORDER BY pp.product_id, pp.valid_from;

This is a self join because product_prices appears twice, and the join condition relates each row to another row in the same table. The correlated subquery ensures you get the immediately prior record, not all earlier records.

Note: Many SQL dialects also support window functions like LAG for this pattern, but the self join approach is widely applicable and helps reinforce how to relate rows within a single table using explicit conditions.

Self Join for Finding “Peers” (Same Attribute, Different Entity)

Sometimes you want to relate entities that share an attribute. For example, customers in the same city, students in the same class, or employees with the same title. This is a self join where the join condition matches on the shared attribute and excludes the same entity.

Example: employees with the same title in the same department

We’ll pair employees who share both department_id and title. To avoid duplicates and self-pairs, we’ll use < on the primary key.

SELECT  e1.department_id,  e1.title,  e1.full_name AS employee_1,  e2.full_name AS employee_2 FROM employees AS e1 JOIN employees AS e2   ON e1.department_id = e2.department_id  AND e1.title = e2.title  AND e1.employee_id < e2.employee_id ORDER BY e1.department_id, e1.title, e1.full_name, e2.full_name;

This produces unique peer pairs. If you instead want a list of each employee and the count of peers, you might self join and aggregate, but be careful: pairing creates multiple rows per employee. For counts, you can join and count distinct peers.

SELECT  e1.employee_id,  e1.full_name,  COUNT(e2.employee_id) AS peer_count FROM employees AS e1 LEFT JOIN employees AS e2   ON e1.department_id = e2.department_id  AND e1.title = e2.title  AND e1.employee_id <> e2.employee_id GROUP BY e1.employee_id, e1.full_name ORDER BY peer_count DESC, e1.full_name;

Here we keep all employees (left join) and count how many other employees match the peer criteria.

Self Join with Non-Equality Conditions (Range and Overlap Matching)

Self joins are not limited to equality. You can relate rows using inequality conditions, which is useful for detecting overlaps, gaps, or sequences.

Example dataset: room_bookings

room_bookings(  booking_id INT PRIMARY KEY,  room_id    INT,  start_ts   TIMESTAMP,  end_ts     TIMESTAMP)

To find overlapping bookings in the same room, self join bookings to bookings where time ranges overlap. Use an ordering rule to avoid duplicates.

SELECT  b1.room_id,  b1.booking_id AS booking_1,  b2.booking_id AS booking_2,  b1.start_ts   AS booking_1_start,  b1.end_ts     AS booking_1_end,  b2.start_ts   AS booking_2_start,  b2.end_ts     AS booking_2_end FROM room_bookings AS b1 JOIN room_bookings AS b2   ON b1.room_id = b2.room_id  AND b1.booking_id < b2.booking_id  AND b1.start_ts < b2.end_ts  AND b2.start_ts < b1.end_ts ORDER BY b1.room_id, b1.booking_id, b2.booking_id;

The overlap logic (b1.start_ts < b2.end_ts and b2.start_ts < b1.end_ts) is a clear condition that relates two rows. The b1.booking_id < b2.booking_id rule prevents mirrored duplicates and self-matching.

Making Self Joins Readable: Alias and Column Naming Practices

Because self joins inherently reuse the same column names, readability is not optional. A few practical habits make a big difference:

Use role-based aliases: e/m (employee/manager), c/p (child/parent), pp/prev (current/previous), b1/b2 (booking pair).
Alias output columns so the result set is self-explanatory: employee_name, manager_name, previous_price.
Keep join conditions grouped by meaning: first match the entity grouping (same product, same room), then add the relationship rule (previous date, overlap), then add the de-duplication rule (<).
Be explicit about nullability: if the relationship column can be NULL (like manager_id), choose a join type that matches your intent and expect nulls on the “other side” columns.

Performance and Correctness Considerations Specific to Self Joins

Indexes that matter

Self joins can be efficient if the relationship columns are indexed. Common helpful indexes include:

employees(employee_id) (usually already the primary key) and employees(manager_id) for employee-manager lookups.
categories(category_id) and categories(parent_id) for hierarchy lookups.
Composite keys like product_prices(product_id, valid_from) to support “previous row” matching.
room_bookings(room_id, start_ts, end_ts) (or at least room_id plus one time column) for overlap detection.

Cardinality can grow quickly

Some self joins create many combinations. Pairing peers in a department can produce roughly N×(N-1)/2 rows per group. Overlap detection can also be heavy if many bookings exist per room. When you see unexpectedly large result sets, check whether your join condition is too broad (for example, matching only on department_id when you meant department_id and title), and ensure you’ve included de-duplication rules where appropriate.

Validate with small slices

When building a self join, it’s practical to test on a single entity or group first (one department, one product, one room) by adding a WHERE filter. This helps confirm that the join condition produces the intended pairs before running it across the whole table.

SELECT ... FROM room_bookings b1 JOIN room_bookings b2   ON ... WHERE b1.room_id = 101;

Now answer the exercise about the content:

When pairing employees in the same department using a self join, what condition best ensures the results exclude self-pairs and avoid mirrored duplicates like (A,B) and (B,A)?

You are right! Congratulations, now go to the next page

You missed! Try again.

In self-join pairing queries, enforcing a consistent ordering like e1.employee_id < e2.employee_id prevents matching a row to itself and removes mirrored duplicates by keeping only one direction of each pair.