Lesson Beginner

Self Join

A self join is any join where both sides are the same table.
You use it to relate rows within a single entity: employees to their managers, orders to other orders by the same user, consecutive events for the same customer.

The Alias Rule #

Warning

You must give both copies different aliases.
Without aliases, every column reference is ambiguous; SQL has no way to know which copy of the table you mean. Pick aliases that describe the role each copy plays in the relationship: child/parent, earlier/later, a/b. Never use the bare table name on both sides.

Parent-Child Hierarchy #

The classic example: an employees table where each row has a manager_id pointing to another row in the same table. To show each employee alongside their manager's name, join the table to itself.

(The example builds a tiny employees table with WITH. WITH names a subquery, covered properly in the CTE lesson later. For now, read it as a named intermediate result.)

sql
WITH employees AS (
  SELECT * FROM (VALUES
    (1, 'CEO',       NULL),
    (2, 'CTO',       1),
    (3, 'Data Lead', 2),
    (4, 'Analyst',   3)
  ) AS t(emp_id, emp_name, manager_id)
)
SELECT
  child.emp_name  AS employee,
  parent.emp_name AS manager
FROM employees AS child
LEFT JOIN employees AS parent
  ON child.manager_id = parent.emp_id;

LEFT JOIN here is deliberate: the CEO has manager_id = NULL and should still appear (with NULL in the manager column). An INNER JOIN would silently drop the CEO.

Pair Matching #

Self joins also generate pairs of rows from the same table: "all pairs of orders by the same user", "all events within 5 minutes of each other". The trick is to add an inequality like a.id < b.id so each pair appears once, not twice (and no row pairs with itself).

sql
-- All pairs of orders from the same user (each pair once)
SELECT
  a.id AS order_a,
  b.id AS order_b,
  a.user_id
FROM orders AS a
INNER JOIN orders AS b
  ON a.user_id = b.user_id
  AND a.id < b.id;

Try It #

Match employees to their managers. Then switch to INNER JOIN and note what disappears.

Loading SQL editor...

Practice #

Using the real orders table, find every pair of paid orders placed by the same user. Return user_id, order_a (the smaller order id), and order_b (the larger). Each pair should appear exactly once. No ordering requirement.

Loading SQL editor...

Mistakes to Watch For #

  • Forgetting to alias both copies. "Ambiguous column" errors are the most common self-join failure.
  • Using INNER JOIN for hierarchies and losing root rows (like the CEO), when LEFT JOIN preserves them.
  • Omitting a.id < b.id in pair-matching queries, which gives you every pair twice and each row paired with itself.
  • Chaining many self joins to walk a variable-depth tree — recursive CTEs handle that case properly (different lesson).

Knowledge check #

3 questions

0 / 3 answered
  1. Why are aliases required when self-joining?

  2. For "find each row alongside the next row by date," modern SQL prefers:

  3. A self-join without an explicit "exclude this row" condition:

Next Step #

Continue to Join debugging: diagnosing duplicate rows and missing rows after a join.