Lesson Intermediate

Subqueries (IN / EXISTS)

A subquery is a SELECT inside another SELECT. Three shapes show up over and over:

FormReturnsUsed in
col IN (subquery)a list of valuesWHERE
EXISTS (subquery)a booleanWHERE
(subquery) as scalara single valueSELECT or WHERE

They all let you express "match against another query's result" without materializing an intermediate table.

IN (subquery) #

col IN (subquery) keeps rows where col matches any value returned by the inner query. The subquery returns one column, any number of rows.

sql
-- Products that appear in at least one paid order
SELECT id, name
FROM products
WHERE id IN (
  SELECT DISTINCT product_id
  FROM orders
  WHERE status = 'paid'
);

EXISTS (subquery) #

EXISTS (subquery) returns true for the outer row if the inner query returns at least one row. The inner SELECT list doesn't matter; SELECT 1 is the convention.

sql
-- Users who have placed at least one paid order
SELECT id, email
FROM users AS u
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.user_id = u.id
    AND o.status = 'paid'
);

IN vs EXISTS #

Both answer "does a match exist?" but they differ in shape:

IN (subquery)EXISTS (subquery)
Subquery returnsone column of valuesanything (usually SELECT 1)
Correlates with outer row?usually nousually yes
NULL in result listbreaks NOT IN (see below)safe

Modern query optimizers often translate both into the same plan, but EXISTS is the idiomatic choice when you're correlating with the outer row.

Warning

NOT IN (subquery) silently returns zero rows when the subquery contains NULL.
WHERE col NOT IN (a, b, NULL) becomes UNKNOWN for every row, so nothing matches. If the column the subquery returns is nullable, use NOT EXISTS instead, or filter out NULL in the subquery.

Scalar Subqueries #

A scalar subquery returns exactly one row and one column. You can drop it anywhere a single value is expected:

sql
-- Compare each order's total to the overall average
SELECT
  id,
  total,
  ROUND(total - (SELECT AVG(total) FROM orders), 2) AS diff_from_avg
FROM orders;

If a "scalar" subquery accidentally returns more than one row at runtime, the query fails. Keep them narrow: aggregates like AVG, MAX, or LIMIT 1 are the safe patterns.

Try It #

Find products that appear in at least one paid order. Then rewrite the same query with EXISTS.

Loading SQL editor...

Practice #

Return id, name, and price for every product that has appeared in at least one paid order. Use a subquery with IN or EXISTS. No ordering requirement.

Loading SQL editor...

Mistakes to Watch For #

  • Using NOT IN against a subquery that can return NULL. Reach for NOT EXISTS instead.
  • Writing an IN (subquery) that returns multiple columns. The subquery must return exactly one column.
  • Using a scalar subquery that sometimes returns more than one row. The query will fail at runtime, not at compile time.
  • Reaching for a subquery when a JOIN reads better. Subqueries shine for existence checks; joins shine when you actually need columns from both sides.

Knowledge check #

3 questions

0 / 3 answered
  1. EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id) is true when:

  2. NOT IN (subquery) is not equivalent to NOT EXISTS when:

  3. SELECT (SELECT MAX(price) FROM products) AS top_price, name FROM products is an example of:

Next Step #

Continue to Correlated subqueries: the same shape, but where the inner query references the outer row on every evaluation.