Subqueries (IN / EXISTS)
A subquery is a SELECT inside another SELECT. Three shapes show up over and over:
| Form | Returns | Used in |
|---|---|---|
col IN (subquery) | a list of values | WHERE |
EXISTS (subquery) | a boolean | WHERE |
(subquery) as scalar | a single value | SELECT or WHERE |
They all let you express "match against another query's result" without materializing an intermediate table.
IN (subquery) #
col IN (subquery) keeps rows where col matches any value returned by the inner query. The subquery returns one column, any number of rows.
-- Products that appear in at least one paid order
SELECT id, name
FROM products
WHERE id IN (
SELECT DISTINCT product_id
FROM orders
WHERE status = 'paid'
);EXISTS (subquery) #
EXISTS (subquery) returns true for the outer row if the inner query returns at least one row. The inner SELECT list doesn't matter; SELECT 1 is the convention.
-- Users who have placed at least one paid order
SELECT id, email
FROM users AS u
WHERE EXISTS (
SELECT 1
FROM orders AS o
WHERE o.user_id = u.id
AND o.status = 'paid'
);IN vs EXISTS #
Both answer "does a match exist?" but they differ in shape:
IN (subquery) | EXISTS (subquery) | |
|---|---|---|
| Subquery returns | one column of values | anything (usually SELECT 1) |
| Correlates with outer row? | usually no | usually yes |
NULL in result list | breaks NOT IN (see below) | safe |
Modern query optimizers often translate both into the same plan, but EXISTS is the idiomatic choice when you're correlating with the outer row.
NOT IN (subquery) silently returns zero rows when the subquery contains NULL.WHERE col NOT IN (a, b, NULL) becomes UNKNOWN for every row, so nothing matches. If the column the subquery returns is nullable, use NOT EXISTS instead, or filter out NULL in the subquery.
Scalar Subqueries #
A scalar subquery returns exactly one row and one column. You can drop it anywhere a single value is expected:
-- Compare each order's total to the overall average
SELECT
id,
total,
ROUND(total - (SELECT AVG(total) FROM orders), 2) AS diff_from_avg
FROM orders;If a "scalar" subquery accidentally returns more than one row at runtime, the query fails. Keep them narrow: aggregates like AVG, MAX, or LIMIT 1 are the safe patterns.
Try It #
Find products that appear in at least one paid order. Then rewrite the same query with EXISTS.
Practice #
Return id, name, and price for every product that has appeared in at least one paid order. Use a subquery with IN or EXISTS. No ordering requirement.
Mistakes to Watch For #
- Using
NOT INagainst a subquery that can returnNULL. Reach forNOT EXISTSinstead. - Writing an
IN (subquery)that returns multiple columns. The subquery must return exactly one column. - Using a scalar subquery that sometimes returns more than one row. The query will fail at runtime, not at compile time.
- Reaching for a subquery when a
JOINreads better. Subqueries shine for existence checks; joins shine when you actually need columns from both sides.
Knowledge check #
3 questions
EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id)is true when:NOT IN (subquery)is not equivalent toNOT EXISTSwhen:SELECT (SELECT MAX(price) FROM products) AS top_price, name FROM productsis an example of:
Next Step #
Continue to Correlated subqueries: the same shape, but where the inner query references the outer row on every evaluation.