Subqueries & set operations: five drills
Five drills on subqueries and set operations, ordered easy → hard. Each one isolates a single pattern. The default editor shows a working solution. Try clearing it and writing your own first; the hidden checks will grade what's there when you press Run.
Tables: users, products, orders, reviews.
1. IN subquery: match against a list (Easy) #
Task: find every user who has at least one paid order. Return id and email, one row per user, ordered by id.
The trap: joining users to orders repeats a user once per paid order. IN (subquery) (or EXISTS) asks "is there a match?" without multiplying rows.
2. NOT EXISTS: the anti-join (Easy) #
Task: find every product that has never been ordered, not in any order, regardless of status. Return id, name, and price, ordered by id.
The skill: NOT EXISTS flips the existence check, keeping the outer row only when the subquery finds nothing. It is the cleanest, NULL-safe way to ask "which X has no Y?".
3. The NOT IN + NULL trap, done right (Medium) #
Task: using the inline feedback data in the editor, return every user (from the users table) who has no feedback. Return id and name, ordered by id.
The trap: feedback.user_id contains a NULL (an anonymous entry). A single NULL in a NOT IN subquery list makes the whole predicate UNKNOWN for every row, so WHERE id NOT IN (SELECT user_id FROM feedback) silently returns zero rows, with no error. Write this with NOT EXISTS (NULL-safe by design), or with NOT IN plus an explicit WHERE user_id IS NOT NULL guard. A naive unguarded NOT IN does not count as solved.
4. UNION ALL: stack with a tag column (Medium) #
Task: bucket every user by whether they have written a review. Return bucket (the literal 'has_review' or 'no_review'), id, and email: exactly one row per user, ordered by id. Build it by stacking two tagged SELECTs with UNION ALL.
The discipline: both branches must return the same columns in the same position (set operations match by position, not by name), and the two branches' conditions must be exact complements so no user lands in both buckets (or neither).
5. Correlated scalar subquery: each order vs its user’s average (Hard) #
Task: for every order, show how it compares to that user's average order total. Return order_id, user_id, total, user_avg_total (the average of that user's orders, rounded to 2 decimals), and diff_from_user_avg (total minus the user's average, rounded to 2 decimals). One row per order, ordered by user_id, then order_id as the tie-break.
You may use a correlated scalar subquery or the window-function rewrite (AVG(total) OVER (PARTITION BY user_id)); grading is on output only. The trap with the subquery form: forget the o2.user_id = o1.user_id correlation and you silently compare every order to the global average.
The patterns to remember:
IN (subquery)= "match against a list." One column out of the subquery; deduplicates implicitly, never fans out.NOT EXISTS= the anti-join. NULL-safe by design, the default choice for "which X has no Y?".NOT IN+NULL= silent empty result. One NULL in the list makes every comparison UNKNOWN. Guard withIS NOT NULLin the subquery, or just useNOT EXISTS.UNION ALL+ tag column = stack same-shaped queries and keep track of which branch each row came from. Columns match by position, not name; complementary branch filters keep the grain clean.- Correlated scalar subquery = a per-row aggregate (
this row vs its group). The correlation clause is load-bearing: without it you get a global aggregate. When it appears in the SELECT list, a window function is usually the cleaner rewrite.