GROUP BY: five drills
Five GROUP BY drills, ordered easy to hard. Each isolates one aggregation concept. The default editor shows a working solution - try clearing it and writing your own first; the hidden checks will grade what's there when you press Run.
Tables: orders, products.
1. COUNT distinctions - three forms, three meanings (Easy) #
Task: every user_id in orders is non-NULL, so this drill uses a small inline dataset where the three counts genuinely differ. Start your query with this VALUES CTE:
WITH signups AS (SELECT * FROM (VALUES (1), (1), (2), (NULL), (3), (NULL), (2)) AS t(user_id))
From signups, return three numbers in one row: total rows (total_rows), non-null user_ids (non_null_users), and distinct user_ids (distinct_users).
The skill: the three COUNT forms differ.
COUNT(*)- total rows, including NULLs in any column.COUNT(col)- non-NULL values ofcol. Skips NULLs.COUNT(DISTINCT col)- unique non-NULL values ofcol.
You'll use this distinction constantly when summarizing data.
2. WHERE vs HAVING - filter timing matters (Easy) #
Task: for each user, count their paid orders. Return only users with more than 1 paid order. Output user_id, paid_order_count. Order by user_id.
The skill: WHERE filters rows before grouping; HAVING filters groups after.
- The
status = 'paid'filter belongs inWHERE- it's a row-level condition. - The
COUNT(*) > 1filter belongs inHAVING- it's an aggregate condition.
The trap: putting status = 'paid' in HAVING is wasteful (the engine groups every row first, then drops). Putting COUNT() > 1 in WHERE is illegal* (WHERE runs before aggregation, the count doesn't exist yet).
3. Multi-column grouping (Medium) #
Task: for each (user_id, status) combination, return the order count and total revenue. Output user_id, status, order_count, revenue. Round revenue to 2 decimals. Order by user_id, status.
The skill: a GROUP BY with multiple columns produces one row per combination of values - the group key is the tuple. Common shape for "metrics by X and Y" reports.
4. ROLLUP - subtotals and grand total in one query (Medium) #
Task: show order revenue at three levels in one result: per (user_id, status), per user_id alone, and the grand total. Use ROLLUP. NULL in a column indicates a rollup (subtotal or grand total) at that level. Output user_id, status, revenue. Round revenue to 2 decimals. Order so user-level subtotals come after each user's per-status rows, and the grand total comes last.
The skill: GROUP BY ROLLUP (a, b) produces three groupings in one query: (a, b), (a), and () (grand total). Same for GROUPING SETS ((a, b), (a), ()). Engines that don't support ROLLUP (e.g. SQLite) need three UNION ALL queries instead.
5. Conditional aggregation - count subsets in one pass (Hard) #
Task: for each product category, return: total products in the category, products that have at least 1 order, and products with zero orders. Output category, total_products, ordered_products, never_ordered. Order by category.
The skill: counting subsets of rows within each group, in a single pass. Two equivalent shapes:
SUM(CASE WHEN cond THEN 1 ELSE 0 END)- portable everywhere.COUNT(*) FILTER (WHERE cond)- cleaner, but only on Postgres / DuckDB / SQLite (3.30+) / Snowflake (limited).
The trap: a naive LEFT JOIN orders and counting matched rows gives you order counts, not "products with at least one order." Use DISTINCT product_id carefully.
The GROUP BY patterns to remember:
- **
COUNT()= rows.COUNT(col)= non-NULL values of col.COUNT(DISTINCT col)* = unique non-NULL values. The three are not interchangeable. WHEREfilters rows before grouping (cheap, sargable).HAVINGfilters groups after (uses aggregates). Putting a row-level filter inHAVINGworks but wastes compute.GROUP BY a, bproduces one row per (a, b) tuple. Every non-aggregate column inSELECTmust appear inGROUP BY.ROLLUP (a, b)=(a, b)+(a)+().CUBE (a, b)= all 4 combinations.GROUPING SETS= explicit list. NULL in a grouped column flags a rollup level.- Conditional counting =
SUM(CASE WHEN cond THEN 1 ELSE 0 END)(portable) orCOUNT(*) FILTER (WHERE cond)(Postgres / DuckDB / SQLite 3.30+).