Practice

GROUP BY: five drills

Five GROUP BY drills, ordered easy to hard. Each isolates one aggregation concept. The default editor shows a working solution - try clearing it and writing your own first; the hidden checks will grade what's there when you press Run.

Tables: orders, products.

1. COUNT distinctions - three forms, three meanings (Easy) #

Task: every user_id in orders is non-NULL, so this drill uses a small inline dataset where the three counts genuinely differ. Start your query with this VALUES CTE:

WITH signups AS (SELECT * FROM (VALUES (1), (1), (2), (NULL), (3), (NULL), (2)) AS t(user_id))

From signups, return three numbers in one row: total rows (total_rows), non-null user_ids (non_null_users), and distinct user_ids (distinct_users).

The skill: the three COUNT forms differ.

COUNT(*) - total rows, including NULLs in any column.
COUNT(col) - non-NULL values of col. Skips NULLs.
COUNT(DISTINCT col) - unique non-NULL values of col.

You'll use this distinction constantly when summarizing data.

Loading SQL editor...

2. WHERE vs HAVING - filter timing matters (Easy) #

Task: for each user, count their paid orders. Return only users with more than 1 paid order. Output user_id, paid_order_count. Order by user_id.

The skill: WHERE filters rows before grouping; HAVING filters groups after.

The status = 'paid' filter belongs in WHERE - it's a row-level condition.
The COUNT(*) > 1 filter belongs in HAVING - it's an aggregate condition.

The trap: putting status = 'paid' in HAVING is wasteful (the engine groups every row first, then drops). Putting COUNT() > 1 in WHERE is illegal* (WHERE runs before aggregation, the count doesn't exist yet).

Loading SQL editor...

3. Multi-column grouping (Medium) #

Task: for each (user_id, status) combination, return the order count and total revenue. Output user_id, status, order_count, revenue. Round revenue to 2 decimals. Order by user_id, status.

The skill: a GROUP BY with multiple columns produces one row per combination of values - the group key is the tuple. Common shape for "metrics by X and Y" reports.

Loading SQL editor...

4. ROLLUP - subtotals and grand total in one query (Medium) #

Task: show order revenue at three levels in one result: per (user_id, status), per user_id alone, and the grand total. Use ROLLUP. NULL in a column indicates a rollup (subtotal or grand total) at that level. Output user_id, status, revenue. Round revenue to 2 decimals. Order so user-level subtotals come after each user's per-status rows, and the grand total comes last.

The skill: GROUP BY ROLLUP (a, b) produces three groupings in one query: (a, b), (a), and () (grand total). Same for GROUPING SETS ((a, b), (a), ()). Engines that don't support ROLLUP (e.g. SQLite) need three UNION ALL queries instead.

Loading SQL editor...

5. Conditional aggregation - count subsets in one pass (Hard) #

Task: for each product category, return: total products in the category, products that have at least 1 order, and products with zero orders. Output category, total_products, ordered_products, never_ordered. Order by category.

The skill: counting subsets of rows within each group, in a single pass. Two equivalent shapes:

SUM(CASE WHEN cond THEN 1 ELSE 0 END) - portable everywhere.
COUNT(*) FILTER (WHERE cond) - cleaner, but only on Postgres / DuckDB / SQLite (3.30+) / Snowflake (limited).

The trap: a naive LEFT JOIN orders and counting matched rows gives you order counts, not "products with at least one order." Use DISTINCT product_id carefully.

Loading SQL editor...

Tip

The GROUP BY patterns to remember:

**COUNT() = rows. COUNT(col) = non-NULL values of col. COUNT(DISTINCT col)* = unique non-NULL values. The three are not interchangeable.
WHERE filters rows before grouping (cheap, sargable). HAVING filters groups after (uses aggregates). Putting a row-level filter in HAVING works but wastes compute.
GROUP BY a, b produces one row per (a, b) tuple. Every non-aggregate column in SELECT must appear in GROUP BY.
ROLLUP (a, b) = (a, b) + (a) + (). CUBE (a, b) = all 4 combinations. GROUPING SETS = explicit list. NULL in a grouped column flags a rollup level.
Conditional counting = SUM(CASE WHEN cond THEN 1 ELSE 0 END) (portable) or COUNT(*) FILTER (WHERE cond) (Postgres / DuckDB / SQLite 3.30+).

GROUP BY: five drills

1. COUNT distinctions - three forms, three meanings (Easy) #

2. WHERE vs HAVING - filter timing matters (Easy) #

3. Multi-column grouping (Medium) #

4. ROLLUP - subtotals and grand total in one query (Medium) #

5. Conditional aggregation - count subsets in one pass (Hard) #

Related In Skill Drills