GROUP BY
An aggregate without GROUP BY collapses the whole table into one row.GROUP BY splits the table into buckets first, then runs the aggregate once per bucket. That's how you get "orders per status", "revenue per day", or "average basket size per user".
Syntax #
List the columns you want to group on after GROUP BY. Each unique combination becomes one output row.
SELECT
status,
COUNT(*) AS order_count,
ROUND(SUM(total), 2) AS revenue
FROM orders
GROUP BY status;One output row per distinct status value. The aggregates compute within each group.
The Core Rule #
Every non-aggregated column in SELECT must appear in GROUP BY.
If you SELECT status, user_id, COUNT(*) and only write GROUP BY status, SQL doesn't know which user_id to return for a group with many users. You'll get a clear error. Either add user_id to GROUP BY (narrower grain) or remove it from SELECT.
Multiple Grouping Columns #
Grouping on multiple columns gives one row per unique combination.
SELECT
status,
user_id,
COUNT(*) AS orders
FROM orders
GROUP BY status, user_id;The grain is now "per (status, user_id) pair". Adding more columns to GROUP BY makes the grain finer: more output rows, smaller groups.
Grouping on Expressions #
You can group on computed values, not just raw columns. A daily revenue rollup groups by a truncated timestamp:
SELECT
created_at::DATE AS order_day,
COUNT(*) AS orders,
ROUND(SUM(total), 2) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY created_at::DATE
ORDER BY order_day;created_at::DATE casts the timestamp down to a date, stripping the time component. Any function, CASE expression, or arithmetic can be a grouping key, as long as the same expression appears in SELECT (or SQL can match it by alias, in some dialects).
NULL as a Group #
If the grouping column has NULL values, all of them collapse into a single NULL group.
This is the one place NULL = NULL effectively holds, similar to DISTINCT.
Try It #
Group orders by status and compute three metrics per group.
Practice #
For paid orders only, return product_id and product_revenue (sum of total per product, rounded to 2 decimals). One row per distinct paid-order product_id. No ordering requirement.
Mistakes to Watch For #
- Selecting a non-aggregated column that isn't in
GROUP BY. The query errors out. - Grouping at the wrong grain: grouping by
user_idwhen you meant per-user-per-day, or vice versa. Write the grain in words before writing theGROUP BY. - Assuming aggregates see the ungrouped rows. They don't. Once you add
GROUP BY, every aggregate operates inside its group. - Joining tables before aggregating can inflate
SUMandCOUNTdue to fan-out. The Joins section covers the fix; for now, aggregate before joining whenever possible.
Knowledge check #
4 questions
After
GROUP BY region, which columns can appear inSELECTwithout an aggregate?You want to keep only groups where the total revenue exceeds 10,000. Where does the filter go?
A column you group by has some
NULLvalues. What happens?SELECT user_id, COUNT(*) FROM orders GROUP BY user_idreturns:
Next Step #
Continue to HAVING to filter groups after the aggregates have been computed.