Lesson Beginner

GROUP BY

An aggregate without GROUP BY collapses the whole table into one row.
GROUP BY splits the table into buckets first, then runs the aggregate once per bucket. That's how you get "orders per status", "revenue per day", or "average basket size per user".

Syntax #

List the columns you want to group on after GROUP BY. Each unique combination becomes one output row.

sql

SELECT
  status,
  COUNT(*) AS order_count,
  ROUND(SUM(total), 2) AS revenue
FROM orders
GROUP BY status;

One output row per distinct status value. The aggregates compute within each group.

The Core Rule #

Warning

Every non-aggregated column in SELECT must appear in GROUP BY.
If you SELECT status, user_id, COUNT(*) and only write GROUP BY status, SQL doesn't know which user_id to return for a group with many users. You'll get a clear error. Either add user_id to GROUP BY (narrower grain) or remove it from SELECT.

Multiple Grouping Columns #

Grouping on multiple columns gives one row per unique combination.

sql

SELECT
  status,
  user_id,
  COUNT(*) AS orders
FROM orders
GROUP BY status, user_id;

The grain is now "per (status, user_id) pair". Adding more columns to GROUP BY makes the grain finer: more output rows, smaller groups.

Grouping on Expressions #

You can group on computed values, not just raw columns. A daily revenue rollup groups by a truncated timestamp:

sql

SELECT
  created_at::DATE AS order_day,
  COUNT(*) AS orders,
  ROUND(SUM(total), 2) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY created_at::DATE
ORDER BY order_day;

created_at::DATE casts the timestamp down to a date, stripping the time component. Any function, CASE expression, or arithmetic can be a grouping key, as long as the same expression appears in SELECT (or SQL can match it by alias, in some dialects).

NULL as a Group #

If the grouping column has NULL values, all of them collapse into a single NULL group.
This is the one place NULL = NULL effectively holds, similar to DISTINCT.

Try It #

Group orders by status and compute three metrics per group.

Loading SQL editor...

Practice #

For paid orders only, return product_id and product_revenue (sum of total per product, rounded to 2 decimals). One row per distinct paid-order product_id. No ordering requirement.

Loading SQL editor...

Mistakes to Watch For #

Selecting a non-aggregated column that isn't in GROUP BY. The query errors out.
Grouping at the wrong grain: grouping by user_id when you meant per-user-per-day, or vice versa. Write the grain in words before writing the GROUP BY.
Assuming aggregates see the ungrouped rows. They don't. Once you add GROUP BY, every aggregate operates inside its group.
Joining tables before aggregating can inflate SUM and COUNT due to fan-out. The Joins section covers the fix; for now, aggregate before joining whenever possible.

Knowledge check #

4 questions

0 / 4 answered

After GROUP BY region, which columns can appear in SELECT without an aggregate?
You want to keep only groups where the total revenue exceeds 10,000. Where does the filter go?
A column you group by has some NULL values. What happens?
SELECT user_id, COUNT(*) FROM orders GROUP BY user_id returns:

Next Step #

Continue to HAVING to filter groups after the aggregates have been computed.