Lesson Intermediate

Top-N Per Group

"Top 3 products per category." "Highest-spending customer per region." "Latest order per user." These all fit the same shape: rank within a group, then keep the top N.
The canonical pattern is a two-step query:

sql

WITH ranked AS (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY group_key
    ORDER BY sort_key DESC
  ) AS rn
  FROM t
)
SELECT * FROM ranked WHERE rn <= N;

You need a CTE (or subquery) because WHERE can't filter on a window function in the same SELECT. DuckDB and Snowflake accept QUALIFY as a one-liner; everywhere else uses this shape.

Top-1 (Latest / Best Per Group) #

The most common specialization keeps exactly one row per group: latest order per user, highest-scoring attempt per student, most-recent status per ticket.

sql

-- Latest order per user
WITH ranked AS (
  SELECT
    id AS order_id,
    user_id,
    total,
    created_at,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY created_at DESC, id DESC
    ) AS rn
  FROM orders
)
SELECT order_id, user_id, total, created_at
FROM ranked
WHERE rn = 1;

Top-N (N > 1) #

sql

-- Top 2 paid orders per user by total
WITH ranked AS (
  SELECT
    id AS order_id,
    user_id,
    total,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY total DESC, id
    ) AS rn
  FROM orders
  WHERE status = 'paid'
)
SELECT order_id, user_id, total, rn
FROM ranked
WHERE rn <= 2
ORDER BY user_id, rn;

Warning

Always tie-break the ORDER BY inside the window.
If two rows share every column in ORDER BY total DESC, the database is free to pick either one for rn = 1, and the choice can change between runs. Add a stable tie-breaker (usually a primary key): ORDER BY total DESC, id. The output becomes deterministic, which matters for tests, dashboards, and anything downstream that cares which row won.

ROW_NUMBER vs RANK vs DENSE_RANK #

Which ranking function you use depends on tie semantics:

ROW_NUMBER(): want exactly N rows per group, ties broken arbitrarily (or by your tie-breaker). Default choice.
RANK(): ties share a rank. WHERE rn <= 3 can return more than 3 rows if rank 3 has ties.
DENSE_RANK(): distinct levels, no gaps. "Top 3 distinct totals" rather than "top 3 rows".

If "exactly N rows" matters for the report, use ROW_NUMBER. If ties should survive the cut, use RANK.

Alternative: QUALIFY (DuckDB, Snowflake) #

DuckDB supports QUALIFY as a shorthand: same result, less syntax:

sql

-- Same top-2-per-user, no CTE needed
SELECT id AS order_id, user_id, total
FROM orders
WHERE status = 'paid'
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC, id) <= 2;

Stick with the CTE form when writing portable SQL. QUALIFY works in DuckDB, Snowflake, BigQuery, and Teradata, but not Postgres or MySQL.

Try It #

Each user's top 2 paid orders by total. Watch tie behavior across the ranking functions in the hints.

Loading SQL editor...

Practice #

Return each user's top 2 paid orders by total, tie-broken by lowest order_id. Output: order_id, user_id, total, rn (1 or 2). No ordering requirement.

Loading SQL editor...

Mistakes to Watch For #

Writing WHERE rn <= N in the same SELECT that computes rn. Window functions run after WHERE, so use a CTE, subquery, or QUALIFY.
Forgetting the tie-breaker, which makes "top row per group" non-deterministic when the sort key ties.
Using RANK when you meant ROW_NUMBER: WHERE rn <= 3 can return 5 rows if ranks 1–3 have ties.
Using LIMIT N at the end of the outer query hoping it means "per group". LIMIT is global; it cuts the final result to N rows total. Per-group limiting needs ranking.
PARTITION BY on the wrong column. "Top 2 per category" means PARTITION BY category, not PARTITION BY product_id. Double-check the group key matches the business question.

Knowledge check #

4 questions

0 / 4 answered

For "top 3 orders per user," the canonical pattern is:
You filter WHERE rn <= 3 after computing rn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC). The result returns:
For "top 3 with ties — return all rows tied at rank 3," replace ROW_NUMBER with:
On DuckDB, Snowflake, BigQuery, the wrapper subquery for filtering on ROW_NUMBER can be replaced with:

Next Step #

Continue to Cohorts & retention, grouping users by signup period and tracking how many return over later months.