Top-N Per Group
"Top 3 products per category." "Highest-spending customer per region." "Latest order per user." These all fit the same shape: rank within a group, then keep the top N.
The canonical pattern is a two-step query:
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY group_key
ORDER BY sort_key DESC
) AS rn
FROM t
)
SELECT * FROM ranked WHERE rn <= N;You need a CTE (or subquery) because WHERE can't filter on a window function in the same SELECT. DuckDB and Snowflake accept QUALIFY as a one-liner; everywhere else uses this shape.
Top-1 (Latest / Best Per Group) #
The most common specialization keeps exactly one row per group: latest order per user, highest-scoring attempt per student, most-recent status per ticket.
-- Latest order per user
WITH ranked AS (
SELECT
id AS order_id,
user_id,
total,
created_at,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY created_at DESC, id DESC
) AS rn
FROM orders
)
SELECT order_id, user_id, total, created_at
FROM ranked
WHERE rn = 1;Top-N (N > 1) #
-- Top 2 paid orders per user by total
WITH ranked AS (
SELECT
id AS order_id,
user_id,
total,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY total DESC, id
) AS rn
FROM orders
WHERE status = 'paid'
)
SELECT order_id, user_id, total, rn
FROM ranked
WHERE rn <= 2
ORDER BY user_id, rn;Always tie-break the ORDER BY inside the window.
If two rows share every column in ORDER BY total DESC, the database is free to pick either one for rn = 1, and the choice can change between runs. Add a stable tie-breaker (usually a primary key): ORDER BY total DESC, id. The output becomes deterministic, which matters for tests, dashboards, and anything downstream that cares which row won.
ROW_NUMBER vs RANK vs DENSE_RANK #
Which ranking function you use depends on tie semantics:
ROW_NUMBER(): want exactly N rows per group, ties broken arbitrarily (or by your tie-breaker). Default choice.RANK(): ties share a rank.WHERE rn <= 3can return more than 3 rows if rank 3 has ties.DENSE_RANK(): distinct levels, no gaps. "Top 3 distinct totals" rather than "top 3 rows".
If "exactly N rows" matters for the report, use ROW_NUMBER. If ties should survive the cut, use RANK.
Alternative: QUALIFY (DuckDB, Snowflake) #
DuckDB supports QUALIFY as a shorthand: same result, less syntax:
-- Same top-2-per-user, no CTE needed
SELECT id AS order_id, user_id, total
FROM orders
WHERE status = 'paid'
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC, id) <= 2;Stick with the CTE form when writing portable SQL. QUALIFY works in DuckDB, Snowflake, BigQuery, and Teradata, but not Postgres or MySQL.
Try It #
Each user's top 2 paid orders by total. Watch tie behavior across the ranking functions in the hints.
Practice #
Return each user's top 2 paid orders by total, tie-broken by lowest order_id. Output: order_id, user_id, total, rn (1 or 2). No ordering requirement.
Mistakes to Watch For #
- Writing
WHERE rn <= Nin the sameSELECTthat computesrn. Window functions run afterWHERE, so use a CTE, subquery, orQUALIFY. - Forgetting the tie-breaker, which makes "top row per group" non-deterministic when the sort key ties.
- Using
RANKwhen you meantROW_NUMBER:WHERE rn <= 3can return 5 rows if ranks 1–3 have ties. - Using
LIMIT Nat the end of the outer query hoping it means "per group".LIMITis global; it cuts the final result to N rows total. Per-group limiting needs ranking. PARTITION BYon the wrong column. "Top 2 per category" meansPARTITION BY category, notPARTITION BY product_id. Double-check the group key matches the business question.
Knowledge check #
4 questions
For "top 3 orders per user," the canonical pattern is:
You filter
WHERE rn <= 3after computingrn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC). The result returns:For "top 3 with ties — return all rows tied at rank 3," replace
ROW_NUMBERwith:On DuckDB, Snowflake, BigQuery, the wrapper subquery for filtering on
ROW_NUMBERcan be replaced with:
Next Step #
Continue to Cohorts & retention, grouping users by signup period and tracking how many return over later months.