Rank rows within a group
Short answer: three ranking functions, three different tie-handling behaviors. ROW_NUMBER gives every row a unique rank (ties broken arbitrarily). RANK gives tied rows the same rank but skips the next number (1, 2, 2, 4). DENSE_RANK gives tied rows the same rank without skipping (1, 2, 2, 3). Pick by what should happen on ties in your data.
Quick comparison #
| Score: 95, 90, 90, 85 | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 95 | 1 | 1 | 1 |
| 90 (tied) | 2 | 2 | 2 |
| 90 (tied) | 3 | 2 | 2 |
| 85 | 4 | 4 | 3 |
| Use for | "exactly one row per rank" (Top-N) | "tied scores share rank, gaps OK" (sports) | "tied scores share rank, no gaps" (pricing tiers) |
Canonical SQL #
SELECT
user_id,
region,
revenue,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) AS row_num,
RANK() OVER (PARTITION BY region ORDER BY revenue DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY region ORDER BY revenue DESC) AS dense_rnk
FROM user_revenue;Example #
Common patterns #
-- Top-N per group: most common analytics ask.
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) AS rn
FROM user_revenue
) ranked
WHERE rn <= 5; -- Top 5 per region
-- Percentile rank (0 to 1).
SELECT
user_id,
revenue,
PERCENT_RANK() OVER (ORDER BY revenue) AS pct_rank,
CUME_DIST() OVER (ORDER BY revenue) AS cumulative_dist
FROM user_revenue;
-- "Highest in group" tie-broken by a secondary column.
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY revenue DESC, signup_date ASC, user_id ASC -- multi-key tie break
) AS rn
FROM user_revenue
) ranked
WHERE rn = 1;
-- NTILE: split rows into N equal-sized buckets (quartiles, deciles).
SELECT
user_id,
revenue,
NTILE(4) OVER (ORDER BY revenue) AS quartile
FROM user_revenue;When to use which #
ROW_NUMBER: when you need exactly one row per rank position. For "Top 3 customers per region," you want exactly 3 rows per region, even if two customers are tied at #3. The engine picks one; add a deterministic tie-breaker column to make the choice reproducible.
RANK: when ties should share a rank but the next position should reflect how many rows came before it. Sports leaderboards work this way: two athletes tied for second place, the next is fourth (not third). RANK produces 1, 2, 2, 4.
DENSE_RANK: when ties should share a rank but the next position is just the next integer. Pricing tiers and segmentation buckets work this way, with Tier 1 / Tier 2 / Tier 3 regardless of how many products are in each tier. DENSE_RANK produces 1, 2, 2, 3.
NTILE(n): divides rows into n evenly-sized buckets. Quartiles (NTILE(4)), deciles (NTILE(10)), percentiles (NTILE(100)). Different problem from ranking, same family of functions.
Dialect notes #
ROW_NUMBER,RANK,DENSE_RANK,NTILE,PERCENT_RANK,CUME_DIST: ANSI standard, supported on Postgres, MySQL (8+), SQL Server, Oracle, Snowflake, BigQuery, DuckDB, Redshift, SQLite (3.25+).QUALIFYclause: DuckDB / Snowflake / BigQuery / Databricks / Teradata let you filter on a window function without the subquery wrapper:SELECT ... QUALIFY ROW_NUMBER() OVER (...) <= 5. Cleaner.- MySQL pre-8.0: no window functions. Emulate
ROW_NUMBERper group with user-defined variables (@row := IF(@grp = group, @row + 1, 1)); it works but is fragile. RANKvsDENSE_RANKperformance: identical on every modern engine. Choose by semantics, not by speed.
The single most-Googled SQL question is "Top N per group," and ROW_NUMBER() OVER (PARTITION BY g ORDER BY metric DESC) <= N (filtered in a subquery or with QUALIFY) is the canonical answer. Memorize the pattern, swap ROW_NUMBER for RANK / DENSE_RANK based on tie semantics, and you've got 80% of the analytics ranking work covered. References: Top N per group pattern, first or last row per group.