Cheat sheet

SQL window functions cheat sheet

Q: What's the difference between OVER and GROUP BY?

GROUP BY collapses rows to one output row per group. OVER (window function) computes a per-group value but keeps every input row. Use GROUP BY when you only want the aggregate; use OVER when you need each row alongside the aggregate.

Q: When should I use ROW_NUMBER, RANK, or DENSE_RANK?

ROW_NUMBER always assigns unique sequential numbers, best for "exactly N rows per group" (Top-N pattern). RANK gives ties the same rank but skips numbers (1, 2, 2, 4), like a sports leaderboard. DENSE_RANK gives ties the same rank without gaps (1, 2, 2, 3), like pricing tiers.

Q: Can I filter on a window function in WHERE?

No. Window functions are evaluated after WHERE. Wrap the query in a subquery / CTE and filter on the outer level: SELECT FROM (SELECT , ROW_NUMBER() OVER (...) AS rn FROM t) WHERE rn = 1. On DuckDB, Snowflake, BigQuery, and Databricks, QUALIFY lets you filter window results in one query.

Q: Why does LAST_VALUE return the wrong row?

The default frame with ORDER BY is UNBOUNDED PRECEDING TO CURRENT ROW, so LAST_VALUE returns the current row's value. To get the actual partition's last value, specify the full frame: LAST_VALUE(col) OVER (PARTITION BY g ORDER BY t ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).

Q: What's the difference between ROWS and RANGE in a window frame?

ROWS walks physical rows. RANGE walks by the ORDER BY value, meaning ties on the order column are all included together. Use ROWS for "last 7 rows"; use RANGE for "last 7 days." RANGE with intervals (RANGE BETWEEN INTERVAL '7' DAY PRECEDING ...) is supported on Postgres 11+, Oracle, and DuckDB.

Window functions compute a value across a set of rows related to the current row without collapsing rows like GROUP BY does. The whole family on one page.

The OVER clause #

sql

function(args) OVER (
  [PARTITION BY col_a, col_b]      -- group rows; analogous to GROUP BY
  [ORDER BY col_t [ASC|DESC]]      -- sort within each partition
  [ROWS|RANGE BETWEEN ... AND ...] -- frame: which rows in the partition to operate on
)

Ranking: ROW_NUMBER, RANK, DENSE_RANK #

sql

SELECT
  user_id,
  total,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rn,    -- 1, 2, 3, 4
  RANK()       OVER (PARTITION BY user_id ORDER BY total DESC) AS rk,    -- 1, 2, 2, 4 (ties skip)
  DENSE_RANK() OVER (PARTITION BY user_id ORDER BY total DESC) AS dr     -- 1, 2, 2, 3 (ties don't skip)
FROM orders;

Pick ROW_NUMBER for "exactly one row per rank position" (Top-N). RANK for "ties share rank, gaps OK." DENSE_RANK for "ties share rank, no gaps."

Top-N per group (the most useful pattern) #

sql

-- 3 most recent orders per user.
SELECT *
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) AS rn
  FROM orders
) ranked
WHERE rn <= 3;

-- DuckDB / Snowflake / BigQuery / Databricks: use QUALIFY to skip the wrapper.
SELECT *
FROM   orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) <= 3;

LAG / LEAD: previous and next row #

sql

SELECT
  day,
  revenue,
  LAG(revenue)  OVER (ORDER BY day) AS prev_day,
  LEAD(revenue) OVER (ORDER BY day) AS next_day,
  revenue - LAG(revenue) OVER (ORDER BY day) AS day_over_day_delta,
  LAG(revenue, 7)    OVER (ORDER BY day) AS revenue_week_ago,
  LAG(revenue, 1, 0) OVER (ORDER BY day) AS prev_or_zero       -- default for first row
FROM daily_revenue;

Aggregates as window functions #

sql

SELECT
  region,
  user_id,
  revenue,
  -- Group total without collapsing rows.
  SUM(revenue)   OVER (PARTITION BY region)                   AS region_total,
  -- Per-row percentage of region total.
  100.0 * revenue / SUM(revenue) OVER (PARTITION BY region)   AS pct_of_region,
  -- Running total within region, ordered by date.
  SUM(revenue)   OVER (PARTITION BY region ORDER BY day)      AS running_total,
  -- Average so far for that region.
  AVG(revenue)   OVER (PARTITION BY region ORDER BY day)      AS running_avg,
  -- Min / Max so far.
  MIN(revenue)   OVER (PARTITION BY region ORDER BY day)      AS region_low_so_far,
  MAX(revenue)   OVER (PARTITION BY region ORDER BY day)      AS region_high_so_far
FROM region_daily;

Frames (ROWS / RANGE) #

sql

-- Last 7 rows (rolling 7-day if the data is one row per day).
SUM(revenue) OVER (
  ORDER BY day
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)

-- 30-day moving average.
AVG(revenue) OVER (
  ORDER BY day
  ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
)

-- Forward-looking: this row plus next 4.
SUM(revenue) OVER (
  ORDER BY day
  ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING
)

-- Whole partition (e.g., for LAST_VALUE to actually return the last).
LAST_VALUE(revenue) OVER (
  PARTITION BY region ORDER BY day
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Default frame with ORDER BY: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Includes ties on the order column. ROWS walks physical rows, usually what you want when ties exist.

FIRST_VALUE / LAST_VALUE / NTH_VALUE #

sql

-- First and last revenue in each region (sorted by date).
SELECT
  region,
  day,
  revenue,
  FIRST_VALUE(revenue) OVER (PARTITION BY region ORDER BY day)               AS first_day_rev,
  LAST_VALUE(revenue)  OVER (PARTITION BY region ORDER BY day
                              ROWS BETWEEN UNBOUNDED PRECEDING
                                       AND UNBOUNDED FOLLOWING)              AS last_day_rev,
  NTH_VALUE(revenue, 2) OVER (PARTITION BY region ORDER BY day)              AS second_day_rev
FROM region_daily;

Trap: LAST_VALUE with the default frame returns the current row's value, not the partition's last. Always specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for "the actual last."

NTILE: quartiles, deciles, percentiles #

sql

SELECT
  user_id,
  revenue,
  NTILE(4)   OVER (ORDER BY revenue) AS quartile,
  NTILE(10)  OVER (ORDER BY revenue) AS decile,
  NTILE(100) OVER (ORDER BY revenue) AS percentile
FROM user_revenue;

PERCENT_RANK / CUME_DIST #

sql

SELECT
  value,
  PERCENT_RANK() OVER (ORDER BY value) AS pct_rank,        -- 0 to 1
  CUME_DIST()    OVER (ORDER BY value) AS cumulative_dist  -- 0 to 1
FROM measurements;

PERCENTILE_CONT / PERCENTILE_DISC #

sql

-- As an aggregate (one row per group): WITHIN GROUP form.
SELECT
  region,
  PERCENTILE_CONT(0.5)  WITHIN GROUP (ORDER BY revenue) AS median,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY revenue) AS p95
FROM   user_revenue
GROUP  BY region;

-- As a window function: per-row.
SELECT
  region, user_id, revenue,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue)
    OVER (PARTITION BY region) AS region_median
FROM user_revenue;

Reusable WINDOW clause (avoid copy-paste) #

sql

SELECT
  user_id,
  total,
  ROW_NUMBER() OVER w AS rn,
  RANK()       OVER w AS rk,
  LAG(total)   OVER w AS prev_total
FROM   orders
WINDOW w AS (PARTITION BY user_id ORDER BY order_date);     -- Postgres / MySQL 8 / DuckDB / SQLite / BigQuery / SQL Server 2022+ / Oracle 21c+ (not Snowflake)

Common mistakes #

Mixing GROUP BY and window functions naively: window functions run after GROUP BY. SUM(total) OVER (...) over a grouped query operates on the grouped rows, not the raw data.
LAST_VALUE returning the wrong value: default frame is UNBOUNDED PRECEDING ... CURRENT ROW, so LAST_VALUE is the same as the current row's value. Always specify UNBOUNDED FOLLOWING explicitly.
ORDER BY with no tie-breaker: ROW_NUMBER over tied values is non-deterministic. Add a tie-breaker column (ORDER BY t, id) for reproducible results.
Using window functions where GROUP BY would do: if you don't need per-row output, plain GROUP BY is faster.

FAQ #

What's the difference between OVER and GROUP BY?

GROUP BY collapses rows to one output row per group. OVER (window function) computes a per-group value but keeps every input row. Use GROUP BY when you only want the aggregate; use OVER when you need each row alongside the aggregate.

When should I use ROW_NUMBER, RANK, or DENSE_RANK?

ROW_NUMBER always assigns unique sequential numbers, best for "exactly N rows per group" (Top-N pattern). RANK gives ties the same rank but skips numbers (1, 2, 2, 4), like a sports leaderboard. DENSE_RANK gives ties the same rank without gaps (1, 2, 2, 3), like pricing tiers.

Can I filter on a window function in WHERE?

No. Window functions are evaluated after WHERE. Wrap the query in a subquery / CTE and filter on the outer level: SELECT FROM (SELECT , ROW_NUMBER() OVER (...) AS rn FROM t) WHERE rn = 1. On DuckDB, Snowflake, BigQuery, and Databricks, QUALIFY lets you filter window results in one query.

Why does LAST_VALUE return the wrong row?

The default frame with ORDER BY is UNBOUNDED PRECEDING TO CURRENT ROW, so LAST_VALUE returns the current row's value. To get the actual partition's last value, specify the full frame: LAST_VALUE(col) OVER (PARTITION BY g ORDER BY t ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).

What's the difference between ROWS and RANGE in a window frame?

ROWS walks physical rows. RANGE walks by the ORDER BY value, meaning ties on the order column are all included together. Use ROWS for "last 7 rows"; use RANGE for "last 7 days." RANGE with intervals (RANGE BETWEEN INTERVAL '7' DAY PRECEDING ...) is supported on Postgres 11+, Oracle, and DuckDB.

How do I compute a 7-day rolling average?

AVG(value) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). The ROWS BETWEEN frame is required: the default frame would give a cumulative average, not a rolling one. For per-group rolling averages, add PARTITION BY group_col inside the OVER.

Tip

Two patterns cover 80% of analytics window-function work. (1) Top-N per group via ROW_NUMBER. (2) Running total / rolling average via SUM / AVG with ORDER BY and a frame. Master those two and the rest are variants. Detail pages: Running total, LAG / LEAD, First/last per group, Ranking.