Lesson Intermediate

QUALIFY

Every window-function lesson so far has hit the same wall: you compute rn, then you can't filter on it.

sql
-- Fails: rn doesn't exist when WHERE runs
SELECT
  id,
  user_id,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM orders
WHERE rn = 1;

WHERE is evaluated before window functions are computed, and HAVING filters aggregates, not windows. So the standard workaround is the two-step CTE you've been writing: compute rn in a CTE, filter WHERE rn = 1 outside.

QUALIFY removes the workaround. It's a dedicated filter clause that runs after window functions are computed: WHERE for rows, HAVING for groups, QUALIFY for window results.

Syntax #

sql
SELECT column_list
FROM t
WHERE row_filter          -- runs first, before windows
GROUP BY ...              -- optional
HAVING group_filter       -- optional
QUALIFY window_predicate  -- runs after windows are computed
ORDER BY ...
LIMIT n;

The logical evaluation order:

FROMWHEREGROUP BYHAVINGwindow functionsQUALIFYDISTINCTORDER BYLIMIT

Two consequences worth internalizing:

  • Rows removed by WHERE never receive window numbers. The window only sees survivors.
  • QUALIFY sees fully-computed window values, so it can compare them, combine them with AND/OR, or test them against LAG/LEAD results.
Tip

DuckDB and Snowflake also let QUALIFY reference a window alias from SELECT: SELECT ..., ROW_NUMBER() OVER (...) AS rn FROM orders QUALIFY rn = 1. Handy when you want rn in the output anyway.

CTE vs QUALIFY: Same Query, Two Shapes #

The highest-total order within each status. First, the portable CTE form you already know:

sql
-- Portable everywhere
WITH ranked AS (
  SELECT
    id AS order_id,
    user_id,
    status,
    total,
    ROW_NUMBER() OVER (
      PARTITION BY status
      ORDER BY total DESC, id
    ) AS rn
  FROM orders
)
SELECT order_id, user_id, status, total
FROM ranked
WHERE rn = 1;

Now the QUALIFY form, identical result, no wrapper:

sql
-- DuckDB / Snowflake / BigQuery / Databricks / Teradata
SELECT id AS order_id, user_id, status, total
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY status ORDER BY total DESC, id) = 1;

Nothing changed semantically. QUALIFY is sugar over the CTE pattern, and the optimizer treats both the same. What changed is readability: the filter sits next to the query it filters, and there's one less name (ranked, rn) to invent.

The Three Canonical Uses #

Top-N per group #

sql
-- Top 2 paid orders per user by total
SELECT id AS order_id, user_id, total
FROM orders
WHERE status = 'paid'
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC, id) <= 2;

Dedupe: keep the latest row per key #

sql
-- Most recent order per user, full row detail
SELECT id, user_id, total, status, created_at
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC, id DESC) = 1;

Change detection with LAG #

QUALIFY isn't limited to ranking functions; any window result works. Here's "rows where the value differs from the previous row," the building block of change-history tables:

sql
-- Orders where the total differs from the user's previous order
SELECT user_id, id AS order_id, total, created_at
FROM orders
QUALIFY total IS DISTINCT FROM
  LAG(total) OVER (PARTITION BY user_id ORDER BY created_at, id);

IS DISTINCT FROM instead of <> is deliberate: each user's first order has LAG(total) = NULL, and total <> NULL is NULL (filtered out). IS DISTINCT FROM treats NULL as a comparable value, so first rows survive, which is usually what change detection wants.

Portability #

QUALIFY is not standard SQL. It originated in Teradata and spread through the analytical engines:

EngineQUALIFYWhat to write instead
DuckDBYes
SnowflakeYes
BigQueryYes
Databricks / TeradataYes
PostgreSQLNoCTE / subquery + WHERE rn = 1
MySQLNoCTE / subquery + WHERE rn = 1
SQL ServerNoCTE / subquery + WHERE rn = 1
SQLiteNoCTE / subquery + WHERE rn = 1

The editors on this site run DuckDB, so QUALIFY works everywhere here. But know both forms: if your job runs on Postgres or SQL Server, or an interviewer asks for portable SQL, the CTE pattern is the answer. QUALIFY saves typing; the CTE version is the one that runs everywhere.

Try It #

Each user's highest-total order, with their name, in one query and no CTE. The QUALIFY line does what an entire wrapper query used to.

Loading SQL editor...

Practice #

Return each user's latest paid order: latest by created_at, tie-broken by highest id. Output: order_id, user_id, total, created_at. Use QUALIFY, with no CTE and no subquery. No ordering requirement on the final result.

Loading SQL editor...

Mistakes to Watch For #

  • Shipping QUALIFY to Postgres, MySQL, SQL Server, or SQLite, where it's a syntax error. Translate to the CTE + WHERE rn = 1 form before porting.
  • Still trying WHERE rn = 1 or HAVING rn = 1 on a window alias in the same SELECT. Both run before windows exist; QUALIFY is the clause that runs after.
  • Dropping the row filter and hoping QUALIFY handles it. Without WHERE status = 'paid', a cancelled order can take rn = 1 and knock out the paid order you wanted. Row filters belong in WHERE; window filters belong in QUALIFY.
  • Missing tie-breaker in the window's ORDER BY — same non-determinism as always, and QUALIFY does nothing to fix it.
  • Treating QUALIFY as a HAVING replacement. HAVING filters aggregate results after GROUP BY; QUALIFY filters window results. A query can legitimately use both.

Knowledge check #

4 questions

0 / 4 answered
  1. SELECT *, ROW_NUMBER() OVER (...) AS rn FROM orders WHERE rn = 1 fails because:

  2. QUALIFY is natively supported in:

  3. To port QUALIFY ROW_NUMBER() OVER (...) = 1 to Postgres, you:

  4. A query has both WHERE status = 'paid' and QUALIFY rn = 1. Which rows receive window numbers?

Next Step #

That wraps up the Window Functions section. You now have the full toolkit, including the clause that makes it pleasant to filter. Continue to String Cleanup, the TRIM / LOWER / REPLACE techniques that turn messy text into values you can join, group, and dedupe.