Lesson Beginner

DISTINCT

DISTINCT removes duplicate rows from a result.
It sounds simple, and on one column it is. But the word "duplicate" hides a couple of subtleties worth knowing before you use it in real queries.

One Column #

Place DISTINCT right after SELECT. The query returns each unique value once.

sql
SELECT DISTINCT status
FROM orders;

This returns each status once, useful for quickly seeing the set of values a column takes.

Multiple Columns: Whole-Row Uniqueness #

DISTINCT applies to the entire row of selected columns, not just the first one.

sql
SELECT DISTINCT
  user_id,
  status
FROM orders;

One row is returned for every unique (user_id, status) pair. A user can appear several times, once per distinct status they have.
This trips people up: they expect one row per user_id, then get many.

COUNT(DISTINCT col) #

When you want to count unique values, put DISTINCT inside the aggregate.

sql
SELECT COUNT(DISTINCT user_id) AS unique_buyers
FROM orders
WHERE status = 'paid';

COUNT(*) would count orders. COUNT(DISTINCT user_id) counts the number of different users who placed at least one paid order. Mixing these up is one of the most common beginner analytics bugs.

DISTINCT and NULL #

Even though SQL normally treats NULL as "unknown", DISTINCT treats all NULL values as equal. A column full of NULLs collapses to a single NULL in the output.
This is one of the few places in SQL where NULL = NULL effectively holds. GROUP BY does the same, collapsing all NULLs into a single group.

Warning

DISTINCT is often a sign that something upstream is wrong.
If your query returns duplicates you did not expect, DISTINCT hides the symptom instead of fixing the cause, usually a join that fans out rows, or a missing GROUP BY. Use DISTINCT deliberately. When in doubt, ask first: where did the duplicates come from?

Try It #

Run each variant and compare row counts.

Loading SQL editor...

Mistakes to Watch For #

  • Expecting DISTINCT col1, col2 to deduplicate on col1 only.
  • Using DISTINCT to paper over a bad join instead of fixing the join.
  • Confusing COUNT(*) with COUNT(DISTINCT col); they measure different things.
  • Adding DISTINCT "just in case". It has a real cost because the database has to sort or hash the rows to deduplicate.

Practice #

How many different products appear in the orders table? Return a single row with one column called unique_products.

Loading SQL editor...

Knowledge check #

3 questions

0 / 3 answered
  1. SELECT DISTINCT region, country FROM orders deduplicates by:

  2. How does DISTINCT treat NULL values?

  3. SELECT DISTINCT region FROM orders and SELECT region FROM orders GROUP BY region are:

Next Step #

Continue to Aliases to rename columns and tables so your queries stay readable as they grow.