DISTINCT
DISTINCT removes duplicate rows from a result.
It sounds simple, and on one column it is. But the word "duplicate" hides a couple of subtleties worth knowing before you use it in real queries.
One Column #
Place DISTINCT right after SELECT. The query returns each unique value once.
SELECT DISTINCT status
FROM orders;This returns each status once, useful for quickly seeing the set of values a column takes.
Multiple Columns: Whole-Row Uniqueness #
DISTINCT applies to the entire row of selected columns, not just the first one.
SELECT DISTINCT
user_id,
status
FROM orders;One row is returned for every unique (user_id, status) pair. A user can appear several times, once per distinct status they have.
This trips people up: they expect one row per user_id, then get many.
COUNT(DISTINCT col) #
When you want to count unique values, put DISTINCT inside the aggregate.
SELECT COUNT(DISTINCT user_id) AS unique_buyers
FROM orders
WHERE status = 'paid';COUNT(*) would count orders. COUNT(DISTINCT user_id) counts the number of different users who placed at least one paid order. Mixing these up is one of the most common beginner analytics bugs.
DISTINCT and NULL #
Even though SQL normally treats NULL as "unknown", DISTINCT treats all NULL values as equal. A column full of NULLs collapses to a single NULL in the output.
This is one of the few places in SQL where NULL = NULL effectively holds. GROUP BY does the same, collapsing all NULLs into a single group.
DISTINCT is often a sign that something upstream is wrong.
If your query returns duplicates you did not expect, DISTINCT hides the symptom instead of fixing the cause, usually a join that fans out rows, or a missing GROUP BY. Use DISTINCT deliberately. When in doubt, ask first: where did the duplicates come from?
Try It #
Run each variant and compare row counts.
Mistakes to Watch For #
- Expecting
DISTINCT col1, col2to deduplicate oncol1only. - Using
DISTINCTto paper over a bad join instead of fixing the join. - Confusing
COUNT(*)withCOUNT(DISTINCT col); they measure different things. - Adding
DISTINCT"just in case". It has a real cost because the database has to sort or hash the rows to deduplicate.
Practice #
How many different products appear in the orders table? Return a single row with one column called unique_products.
Knowledge check #
3 questions
SELECT DISTINCT region, country FROM ordersdeduplicates by:How does
DISTINCTtreatNULLvalues?SELECT DISTINCT region FROM ordersandSELECT region FROM orders GROUP BY regionare:
Next Step #
Continue to Aliases to rename columns and tables so your queries stay readable as they grow.