Recipe

Find duplicate rows

Short answer: SELECT col, COUNT() FROM t GROUP BY col HAVING COUNT() > 1 finds duplicates by a single column. For "rows that match on multiple columns," group by all of them. For "rows that are exact duplicates of every column," use a window function with ROW_NUMBER(). Choose by what exactly counts as a duplicate in your data.

Canonical SQL #

sql
-- Single-column dupes (e.g., duplicate emails).
SELECT email, COUNT(*) AS dup_count
FROM   users
GROUP  BY email
HAVING COUNT(*) > 1;

-- Multi-column dupes (e.g., same name + signup date).
SELECT name, created_at, COUNT(*) AS dup_count
FROM   users
GROUP  BY name, created_at
HAVING COUNT(*) > 1;

-- Whole-row dupes (group by every column).
SELECT id, email, name, created_at, COUNT(*) AS dup_count
FROM   users
GROUP  BY id, email, name, created_at
HAVING COUNT(*) > 1;

-- See the duplicate rows themselves (not just counts) — using ROW_NUMBER.
SELECT *
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
  FROM users
) ranked
WHERE rn > 1;

Example #

Loading SQL editor...

Common variants #

sql
-- Duplicates after normalization (case-insensitive, whitespace-trimmed).
SELECT LOWER(TRIM(email)) AS normalized_email, COUNT(*) AS dup_count
FROM   users
GROUP  BY LOWER(TRIM(email))
HAVING COUNT(*) > 1;

-- Duplicates within a recent window only.
SELECT email, COUNT(*) AS dup_count
FROM   users
WHERE  created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP  BY email
HAVING COUNT(*) > 1;

-- "True" duplicates including row count beyond the simple GROUP BY (every column).
WITH counted AS (
  SELECT u.*,
         COUNT(*) OVER (PARTITION BY id, email, name, created_at) AS row_count
  FROM users u
)
SELECT * FROM counted WHERE row_count > 1;

-- Find duplicate composite keys in a fact table (data quality check).
SELECT order_id, product_id, COUNT(*) AS dup_count
FROM   ecom_order_items
GROUP  BY order_id, product_id
HAVING COUNT(*) > 1;

Dialect notes #

  • Every major engine supports GROUP BY ... HAVING and window functions for this. The syntax above works on Postgres, MySQL, SQL Server, Oracle, Snowflake, BigQuery, DuckDB, Redshift, SQLite.
  • DuckDB / Snowflake / BigQuery / Databricks support QUALIFY, which lets you skip the subquery wrapper: SELECT * FROM signups QUALIFY ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) > 1. See QUALIFY.
  • Postgres has the DISTINCT ON (col) shortcut for "one row per value of col, keeping the first by some order" — useful for the opposite operation (keep canonical, drop dupes). Not standard SQL.
Tip

**A duplicate-detection query without a normalization step usually misses real duplicates.** 'ana@example.com' and 'Ana@Example.com ' (with trailing space) are different strings to the database; they are the same email to a human. Always think about which fields might have casing, whitespace, leading-zero, or accented-character variations and apply normalization (LOWER, TRIM, UNACCENT) inside the GROUP BY before declaring the data clean. The next step after finding duplicates is usually deciding what to keep and removing the rest: see Delete duplicate rows.