Find duplicate rows
Short answer: SELECT col, COUNT() FROM t GROUP BY col HAVING COUNT() > 1 finds duplicates by a single column. For "rows that match on multiple columns," group by all of them. For "rows that are exact duplicates of every column," use a window function with ROW_NUMBER(). Choose by what exactly counts as a duplicate in your data.
Canonical SQL #
-- Single-column dupes (e.g., duplicate emails).
SELECT email, COUNT(*) AS dup_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Multi-column dupes (e.g., same name + signup date).
SELECT name, created_at, COUNT(*) AS dup_count
FROM users
GROUP BY name, created_at
HAVING COUNT(*) > 1;
-- Whole-row dupes (group by every column).
SELECT id, email, name, created_at, COUNT(*) AS dup_count
FROM users
GROUP BY id, email, name, created_at
HAVING COUNT(*) > 1;
-- See the duplicate rows themselves (not just counts) — using ROW_NUMBER.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) ranked
WHERE rn > 1;Example #
Common variants #
-- Duplicates after normalization (case-insensitive, whitespace-trimmed).
SELECT LOWER(TRIM(email)) AS normalized_email, COUNT(*) AS dup_count
FROM users
GROUP BY LOWER(TRIM(email))
HAVING COUNT(*) > 1;
-- Duplicates within a recent window only.
SELECT email, COUNT(*) AS dup_count
FROM users
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY email
HAVING COUNT(*) > 1;
-- "True" duplicates including row count beyond the simple GROUP BY (every column).
WITH counted AS (
SELECT u.*,
COUNT(*) OVER (PARTITION BY id, email, name, created_at) AS row_count
FROM users u
)
SELECT * FROM counted WHERE row_count > 1;
-- Find duplicate composite keys in a fact table (data quality check).
SELECT order_id, product_id, COUNT(*) AS dup_count
FROM ecom_order_items
GROUP BY order_id, product_id
HAVING COUNT(*) > 1;Dialect notes #
- Every major engine supports
GROUP BY ... HAVINGand window functions for this. The syntax above works on Postgres, MySQL, SQL Server, Oracle, Snowflake, BigQuery, DuckDB, Redshift, SQLite. - DuckDB / Snowflake / BigQuery / Databricks support
QUALIFY, which lets you skip the subquery wrapper:SELECT * FROM signups QUALIFY ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) > 1. See QUALIFY. - Postgres has the
DISTINCT ON (col)shortcut for "one row per value of col, keeping the first by some order" — useful for the opposite operation (keep canonical, drop dupes). Not standard SQL.
**A duplicate-detection query without a normalization step usually misses real duplicates.** 'ana@example.com' and 'Ana@Example.com ' (with trailing space) are different strings to the database; they are the same email to a human. Always think about which fields might have casing, whitespace, leading-zero, or accented-character variations and apply normalization (LOWER, TRIM, UNACCENT) inside the GROUP BY before declaring the data clean. The next step after finding duplicates is usually deciding what to keep and removing the rest: see Delete duplicate rows.