Compare

UNION vs UNION ALL

Short answer: UNION removes duplicate rows from the combined result; UNION ALL keeps every row. UNION ALL is faster because it skips the dedup step, which on large result sets means a sort or hash that the engine no longer has to do. Default to UNION ALL and only reach for UNION when you genuinely need duplicate elimination.

Quick comparison #

UNIONUNION ALL
Duplicate rowsremovedkept
Performancesort / hash to dedupconcat, fast
Default choice in analytics?noyes
Result preserves order?no (dedup may reorder)left side then right side, on most engines
Use whencombining sets where dupes are meaningful and unwantedcombining sets where dupes don't appear or shouldn't be removed

See the difference #

Loading SQL editor...

When to use which #

UNION ALL is the default. Use whenever you're stitching partitions together (SELECT FROM events_2024 UNION ALL SELECT FROM events_2025), combining branches of a CASE-like split, or building a result where each row already has a natural source identifier. Faster, more predictable, no chance of accidentally collapsing rows that look identical but have different meaning.

UNION is for when duplicates are genuinely undesired in the output. Take "distinct email addresses across customers and prospects": SELECT email FROM customers UNION SELECT email FROM prospects returns each email once. Equivalent to UNION ALL followed by SELECT DISTINCT, just one keyword shorter.

Dialect notes #

  • Both are ANSI standard and behave identically across Postgres, MySQL, SQLite, SQL Server, Oracle, Snowflake, BigQuery, DuckDB, Redshift.
  • Column names: come from the first SELECT. The second SELECT only needs matching column count and compatible types; its names are ignored.
  • Type coercion: most engines implicitly cast (INTBIGINTBIGINT). Mixing incompatible types (VARCHARDATE) errors. BigQuery is stricter, so explicit CAST may be required.
  • ORDER BY: applies to the combined result, must reference column position or names from the first SELECT.
  • INTERSECT and EXCEPT: same family. INTERSECT returns rows in both; EXCEPT (MINUS in Oracle) returns rows in the left but not the right. See INTERSECT / EXCEPT.

FAQ #

Is UNION ALL really faster than UNION?

Yes, on any non-trivial dataset. UNION removes duplicate rows by sorting or hashing the combined result, and that's the cost. UNION ALL skips dedup and just concatenates. On a billion-row union, the difference is minutes vs seconds.

When should I use UNION instead of UNION ALL?

When duplicates are genuinely undesired in the output. Example: "all distinct email addresses across customers and prospects". A single email might appear in both tables, and you want it once. UNION returns each unique row once. Equivalent to UNION ALL followed by SELECT DISTINCT.

Do columns need to match between UNION'd queries?

Yes: same number of columns and compatible types. Column names in the result come from the first SELECT; the second query's column names are ignored. Type mismatches (e.g., VARCHARDATE) error. Implicit type coercion (INTBIGINTBIGINT) usually works on most engines except BigQuery (stricter, may need explicit CAST).

Can I ORDER BY across a UNION?

Yes. ORDER BY applies to the combined result. Place it at the very end of the union, not after each SELECT. Reference column position or names from the first SELECT. Example: SELECT id, name FROM a UNION ALL SELECT id, name FROM b ORDER BY name.

Does UNION DISTINCT do anything?

UNION DISTINCT is equivalent to UNION; it just explicitly states that duplicates should be removed. Some teams use it for clarity. UNION ALL is the explicit non-dedup form.

What's the difference between UNION, INTERSECT, and EXCEPT?

UNION returns rows from either set. INTERSECT returns rows in both sets. EXCEPT (MINUS in Oracle) returns rows in the first set but not the second. All three remove duplicates by default; all three have an ALL variant that keeps duplicates. See INTERSECT / EXCEPT.

Warning

Defaulting to UNION "to be safe" is a real performance bug in production analytics. A UNION of two billion-row partitions schedules a sort or hash over two billion rows for no reason if those partitions don't overlap. The dev-time difference is one keyword; the prod-time difference can be minutes vs seconds. If you don't have a positive reason to dedup, type the two extra letters: UNION ALL.