COUNT(*) vs COUNT(1) vs COUNT(column)
Short answer: COUNT() and COUNT(1) count every row in the group. They are identical in modern engines, both in result and in performance. COUNT(column) is different: it counts only rows where column IS NOT NULL. The "is COUNT() faster than COUNT(1)?" debate is folklore from twenty years ago; the only meaningful question is whether you want to skip nulls.
Quick comparison #
COUNT(*) | COUNT(1) | COUNT(col) | |
|---|---|---|---|
| What it counts | every row | every row | rows where col IS NOT NULL |
| Affected by NULL? | no | no | yes |
| Performance | identical to COUNT(1) | identical to COUNT(*) | usually same; can read column data |
| Idiomatic? | yes, the standard form | works, but stylistic | yes, when nulls are meaningful |
| Use for | "how many rows are in this group?" | (don't bother — use *) | "how many rows have a value here?" |
See the NULL behavior #
When to use which #
**COUNT()* — the default. "How many rows are in this table / group / partition?" Use it everywhere unless nulls are part of the question. ANSI standard, idiomatic across every engine, and the optimizer will not read column data: it counts rows directly.
COUNT(1) is equivalent to COUNT() in every modern engine (Postgres, MySQL, SQL Server, Oracle, SQLite, Snowflake, BigQuery, DuckDB). The folklore that COUNT(1) is faster came from old Oracle versions and isn't true today. There's no behavioral or performance reason to choose it; COUNT() is more idiomatic.
COUNT(col) — when you specifically want to count non-null values. "How many users have provided a phone number?" maps to COUNT(phone). "How many orders have a coupon applied?" maps to COUNT(coupon_code). The same as SUM(CASE WHEN col IS NOT NULL THEN 1 ELSE 0 END) but shorter. This is the only real use case where the choice between the three matters.
COUNT(DISTINCT col) answers an adjacent question: how many distinct non-null values? COUNT(DISTINCT user_id) is one of the most common analytics aggregations ("unique users this month"). It is notably expensive on large datasets, since the engine has to dedup, often via a hash table or sort. Approximate alternatives (APPROX_COUNT_DISTINCT in BigQuery / Snowflake / Redshift; HLL in DuckDB) trade a small accuracy loss for big speedups.
Common mistakes #
- **Using
COUNT(col)when you meantCOUNT()*: silently undercounts whenevercolhas nulls. Most common in surveys (COUNT(rating)instead ofCOUNT(*)understates the response count) and joined tables (COUNT(orders.id)after aLEFT JOINexcludes users with no orders, which may or may not be what you want). - **Believing
COUNT(1)is faster thanCOUNT()*: it isn't. Both compile to the same plan; the engine recognizes "count rows" and skips reading any column data. - **
COUNT()after aLEFT JOIN*: counts the joined rows, not the original left-side rows. If a user has 5 orders, that user contributes 5 toCOUNT(*). UseCOUNT(DISTINCT user.id)if you want one row per user.
FAQ #
Is COUNT(*) slower than COUNT(1)?
No. They're identical in every modern engine, both in result and in performance. The optimizer recognizes "count rows" and skips reading any column data either way. The folklore that COUNT(1) is faster came from old Oracle versions and isn't true today.
When should I use COUNT(column) instead of COUNT(*)?
When you specifically want to count non-null values of that column. "How many users have a phone number?" → COUNT(phone). "How many rows are in the users table?" → COUNT(*). The difference matters when the column has nulls.
Why did my COUNT change after a LEFT JOIN?
COUNT() after a LEFT JOIN counts the joined* rows — including rows where the right side was NULL-padded. COUNT(orders.id) after a LEFT JOIN orders excludes users with no orders (because orders.id is NULL on those). The two answer different questions; pick by the count you want.
How do I count unique values?
COUNT(DISTINCT column) — counts unique non-null values. The standard pattern for "how many unique users / customers / SKUs." Note: COUNT(DISTINCT) is expensive on large datasets — for approximate counts at scale, use APPROX_COUNT_DISTINCT (BigQuery, Snowflake, Redshift, Databricks).
Why is COUNT(DISTINCT col) slow?
The engine has to deduplicate before counting, typically via a sort or hash table over every distinct value. On a billion-row table this is genuinely expensive. Approximate alternatives (APPROX_COUNT_DISTINCT) use HLL sketches for ~1% error and orders-of-magnitude speed.
Can COUNT return NULL?
No. COUNT always returns a non-negative integer (0 for empty input). SUM, AVG, MIN, MAX over empty input return NULL — that's the asymmetry. Use COALESCE(SUM(col), 0) if your downstream code expects a number from SUM.
The mental model that prevents 80% of count bugs: COUNT() answers "how many rows?" and COUNT(col) answers "how many rows have a value in this column?". When in doubt, write the question as a sentence first — the right COUNT form falls out of which sentence you wrote. COUNT(1) answers neither question better than COUNT(); treat it as legacy syntax and don't reach for it.