PRIMARY KEY
The column (or combination of columns) that uniquely identifies every row. Implicitly NOT NULL and UNIQUE, and on most engines automatically backed by an index. A table can have at most one primary key, so pick the column you'd use to answer "which row are we talking about?".
Syntax #
-- Inline, single column.
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email VARCHAR
);
-- Table-level constraint, single or composite.
CREATE TABLE order_items (
order_id INTEGER,
line_no INTEGER,
product_id INTEGER,
PRIMARY KEY (order_id, line_no)
);
-- Add to an existing table.
ALTER TABLE users
ADD CONSTRAINT users_pk PRIMARY KEY (id);Example #
Natural vs surrogate keys #
- Natural key: a column already meaningful in the domain (
email,isbn,tax_id). Pros: no extra column, joins read naturally. Cons: meaningful values change (people change emails, governments reissue tax IDs), and that change cascades through every foreign key. - Surrogate key: a synthetic identifier with no business meaning (auto-incremented
id, UUID, hash). Pros: stable forever, decouples the model from business reality. Cons: extra column, joins less self-documenting.
Default to surrogate keys for any table that other tables reference. Add a UNIQUE constraint on the natural key alongside, so the database still enforces business uniqueness.
Dialect notes #
- Auto-incrementing:
SERIAL/GENERATED ALWAYS AS IDENTITY(Postgres),AUTO_INCREMENT(MySQL),IDENTITY(1,1)(SQL Server),INTEGER PRIMARY KEY AUTOINCREMENT(SQLite). - Snowflake / BigQuery:
PRIMARY KEYis accepted but not enforced; it's metadata only. You're responsible for uniqueness via your ETL or test layer (e.g., dbt'suniquetest). - Composite vs single column: composite keys are correct for many-to-many bridge tables (e.g.,
(order_id, product_id)) but make foreign keys verbose. A surrogateidplus aUNIQUE (order_id, product_id)constraint is often cleaner.
FAQ #
Should I use a natural key or a surrogate key?
Default to surrogate keys (synthetic id column) for any table that other tables reference. Pros: stable forever, decoupled from business reality. Add a UNIQUE constraint on the natural key (email, isbn, etc.) so the database still enforces business uniqueness. Use natural keys only when the value is genuinely immutable (ISO country codes are stable; emails are not).
Can a table have multiple primary keys?
No. A table has at most one primary key. But that primary key can be composite (multiple columns combined): PRIMARY KEY (order_id, line_no). For "secondary unique columns," use UNIQUE constraints instead.
Are primary keys enforced on Snowflake and BigQuery?
No. Both warehouses accept PRIMARY KEY declarations as metadata only. They're not enforced. You're responsible for uniqueness via your ETL or test layer (e.g., dbt's unique test). Same applies to UNIQUE and FOREIGN KEY constraints in those engines.
Can a primary key column be NULL?
No. PRIMARY KEY implicitly enforces NOT NULL and UNIQUE. That's part of the contract: every row has a non-null, distinct identifier.
How do I add a primary key to an existing table?
ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY (id). The column must already be NOT NULL and free of duplicates, otherwise the statement fails. For a new surrogate, you may need to add the column, populate it, then add the constraint.
Should the primary key be auto-incremented?
Convenient default: yes. Use BIGINT GENERATED ALWAYS AS IDENTITY (Postgres modern), BIGINT AUTO_INCREMENT (MySQL), BIGINT IDENTITY(1,1) (SQL Server). UUIDs are an alternative when you need to generate IDs client-side or across distributed systems. Sequential integers are more compact and index better; UUIDs are more flexible.
Every table should have a primary key, even if the engine doesn't force it. A table without one is a table where you can't reliably point to "this row." Update and delete operations become unsafe (UPDATE ... WHERE may match multiple rows you didn't intend), CDC tools can't track changes, and replication breaks. If a table genuinely has no natural identifier, add a synthetic id column.