SCD Type 1, 2, and 3
Use this page as a practical implementation guide for Slowly Changing Dimensions (SCDs).
Goal:
- choose the correct SCD pattern for each attribute
- load dimension history correctly and repeatably
- avoid duplicate current rows and broken historical joins
- preserve reporting trust when attributes change over time
When to Use Type 1 vs Type 2 vs Type 3 #
- Type 1: overwrite old value. Use for corrected attributes where history is not analytically meaningful.
- Type 2: add new row version. Use when historical point-in-time analysis must be preserved.
- Type 3: keep current + one prior value. Use sparingly for narrow comparison use cases.
Dimension Contract (Required) #
For each dimension, define:
- business key (natural key)
- surrogate key
- SCD policy per attribute
- effective start/end timestamps
- current-row flag
- late-arriving update policy
Reference Table Design (Type 2) #
CREATE TABLE dim_customer (
customer_sk BIGINT,
customer_id VARCHAR, -- business key
customer_name VARCHAR,
segment VARCHAR,
country VARCHAR,
email VARCHAR,
effective_start_ts TIMESTAMP,
effective_end_ts TIMESTAMP,
is_current BOOLEAN,
record_source VARCHAR,
load_ts TIMESTAMP
);Type 1 Load Pattern (Overwrite) #
UPDATE dim_customer d
SET
customer_name = s.customer_name,
email = s.email,
load_ts = CURRENT_TIMESTAMP
FROM stg_customer s
WHERE d.customer_id = s.customer_id
AND (
d.customer_name IS DISTINCT FROM s.customer_name
OR d.email IS DISTINCT FROM s.email
);In a hybrid Type 1/Type 2 dimension, a Type 1 overwrite must update every version of the member, not just the is_current row. Otherwise historical versions keep the stale value and point-in-time reports disagree with current ones. Note the change detection uses IS DISTINCT FROM rather than <>, so NULL-to-value and value-to-NULL transitions are detected too.
Type 2 Load Pattern (Versioned History) #
-- Step 1: close existing current rows when tracked attributes changed
UPDATE dim_customer d
SET
effective_end_ts = s.change_ts,
is_current = FALSE,
load_ts = CURRENT_TIMESTAMP
FROM stg_customer_changes s
WHERE d.customer_id = s.customer_id
AND d.is_current = TRUE
AND (
d.segment IS DISTINCT FROM s.segment
OR d.country IS DISTINCT FROM s.country
);
-- Step 2: insert new current version
INSERT INTO dim_customer (
customer_sk,
customer_id,
customer_name,
segment,
country,
email,
effective_start_ts,
effective_end_ts,
is_current,
record_source,
load_ts
)
SELECT
s.customer_sk,
s.customer_id,
s.customer_name,
s.segment,
s.country,
s.email,
s.change_ts,
TIMESTAMP '9999-12-31 00:00:00',
TRUE,
s.record_source,
CURRENT_TIMESTAMP
FROM stg_customer_changes s
LEFT JOIN dim_customer d
ON d.customer_id = s.customer_id
AND d.is_current = TRUE
WHERE d.customer_id IS NULL
OR d.segment IS DISTINCT FROM s.segment
OR d.country IS DISTINCT FROM s.country;Type 3 Pattern (Current + Previous) #
ALTER TABLE dim_customer ADD COLUMN prev_segment VARCHAR;
UPDATE dim_customer d
SET
prev_segment = d.segment,
segment = s.segment,
load_ts = CURRENT_TIMESTAMP
FROM stg_customer s
WHERE d.customer_id = s.customer_id
AND d.is_current = TRUE
AND d.segment IS DISTINCT FROM s.segment;Fact Join Rule for Type 2 #
For Type 2 dimensions, join facts to dimensions using business key plus event timestamp window:
- fact.event_ts >= dim.effective_start_ts
- fact.event_ts < dim.effective_end_ts
Do not join only on business key for historical reporting.
SELECT
f.order_id,
f.order_ts,
d.segment,
f.net_amount
FROM fact_order_item f
JOIN dim_customer d
ON d.customer_id = f.customer_id
AND f.order_ts >= d.effective_start_ts
AND f.order_ts < d.effective_end_ts;Validation SQL You Should Always Run #
-- 1) exactly one current row per business key
SELECT customer_id, COUNT(*) AS current_rows
FROM dim_customer
WHERE is_current = TRUE
GROUP BY customer_id
HAVING COUNT(*) <> 1;
-- 2) overlapping validity windows
SELECT a.customer_id
FROM dim_customer a
JOIN dim_customer b
ON a.customer_id = b.customer_id
AND a.customer_sk <> b.customer_sk
AND a.effective_start_ts < b.effective_end_ts
AND b.effective_start_ts < a.effective_end_ts
GROUP BY a.customer_id;
-- 3) orphan fact joins (Type 2 time window)
SELECT COUNT(*) AS orphan_fact_rows
FROM fact_order_item f
LEFT JOIN dim_customer d
ON d.customer_id = f.customer_id
AND f.order_ts >= d.effective_start_ts
AND f.order_ts < d.effective_end_ts
WHERE d.customer_sk IS NULL;Advanced Engineer Notes #
- Separate tracked vs non-tracked attributes to avoid unnecessary Type 2 churn.
- Use deterministic change detection (hashdiff or explicit column compare).
- Define restatement policy for late-arriving corrections.
- Treat SCD policy changes as breaking contract changes and version carefully.
Senior Engineer Playbook #
- Temporal join audit: assert the fact row count is identical before and after the Type 2 join: any delta means the validity windows drop or duplicate facts (the inclusive-BETWEEN trap).
- Failure postmortem pattern: track when duplicate current rows were introduced and which pipeline run caused overlap.
- Migration playbook: when moving Type 1 -> Type 2, run dual-publish windows and reconcile metric deltas before cutover.
Interactive SQL Check #
Run this temporal join check to simulate a validity-window style join between premium events and policy records.
Apply the half-open rule from above, adapted to month grain: premium fact rows sit at the first of the month while policies can start mid-month, so compare month_date against the policy start truncated to its month. The seed end_date is an inclusive last covered day, so the exclusive upper bound is end_date plus one day. An inclusive BETWEEN on the raw dates silently drops every first-month premium row for mid-month policy starts.
Related Lessons #
Most historical metric defects come from dimension versioning errors, not from aggregation SQL.