Reference

SCD Type 1, 2, and 3

Use this page as a practical implementation guide for Slowly Changing Dimensions (SCDs).

Goal:

choose the correct SCD pattern for each attribute
load dimension history correctly and repeatably
avoid duplicate current rows and broken historical joins
preserve reporting trust when attributes change over time

When to Use Type 1 vs Type 2 vs Type 3 #

Type 1: overwrite old value. Use for corrected attributes where history is not analytically meaningful.
Type 2: add new row version. Use when historical point-in-time analysis must be preserved.
Type 3: keep current + one prior value. Use sparingly for narrow comparison use cases.

Dimension Contract (Required) #

For each dimension, define:

business key (natural key)
surrogate key
SCD policy per attribute
effective start/end timestamps
current-row flag
late-arriving update policy

Reference Table Design (Type 2) #

sql

CREATE TABLE dim_customer (
  customer_sk BIGINT,
  customer_id VARCHAR,                 -- business key
  customer_name VARCHAR,
  segment VARCHAR,
  country VARCHAR,
  email VARCHAR,
  effective_start_ts TIMESTAMP,
  effective_end_ts TIMESTAMP,
  is_current BOOLEAN,
  record_source VARCHAR,
  load_ts TIMESTAMP
);

Type 1 Load Pattern (Overwrite) #

sql

UPDATE dim_customer d
SET
  customer_name = s.customer_name,
  email = s.email,
  load_ts = CURRENT_TIMESTAMP
FROM stg_customer s
WHERE d.customer_id = s.customer_id
  AND (
    d.customer_name IS DISTINCT FROM s.customer_name
    OR d.email IS DISTINCT FROM s.email
  );

In a hybrid Type 1/Type 2 dimension, a Type 1 overwrite must update every version of the member, not just the is_current row. Otherwise historical versions keep the stale value and point-in-time reports disagree with current ones. Note the change detection uses IS DISTINCT FROM rather than <>, so NULL-to-value and value-to-NULL transitions are detected too.

Type 2 Load Pattern (Versioned History) #

sql

-- Step 1: close existing current rows when tracked attributes changed
UPDATE dim_customer d
SET
  effective_end_ts = s.change_ts,
  is_current = FALSE,
  load_ts = CURRENT_TIMESTAMP
FROM stg_customer_changes s
WHERE d.customer_id = s.customer_id
  AND d.is_current = TRUE
  AND (
    d.segment IS DISTINCT FROM s.segment
    OR d.country IS DISTINCT FROM s.country
  );

-- Step 2: insert new current version
INSERT INTO dim_customer (
  customer_sk,
  customer_id,
  customer_name,
  segment,
  country,
  email,
  effective_start_ts,
  effective_end_ts,
  is_current,
  record_source,
  load_ts
)
SELECT
  s.customer_sk,
  s.customer_id,
  s.customer_name,
  s.segment,
  s.country,
  s.email,
  s.change_ts,
  TIMESTAMP '9999-12-31 00:00:00',
  TRUE,
  s.record_source,
  CURRENT_TIMESTAMP
FROM stg_customer_changes s
LEFT JOIN dim_customer d
  ON d.customer_id = s.customer_id
 AND d.is_current = TRUE
WHERE d.customer_id IS NULL
   OR d.segment IS DISTINCT FROM s.segment
   OR d.country IS DISTINCT FROM s.country;

A dim_customer table with two rows for customer C001: version 1 (segment SMB) end-dated on 2024-03-01 with is_current FALSE, and version 2 (segment Enterprise) open to 9999-12-31 with is_current TRUE — Type 2 keeps history: when a tracked attribute changes, the current row is end-dated and is_current set FALSE, and a new open-ended version is inserted under the same business key.

Type 3 Pattern (Current + Previous) #

sql

ALTER TABLE dim_customer ADD COLUMN prev_segment VARCHAR;

UPDATE dim_customer d
SET
  prev_segment = d.segment,
  segment = s.segment,
  load_ts = CURRENT_TIMESTAMP
FROM stg_customer s
WHERE d.customer_id = s.customer_id
  AND d.is_current = TRUE
  AND d.segment IS DISTINCT FROM s.segment;

Fact Join Rule for Type 2 #

For Type 2 dimensions, join facts to dimensions using business key plus event timestamp window:

fact.event_ts >= dim.effective_start_ts
fact.event_ts < dim.effective_end_ts

Do not join only on business key for historical reporting.

sql

SELECT
  f.order_id,
  f.order_ts,
  d.segment,
  f.net_amount
FROM fact_order_item f
JOIN dim_customer d
  ON d.customer_id = f.customer_id
 AND f.order_ts >= d.effective_start_ts
 AND f.order_ts < d.effective_end_ts;

Validation SQL You Should Always Run #

sql

-- 1) exactly one current row per business key
SELECT customer_id, COUNT(*) AS current_rows
FROM dim_customer
WHERE is_current = TRUE
GROUP BY customer_id
HAVING COUNT(*) <> 1;

-- 2) overlapping validity windows
SELECT a.customer_id
FROM dim_customer a
JOIN dim_customer b
  ON a.customer_id = b.customer_id
 AND a.customer_sk <> b.customer_sk
 AND a.effective_start_ts < b.effective_end_ts
 AND b.effective_start_ts < a.effective_end_ts
GROUP BY a.customer_id;

-- 3) orphan fact joins (Type 2 time window)
SELECT COUNT(*) AS orphan_fact_rows
FROM fact_order_item f
LEFT JOIN dim_customer d
  ON d.customer_id = f.customer_id
 AND f.order_ts >= d.effective_start_ts
 AND f.order_ts < d.effective_end_ts
WHERE d.customer_sk IS NULL;

Advanced Engineer Notes #

Separate tracked vs non-tracked attributes to avoid unnecessary Type 2 churn.
Use deterministic change detection (hashdiff or explicit column compare).
Define restatement policy for late-arriving corrections.
Treat SCD policy changes as breaking contract changes and version carefully.

Senior Engineer Playbook #

Temporal join audit: assert the fact row count is identical before and after the Type 2 join: any delta means the validity windows drop or duplicate facts (the inclusive-BETWEEN trap).
Failure postmortem pattern: track when duplicate current rows were introduced and which pipeline run caused overlap.
Migration playbook: when moving Type 1 -> Type 2, run dual-publish windows and reconcile metric deltas before cutover.

Interactive SQL Check #

Run this temporal join check to simulate a validity-window style join between premium events and policy records.

Apply the half-open rule from above, adapted to month grain: premium fact rows sit at the first of the month while policies can start mid-month, so compare month_date against the policy start truncated to its month. The seed end_date is an inclusive last covered day, so the exclusive upper bound is end_date plus one day. An inclusive BETWEEN on the raw dates silently drops every first-month premium row for mid-month policy starts.

Loading SQL editor...

Warning

Most historical metric defects come from dimension versioning errors, not from aggregation SQL.