Reference

Star Schema Basics

Use this page as an implementation reference for building a datamart with fact and dimension tables.

Goal:

  • define the correct grain
  • model dimensions and facts cleanly
  • load data in a safe order
  • validate metric correctness before dashboard use
Star schema diagram: a central fact_sales table surrounded by dim_date, dim_product, dim_customer, and dim_store, each joined many-to-one from a foreign key in the fact table
The shape that gives the pattern its name: one fact table in the middle, dimensions around it. Every fact row carries one key per dimension; joins fan out from the center, never dimension-to-dimension.

When to Use a Star Schema #

Choose star schema when:

  • analysts need fast, readable SQL
  • KPIs are mostly aggregate reporting metrics
  • you want reusable dimensions across many facts
  • BI tools are primary consumers

Core Components #

  • Fact table: measurable events or snapshots
  • Dimension tables: descriptive context (customer, product, date, channel)
  • Surrogate keys: stable integer keys for joins
  • Conformed dimensions: shared dimensions used by multiple facts

Fact Table Types #

  • Transaction fact: one row per event (order line, claim)
  • Periodic snapshot fact: one row per entity per time period
  • Accumulating snapshot fact: one row tracking multi-step lifecycle
  1. Choose one business process (for example: orders).
  2. Declare exact fact grain in one sentence.
  3. List dimensions needed for slicing/filtering.
  4. Define additive and non-additive metrics.
  5. Choose SCD policy per dimension.
  6. Build dimension tables first, then fact.
  7. Add data tests and reconciliation checks.
  8. Expose as mart/semantic layer for BI.

Step 1: Declare Grain Explicitly #

Example grain statement:
One row in fact_order_item represents one product line on one completed order.

If this sentence is ambiguous, stop and fix it before modeling.

Step 2: Dimension Design #

For each dimension, define:

  • business key (natural key)
  • surrogate key
  • attributes used by analysis
  • SCD behavior (Type 1 or Type 2)
  • null/default handling rules

Step 3: Fact Design #

For fact table columns:

  • foreign keys to dimensions
  • event timestamp/date key
  • numeric metrics
  • optional degenerate dimensions (order_number, invoice_id)

Classify metrics:

  • additive (safe to sum across all dims)
  • semi-additive (safe across some dims, not all)
  • non-additive (ratios/percentages; compute from base measures)

SCD Considerations #

Dimension history strategy affects metric interpretation:

  • Type 1: overwrite (no history)
  • Type 2: row versioning (history preserved)
  • Type 3: limited historical columns

Pick SCD policy per business question, not per engineering convenience.

Starter SQL Templates #

sql
-- Dimension templates
CREATE TABLE dim_customer (
  customer_key INTEGER PRIMARY KEY,
  customer_id VARCHAR,              -- business key
  customer_name VARCHAR,
  segment VARCHAR,
  country VARCHAR,
  effective_start_ts TIMESTAMP,     -- for SCD2
  effective_end_ts TIMESTAMP,       -- for SCD2
  is_current BOOLEAN
);

CREATE TABLE dim_product (
  product_key INTEGER PRIMARY KEY,
  product_id VARCHAR,
  product_name VARCHAR,
  category VARCHAR,
  brand VARCHAR
);

CREATE TABLE dim_date (
  date_key INTEGER PRIMARY KEY,     -- yyyymmdd
  calendar_date DATE,
  year INTEGER,
  quarter INTEGER,
  month INTEGER,
  week INTEGER,
  day_of_month INTEGER
);
sql
-- Fact template
-- Grain: one row = one product line on one order -> (order_id, order_item_id)
CREATE TABLE fact_order_item (
  order_item_key INTEGER PRIMARY KEY,
  order_id VARCHAR,                 -- degenerate dimension
  order_item_id VARCHAR,            -- degenerate line key (completes the grain)
  order_date_key INTEGER,           -- FK -> dim_date
  customer_key INTEGER,             -- FK -> dim_customer
  product_key INTEGER,              -- FK -> dim_product
  quantity INTEGER,
  gross_amount DECIMAL(18,2),
  discount_amount DECIMAL(18,2),
  net_amount DECIMAL(18,2),
  cost_amount DECIMAL(18,2)
);
sql
-- Query pattern
SELECT
  DATE_TRUNC('month', d.calendar_date) AS calendar_month,
  c.segment,
  SUM(f.net_amount) AS revenue,
  SUM(f.net_amount - f.cost_amount) AS gross_margin
FROM fact_order_item AS f
JOIN dim_date AS d ON d.date_key = f.order_date_key
JOIN dim_customer AS c ON c.customer_key = f.customer_key
GROUP BY DATE_TRUNC('month', d.calendar_date), c.segment;

Load Order and Data Contracts #

Recommended load sequence:

  1. stage source tables
  2. load/update dimensions
  3. resolve surrogate keys
  4. load fact rows
  5. run quality + reconciliation tests

Contract examples:

  • no duplicate fact grain keys
  • no orphan fact foreign keys
  • metric totals reconcile to trusted source aggregates

Validation SQL You Should Always Run #

sql
-- 1) Grain uniqueness on the true grain key
-- (order_id, product_key) is NOT the grain: an order can legitimately
-- repeat the same product on separate lines
SELECT order_id, order_item_id, COUNT(*) AS row_count
FROM fact_order_item
GROUP BY order_id, order_item_id
HAVING COUNT(*) > 1;

-- 2) Orphan keys
SELECT COUNT(*) AS missing_customer_keys
FROM fact_order_item f
LEFT JOIN dim_customer c ON c.customer_key = f.customer_key
WHERE c.customer_key IS NULL;

-- 3) Reconciliation
SELECT
  ROUND(SUM(net_amount), 2) AS fact_total
FROM fact_order_item;

Performance and Maintenance Notes #

  • Keep fact tables narrow; move verbose text to dimensions.
  • Partition/cluster fact tables by common time filters where supported.
  • Pre-aggregate common dashboard grains into summary marts.
  • Use conformed dimensions to avoid duplicated business logic.
  • Version metric definitions when formulas change.

Common Modeling Mistakes #

  • Mixed grain in one fact table.
  • Ratios stored as base fact metrics (instead of deriving from numerators/denominators).
  • SCD policy chosen without business sign-off.
  • Facts loaded before dimensions are stable.
  • Hidden duplicate joins caused by non-unique dimension business keys.

Interactive SQL Check #

Run this query to practice a classic star-style aggregate across orders, items, and product dimensions.

Loading SQL editor...
Tip

A datamart is production-ready when:
1) grain is explicit and tested,
2) metrics reconcile,
3) dimensions are trustworthy,
4) analysts can query without custom join logic.