Reference

Medallion Architecture (Bronze / Silver / Gold)

Use this page to understand how raw data is refined into trustworthy, business-ready models through the medallion architecture — the bronze/silver/gold layering popularized by lakehouse platforms and now widely adopted.

Goal:

know what each layer (bronze, silver, gold) is responsible for
map the modeling concepts in this section to the layer they belong to
treat each layer boundary as a quality contract
avoid putting business logic in the wrong layer

The Three Layers #

Data flows from OLTP and event sources into Bronze (raw, as-is), is cleaned and conformed into Silver (dimensions and facts), then modeled into Gold (business marts and metrics) consumed by BI, ML, and reports — Data is refined left to right. Each layer reads only from the one before it and adds quality, then business meaning.

Data moves left to right, and each layer reads only from the one before it:

Bronze (raw): source data landed as-is — append-only, no business logic, schema close to the source. Its job is faithful capture and replayability, not cleanliness.
Silver (cleaned & conformed): deduplicated, type-cast, validated, and joined into conformed dimensions and facts. Most of the modeling work in this section lives here. Trustworthy, but not yet shaped for a specific question.
Gold (business marts): aggregated and modeled for consumption — star schemas, metric marts, and KPI tables that dashboards and ML read directly.

The metals signal increasing refinement and value, not increasing volume — gold is usually the smallest layer.

Naming varies by stack: bronze is also called the raw or landing layer; silver overlaps with dbt-style staging and intermediate models (the stg_ tables used elsewhere in this section); gold with marts. The layers matter more than the labels.

What Each Layer Owns #

Layer	Holds	Transformations	Example tables
Bronze	raw source / event data	ingest only (as-is)	`raw_orders`, `raw_events`
Silver	conformed dimensions & facts	dedupe, cast, validate, join	`dim_customer`, `fact_orders`
Gold	marts & metrics	aggregate, derive KPIs	`mart_monthly_revenue`

How It Maps to the Rest of This Section #

Most of this section is really about the silver and gold layers:

The OLTP source from OLTP vs OLAP and Grain (normalized 3NF tables) feeds bronze.
Conformed fact tables, SCD dimensions, and bridge tables are the silver layer.
Star schemas and the metric layer / serving marts are the gold layer.

So "where does this table live?" usually answers itself: a source mirror is bronze, a conformed model is silver, a query-ready aggregate is gold.

Layer Boundaries Are Contracts #

Promote data to the next layer only when it meets that layer's contract, and run quality checks at each boundary (see Data Quality for Models):

Bronze → Silver: schema/type validation, deduplication, referential integrity.
Silver → Gold: grain uniqueness, additivity rules, metric reconciliation.

A failed check should block promotion, not silently ship bad numbers downstream.

sql

-- Bronze: land source rows as-is (append-only, no logic)
CREATE TABLE raw_orders AS SELECT * FROM source.orders;

-- Silver: clean + conform to a fact at order grain
CREATE TABLE fact_orders AS
SELECT
  CAST(order_id AS BIGINT) AS order_id,
  CAST(customer_id AS BIGINT) AS customer_id,
  CAST(order_date AS DATE) AS order_date,
  order_status,
  CAST(order_total AS DECIMAL(18, 2)) AS order_total
FROM raw_orders
WHERE order_id IS NOT NULL;

-- Gold: aggregate into a query-ready mart
CREATE TABLE mart_monthly_revenue AS
SELECT
  CAST(DATE_TRUNC('month', order_date) AS DATE) AS order_month,
  ROUND(SUM(order_total), 2) AS revenue
FROM fact_orders
WHERE order_status = 'completed'
GROUP BY CAST(DATE_TRUNC('month', order_date) AS DATE);

Interactive SQL Check #

The seeded ecom_orders table is already a clean, conformed silver fact. Build the gold monthly-revenue mart from it: one row per month, completed orders only, with the order count and revenue (summed order_total, rounded to 2 decimals).

Loading SQL editor...

Reconcile Gold to Silver #

A gold aggregate must tie back to its silver source. Total revenue across the mart should equal total completed revenue in the fact — run this after every rebuild:

sql

-- Should return TRUE: the mart reconciles to the fact it was built from
SELECT
  (SELECT ROUND(SUM(revenue), 2) FROM mart_monthly_revenue)
  = (SELECT ROUND(SUM(order_total), 2) FROM fact_orders WHERE order_status = 'completed')
  AS reconciles;

Common Mistakes #

Putting business logic in bronze. Transforming on ingest destroys the raw source of truth and makes replay/backfill impossible.
Letting dashboards query silver directly for heavy aggregations. Gold exists so consumers do not re-derive metrics on every hit.
Cleaning data in gold. Quality belongs in silver; gold should be pure aggregation and shaping.
Skipping the silver → gold reconciliation. A mart that does not tie back to its fact is a silent metric bug.
Reading the metals as data volume. Gold is the most refined layer and usually the smallest.

Performance and Operational Notes #

Bronze is cheap, high-volume storage. Keep it append-only and partition by ingest time for fast backfills.
Gold pre-computes the expensive joins and scans once, so reads stay fast and cheap at dashboard concurrency.
Rebuild gold incrementally from silver; backfill by replaying from bronze (see Incremental and Backfill Strategy).
Name tables by layer so the layer — and its guarantees — is obvious from the name alone.

Tip

Name tables by layer — raw_ (bronze), dim_ / fact_ (silver), mart_ (gold) — so anyone can tell a table's layer, and its guarantees, from its name alone.