Medallion Architecture (Bronze / Silver / Gold)
Use this page to understand how raw data is refined into trustworthy, business-ready models through the medallion architecture — the bronze/silver/gold layering popularized by lakehouse platforms and now widely adopted.
Goal:
- know what each layer (bronze, silver, gold) is responsible for
- map the modeling concepts in this section to the layer they belong to
- treat each layer boundary as a quality contract
- avoid putting business logic in the wrong layer
The Three Layers #
Data moves left to right, and each layer reads only from the one before it:
- Bronze (raw): source data landed as-is — append-only, no business logic, schema close to the source. Its job is faithful capture and replayability, not cleanliness.
- Silver (cleaned & conformed): deduplicated, type-cast, validated, and joined into conformed dimensions and facts. Most of the modeling work in this section lives here. Trustworthy, but not yet shaped for a specific question.
- Gold (business marts): aggregated and modeled for consumption — star schemas, metric marts, and KPI tables that dashboards and ML read directly.
The metals signal increasing refinement and value, not increasing volume — gold is usually the smallest layer.
Naming varies by stack: bronze is also called the raw or landing layer; silver overlaps with dbt-style staging and intermediate models (the stg_ tables used elsewhere in this section); gold with marts. The layers matter more than the labels.
What Each Layer Owns #
| Layer | Holds | Transformations | Example tables |
|---|---|---|---|
| Bronze | raw source / event data | ingest only (as-is) | raw_orders, raw_events |
| Silver | conformed dimensions & facts | dedupe, cast, validate, join | dim_customer, fact_orders |
| Gold | marts & metrics | aggregate, derive KPIs | mart_monthly_revenue |
How It Maps to the Rest of This Section #
Most of this section is really about the silver and gold layers:
- The OLTP source from OLTP vs OLAP and Grain (normalized 3NF tables) feeds bronze.
- Conformed fact tables, SCD dimensions, and bridge tables are the silver layer.
- Star schemas and the metric layer / serving marts are the gold layer.
So "where does this table live?" usually answers itself: a source mirror is bronze, a conformed model is silver, a query-ready aggregate is gold.
Layer Boundaries Are Contracts #
Promote data to the next layer only when it meets that layer's contract, and run quality checks at each boundary (see Data Quality for Models):
- Bronze → Silver: schema/type validation, deduplication, referential integrity.
- Silver → Gold: grain uniqueness, additivity rules, metric reconciliation.
A failed check should block promotion, not silently ship bad numbers downstream.
-- Bronze: land source rows as-is (append-only, no logic)
CREATE TABLE raw_orders AS SELECT * FROM source.orders;
-- Silver: clean + conform to a fact at order grain
CREATE TABLE fact_orders AS
SELECT
CAST(order_id AS BIGINT) AS order_id,
CAST(customer_id AS BIGINT) AS customer_id,
CAST(order_date AS DATE) AS order_date,
order_status,
CAST(order_total AS DECIMAL(18, 2)) AS order_total
FROM raw_orders
WHERE order_id IS NOT NULL;
-- Gold: aggregate into a query-ready mart
CREATE TABLE mart_monthly_revenue AS
SELECT
CAST(DATE_TRUNC('month', order_date) AS DATE) AS order_month,
ROUND(SUM(order_total), 2) AS revenue
FROM fact_orders
WHERE order_status = 'completed'
GROUP BY CAST(DATE_TRUNC('month', order_date) AS DATE);Interactive SQL Check #
The seeded ecom_orders table is already a clean, conformed silver fact. Build the gold monthly-revenue mart from it: one row per month, completed orders only, with the order count and revenue (summed order_total, rounded to 2 decimals).
Reconcile Gold to Silver #
A gold aggregate must tie back to its silver source. Total revenue across the mart should equal total completed revenue in the fact — run this after every rebuild:
-- Should return TRUE: the mart reconciles to the fact it was built from
SELECT
(SELECT ROUND(SUM(revenue), 2) FROM mart_monthly_revenue)
= (SELECT ROUND(SUM(order_total), 2) FROM fact_orders WHERE order_status = 'completed')
AS reconciles;Common Mistakes #
- Putting business logic in bronze. Transforming on ingest destroys the raw source of truth and makes replay/backfill impossible.
- Letting dashboards query silver directly for heavy aggregations. Gold exists so consumers do not re-derive metrics on every hit.
- Cleaning data in gold. Quality belongs in silver; gold should be pure aggregation and shaping.
- Skipping the silver → gold reconciliation. A mart that does not tie back to its fact is a silent metric bug.
- Reading the metals as data volume. Gold is the most refined layer and usually the smallest.
Performance and Operational Notes #
- Bronze is cheap, high-volume storage. Keep it append-only and partition by ingest time for fast backfills.
- Gold pre-computes the expensive joins and scans once, so reads stay fast and cheap at dashboard concurrency.
- Rebuild gold incrementally from silver; backfill by replaying from bronze (see Incremental and Backfill Strategy).
- Name tables by layer so the layer — and its guarantees — is obvious from the name alone.
Related Lessons #
Name tables by layer — raw_ (bronze), dim_ / fact_ (silver), mart_ (gold) — so anyone can tell a table's layer, and its guarantees, from its name alone.