Reference

OLTP vs OLAP and Grain

Use this page as a practical reference for choosing the right modeling shape between OLTP and OLAP systems, and for defining grain correctly.

Goal:

  • separate transactional and analytical concerns clearly
  • define row-level grain as an explicit contract
  • prevent double counting and metric drift
  • build models that are both correct and performant

What OLTP and OLAP Optimize For #

DimensionOLTPOLAP
Primary goalTransaction correctnessAnalytical insight
Workload patternmany concurrent small writesfewer but heavier reads
Query shapepoint lookup, insert, updatescan, join, aggregate
Data model tendencynormalized (3NF)star/mart/serving-oriented
Latency focuslow write latencystable query latency
Freshnessnear real-timebatch/micro-batch/stream windows

System Boundary Rule #

Do not force one system to optimize for both worlds.

Recommended:

  • OLTP stores transactional truth and process state.
  • OLAP stores analytical truth and query-friendly structures.

Analytical models should be derived from OLTP (or event streams), not embedded into OLTP tables.

Grain: The Core Contract #

Grain defines what one row means.

Contract template:

  • one row represents: [entity/event + time context]
  • uniqueness key: [column set]
  • event timestamp column: [event_ts]
  • processing timestamp column: [load_ts]
  • additive metrics: [safe to sum]
  • non-additive metrics: [recompute from base measures]
  • late-arriving policy: [upsert/restate/ignore]

Example Grain Contracts #

sql
-- Transaction fact example
-- one row = one product line on one order
-- unique key = (order_id, order_item_id)
-- additive metrics = quantity, net_amount, cost_amount

-- Daily snapshot fact example
-- one row = one account at end of day
-- unique key = (account_id, snapshot_date)
-- semi-additive metric = ending_balance (additive by account, not across dates)

-- Period aggregate example
-- one row = one customer x month
-- unique key = (customer_id, month_start_date)
-- non-additive metric = margin_pct (derive from sums, do not sum percentages)

OLTP to OLAP Mapping #

Typical mapping from transactional tables to analytics models:

  • OLTP orders and order_items -> OLAP fact_order_item
  • OLTP customers -> OLAP dim_customer
  • OLTP status history -> OLAP snapshots or status-change facts

Keep raw transactional IDs for traceability, but use analytical keys for joins and governance.

Reference SQL: Build a Correct Fact Grain #

sql
CREATE TABLE fact_order_item AS
SELECT
  oi.order_id,
  oi.order_item_id,
  o.customer_id,
  o.order_ts,
  o.order_ts::DATE AS order_date,
  oi.product_id,
  oi.quantity,
  oi.unit_price,
  (oi.quantity * oi.unit_price) AS gross_amount,
  COALESCE(oi.discount_amount, 0) AS discount_amount,
  (oi.quantity * oi.unit_price) - COALESCE(oi.discount_amount, 0) AS net_amount
FROM orders o
JOIN order_items oi
  ON oi.order_id = o.order_id
WHERE o.order_status = 'completed';

Validation SQL You Should Always Run #

sql
-- 1) Grain uniqueness
SELECT order_id, order_item_id, COUNT(*) AS row_count
FROM fact_order_item
GROUP BY order_id, order_item_id
HAVING COUNT(*) > 1;

-- 2) Unexpected multiplicative join check
SELECT
  SUM(net_amount) AS net_before
FROM fact_order_item;

SELECT
  SUM(f.net_amount) AS net_after
FROM fact_order_item f
JOIN dim_customer c
  ON c.customer_id = f.customer_id;

-- Compare net_before vs net_after; mismatch suggests duplicated joins.

-- 3) Time semantics check
SELECT
  MIN(order_ts) AS min_order_ts,
  MAX(order_ts) AS max_order_ts,
  MIN(order_date) AS min_order_date,
  MAX(order_date) AS max_order_date
FROM fact_order_item;

Event Time vs Processing Time #

Track both:

  • event time: when business action occurred
  • processing time: when warehouse received/processed data

This separation is required for late-arriving events, backfills, and SLA diagnostics.

Common Failure Modes #

  • Mixing order-level and item-level facts in the same table.
  • Summing pre-aggregated monthly tables into quarterly numbers without contract checks.
  • Joining dimensions with non-unique business keys.
  • Using percentages as additive metrics.
  • Undefined date meaning (order date vs invoice date vs settlement date).
  • Treating snapshots like transactions in trend analysis.

Performance Considerations #

  • OLTP indexing prioritizes point lookups and updates.
  • OLAP structures prioritize scan efficiency and join locality.
  • Pre-aggregate high-frequency dashboard paths when raw facts are expensive.
  • Keep fact tables at lowest useful grain, then publish curated aggregates for speed.
  • Enforce partitioning/filter conventions on common time predicates.

Implementation Checklist #

Before publishing an analytical model:

  1. Grain statement exists in docs and code comments.
  2. Uniqueness test is automated on grain keys.
  3. Additivity class is defined for each metric.
  4. Event-time and processing-time columns are present.
  5. Reconciliation against source totals is complete.
  6. Consumer query examples are tested at target grain.

Advanced Engineer Notes #

In mature platforms, grain is treated as metadata and contract:

  • captured in model spec files
  • validated in CI before deployment
  • linked to semantic metric definitions
  • versioned when grain changes require restatement

A grain change is a breaking change and should follow controlled migration paths.

Interactive SQL Check #

Run this check to verify an event-level analytical grain (one row per order item on completed orders).

Loading SQL editor...
Warning

If grain is ambiguous, every downstream KPI is fragile. Treat grain as a first-class engineering artifact, not a documentation afterthought.