Reference

OLTP vs OLAP and Grain

Use this page as a practical reference for choosing the right modeling shape between OLTP and OLAP systems, and for defining grain correctly.

Goal:

separate transactional and analytical concerns clearly
define row-level grain as an explicit contract
prevent double counting and metric drift
build models that are both correct and performant

What OLTP and OLAP Optimize For #

Dimension	OLTP	OLAP
Primary goal	Transaction correctness	Analytical insight
Workload pattern	many concurrent small writes	fewer but heavier reads
Query shape	point lookup, insert, update	scan, join, aggregate
Data model tendency	normalized (3NF)	star/mart/serving-oriented
Latency focus	low write latency	stable query latency
Freshness	near real-time	batch/micro-batch/stream windows

System Boundary Rule #

Do not force one system to optimize for both worlds.

Recommended:

OLTP stores transactional truth and process state.
OLAP stores analytical truth and query-friendly structures.

Analytical models should be derived from OLTP (or event streams), not embedded into OLTP tables.

Grain: The Core Contract #

Grain defines what one row means.

Contract template:

one row represents: [entity/event + time context]
uniqueness key: [column set]
event timestamp column: [event_ts]
processing timestamp column: [load_ts]
additive metrics: [safe to sum]
non-additive metrics: [recompute from base measures]
late-arriving policy: [upsert/restate/ignore]

Example Grain Contracts #

sql

-- Transaction fact example
-- one row = one product line on one order
-- unique key = (order_id, order_item_id)
-- additive metrics = quantity, net_amount, cost_amount

-- Daily snapshot fact example
-- one row = one account at end of day
-- unique key = (account_id, snapshot_date)
-- semi-additive metric = ending_balance (additive by account, not across dates)

-- Period aggregate example
-- one row = one customer x month
-- unique key = (customer_id, month_start_date)
-- non-additive metric = margin_pct (derive from sums, do not sum percentages)

OLTP to OLAP Mapping #

Typical mapping from transactional tables to analytics models:

OLTP orders and order_items -> OLAP fact_order_item
OLTP customers -> OLAP dim_customer
OLTP status history -> OLAP snapshots or status-change facts

Keep raw transactional IDs for traceability, but use analytical keys for joins and governance.

Reference SQL: Build a Correct Fact Grain #

sql

CREATE TABLE fact_order_item AS
SELECT
  oi.order_id,
  oi.order_item_id,
  o.customer_id,
  o.order_ts,
  o.order_ts::DATE AS order_date,
  oi.product_id,
  oi.quantity,
  oi.unit_price,
  (oi.quantity * oi.unit_price) AS gross_amount,
  COALESCE(oi.discount_amount, 0) AS discount_amount,
  (oi.quantity * oi.unit_price) - COALESCE(oi.discount_amount, 0) AS net_amount
FROM orders o
JOIN order_items oi
  ON oi.order_id = o.order_id
WHERE o.order_status = 'completed';

Validation SQL You Should Always Run #

sql

-- 1) Grain uniqueness
SELECT order_id, order_item_id, COUNT(*) AS row_count
FROM fact_order_item
GROUP BY order_id, order_item_id
HAVING COUNT(*) > 1;

-- 2) Unexpected multiplicative join check
SELECT
  SUM(net_amount) AS net_before
FROM fact_order_item;

SELECT
  SUM(f.net_amount) AS net_after
FROM fact_order_item f
JOIN dim_customer c
  ON c.customer_id = f.customer_id;

-- Compare net_before vs net_after; mismatch suggests duplicated joins.

-- 3) Time semantics check
SELECT
  MIN(order_ts) AS min_order_ts,
  MAX(order_ts) AS max_order_ts,
  MIN(order_date) AS min_order_date,
  MAX(order_date) AS max_order_date
FROM fact_order_item;

Event Time vs Processing Time #

Track both:

event time: when business action occurred
processing time: when warehouse received/processed data

This separation is required for late-arriving events, backfills, and SLA diagnostics.

Common Failure Modes #

Mixing order-level and item-level facts in the same table.
Summing pre-aggregated monthly tables into quarterly numbers without contract checks.
Joining dimensions with non-unique business keys.
Using percentages as additive metrics.
Undefined date meaning (order date vs invoice date vs settlement date).
Treating snapshots like transactions in trend analysis.

Performance Considerations #

OLTP indexing prioritizes point lookups and updates.
OLAP structures prioritize scan efficiency and join locality.
Pre-aggregate high-frequency dashboard paths when raw facts are expensive.
Keep fact tables at lowest useful grain, then publish curated aggregates for speed.
Enforce partitioning/filter conventions on common time predicates.

Implementation Checklist #

Before publishing an analytical model:

Grain statement exists in docs and code comments.
Uniqueness test is automated on grain keys.
Additivity class is defined for each metric.
Event-time and processing-time columns are present.
Reconciliation against source totals is complete.
Consumer query examples are tested at target grain.

Advanced Engineer Notes #

In mature platforms, grain is treated as metadata and contract:

captured in model spec files
validated in CI before deployment
linked to semantic metric definitions
versioned when grain changes require restatement

A grain change is a breaking change and should follow controlled migration paths.

Interactive SQL Check #

Run this check to verify an event-level analytical grain (one row per order item on completed orders).

Loading SQL editor...

Warning

If grain is ambiguous, every downstream KPI is fragile. Treat grain as a first-class engineering artifact, not a documentation afterthought.