OLTP vs OLAP and Grain
Use this page as a practical reference for choosing the right modeling shape between OLTP and OLAP systems, and for defining grain correctly.
Goal:
- separate transactional and analytical concerns clearly
- define row-level grain as an explicit contract
- prevent double counting and metric drift
- build models that are both correct and performant
What OLTP and OLAP Optimize For #
| Dimension | OLTP | OLAP |
|---|---|---|
| Primary goal | Transaction correctness | Analytical insight |
| Workload pattern | many concurrent small writes | fewer but heavier reads |
| Query shape | point lookup, insert, update | scan, join, aggregate |
| Data model tendency | normalized (3NF) | star/mart/serving-oriented |
| Latency focus | low write latency | stable query latency |
| Freshness | near real-time | batch/micro-batch/stream windows |
System Boundary Rule #
Do not force one system to optimize for both worlds.
Recommended:
- OLTP stores transactional truth and process state.
- OLAP stores analytical truth and query-friendly structures.
Analytical models should be derived from OLTP (or event streams), not embedded into OLTP tables.
Grain: The Core Contract #
Grain defines what one row means.
Contract template:
- one row represents: [entity/event + time context]
- uniqueness key: [column set]
- event timestamp column: [event_ts]
- processing timestamp column: [load_ts]
- additive metrics: [safe to sum]
- non-additive metrics: [recompute from base measures]
- late-arriving policy: [upsert/restate/ignore]
Example Grain Contracts #
-- Transaction fact example
-- one row = one product line on one order
-- unique key = (order_id, order_item_id)
-- additive metrics = quantity, net_amount, cost_amount
-- Daily snapshot fact example
-- one row = one account at end of day
-- unique key = (account_id, snapshot_date)
-- semi-additive metric = ending_balance (additive by account, not across dates)
-- Period aggregate example
-- one row = one customer x month
-- unique key = (customer_id, month_start_date)
-- non-additive metric = margin_pct (derive from sums, do not sum percentages)OLTP to OLAP Mapping #
Typical mapping from transactional tables to analytics models:
- OLTP
ordersandorder_items-> OLAPfact_order_item - OLTP
customers-> OLAPdim_customer - OLTP status history -> OLAP snapshots or status-change facts
Keep raw transactional IDs for traceability, but use analytical keys for joins and governance.
Reference SQL: Build a Correct Fact Grain #
CREATE TABLE fact_order_item AS
SELECT
oi.order_id,
oi.order_item_id,
o.customer_id,
o.order_ts,
o.order_ts::DATE AS order_date,
oi.product_id,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) AS gross_amount,
COALESCE(oi.discount_amount, 0) AS discount_amount,
(oi.quantity * oi.unit_price) - COALESCE(oi.discount_amount, 0) AS net_amount
FROM orders o
JOIN order_items oi
ON oi.order_id = o.order_id
WHERE o.order_status = 'completed';Validation SQL You Should Always Run #
-- 1) Grain uniqueness
SELECT order_id, order_item_id, COUNT(*) AS row_count
FROM fact_order_item
GROUP BY order_id, order_item_id
HAVING COUNT(*) > 1;
-- 2) Unexpected multiplicative join check
SELECT
SUM(net_amount) AS net_before
FROM fact_order_item;
SELECT
SUM(f.net_amount) AS net_after
FROM fact_order_item f
JOIN dim_customer c
ON c.customer_id = f.customer_id;
-- Compare net_before vs net_after; mismatch suggests duplicated joins.
-- 3) Time semantics check
SELECT
MIN(order_ts) AS min_order_ts,
MAX(order_ts) AS max_order_ts,
MIN(order_date) AS min_order_date,
MAX(order_date) AS max_order_date
FROM fact_order_item;Event Time vs Processing Time #
Track both:
- event time: when business action occurred
- processing time: when warehouse received/processed data
This separation is required for late-arriving events, backfills, and SLA diagnostics.
Common Failure Modes #
- Mixing order-level and item-level facts in the same table.
- Summing pre-aggregated monthly tables into quarterly numbers without contract checks.
- Joining dimensions with non-unique business keys.
- Using percentages as additive metrics.
- Undefined date meaning (order date vs invoice date vs settlement date).
- Treating snapshots like transactions in trend analysis.
Performance Considerations #
- OLTP indexing prioritizes point lookups and updates.
- OLAP structures prioritize scan efficiency and join locality.
- Pre-aggregate high-frequency dashboard paths when raw facts are expensive.
- Keep fact tables at lowest useful grain, then publish curated aggregates for speed.
- Enforce partitioning/filter conventions on common time predicates.
Implementation Checklist #
Before publishing an analytical model:
- Grain statement exists in docs and code comments.
- Uniqueness test is automated on grain keys.
- Additivity class is defined for each metric.
- Event-time and processing-time columns are present.
- Reconciliation against source totals is complete.
- Consumer query examples are tested at target grain.
Advanced Engineer Notes #
In mature platforms, grain is treated as metadata and contract:
- captured in model spec files
- validated in CI before deployment
- linked to semantic metric definitions
- versioned when grain changes require restatement
A grain change is a breaking change and should follow controlled migration paths.
Interactive SQL Check #
Run this check to verify an event-level analytical grain (one row per order item on completed orders).
Related Lessons #
If grain is ambiguous, every downstream KPI is fragile. Treat grain as a first-class engineering artifact, not a documentation afterthought.