Reference

End-to-End Datamart Case Study

Use this case study as a blueprint to design and deliver a production-ready datamart.

Scenario:

insurance domain
policy, premium, and claims analytics
business needs written premium, earned premium, claims ratio, and trend cuts by product/channel/region

Step 1: Business Questions and Grain #

Primary questions:

What is the claims ratio by product, month over month?
How do written and earned premium trend by region?

Fact grain decision:

one row in fact_policy_monthly represents one policy for one month.

Active-policy counts by segment and channel-level profitability are natural follow-on questions, but they need customer and channel dimensions this case study does not model. Scope them as a second phase rather than letting them blur the grain decision.

Step 2: Core Star Model #

sql

CREATE TABLE dim_policy (
  policy_key BIGINT,
  policy_id VARCHAR,
  product_key BIGINT,
  channel_key BIGINT,
  customer_key BIGINT,
  region_key BIGINT,
  status VARCHAR
);

CREATE TABLE fact_policy_monthly (
  policy_key BIGINT,
  month_start_date DATE,
  written_premium DECIMAL(18,2),
  earned_premium DECIMAL(18,2),
  paid_claim_amount DECIMAL(18,2),
  reported_claim_amount DECIMAL(18,2),
  claim_count INTEGER
);

The fact carries month_start_date directly, so the model skips a dim_date: at month grain there is nothing for it to add yet. Likewise the bare product_key, channel_key, and region_key on dim_policy would each become a proper dimension table in a fuller build; they stay keys here to keep the case study focused on fact grain.

Step 3: Serving Layer View #

sql

CREATE OR REPLACE VIEW mart_insurance_monthly AS
SELECT
  f.month_start_date,
  p.policy_id,
  p.product_key,
  p.channel_key,
  p.region_key,
  f.written_premium,
  f.earned_premium,
  f.paid_claim_amount,
  f.reported_claim_amount,
  f.claim_count,
  f.reported_claim_amount / NULLIF(f.earned_premium, 0) AS reported_claim_ratio
FROM fact_policy_monthly f
JOIN dim_policy p ON p.policy_key = f.policy_key;

Step 4: KPI Query Examples #

sql

-- Claims ratio by month and product
SELECT
  month_start_date,
  product_key,
  SUM(reported_claim_amount) / NULLIF(SUM(earned_premium), 0) AS claims_ratio
FROM mart_insurance_monthly
GROUP BY month_start_date, product_key
ORDER BY month_start_date, product_key;

-- Written vs earned premium by region
SELECT
  month_start_date,
  region_key,
  SUM(written_premium) AS written_premium,
  SUM(earned_premium) AS earned_premium
FROM mart_insurance_monthly
GROUP BY month_start_date, region_key
ORDER BY month_start_date, region_key;

Step 5: Data Quality and Reconciliation #

sql

-- 1) grain uniqueness
SELECT policy_key, month_start_date, COUNT(*) AS row_count
FROM fact_policy_monthly
GROUP BY policy_key, month_start_date
HAVING COUNT(*) > 1;

-- 2) orphan keys
SELECT COUNT(*) AS orphan_policy_rows
FROM fact_policy_monthly f
LEFT JOIN dim_policy p ON p.policy_key = f.policy_key
WHERE p.policy_key IS NULL;

-- 3) metric sanity
SELECT COUNT(*) AS negative_rows
FROM fact_policy_monthly
WHERE written_premium < 0
   OR earned_premium < 0
   OR reported_claim_amount < 0;

Step 6: Tradeoff Decisions #

Keep star schema for analyst usability.
Add summary serving tables for heavy dashboard slices.
Version claims-ratio metric if formula changes.
Keep source-to-mart lineage for auditability.

Production Readiness Checklist #

Before launch:

Grain and metric contracts documented.
All model tests automated in pipeline.
Reconciliation reports reviewed with business owners.
Freshness SLA and owner defined.
Backfill/restate runbook approved.

Senior Engineer Playbook #

Query-plan diagnostics: profile hottest KPI queries and verify join selectivity at intended datamart grain.
Failure postmortem pattern: keep a runbook for claims-ratio spikes caused by grain mismatch or late adjustments.
Migration playbook: roll out the case-study model in waves (shadow, pilot, default) using the month x product_line claims-ratio surface as the regression gate for sign-off.

Interactive SQL Check #

Run this case-study check to compute claims ratio at the mart's own grain (month x product_line, the same grain as fact_policy_monthly) using fanout-safe policy-month pre-aggregations.

Loading SQL editor...

Tip

A good case-study datamart is not just queryable. It is governable, testable, and resilient under change.