End-to-End Datamart Case Study
Use this case study as a blueprint to design and deliver a production-ready datamart.
Scenario:
- insurance domain
- policy, premium, and claims analytics
- business needs written premium, earned premium, claims ratio, and trend cuts by product/channel/region
Step 1: Business Questions and Grain #
Primary questions:
- What is the claims ratio by product, month over month?
- How do written and earned premium trend by region?
Fact grain decision:
- one row in fact_policy_monthly represents one policy for one month.
Active-policy counts by segment and channel-level profitability are natural follow-on questions, but they need customer and channel dimensions this case study does not model. Scope them as a second phase rather than letting them blur the grain decision.
Step 2: Core Star Model #
CREATE TABLE dim_policy (
policy_key BIGINT,
policy_id VARCHAR,
product_key BIGINT,
channel_key BIGINT,
customer_key BIGINT,
region_key BIGINT,
status VARCHAR
);
CREATE TABLE fact_policy_monthly (
policy_key BIGINT,
month_start_date DATE,
written_premium DECIMAL(18,2),
earned_premium DECIMAL(18,2),
paid_claim_amount DECIMAL(18,2),
reported_claim_amount DECIMAL(18,2),
claim_count INTEGER
);The fact carries month_start_date directly, so the model skips a dim_date: at month grain there is nothing for it to add yet. Likewise the bare product_key, channel_key, and region_key on dim_policy would each become a proper dimension table in a fuller build; they stay keys here to keep the case study focused on fact grain.
Step 3: Serving Layer View #
CREATE OR REPLACE VIEW mart_insurance_monthly AS
SELECT
f.month_start_date,
p.policy_id,
p.product_key,
p.channel_key,
p.region_key,
f.written_premium,
f.earned_premium,
f.paid_claim_amount,
f.reported_claim_amount,
f.claim_count,
f.reported_claim_amount / NULLIF(f.earned_premium, 0) AS reported_claim_ratio
FROM fact_policy_monthly f
JOIN dim_policy p ON p.policy_key = f.policy_key;Step 4: KPI Query Examples #
-- Claims ratio by month and product
SELECT
month_start_date,
product_key,
SUM(reported_claim_amount) / NULLIF(SUM(earned_premium), 0) AS claims_ratio
FROM mart_insurance_monthly
GROUP BY month_start_date, product_key
ORDER BY month_start_date, product_key;
-- Written vs earned premium by region
SELECT
month_start_date,
region_key,
SUM(written_premium) AS written_premium,
SUM(earned_premium) AS earned_premium
FROM mart_insurance_monthly
GROUP BY month_start_date, region_key
ORDER BY month_start_date, region_key;Step 5: Data Quality and Reconciliation #
-- 1) grain uniqueness
SELECT policy_key, month_start_date, COUNT(*) AS row_count
FROM fact_policy_monthly
GROUP BY policy_key, month_start_date
HAVING COUNT(*) > 1;
-- 2) orphan keys
SELECT COUNT(*) AS orphan_policy_rows
FROM fact_policy_monthly f
LEFT JOIN dim_policy p ON p.policy_key = f.policy_key
WHERE p.policy_key IS NULL;
-- 3) metric sanity
SELECT COUNT(*) AS negative_rows
FROM fact_policy_monthly
WHERE written_premium < 0
OR earned_premium < 0
OR reported_claim_amount < 0;Step 6: Tradeoff Decisions #
- Keep star schema for analyst usability.
- Add summary serving tables for heavy dashboard slices.
- Version claims-ratio metric if formula changes.
- Keep source-to-mart lineage for auditability.
Production Readiness Checklist #
Before launch:
- Grain and metric contracts documented.
- All model tests automated in pipeline.
- Reconciliation reports reviewed with business owners.
- Freshness SLA and owner defined.
- Backfill/restate runbook approved.
Senior Engineer Playbook #
- Query-plan diagnostics: profile hottest KPI queries and verify join selectivity at intended datamart grain.
- Failure postmortem pattern: keep a runbook for claims-ratio spikes caused by grain mismatch or late adjustments.
- Migration playbook: roll out the case-study model in waves (shadow, pilot, default) using the month x product_line claims-ratio surface as the regression gate for sign-off.
Interactive SQL Check #
Run this case-study check to compute claims ratio at the mart's own grain (month x product_line, the same grain as fact_policy_monthly) using fanout-safe policy-month pre-aggregations.
Related Lessons #
A good case-study datamart is not just queryable. It is governable, testable, and resilient under change.