Practice

Meet the Insurance Dataset

Before you write a single KPI query, meet the data. Analysts who skip this step ship wrong numbers with confidence.

The routine is always the same:

Grain first: how many rows, and what does one row mean in each table?
Ranges next: what time window does the data actually cover?
Referential sanity: do the facts actually point at real dimension rows?

Every editor on this page is a sandbox. Nothing is graded, so run the queries, change them, break them.

1. Row Counts and Grain #

Start with raw row counts, then confirm the declared grain:

ins_customers: one row per customer
ins_policies: one row per policy
ins_premium_facts: one row per policy per month
ins_claims: one row per claim

A count alone does not prove grain. Compare COUNT(*) against COUNT(DISTINCT key) to prove there are no duplicates.

Loading SQL editor...

2. Date Ranges #

Next question: what period does this book of business cover? Premium accrual, claim activity, and policy starts each have their own window — and they rarely line up perfectly.

Loading SQL editor...

3. Policy Status Mix #

Status mix tells you how much of the portfolio is still in force. It also warns you which filters matter later: a metric "for active policies" excludes a different population than a metric for the whole book.

Loading SQL editor...

4. Trace One Policy End-to-End #

Pick one policy and follow it through every table. Policy 1001 is an Auto policy that starts in January: it should show its full written premium in month one, a steady earned amount every month after, and one claim landing in February.

Tracing a single entity end-to-end is the fastest way to internalize how the fact tables relate to the dimension.

Loading SQL editor...

5. Referential Sanity #

Facts are only trustworthy if every fact row points at a real dimension row. The classic test is an anti-join: keep fact rows whose policy does not exist. An empty result is the pass.

Loading SQL editor...

6. Find the Oddity #

One policy in ins_premium_facts behaves differently from every other policy that started in the same month. Profile months-present per policy and find it — then explain why using what you learned in the status-mix section.

Loading SQL editor...

Tip

This 15-minute routine (counts, grain proof, date ranges, one end-to-end trace, referential anti-joins) works on any unfamiliar dataset, not just this one. Do it before every new project and you will catch fanout bugs and missing-data surprises before they reach a dashboard.

Next Step #

Now formalize what you just eyeballed:

Data Quality Audit — graded checks that prove the dataset is clean.

Meet the Insurance Dataset

1. Row Counts and Grain #

2. Date Ranges #

3. Policy Status Mix #

4. Trace One Policy End-to-End #

5. Referential Sanity #

6. Find the Oddity #

Next Step #

Related In Insurance Analytics