Reference

Star Schema Interview Drills

This drill focuses on grain definition, fact/dimension boundaries, and KPI safety in star schema design questions.

Hard Prompt #

Design a sales mart over this dataset: monthly net revenue and return rate by product category, at a fact grain of one row per order item.

The dataset gives you orders, order items, products (with category), returns, and customers. Three dimensions interviewers will probe are design discussion only, not in this dataset:

  • Margin needs a unit-cost column the product table does not have. Say where cost would live (the product dimension, or a snapshotted cost on the fact row) and why.
  • Channel needs an acquisition or order channel attribute that neither orders nor customers carry. Discuss which table would own it.
  • Point-in-time customer attributes need a slowly changing dimension (SCD2). The customer table here holds only current values, so historical cohort views would silently use today's attributes.

Name the gap, sketch the design, then build what the data supports:

Loading SQL editor...
sql
-- Grain uniqueness validation example
SELECT
  oi.order_id,
  oi.product_id,
  COUNT(*) AS row_count
FROM ecom_order_items oi
GROUP BY oi.order_id, oi.product_id
HAVING COUNT(*) > 1;

Scoring Rubric #

  • grain clarity and consistency (35%), strong: names the fact grain before any SQL and never aggregates across mixed grains
  • dimension/fact design quality (30%), strong: places margin, channel, and SCD2 attributes in the right tables and flags what the current data cannot answer
  • KPI safety and validation plan (20%) — strong: guards ratio denominators and proposes a grain-uniqueness check like the one below
  • explanation clarity (15%) — strong: walks the interviewer from grain to dimensions to KPI in that order