Meet the E-commerce Dataset
Before you write a single KPI query, meet the data. Analysts who skip this step ship wrong numbers with confidence.
The routine is always the same:
- Grain first. How many rows, and what does one row mean in each table?
- Status and ranges next. Which rows count, and what time window do they cover?
- Reconciliation and referential sanity. Do the facts agree with each other?
Every editor on this page is a sandbox. Nothing is graded, so run the queries, change them, break them.
1. Row Counts and Grain #
Six tables, six grains:
ecom_customers: one row per customerecom_products: one row per productecom_orders: one row per order headerecom_order_items: one row per product line within an orderecom_returns: one row per returned line (referencingorder_item_id)ecom_events: one row per clickstream event
A count alone does not prove grain. Compare COUNT(*) against COUNT(DISTINCT key) to prove there are no duplicates.
2. Order Status Mix #
Almost every metric in this track filters to order_status = 'completed'. Before trusting that filter, see what statuses exist and how much volume each carries.
3. Date Ranges #
What window does each fact table actually cover? Orders, returns, and clickstream events each have their own range, and they do not line up. Knowing the ranges stops you from reading "zero purchases in October" as a crash when it is just the order book ending earlier than the event log.
4. Headers vs Lines: a Reconciliation Peek #
ecom_orders.order_total and SUM(ecom_order_items.line_total) describe the same money at two grains. If they disagree, every revenue number you publish depends on which table you summed. Eyeball the comparison here — the next page turns it into a graded audit.
5. Who Actually Buys? #
Dimensions list who could act; facts record who did. The gap between them matters: customers with no orders are invisible to any query that starts from ecom_orders.
6. The Events Table Tour #
ecom_events is the clickstream: view → add_to_cart → checkout → purchase. It powers the funnel pages later in this track, so take its measure now: which event types exist, how many customers appear, and where the data is thin.
This 15-minute routine (counts, grain proof, status mix, date ranges, one reconciliation peek, a fact-coverage check) works on any unfamiliar dataset, not just this one. Do it before every new project and you will catch fanout bugs and missing-data surprises before they reach a dashboard.
Next Step #
Now formalize what you just eyeballed:
- Data Quality Audit: graded checks that prove the dataset is clean.