Practice

Meet the E-commerce Dataset

Before you write a single KPI query, meet the data. Analysts who skip this step ship wrong numbers with confidence.

The routine is always the same:

Grain first. How many rows, and what does one row mean in each table?
Status and ranges next. Which rows count, and what time window do they cover?
Reconciliation and referential sanity. Do the facts agree with each other?

Every editor on this page is a sandbox. Nothing is graded, so run the queries, change them, break them.

1. Row Counts and Grain #

Six tables, six grains:

ecom_customers: one row per customer
ecom_products: one row per product
ecom_orders: one row per order header
ecom_order_items: one row per product line within an order
ecom_returns: one row per returned line (referencing order_item_id)
ecom_events: one row per clickstream event

A count alone does not prove grain. Compare COUNT(*) against COUNT(DISTINCT key) to prove there are no duplicates.

Loading SQL editor...

2. Order Status Mix #

Almost every metric in this track filters to order_status = 'completed'. Before trusting that filter, see what statuses exist and how much volume each carries.

Loading SQL editor...

3. Date Ranges #

What window does each fact table actually cover? Orders, returns, and clickstream events each have their own range, and they do not line up. Knowing the ranges stops you from reading "zero purchases in October" as a crash when it is just the order book ending earlier than the event log.

Loading SQL editor...

4. Headers vs Lines: a Reconciliation Peek #

ecom_orders.order_total and SUM(ecom_order_items.line_total) describe the same money at two grains. If they disagree, every revenue number you publish depends on which table you summed. Eyeball the comparison here — the next page turns it into a graded audit.

Loading SQL editor...

5. Who Actually Buys? #

Dimensions list who could act; facts record who did. The gap between them matters: customers with no orders are invisible to any query that starts from ecom_orders.

Loading SQL editor...

6. The Events Table Tour #

ecom_events is the clickstream: view → add_to_cart → checkout → purchase. It powers the funnel pages later in this track, so take its measure now: which event types exist, how many customers appear, and where the data is thin.

Loading SQL editor...

Tip

This 15-minute routine (counts, grain proof, status mix, date ranges, one reconciliation peek, a fact-coverage check) works on any unfamiliar dataset, not just this one. Do it before every new project and you will catch fanout bugs and missing-data surprises before they reach a dashboard.

Next Step #

Now formalize what you just eyeballed:

Data Quality Audit: graded checks that prove the dataset is clean.

Meet the E-commerce Dataset

1. Row Counts and Grain #

2. Order Status Mix #

3. Date Ranges #

4. Headers vs Lines: a Reconciliation Peek #

5. Who Actually Buys? #

6. The Events Table Tour #

Next Step #

Related In E-commerce Analytics