Lesson Advanced

Capstone Part 3: Segments & Recommendation

Part 1 sized the business. Part 2 asked whether growth was sticky. Now the last question, the one the business actually pays analysts for: "Where should we double down?"
Segmentation is how you answer. Rank products inside categories to find the breakout SKUs. Tier customers by lifetime value to find the ones worth a retention campaign. Then translate the numbers into a recommendation: a short written claim a stakeholder can act on.

Step 1: Top Products Within Category #

"Top product by revenue" overall is one number. "Top product within each category" is the business-relevant cut, where each category competes on its own merits. This is the top-N-per-group pattern from Analytics Patterns applied verbatim.

sql
-- Top product per category by paid revenue
WITH product_revenue AS (
  SELECT
    p.category,
    p.id AS product_id,
    p.name AS product_name,
    ROUND(SUM(o.total), 2) AS revenue
  FROM orders AS o
  JOIN products AS p ON p.id = o.product_id
  WHERE o.status = 'paid'
  GROUP BY p.category, p.id, p.name
),
ranked AS (
  SELECT
    category,
    product_id,
    product_name,
    revenue,
    ROW_NUMBER() OVER (
      PARTITION BY category
      ORDER BY revenue DESC, product_id
    ) AS rn
  FROM product_revenue
)
SELECT category, product_name, revenue
FROM ranked
WHERE rn = 1
ORDER BY revenue DESC;

The ORDER BY revenue DESC, product_id tie-breaker inside the window is deliberate: two products with identical revenue need a deterministic "winner" so the report is reproducible.

Step 2: Customer LTV + Tiering #

LTV (lifetime value) = sum of paid revenue per user. Once you have it, segment customers into tiers so the business can design different treatments: a whale gets personal outreach, an occasional buyer gets a re-engagement email.

sql
WITH ltv AS (
  SELECT
    u.id AS user_id,
    u.email,
    COALESCE(SUM(o.total), 0) AS lifetime_revenue
  FROM users AS u
  LEFT JOIN orders AS o
    ON o.user_id = u.id
    AND o.status = 'paid'
  GROUP BY u.id, u.email
)
SELECT
  user_id,
  email,
  ROUND(lifetime_revenue, 2) AS lifetime_revenue,
  CASE
    WHEN lifetime_revenue >= 500 THEN 'whale'
    WHEN lifetime_revenue >= 50 THEN 'regular'
    ELSE 'occasional'
  END AS tier
FROM ltv
ORDER BY lifetime_revenue DESC;

The LEFT JOIN + COALESCE pattern matters here for the same reason as Part 2: a user with zero paid orders must still appear in the result. They're the occasional tier with LTV = 0, and they're the population a re-activation campaign targets.

Info

Tier thresholds are business decisions, not SQL decisions. 500 / 50 look arbitrary because they are arbitrary. A real engagement would anchor these on the distribution of actual LTV (e.g., top 10% = whale, next 40% = regular, bottom 50% = occasional) or on a stakeholder's explicit spending bands. Don't ship tier boundaries without writing down where they came from.

Step 3: Turn Numbers Into a Recommendation #

The SQL is the easy part. The hard part is converting four queries into two sentences a VP can act on. Good recommendations follow a simple shape:

> Finding: <what the data shows>. Action: <what to do>. Caveat: <what might break the claim>.

Example for this capstone:
> Finding: Electronics generates 87% of paid revenue, led by Laptop Pro 15; 40% of signups convert to paid buyers; 2 of 10 users account for 67% of LTV. Action: protect the Laptop Pro 15 line (inventory, featured placement) and launch a whale-retention program for the top 2 users. Caveat: with only 10 users in the sample, conversion-rate confidence is low, so rerun after the next 100 signups before committing budget.

Notice what's not in there: every number from every query. A stakeholder-facing recommendation picks the 2-3 numbers that support the action. The rest is supporting detail in an appendix.

Try It #

Run the top-product-per-category query. Then modify it to return the top 2 per category. Does the #2 product in each category look like a challenger or a distant also-ran?

Loading SQL editor...

Practice #

Build the Top 2 customers per LTV tier report. Columns: user_id, lifetime_revenue (rounded to 2 decimals), tier, rn (1 or 2, rank within tier by LTV DESC, tie-break by user_id ASC).
Tiers:

  • whale: LTV >= 500
  • regular: 50 <= LTV < 500
  • occasional: LTV < 50

Include all 10 users, even those with zero paid revenue (they're the occasional tier). Filter the final result to rn <= 2. No ordering requirement on the final output.

Loading SQL editor...

Mistakes to Watch For #

  • Ranking without a deterministic tie-breaker. If two customers have identical LTV, the "top whale" can flip between runs. Always add a stable secondary sort (usually user_id).
  • Dropping zero-LTV users by using INNER JOIN. Your occasional tier disappears and the re-activation campaign never gets a target list.
  • Choosing tier thresholds by SQL convenience, not business reality. >= 500 is easy to type; it should still be justified by distribution or stakeholder input.
  • Reporting top-N without the magnitude. "Top product per category" with no revenue column is useless — #1 with $12M is a different story than #1 with $300.
  • Delivering the SQL instead of the recommendation. Stakeholders care about the action, not the query. Write the finding → action → caveat sentence; the SQL lives in the appendix.

Capstone Complete #

You've run the full arc: grain validation → KPI baseline → cohort retention → segmentation + recommendation. This is what analytics engineers and SQL analysts do every week, at scale. The SQL techniques transfer directly; the judgment calls (definitions, thresholds, what to surface) are the craft.
Where next:

  • Modeling section — learn how to design the fact/dimension tables these queries sit on top of in production.
  • Data Engineering section — learn how pipelines move raw source data into those analytical tables reliably.
  • Interview Prep / Practice — drill the individual techniques against timed problems and industry-specific datasets.