Capstone Part 3: Segments & Recommendation
Part 1 sized the business. Part 2 asked whether growth was sticky. Now the last question, the one the business actually pays analysts for: "Where should we double down?"
Segmentation is how you answer. Rank products inside categories to find the breakout SKUs. Tier customers by lifetime value to find the ones worth a retention campaign. Then translate the numbers into a recommendation: a short written claim a stakeholder can act on.
Step 1: Top Products Within Category #
"Top product by revenue" overall is one number. "Top product within each category" is the business-relevant cut, where each category competes on its own merits. This is the top-N-per-group pattern from Analytics Patterns applied verbatim.
-- Top product per category by paid revenue
WITH product_revenue AS (
SELECT
p.category,
p.id AS product_id,
p.name AS product_name,
ROUND(SUM(o.total), 2) AS revenue
FROM orders AS o
JOIN products AS p ON p.id = o.product_id
WHERE o.status = 'paid'
GROUP BY p.category, p.id, p.name
),
ranked AS (
SELECT
category,
product_id,
product_name,
revenue,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY revenue DESC, product_id
) AS rn
FROM product_revenue
)
SELECT category, product_name, revenue
FROM ranked
WHERE rn = 1
ORDER BY revenue DESC;The ORDER BY revenue DESC, product_id tie-breaker inside the window is deliberate: two products with identical revenue need a deterministic "winner" so the report is reproducible.
Step 2: Customer LTV + Tiering #
LTV (lifetime value) = sum of paid revenue per user. Once you have it, segment customers into tiers so the business can design different treatments: a whale gets personal outreach, an occasional buyer gets a re-engagement email.
WITH ltv AS (
SELECT
u.id AS user_id,
u.email,
COALESCE(SUM(o.total), 0) AS lifetime_revenue
FROM users AS u
LEFT JOIN orders AS o
ON o.user_id = u.id
AND o.status = 'paid'
GROUP BY u.id, u.email
)
SELECT
user_id,
email,
ROUND(lifetime_revenue, 2) AS lifetime_revenue,
CASE
WHEN lifetime_revenue >= 500 THEN 'whale'
WHEN lifetime_revenue >= 50 THEN 'regular'
ELSE 'occasional'
END AS tier
FROM ltv
ORDER BY lifetime_revenue DESC;The LEFT JOIN + COALESCE pattern matters here for the same reason as Part 2: a user with zero paid orders must still appear in the result. They're the occasional tier with LTV = 0, and they're the population a re-activation campaign targets.
Tier thresholds are business decisions, not SQL decisions. 500 / 50 look arbitrary because they are arbitrary. A real engagement would anchor these on the distribution of actual LTV (e.g., top 10% = whale, next 40% = regular, bottom 50% = occasional) or on a stakeholder's explicit spending bands. Don't ship tier boundaries without writing down where they came from.
Step 3: Turn Numbers Into a Recommendation #
The SQL is the easy part. The hard part is converting four queries into two sentences a VP can act on. Good recommendations follow a simple shape:
> Finding: <what the data shows>. Action: <what to do>. Caveat: <what might break the claim>.
Example for this capstone:
> Finding: Electronics generates 87% of paid revenue, led by Laptop Pro 15; 40% of signups convert to paid buyers; 2 of 10 users account for 67% of LTV. Action: protect the Laptop Pro 15 line (inventory, featured placement) and launch a whale-retention program for the top 2 users. Caveat: with only 10 users in the sample, conversion-rate confidence is low, so rerun after the next 100 signups before committing budget.
Notice what's not in there: every number from every query. A stakeholder-facing recommendation picks the 2-3 numbers that support the action. The rest is supporting detail in an appendix.
Try It #
Run the top-product-per-category query. Then modify it to return the top 2 per category. Does the #2 product in each category look like a challenger or a distant also-ran?
Practice #
Build the Top 2 customers per LTV tier report. Columns: user_id, lifetime_revenue (rounded to 2 decimals), tier, rn (1 or 2, rank within tier by LTV DESC, tie-break by user_id ASC).
Tiers:
whale: LTV >= 500regular: 50 <= LTV < 500occasional: LTV < 50
Include all 10 users, even those with zero paid revenue (they're the occasional tier). Filter the final result to rn <= 2. No ordering requirement on the final output.
Mistakes to Watch For #
- Ranking without a deterministic tie-breaker. If two customers have identical LTV, the "top whale" can flip between runs. Always add a stable secondary sort (usually
user_id). - Dropping zero-LTV users by using INNER JOIN. Your
occasionaltier disappears and the re-activation campaign never gets a target list. - Choosing tier thresholds by SQL convenience, not business reality.
>= 500is easy to type; it should still be justified by distribution or stakeholder input. - Reporting top-N without the magnitude. "Top product per category" with no revenue column is useless — #1 with $12M is a different story than #1 with $300.
- Delivering the SQL instead of the recommendation. Stakeholders care about the action, not the query. Write the finding → action → caveat sentence; the SQL lives in the appendix.
Capstone Complete #
You've run the full arc: grain validation → KPI baseline → cohort retention → segmentation + recommendation. This is what analytics engineers and SQL analysts do every week, at scale. The SQL techniques transfer directly; the judgment calls (definitions, thresholds, what to surface) are the craft.
Where next:
- Modeling section — learn how to design the fact/dimension tables these queries sit on top of in production.
- Data Engineering section — learn how pipelines move raw source data into those analytical tables reliably.
- Interview Prep / Practice — drill the individual techniques against timed problems and industry-specific datasets.