Practice

RFM Customer Segmentation

Marketing wants to stop sending the same email to everyone. The classic fix is RFM segmentation: score every customer on three behaviors and group them:

  • Recency: how recently did they buy? (recent buyers respond best)
  • Frequency: how often do they buy? (habits beat one-offs)
  • Monetary: how much have they spent? (protect the big spenders)

Three graded steps: compute the raw R/F/M values, score them into tiers with NTILE, then label segments with a CASE. All metrics use completed orders only, and every date calculation is anchored to a fixed as-of date (DATE '2024-06-01'), never CURRENT_DATE, so the same query gives the same answer next week.

1. Raw R/F/M Values #

Task: one row per customer with at least one completed order (customers who never completed a purchase have no recency and are out of scope). Output:

  • customer_id
  • recency_days: days from the customer's last completed order_date to DATE '2024-06-01' (date subtraction gives whole days)
  • frequency: count of completed orders
  • monetary: completed revenue, ROUND(..., 2)

Order by customer_id.

Loading SQL editor...

2. Scoring with NTILE(3) #

Task: turn the raw values into 1-3 scores where 3 is always best. Output customer_id, r_score, f_score, m_score, ordered by customer_id. Score definitions, exactly these:

  • r_score = NTILE(3) OVER (ORDER BY recency_days DESC, customer_id): oldest buyers first, so the most recent land in tier 3
  • f_score = NTILE(3) OVER (ORDER BY frequency ASC, customer_id): most frequent land in tier 3
  • m_score = NTILE(3) OVER (ORDER BY monetary ASC, customer_id): biggest spenders land in tier 3

Tie handling, stated explicitly: NTILE does not understand ties. It deals rows into buckets purely by row order, so two customers with identical frequency can land in different tiers wherever a bucket boundary falls between them. That is why every ORDER BY above carries the customer_id tie-break: it makes the row order, and therefore every score, deterministic. Watch it happen in this data: customers 9 and 10 both have frequency 1, yet they split across f_score 1 and 2.

Loading SQL editor...

3. Segment Labels #

Task: collapse the scores into named segments and count customers in each. The rules, exactly these, evaluated top-down in a CASE:

  1. r_score = 3 AND f_score >= 2Champions (recent and repeat)
  2. r_score >= 2 AND f_score >= 2Loyal (repeat, reasonably recent)
  3. f_score >= 2At Risk (used to buy repeatedly, has gone quiet; this is who win-back emails are for)
  4. everything else → Dormant (no repeat habit; includes recent one-time buyers who have not yet come back)

Output segment, customers, ordered by customers DESC, then segment. On this dataset every segment is non-empty — At Risk is the smallest with a single customer.

Loading SQL editor...
Warning

Production RFM is a metric contract, not a formula: the as-of date, the status filter, the tier count, the tie-break, and the segment rules all change the labels. A customer who silently moves from Loyal to At Risk because someone changed NTILE(3) to NTILE(5) will get the wrong email. Version the rules and anchor the as-of date, exactly as you did here.

Next Step #

Segments tell you who to target; the funnel tells you where you lose them: