Tier Customers Based on Revenue & Usage

Share on facebook
Share on twitter
Share on linkedin

How do you find the thresholds that determine a decent customer, a good customer, and your most important customers? Most marketers will try to guess. Luckily, you have the power of scripts on your side!

We are going to be introducing one of the easier machine learning principles with a very simple example: Customer tiering.

Why is Customer Tiering Important?

First, let’s discuss what customer tiering is and why it’s beneficial. Obviously, we want every single customer to be treated with respect and care.

That being said, some customers are more likely to churn than others and some should have a dedicated account manager or even have gifts sent to them.

I’ve seen software companies mark specific companies as Tier 3, knowing that a larger percentage of churn is okay for that client. Meanwhile Tier 1 clients are the most important and should have hours spent saving them.

Tiering Thresholds

Now, we have the question of how we segment out and decide what metrics to tier based on. In the machine learning world, they like to use Recency, Frequency, and Monetary abbreviated as RFM.

First, we need to define what actions are important to track. The easiest way to track these are Purchases; however, if you are a usage-based software, it may make sense to use logins or actions in the app.

  • Recency: How long it has been since they have completed an action in days?
  • Frequency: How many times in the specified timeframe have they completed this action?
  • Monetary: How much revenue have they generated the business in this timeframe?

The question that we want to solve with this script is “What are these thresholds?”

An Introduction to Clustering

Clustering is one of the three big machine learning methods. The other two are Classification and Regression.

Clustering is a method used to decide boundaries around unstructured data. It is mostly used in NLP, Lifetime Value Prediction, and Tiering Customers.

This makes clustering perfect for tiering customers as we don’t quite know what boundaries we want to use to decide what makes a decent, good, and great customer.

We will be using the KMeans methods inside of Python to find the means that are most accurate to the clusters. We will create clusters for each attribute: Recency, Frequency, and Monetary. We will then add them together to ensure that product usage has an impact on our tier.

Import Clustering Libraries

For this project, we will need seaborn, KMeans, and Pandas. Seaborn is an advanced graphing library. Pandas is a DataFrame library that helps us handle data like a spreadsheet. KMeans will be key to this project and allow us to cluster our customers.

import seaborn as sns
import pandas as pd
from sklearn.cluster import KMeans

Create Recency Metric

Recency is the number of days since a client has made a purchase or made an important action. It is an important reference on the activity of a client and the best recency number is 0 days.

We will first group our purchases data by the customer email or by a unique id. We then want to find the latest purchase date using the max method. Finally, we will reset the index so that it only returns the customer_email and their last purchase.

latest_purchase = purchases.groupby('customer_email')['purchase_date'].max().reset_index()
latest_purchase.columns = ['customer_email', 'last_purchase']

We will then create a column in our latest_purchase DataFrame called “Recency”. It will be the difference the latest purchase of every customer and this specific user’s last purchase. We will turn that into a day number.

latest_purchase['Recency'] = (latest_purchase['last_purchase'].max() - latest_purchase['last_purchase']).dt.days
recency = latest_purchase

Create Frequency Metric

Next, we want to calculate the frequency of this user. In this case, we want to count the amount of purchases or actions they made in the timeframe we are looking at.

We will first group our purchases DataFrame by their customer email or a unique id. We will then count the number of purchases that have occurred. I recommend that you use a purchase or transaction ID so that the count will always be unique. We will then reset the index and change the column names so we have a data frame of the customer email or unique ID and their purchase count.

frequency = purchases.groupby('customer_email')['purchase_id'].count().reset_index()
frequency.columns = ['customer_email', 'Frequency']

Create Total Revenue Metric

If you have a metric in your database that handles all revenue numbers, that’s great. You don’t have to complete this step. If you run a more transactional business with multiple steps, you may need to group these together to add the values in.

revenue = purchases.groupby('customer_email')['price'].sum().reset_index()

Merge Metrics into Your Data

Now that we have created all these amazing metrics for Recency, Frequency, and Monetary, we need to merge them into our original data.

Our original data frame purchases have a lot of data so let’s cut it down just to the customer email for ease of use as a new data frame called users. The double brackets ensure that we return a DataFrame, not a Series.

users = purchases[['customer_email']]

We will then use the merge method to combine our values.

users = users.merge(recency, on='customer_email')
users = users.merge(frequency, on='customer_email')
users = users.merge(revenue, on='customer_email')

Our purchases data frame now has the customer email as an ID, the recency, frequency, and revenue all in one easy data frame with four columns. We can now create clusters with them.

Create Clusters for Each Metric

For this project, we will keep it simple and only use 3 clusters per feature. That being said, there is a much more accurate way to decide the best amount of clusters using the “Elbow Method”.

We will then fit the purchases Frequency column as its own separate dataframe. We will then store the prediction from the KMeans formula into a new column called “Frequency Cluster”.

Finally, we will group our users by their Frequency Cluster and describe the Frequency values that define those boundaries from 1 to 3.

kmeans = KMeans(n_clusters=3)
kmeans.fit(users[['Frequency']])
users['FrequencyCluster'] = kmeans.predict(users[['Frequency']])
print(users.groupby('FrequencyCluster')['Frequency'].describe())

You will then repeat this exact code by simply replacing the word Frequency with “Revenue” and “Recency”. You now have clusters for each customer.

Create Overall Scores

Now that you have a Recency cluster, Frequency cluster, and Revenue cluster, you can create an overall score from these measurements to create an overall score from 0 to 9 simply by adding them together. We are assuming in this situation that all have equal weight.

users['OverallScore'] = users['RecencyCluster'] + users['FrequencyCluster'] + users['RevenueCluster']

Tier Your Overall Scores

Now that we have created a score for each user from 0 to 9, we can mark them as an overall tier and identify the values that define those tiers.

We will default all users first as a “Decent” tier. Next, we want to set any user with a score over than 3 to “Good” and a score over 6 to “Great”.

We will then group all customers by their tier and identify the means that identify those clusters.

# assign segment names
users['Tier'] = 'Decent'
users.loc[users['OverallScore']>3, 'Tier'] = 'Good'
users.loc[users['OverallScore']>6, 'Tier'] = 'Great'
print(core_business_data.groupby('Tier')['Recency','Frequency','Revenue'].mean())

By printing the mean of each segment, we will be able to see the center of these clusters and use it as a definitive marker of a decent, good, and great client.

Export Your Customer Tiers

Finally, let’s say we want to export our customer tier values into a CRM so we can act on these values. We will want to export this as a CSV.

Simply, call the “to_csv” method and specify the correct file you want to save it to.

users.to_csv('users.csv', index=False)

This is a very simple implementation of this solution. There are many other ways that you can improve the accuracy and the precision of this script. That being said, this will get you numbers that will allow you to act today since information is only half the battle.