Understanding your customer as a marketer is often the hardest part of marketing. It’s very easy to make guesses about your customers and waste budgets testing hypotheses, but it can push your success back by months. This article will show you how to learn about your customer’s with a purchase behavior heatmap.
To show you the best way to create a purchase behavior heatmap, I need to show you the process of using real data. Please remember that this code may change depending on your data. Formatting the purchase data will be the hardest part of this project.
We will be using data from this dataset. The source is Kaggle, a dataset marketplace. The data we will be using is Supermarket Sales by Aung Pyae. You can download the dataset by making a Kaggle account.
If you want to find a different dataset to use, visit this article of finding free datasets for passion projects.
In order to produce a product behavior heatmap, you will need to import pandas, numpy, matplotlib, calendar, seaborn, and datetime.
Pandas is a library that acts like spreadsheets on steroids.
Numpy allows us to calculate medians and standardize the heatmap.
Matplotlib is a plotting library that allows us to set the axes, titles, and labels of our chart that we will export. Seaborn is another plotting library that is more advanced than Matplotlib and can be used to create advanced graphs like heatmaps and more.
Calendar and datetime are used to calculate timestamps, timezone, and datetimes as well as calculate the exact weekday of the year that a specific day falls on.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import calendar import datetime as dt
In order to load the purchase data, we need to use the “read_csv” function in the pandas library. We will do this by creating a “purchase_data” variable where we will store the results of reading the “supermarket_sales.csv”.
We want to take a look at our full data so we will print the purchase data. That being said, the print function will often only show a set amount of columns. This can make it tough to read the data.
I recommend using “pd.options.display.width = 0” as a line of code that configures your Pandas library to show the width of the full DataFrame with all the columns.
purchase_data = pd.read_csv('supermarket_sales.csv') pd.options.display.width = 0 print(purchase_data)
In order to reformat the datetimes into weekdays and hours in the next step, we need to create functions that we can use for our DataFrames ( spreadsheets ).
Remember: a function is simply the definition of an input, steps to process that input, and output. You can define a function by calling the keyword “def” followed by the function name with inputs inside the parentheses followed by a colon. After that, you want to indent any actions inside the function.
The four functions we want to use are very similar. They will take in a time string and return the hour, day, month, or year on the time.
In order to turn a string into a datetime, we must us the strptime function. This function takes in a datestring and a datetime format to return a datetime object that can be manipulated more easily.
In this case, our datetime format will be ‘%m/%d/%Y %H:%M’. This means that all of our dates are stored as 04/02/2020 13:01.
Once our strings are turned into a datetime object, we can receive the hour by typing “.hour”. This means that we are pulling this variable from that object.
return dt.datetime.strptime(event_time, '%m/%d/%Y %H:%M').hour
We will create functions for day, month, and year.
def event_time_to_day(event_time): return dt.datetime.strptime(event_time, '%m/%d/%Y %H:%M').day def event_time_to_month(event_time): return dt.datetime.strptime(event_time, '%m/%d/%Y %H:%M').month def event_time_to_year(event_time): return dt.datetime.strptime(event_time, '%m/%d/%Y %H:%M').year
In order to create a heatmap from our data, we need to change the datetime into a cross between hour and weekday.
We want to create a column in DataFrame for the hour, day, month, and year. We will first combine the time and date into a datetime string so we can use it in our functions above.
We will use a type of function called a “lambda”. A lambda is essentially a one-lined function. They are often used to manipulated columns of DataFrames or entire rows of DataFrames.
In this first line, we will manipulate a row. First, we want to store the result of our lambda into the column on “event_datetime”. We will then call the “apply” function on the entire purchase_data DataFrame. We will pass into a lambda that takes in a variable called row.
From there, we will get the Date column of that row and combine it with an empty space and the Time column of that row.
The most important part is to add a comma and pass in another input called axis. We will set this to 1. When the axis is set to 1, it tells the apply method to work for every row rather than a column ( the default ).
purchase_data['event_datetime'] = purchase_data.apply(lambda row: row['Date'] + ' ' + row['Time'], axis=1)
Next, we want to create 4 new columns in our “purchase_data” DataFrame called “year”, “month”, “day”, and “hour”. We will store the result of lambda for each one of these and use the functions we created earlier.
Notice, there is no axis input on this apply statement as we are running it on the column, not the row.
purchase_data['year'] = purchase_data['event_datetime'].apply(lambda event_datetime: event_time_to_year(event_datetime)) purchase_data['month'] = purchase_data['event_datetime'].apply(lambda event_datetime: event_time_to_month(event_datetime)) purchase_data['day'] = purchase_data['event_datetime'].apply(lambda event_datetime: event_time_to_day(event_datetime)) purchase_data['hour'] = purchase_data['event_datetime'].apply(lambda event_datetime: event_time_to_hour(event_datetime)) print(purchase_data)
Now that we have broken the datetime values into individual numbers in the “year”, “month”, “day”, and “hour” columns, we can find the weekdays of these dates.
We will first create a list of 7 strings that have the labels of the weekdays.
weekdays =['M', 'T', 'W', 'Th', 'F', 'S', 'Su']
Next, we will create a function called “calculate_weekday”. This function will take in the number of the day, month, and year. We will then pass those into the calendar library’s method weekday and return the weekday as a number with 0 meaning Monday.
def calculate_weekday(day, month, year): return calendar.weekday(year, month, day)
We will now use our new function in an apply method using the row method as specified earlier.
purchase_data['weekday'] = purchase_data.apply(lambda row: calculate_weekday(year=row['year'], month=row['month'], day=row['day']), axis=1) print(purchase_data)
Finally, we are able to create the product behavior heatmaps we are looking forward to!
We will create a single function called “create_full_chart” and pass in our augmented purchase data. The function will save the heatmap as an image for us.
We will first run the “pivot_table” function from the Pandas library. We will pass in the purchase_data. We will set the index to be the hour and the columns to be the weekdays.
The values will be all unique “Invoice Id” with the aggregation function being count. This simply means that the values inside the new pivot table will have each column show the number of invoices that align with that hour.
We will next fill the pivot table with a zero for anytime NaN appears in the table. Next, we will calculate the median using Numpy. This will provide a color baseline for the heatmap.
To create the heatmap, we will use the Seaborn library and call the heatmap function. We will pass in the pivot table, the median as the center, and then the Color map ( cmap ) as “coolwarm”. This sets the color.
We will then call the get_figure function on the heatmap. The figure allows us to export to a PNG. Finally, we will set a few labels on the graph and save the picture as a PNG.
def create_full_chart(purchase_data): purchase_heatmap = pd.pivot_table(purchase_data, values=['Invoice ID'], index=['hour'], columns=['weekday'], aggfunc='count') purchase_heatmap = purchase_heatmap.fillna(0) print(purchase_heatmap) median = np.median(purchase_heatmap) heatmap = sns.heatmap(purchase_heatmap, center=median, linewidths=.5, cbar=True, cmap="coolwarm") figure = heatmap.get_figure() plt.title("Supermarket Purchase Data") plt.xlabel("Weekday") plt.xticks(ticks=[num + .5 for num in range(7)], labels=weekdays) axes = figure.axes figure.savefig("purchase.png", bbox_inches="tight") figure.show()
Once we finish the function, we want to simply call it on our new data.
Now, you have a lovely heatmap for your data and can see at what times and days, your customers take specific actions at your store, product, or software.
As you take a look at your new heatmap, you will notice that in this case, the highest density of purchases are on Tuesday and Sunday at 7 o clock.
If you are trying to increase purchases, you may want to anticipate this need and advertise at these times. You may also want to ask why people are buying at these times so much. It may lead to a better understanding of your customers.
Start getting creative with all the ways you can analyze your purchase behavior heatmap.