Measure Customer Happiness from Messages

Share on facebook
Share on twitter
Share on linkedin

When you reach more than 500 customers for any company, it becomes increasingly difficult to keep track of your customer’s feelings about your company. By anticipating either growing delight or anger from your customer’s messages, you can quickly respond accordingly. The secret is learning how to measure your customer’s happiness from their messages.

Luckily, this isn’t incredibly difficult with Python. In fact, you can learn the basics quickly and integrate it into your proactive customer management strategy as soon as you download this tutorial.

Retrieve All Contacts

Before we get all the customer emails, we need to pull all the contacts out of HubSpot automatically. This is a simple HTTP request to the HubSpot API with our API key.

import json
import requests
import pandas as pd

headers = {"Content-Type": "application/json", "Accept-All": "application/json"}
def get_all_contact_properties():
    url = '' + hapikey
    response = requests.get(url=url, headers=headers)
    return json_normalize(response.json())

def get_all_contacts():
    contact_properties = get_all_contact_properties()
    all_properties = contact_properties['name'].to_list()
    url = "" + hapikey

    querystring = {"limit": "100","properties": all_properties, "paginateAssociations": "false", "archived": "false"}
    all_contacts = pd.DataFrame()

    has_more = True
    while has_more:
        response = requests.get(url=url, params=querystring, headers=headers)
        response = response.json()
        contacts = response['results']
        contacts = json_normalize(contacts)
        if len(contacts) == 0:
            return contacts
        contacts.columns = contacts.columns.str.replace('properties.', '')

        if 'paging' in response:
            offset = response['paging']['next']['after']
            querystring['after'] = offset
            has_more = True
            has_more = False
        frames = [all_contacts, contacts]
        all_contacts = pd.concat(frames, sort=False, ignore_index=True)

    return all_contacts

Segment Customers

There can be a lot of emails in any HubSpot or CRM that has been around for two years or more. Sadly, the HubSpot Email API isn’t very fast in the grand scheme of its solutions.

That’s why we want to only segment our customers out so we reduce the time it takes to run this script. Luckily, this is pretty easy.

We simply need to get all the contacts with the method defined above. I wrote it so it would return a dataframe. From there, we will segment on lifecycle stage. We can accomplish this by calling the following schema:

dataframe = dataframe[dataframe[field] == field_value]

This returns a filtered dataframe where all the values have only that field.

contacts = get_all_contacts()
contacts = contacts[contacts['lifecyclestage'] == 'customer']

Pull Customer Emails

Next, we want to use the Engagement API, HubSpot’s activity endpoint, to pull all the emails from HubSpot on only our customers we segmented earlier.

All we need to pull the customer emails is the “VID”. This is HubSpot’s way of signaling a HubSpot contact ID. I get distracted if my dataframes have too much information so I call the “filter” method to filter by only the values I need in the dataframe. Since that is just vid in this case, we’ll filter by it.

contacts = contacts.filter(['vid']);
def read_all_engagements():
    offset = 0
    has_more = True
    headers = {"Content-Type": "application/json", "Accept-All": "application/json"}
    all_engagements = pd.DataFrame()
    needed_data = ['associations.companyIds', 'associations.contactIds',
                   'associations.contentIds', 'associations.dealIds', 'engagement.createdAt', 'engagement.createdBy',
                   '', 'engagement.ownerId', 'engagement.timestamp', 'engagement.type',
                   '', '']
    while has_more:
        url = '' + hapikey + '&limit=250' + '&offset=' + str(offset)
        response = requests.get(url=url, headers=headers)
        response = response.json()
        offset = response['offset']
        has_more = response['hasMore']
        results = response['results']

        engagements = json_normalize(results)
            engagements = engagements.filter(needed_data)
        except KeyError:
        engagements = engagements[engagements['engagement.type'] != 'TASK'].reset_index()
        frames = [all_engagements, engagements]
        all_engagements = pd.concat(frames, sort=False, ignore_index=True)
    return all_engagements

Segment Incoming Emails

We will now segment out any emails coming from our account managers so we know it is coming from our customers. Our account managers should always be cheery so we don’t need to measure their sentiment!

Yet again, we will filter the dataframe by the column “engagement.type” and only include incoming emails which are marked in caps.

engagements = read_all_engagements()
engagements = engagements[engagements['engagement.type'] == 'INCOMING_EMAIL']

Score Each Email

Next, we will use the TextBlob object from Python to measure the polarity and subjectivity of the email.

from textblob import TextBlob

Polarity implies the emotion that the message indicates where -1 means negative and 1 means positive.

Subjectivity indicates how opinionated a message is. 0 is not very opinionated while 1 is very opinionated.

engagements = engagements.filter(['', 'metadata.text'])
all_engagements = pd.merge(left=contacts, right=engagements, left_on='email', right_on='')
all_engagements['metadata.text'] = all_engagements['metadata.text'].fillna('')
all_engagements['sentiment'] = all_engagements['metadata.text'].apply(lambda text: TextBlob(text).sentiment)
all_engagements['polarity'] = all_engagements['sentiment'].apply(lambda sentiment: sentiment[0])
all_engagements['subjectivity'] = all_engagements['sentiment'].apply(lambda sentiment: sentiment[1])

Aggregate Their Scores

In this step, we want to simply take an average of the sentiment and subjectivity values from each one of their emails. While this is not a perfect solution, it will get us started to asking the questions on how to accurately use NLP and sentiment analysis to build account management at scale.

all_engagements = all_engagements.filter(['email','id', 'polarity', 'subjectivity'])
all_engagements = all_engagements.groupby(['id'])['polarity'].mean().reset_index(drop=False)
all_engagements.columns = ['id', 'polarity']
all_engagements = all_engagements.sort_values(['polarity'], ascending=False)
Scores per contact on average polarity and subjectivity

Assign Groups According to Scores

This is a very basic article so we’re going to focus on simply marking whether the relationship with our customer is Positive, Negative, or Neutral.

In this example, I’ll simply use any positive over .1 as Positive and any negative less than -.1 to create a Neutral criteria. In a perfect world with more data, I would tier the positivity and negativity using tiering clusters.

def sentiment_score(polarity):
    if polarity > .1:
        return 'Positive'
    elif polarity > -.1:
        return 'Neutral'
        return 'Negative'
all_engagements['sentiment_score'] = all_engagements['polarity'].apply(lambda polarity: sentiment_score(polarity))
all_engagements = all_engagements.filter(['id', 'sentiment_score'])

Create a Sentiment Score in HubSpot

Now that we have a score on the sentiment on all the emails from our customers, we can import it back into HubSpot as a “Sentiment Score”.

We need to create a property under each contact called “Sentiment Score”. You can either do this manually pretty easily or just run a line of code. I wonder which one I’d rather do…

def create_contact_dropdown(contact_property, options):
    url = '' + hapikey

    reordered_options = []
    display_order = 0
    for value in options:
        name = value
        sublabel = value
        new_option = json.dumps(
                'readOnly': False,
                'doubleData': None,
                'description': None,
                'label': sublabel,
                'value': name,
                'hidden': True,
                'displayOrder': display_order,
        new_option = json.loads(new_option)
        display_order += 1
    payload = json.dumps(
            "name": contact_property,
            "label": contact_property.replace('_', ' ').capitalize(),
            "groupName": "hubspot_integration",
            "description": "",
            "formField": False,
            "type": "enumeration",
            "fieldType": "select",
            "options": reordered_options
    response =, data=payload, headers=headers)
create_contact_dropdown("sentiment_score", ['Positive', 'Neutral', 'Negative'])

Import Our Customer Happiness Scores

Finally, we simply need to update our contacts according to the values we created.

In order to update our contacts quickly whether we have 100 or 10,000 contacts, we should use the HubSpot batch update method that does updates 10 at a time.

It is always better to use batch requests when available since requests are incredibly slow and can take 30 minutes in some cases.

batch_update_url = "" + hapikey
all_updates = []
for index, row in all_engagements.iterrows():
    payload = {
                'sentiment_score': row['sentiment_score']
        'id': row['id']
batch_size = 10
offset = 0
while offset < len(all_updates):
    if offset + batch_size > len(all_updates):
        batch = (all_updates[offset:])
        batch = (all_updates[offset:offset+batch_size])
    payload = json.dumps({
        'inputs': batch
    response = requests.request("POST", batch_update_url, data=payload, headers=headers)
    offset += batch_size
Sentiment Score now in HubSpot

Next Steps to Measure Customer Happiness

While this process will work to measure happiness and negativity in your customer’s emails, it has its blind spots. It does not take into account who the customer is talking negativity or positively about. In cases where a customer loves a feature, but hates another, it will simply level out those two statements.

We will talk about that in the next article where we dive deeper into how to accurately automate or assist your account management teams.