Finding email addresses automatically can be difficult. While there are many ways to source emails, finding emails automatically in bulk can be hard.
In this tutorial, I’m going to show you how you can use a 3rd party tool to pull thousands of emails at once from Hunter.io. This is commonly the #1` used email providers for data researchers on UpWork.
By using the Hunter API for this provider, you can cut out third party contractors that are much more expensive and start sourcing new leads in seconds, not days.
Affiliate Disclaimer: All Hunter links in this article are marked with my affiliate code. I do make a commission on any purchases using my link.
Go to create a Hunter.io account. In order to test the API, you do not need a paid plan; however, if you want to pull leads in bulk for your account lists, I recommend you pay for the Growth plan at $62 a month. It is pretty incredible if you’re looking to source leads and is really cheap compared to relying on someone on Upwork which can cost up to $200 for 400 leads and requires a lot more management.
On the Growth Plan, you can make 5,000 requests which actually can end up counting as 50,000 email addresses. You can purchase a Hunter subscription using this link.
Once you create a new account, click on your profile in the top right corner and click API. You will see the section below. Click on that eyeball icon and copy your API key.
As always, we want to start by importing our libraries into Python. For this project, we will need to import “pprint”, “requests”, “json_normalize”, and the “pandas” library.
PPrint will let us read the JSON in a way we understand. Requests will allow us to call the Hunter API. The json_normalize function comes from Pandas and lets us turn JSON into a DataFrame. Finally, we need to import the Pandas library so we can manage DataFrames.
import pprint import requests from pandas.io.json import json_normalize import pandas as pd
We can now build a function that finds email addresses automatically.
We want to create a function for the Hunter API that we can use repeatedly in any projects we build.
Please keep in mind that all code after this point will be indented within this function so we can call it in one line.
We will first create a few variables and make settings to make this process easier on us. We will set the width of our pandas library to 0 so it is displayed at full width.
We will set our offset as 0 as a default. We will set results to be 1 so we can start our For Loop. We will store the results count in here after each call to the Hunter API. Finally, we will set a limit to 30 per call.
pd.options.display.width = 0 offset = 0 results = 1 limit = 100
Next, we want to create a DataFrame that we can store all of the emails we will be find. Hunter returns the following data so we want to create a list including the following columns including Email, Phone Number, etc.
We will then create a new DataFrame called contacts and pass those columns in there.
contacts = pd.DataFrame()
We will next call the Hunter API to add each email found to our contacts DataFrame and keep moving through each page of contacts since the API has a limit of 100 contacts at a time.
while len(contacts) < results:
We will start with a while loop that checks if our DataFrame is still less than the total contacts that Hunter has to offer. We will continuously store that variable after each call. When we have collected all the emails, the while loop will stop.
We will call the Hunter API by creating a URL with the variables we created earlier. The URL will contain the domain we want emails for, our API key, the limit, and our page offset.
We will then call the requests library with a “get” method and store the JSON response.
url = 'https://api.hunter.io/v2/domain-search?domain=' + str(domain) + '&api_key=' + hunter_key + "&limit=" + str(limit) + "&offset=" + str(offset)
response = requests.get(url)
response = response.json()
When we print the response, we will see up to 100 email addresses along with other information like Phone Numbers, Job Titles, and First Names.
After you print out your response, you will notice that the Hunter API is split up into several segments. The two we care about are the “data” segment and the “meta” segment.
We will store this data inside of a variable called “data” and “meta”. We will access this data by receiving the value associated with the key “data” or “meta” in the response from Hunter. Remember that most JSON is returned as a dictionary so we want to pass in the strings that we are looking for.
data = response['data'] meta = response['meta']
The data we want from these variables are pattern, the email data, the results count, and the offset number.
pattern = data['pattern']
emails = data['emails']
results = meta['results']
offset = meta['offset']
Now that we stored this information, we can iterate through each email in the emails value of our response.
Next, we want to take the response that Hunter gives us and turn it into a DataFrame. Luckily, there is an amazing method that can turn a JSON into a DataFrame using the json_normalize method.
This will quickly turn the JSON into a DataFrame. We also want to add in the pattern from the Meta values. Additionally if you are searching multiple domains, ensure you are tracking the domain so you can easily filter it.
new_data = json_normalize(emails) new_data['Pattern'] = pattern new_data['Domain'] = domain
Finally, we want to create an array with our global DataFrame and our new DataFrame from the most recent response. Then, we’ll want to use the concat method to combine the two DataFrames. We will store the concatenated two DataFrames into a contacts DataFrame.
frames = [contacts, new_data] contacts = pd.concat(frames, sort=False, ignore_index=True)
This will iterate until there are no many more results in the Hunter API. Finally, we need to return our email data so we can store it.
Remember, we are creating this entire script in one function called
“find_contacts_for_domain”. We need to clean our data and return the contacts so that we can use them in whatever way we like.
We want to check if the DataFrame is empty and return null if it is. Finally, we’ll return the contacts DataFrame to be used in our other scripts.
contacts = contacts.fillna('').reset_index(drop=True) contacts.to_csv('test.csv', index=False) if contacts.empty: return return contacts
Finally, we want to call this function to see if it works!
It should return a DataFrame of all the email addresses associated with Hunter.io.
If you want to get started on this project, simply visit Hunter.io and signup for a free account to test it out for your industry.