Learn how to use a data integration tool (Airbyte) and a data transformation tool (dbt) to create a single customer view on a data warehouse (BigQuery).
A single customer view is a holistic view of every interaction that a customer has with a brand. A typical online business knows a surprising amount of details about its customers. For example, an e-commerce store knows when you visit their site, which products you look at, how many orders you place, and how often you have called customer support. It also knows whether or not you open a promotional email or click on one of its ads on social networks. However, tracking all this information and using it to make better decisions are two separate things.
This article will give the case for building a single customer view in your data warehouse and how you can do that easily with open-source tools. Apart from setting up a data warehouse (BigQuery), we will use an open-source data integration tool (Airbyte) and an open-source data modeling tool (dbt). By the end of this tutorial, you will have two modeled tables with all your customers’ data, ready to be used by the business.
Having a single customer view can empower marketers, improve the customer service experience and enable predictive analytics. Here are some example use cases.
Choosing the right message or offer to send to a customer is essential for marketing success. Without data, this is extremely difficult and expensive to pull off. With raw data, though, a marketer can connect different data sources and manually extract the target audience for the campaign. However, this is often time-consuming, repetitive, and error-prone. The marketer is better off focusing on doing marketing work rather than data work.
Rather than manually extracting data from multiple sources and writing complex queries for each use case, with a single customer view the marketer can easily use SQL or any BI/Marketing tool to create a customer segment. In the examples above, they can segment out people in the age range of new parents (20 - 40) and have bought baby-related categories in the last 90 days. They can further refine the filters to experiment with the best segment for a particular campaign.
The previous use case is time-based, meaning that you send out communications to customers in batches. What if, instead, our marketer wants to create event-based campaigns? For instance, if a customer spends $1,000 with the company, send out a thank you note with buying statistics. You can't handle this manually. You would need a data pipeline and a marketing automation tool to deliver.
The single customer view makes it straightforward to integrate with other marketing systems to deliver automated marketing. Marketers can set up campaigns, let them run autonomously, and monitor campaign results and costs.
Imagine when customers contact customer service through chat or phone, and the agent already has all the information. With an automated routing system, customer service agents can be even more efficient.
“Hello, I see that you are calling to refund a recent transaction. Are you referring to order X with item Y?”. An occasion like this can be an excellent chance to deliver a “wow” customer service experience. “I see that you have enjoyed buying item Z with us monthly for the past year. As a token of our appreciation, I have added a 10% discount coupon for your next purchase”.
Use cases like this require data from your ordering system, website tracking, and probably more systems integrated with the call center software.
With enough customer data, there are many predictive analytics use cases that your data science team can deliver. Some examples include:
All of the use cases mentioned above revolve around customer data. Having a holistic single customer view in your data warehouse will make data scientists’ jobs much more manageable.
There are proprietary technologies out there that allow you to achieve a single customer view with more functionalities. They are often referred to as Customer Data Platform (CDP). You can choose CDP solutions from big tech companies like Adobe, Salesforce, and Microsoft or specialized companies like Segment, mParticle, and Treasure Data.
However, there are some downsides of using a CDP for building a single customer view. For example:
It is easier than ever to build a single customer view in your data warehouse for your business today with open-source tools. To build a single customer view we need a data warehouse (we choose BigQuery), a data integration tool (Airbyte) and a transformation tool (dbt).
Here are the prerequisites for following along with this article.
Before diving into the technical hands-on, you need to understand the requirements of this use case. To have a unified view of a customer across your business, you need to track customers when they interact with different systems and processes.
Merging all this information together is often called identity resolution, matching various customers’ identifiers across touch points. For example, in one system, a customer may use his phone number to log in. But in another, they may use their email address or their passport number. Without a connection between the two identifiers, you have no way of knowing that they belong to the same person.
Discussing identity resolution is outside the scope of this article, but it ranges from the basic one-to-one mapping of identifiers to using machine learning models to do that. If you have systems that you want to incorporate into your customer journey, but you don’t have a way of connecting to other systems yet, it is good to start thinking about capturing more data.
The first step to build a single customer view is to integrate all the customer data in a data warehouse. With well over 100 connectors available today, Airbyte is one of the best tools for customer data integration. Airbyte is open-source so can run it in your own infrastructure to protect customer data. A typical real-world business can have the following sources set up in Airbyte:
We will be using a synthesized e-commerce dataset for this article, so you can use dbt seed command instead of Airbyte to load the data. Feel free to follow along with your dataset if you have access to one. I will show how you can add a Shopify connector and load data easily with Airbyte.
Start an Airbyte instance locally by simply running:
Visit the Airbyte deployment using your browser of choice at http://localhost:8000. Finish the onboarding process to get to the home screen and set up a Shopify source.
Next, create a BigQuery destination with the following configuration.
Finally, choose all the tables to replicate, select the sync frequency, and start syncing. Just like that, you will create a pipeline to extract data from a source and write to a unified data warehouse.
For this article, I have used the Faker Python library to synthesize an e-commerce company’s data. We have the following data sources and tables.
dbt is an open-source data transformation tool for your data warehouse. dbt will be used to create the the single customer view using SQL queries from your separate customer data sources. You can then integrate dbt transformations to run on Airbyte after extracting the data or you can run dbt standalone like in this article. To keep it simple, I will only use one e-commerce order source. But adding other systems only requires adding a new connection in Airbyte and performing identity resolution in data modeling phase.
First clone this dbt project from Github by running:
Before running the dbt project, you need to set up the profiles.yml file to configure connection and credentials to your BigQuery dataset. You can look at the sample_profile.yml file and the BigQuery profile dbt documentation on how to do that. With the latest version of dbt installed (0.21.0 at the time of this writing), you will only need to run one command:
Let’s take a look at the tables. The first one, dim_user_attributes contains all dimensions (mutable) of any users who have signed up with the company.
I built several staging tables grouped by customer_id and calculated some metrics around that. For example, in the order_items model, I calculated metrics like total revenue and order count. I calculated metrics like most purchased category names and total revenue per item in the products model. Then I joined that with the customers table to get customer demographic information.
The result is a table with all of this information.
With this table, a marketer can easily filter out customers whose most purchased category was jeans, is female, and has spent more than $500 to do a targeted campaign. Or she can set up an automated campaign based on any of the values in this table.
The next table is fct_user_events_unioned (immutable), which contains user orders and web events.
With this table, you can see see the customer journey. The idea here is to use a timestamp column and provide some context around a customer event. An event can be an order, a visit to the website, or returning of a purchase. It can also be a call to customer support or clicking on a link via an email campaign.
Let’s look at some example events of the customer journey. This customer first visited the website on March 18th and ordered some products. They then visited the website and viewed some products in April but only returned to buy in July. Note that event_value is the items they purchased in an order. Here is your single customer view available on your data warehouse to empower all business units.
With Airbyte your can add a customer data source in minutes, and with dbt you can easily build a single customer view in your data warehouse. This view can power many use cases, as discussed above. You can use the code presented in this article as a starting point and add other customer data sources to suit your use case.
From here, you can plug your single customer view with BI tools. You can also add reverse-ETL to integrate your customer data from your data warehouse back to your marketing tools, customer support tools, and your company’s custom applications.
Tuan Nguyen is the CTO of Joon Solutions, a Data as a Service company. Tuan has built and managed data science teams, analytics infrastructure and analytics use cases.
Get all your ELT data pipelines running in minutes with Airbyte.
Learn how to detect data quality issues on your Airbyte syncs with re_data.
Set up PostgreSQL CDC in minutes using Airbyte, leveraging a powerful tool like Debezium to build a near real-time EL(T).