All tutorials

Replicate Salesforce data to BigQuery

Learn how to configure your Salesforce account to replicate leads data to BigQuery.

No items found.

By integrating marketing, sales, service, and IT teams into one platform, Salesforce has transformed the way businesses operate. However, in today’s changing data landscape, you cannot simply rely on Salesforce itself to derive maximum value for your business. In fact, you need a combination of Salesforce and a powerful data warehouse like Google’s BigQuery so you can open the doors to more robust analytical insights and in turn, drive increased revenue and growth.

In this recipe, we will explore the benefits of why you should consider moving your Salesforce data into a data warehouse, and then demonstrate how you can easily leverage Airbyte to do the job.

Why centralize Salesforce data to a data warehouse?

Let’s look at some of the reasons why you might want to centralize your Salesforce data into a data warehouse.

Out-of-the-box reporting is not sufficient and difficult to use 

While Salesforce is a powerful tool, its default data capabilities are limited. There are hard restrictions on reporting and dashboard capabilities, which makes it inflexible to use for different scenarios. Moreover, when analyzing historical data in Salesforce, the timing of the snapshots has to be planned in advance and cannot be changed on the fly.  On top of that, there are other limitations when it comes to large reports with over 100 fields.

Customizing Salesforce can be very expensive and time-consuming

Customizing Salesforce workflows is typically a multi-step process involving hiring certified Salesforce consultants, building an in-house development team, and ultimately collaborating. Once the solution is built, your staff will need to be trained to use it effectively. All of this not only takes time and effort, but can also be quite expensive. The cost of highly customized Salesforce implementations can range from $10,000 to well over $50,000 depending on the complexity of the company's internal processes.

Salesforce's pricing limits access

Salesforce's pricing model is based on pay-per-user, and purchasing an annual license is required to get started. This policy will likely force you to limit the number of Salesforce users in your enterprise to bring down costs. With limited possibilities of sharing, collaboration can be challenging - one of the reasons why Salesforce was implemented to begin with. For example, with this limitation, reports can only be shared with paid Salesforce users. Therefore a C-level executive who might only need to view a report will be unable to do so without paying a license fee.

Why use Airbyte to extract Salesforce data

Now that you realize the value of moving Salesforce data into a data warehouse, how do you implement that process?

In most cases, many businesses start by writing custom ETL scripts but the truth is that they don't succeed with them.  Manually writing scripts for this will slow down your project's velocity. Moreover, if these scripts are typically brittle — constant care and time need to be devoted to keep these scripts running. With automation, you can ditch the complex hardcoded scripts that handle data wrangling and scheduling, enabling your teams to work efficiently. Thanks to Airbyte, connectors are open-source and easily customizable. They help you seamlessly integrate your data that is residing across many business apps and databases in your data warehouse. With Airbyte, you get full control over your data in an effortless way. Data is deduplicated and can be transformed on the fly based on custom business logic rules with SQL. Airbyte has built-in scheduling, orchestration, and monitoring. Airbyte's ready-to-go scheduler enables you to replicate data either fully or in an incremental fashion, removing any manual intervention and allowing your developers to focus on more critical matters. 

Now, let’s get started with the how-to’s of using Airbyte to replicate data from Salesforce to BigQuery.

Prerequisites

Below are the prerequisite tools you’ll need to get started on backing up your Salesforce data to your Google BigQuery.

  1. You’ll need to get Airbyte to do the data replication for you. To deploy Airbyte, follow the simple instructions in our documentation here.
  2. You will need a Salesforce account.
  3. You also need a Google Cloud Platform account with the BigQuery service enabled.

Salesforce accounts can be created by signing up at the link here. Note: You will need at least a developer account for Airbyte to be able to access Salesforce REST APIs. To set up and create a GCP account with the BigQuery service enabled, follow the instructions at the link here.

Configure your Salesforce account 

The first step is to configure your Salesforce account to allow external applications to connect and access data from the account. Login to Salesforce and go to Setup > Create > Apps and select New in the Connected Apps section.

Fill in the required fields.

Next, enable the OAuth settings, enter https://login.salesforce.com/ under Callback URL and select the following Oauth Scopes.

Save the connected app. Once saved you should be able to see your Consumer Key and Consumer Secret. Make a note of these values which will be used later.


Next you will have to allow access to the connected apps and generate an authorization code which will then be used to get your refresh token which is required to configure Airbyte for Salesforce. 

Login to Salesforce and in a new browser tab, go to the following URL: https://<YOUR_INSTANCE>.salesforce.com/services/oauth2/authorize?response_type=code&client_id=<CONSUMER_KEY>&redirect_uri=https://login.salesforce.com/

<YOUR_INSTANCE> can be found in your Salesforce account verification email. After hitting the URL, you will be prompted to allow access.  

Copy the value following code in the URL. This will be used as the authorization code in the next step.


Next you will need to make a POST request to https://<YOUR_INSTANCE>.salesforce.com/services/oauth2/token with the following parameters. 

Set client_id to <YOUR_CONSUMER_KEY>, code to the authorization code from the previous step, grant_type to authorization_code, client_secret to <YOUR_CONSUMER_SECRET> and the redirect_url to https://login.salesforce.com/ 

The response will contain a value for refresh_token which will be required to configure Airbyte.

Add sample data to Salesforce

In the next step we will generate and import sample data into Salesforce. For this example we will create Leads data. Go to https://www.mockaroo.com/ and create the Leads dataset with the following fields. 

This will generate a CSV with sample data.

Next login to Salesforce and go to Setup > Data > Data Import Wizard. Choose the Leads object, upload the CSV and follow the Wizard steps to upload the data.

Once complete, you can view the data by going to the Salesforce homepage > Leads.

Set up Google BigQuery

The next step is to create a BigQuery dataset and generate a credentials JSON file required to configure Airbyte. Login to your GCP account and go to BigQuery. 

Select Create a Dataset and give and add the required information. 

Next, go to the Service Account page and Create a new Service account. Enter the required information. Grant the service account the BigQuery Data Owner role.

Once the service account is created, go to Keys > Add Key and create a new key. You will be prompted to download a credentials file. Download the JSON version which will be used later.

Set up Salesforce as your Airbyte source

Next, set up the connection for the source, which will be your Salesforce account. Under client_id, enter <YOUR_CONSUMER_KEY>, under client_secret enter <YOUR_CONSUMER_SECRET> and under refresh_token, enter the refresh token obtained at the end of Step 1.


Set up BigQuery as your Airbyte destination

Next,  set up Airbyte to use BigQuery as the destination for the data replication. Enter your GCP project-id, dataset ID, and dataset location. Copy the contents of the credentials JSON.


Create a Salesforce to BigQuery connection

Once configured, a list of Salesforce streams that data can be backed up.

Scroll through and select the Leads stream that contains the sample data.  

Note: You need to have billing enabled in your BigQuery service for normalization to work without any errors.

Once configured you can manually trigger a sync. Once complete, your data will be backed up to BigQuery.

You can see two tables created in your dataset. The Lead table contains the normalized data.

You can also view your data by going to the dataset and going to the _airbyte_raw_lead table.

You can test out the incremental Sync Append by generating another sample data file and uploading it to Salesforce. Running the sync again will add the additional items to BigQuery.

In this case, an additional 1065 items were added. You can verify the total row count in your dataset by running the following in the BigQuery UI:


Wrapping up

Now that you have replicated your Salesforce data to Google BigQuery, you can leverage the rich analytical capabilities of BigQuery to extract more insights from this data. Good luck, and we wish you all the best using Airbyte! 

Join the conversation at Airbyte’s community Slack Channel to share your ideas with over 1000 data engineers and help make everyone’s project successful.

Open-source data integration

Get all your ELT data pipelines running in minutes with Airbyte.

Similar use cases

Copy Salesforce data to Snowflake

Learn how to use Airbyte Cloud to sync CRM data in Salesforce to Snowflake.