Warehouses and Lakes
Marketing Analytics

How to load data from Google Analytics 4 (GA4) to BigQuery

Learn how to use Airbyte to synchronize your Google Analytics 4 (GA4) data into BigQuery within minutes.

TL;DR

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps:

  1. set up Google Analytics 4 (GA4) as a source connector (using Auth, or usually an API key)
  2. set up BigQuery as a destination connector
  3. define which data you want to transfer and how frequently

You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud.

This tutorial’s purpose is to show you how.

What is Google Analytics 4 (GA4)

Google Analytics is a web service that provides integrated analytical tools and essential statistics to aid companies in understanding data and performing search engine optimization (SEO) to improve marketing. Google Analytics provides a range of free marketing tools that help businesses track their website performance and gather insights from visitors. Analytics’ integration with other Google solutions like Google’s advertising and publishing products promotes a seamless workflow that saves time and increases efficiency and insights.

What is BigQuery

BigQuery is an enterprise data warehouse that draws on the processing power of Google Cloud Storage to enable fast processing of SQL queries through massive datasets. BigQuery helps businesses select the most appropriate software provider to assemble their data, based on the platforms the business uses. Once a business’ data is acculumated, it is moved into BigQuery. The company controls access to the data, but BigQuery stores and processes it for greater speed and convenience.

Integrate Google Analytics 4 (GA4) with BigQuery in minutes

Try for free now

Prerequisites

  1. A Google Analytics 4 (GA4) account to transfer your customer data automatically from.
  2. A BigQuery account.
  3. An active Airbyte Cloud account, or you can also choose to use Airbyte Open Source locally. You can follow the instructions to set up Airbyte on your system using docker-compose.

Airbyte is an open-source data integration platform that consolidates and streamlines the process of extracting and loading data from multiple data sources to data warehouses. It offers pre-built connectors, including Google Analytics 4 (GA4) and BigQuery, for seamless data migration.

When using Airbyte to move data from Google Analytics 4 (GA4) to BigQuery, it extracts data from Google Analytics 4 (GA4) using the source connector, converts it into a format BigQuery can ingest using the provided schema, and then loads it into BigQuery via the destination connector. This allows businesses to leverage their Google Analytics 4 (GA4) data for advanced analytics and insights within BigQuery, simplifying the ETL process and saving significant time and resources.

Step 1: Set up Google Analytics 4 (GA4) as a source connector

1. Go to the Google Analytics website and log in to your account.

2. Click on the Admin button in the bottom left corner of the screen.

3. In the Admin section, click on the Account User Management option.

4. Click on the + button to add a new user.

5. Enter the email address associated with your Airbyte account and select the permissions you want to grant.

6. Click on the Add button to save the new user.

7. Go to the Airbyte website and navigate to the Sources page.

8. Click on the Google Analytics source connector.

9. Enter the credentials for your Google Analytics account, including the email address and password associated with the account.

10. Click on the Test button to ensure that the connection is working properly.

11. If the test is successful, click on the Save button to save the credentials and complete the setup process.

12. You can now use the Google Analytics source connector to import data from your Google Analytics account into Airbyte.

Step 2: Set up BigQuery as a destination connector

1. First, navigate to the Airbyte dashboard and select the "Destinations" tab on the left-hand side of the screen.

2. Scroll down until you find the "BigQuery" destination connector and click on it.

3. Click the "Create Destination" button to begin setting up your BigQuery destination.

4. Enter your Google Cloud Platform project ID and service account credentials in the appropriate fields.

5. Next, select the dataset you want to use for your destination and enter the table prefix you want to use.

6. Choose the schema mapping for your data, which will determine how your data is organized in BigQuery.

7. Finally, review your settings and click the "Create Destination" button to complete the setup process.

8. Once your destination is created, you can begin configuring your source connectors to start syncing data to BigQuery.

9. To do this, navigate to the "Sources" tab on the left-hand side of the screen and select the source connector you want to use.

10. Follow the prompts to enter your source credentials and configure your sync settings.

11. When you reach the "Destination" step, select your BigQuery destination from the dropdown menu and choose the dataset and table prefix you want to use.

12. Review your settings and click the "Create Connection" button to start syncing data from your source to your BigQuery destination.

Step 3: Set up a connection to sync your Google Analytics 4 (GA4) data to BigQuery

Once you've successfully connected Google Analytics 4 (GA4) as a data source and BigQuery as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select Google Analytics 4 (GA4) from the dropdown list of your configured sources.
  3. Select your destination: Choose BigQuery from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific Google Analytics 4 (GA4) objects you want to import data from towards BigQuery. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from Google Analytics 4 (GA4) to BigQuery according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your BigQuery data warehouse is always up-to-date with your Google Analytics 4 (GA4) data.

Use Cases to transfer your Google Analytics 4 (GA4) data to BigQuery

Integrating data from Google Analytics 4 (GA4) to BigQuery provides several benefits. Here are a few use cases:

  1. Advanced Analytics: BigQuery’s powerful data processing capabilities enable you to perform complex queries and data analysis on your Google Analytics 4 (GA4) data, extracting insights that wouldn't be possible within Google Analytics 4 (GA4) alone.
  2. Data Consolidation: If you're using multiple other sources along with Google Analytics 4 (GA4), syncing to BigQuery allows you to centralize your data for a holistic view of your operations, and to set up a change data capture process so you never have any discrepancies in your data again.
  3. Historical Data Analysis: Google Analytics 4 (GA4) has limits on historical data. Syncing data to BigQuery allows for long-term data retention and analysis of historical trends over time.
  4. Data Security and Compliance: BigQuery provides robust data security features. Syncing Google Analytics 4 (GA4) data to BigQuery ensures your data is secured and allows for advanced data governance and compliance management.
  5. Scalability: BigQuery can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding Google Analytics 4 (GA4) data.
  6. Data Science and Machine Learning: By having Google Analytics 4 (GA4) data in BigQuery, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
  7. Reporting and Visualization: While Google Analytics 4 (GA4) provides reporting tools, data visualization tools like Tableau, PowerBI, Looker (Google Data Studio) can connect to BigQuery, providing more advanced business intelligence options. If you have a Google Analytics 4 (GA4) table that needs to be converted to a BigQuery table, Airbyte can do that automatically.

Wrapping Up

To summarize, this tutorial has shown you how to:

  1. Configure a Google Analytics 4 (GA4) account as an Airbyte data source connector.
  2. Configure BigQuery as a data destination connector.
  3. Create an Airbyte data pipeline that will automatically be moving data directly from Google Analytics 4 (GA4) to BigQuery after you set a schedule

With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.

We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

At Airbyte’s content team, analyzing Google Analytics (GA) is essential for the visibility of our website’s performance. Currently, we export our Google analytics data to a Google Data Studio report to track our content KPIs. Still, it doesn’t provide us with the capabilities we would like.

If you’re familiar with it, you know that GA does not have an ideal format for in-depth analysis. Hence, moving the data to a data warehouse becomes necessary, especially when combining GA data with other sources like Google Search Console or other SEO data.

Since our analytics team is already making extensive use of BigQuery, we thought it’s a great idea to use it for GA, therefore centralizing all our data. The Google Analytics to BigQuery integration will serve three primary purposes: querying raw GA data, combining data from other sources, and exporting data for visualization.

I decided to use our newest product: Airbyte Cloud, to set up the integration! I have to say I was impressed with how easy it is to configure it. I had never created a data integration this fast before, and the best part is that I didn’t need to code.

In addition to providing hosting and management, Airbyte Cloud enables companies to have multiple workspaces and access management for their teams. It also supports OAuth authentication to help less technical users to connect their tools.

But why would you use Airbyte if GA and BigQuery integrate natively? Well, there are a couple of reasons. The BigQuery Data Transfer Service (DTS) cannot be used to integrate services outside of Google. Airbyte can be integrated with practically any service. Also, BigQuery DTS goes only in one direction: into BigQuery, which means you’ll need to resort to third-party solutions to move data further. Airbyte can replicate data into and from BigQuery.

Airbyte provides the advantage of being the only service you need for all your data integrations, making data management a lot easier.

Now that you have all the context let’s see how to export Google Analytics data to BigQuery!



{{COMPONENT_CTA}}

Prerequisites

Set up a Google Analytics source

Note that Airbyte’s Google Analytics source is currently in Beta, which means the engineering team is still working on new features and bug fixes. It’s also worth mentioning that Airbyte’s GA connector is based on the Google Analytics Reporting API v4, and it only works with Universal Analytics properties – GA4 properties will be supported soon. You can keep track of progress in this Github issue.

Setting up a GA source on Airbyte Cloud is very easy. Just go to the “Sources” tab and create a new source. As the source type, select Google Analytics. Then, a menu will appear, and you’ll need to fill in the following parameters.

Name

Write a name to help you identify the source in your workspace. You may have more than one GA source if you are a marketing agency or marketing SaaS tool, so selecting a descriptive name is convenient.

Credentials

Currently, there are two options to authenticate: via Google using OAuth or using the JSON key of your GCS’s service account. I used OAuth since it only requires a few clicks. OAuth is a standard behind the scenes to ensure the secure handling of permissions.

Start date

Select when you want to start replicating data.

View ID

Input the ID for the Google Analytics View you want to fetch data from. If you go to Google Analytics Account Explorer, you’ll see it under the “View” column.

Custom reports

The GA connector can replicate Custom Reports as well. We will explore Custom Reports in a dedicated section of this article. For now, you can leave it blank as it's an optional parameter.

Data request time increment in days

The Data request time increment in days parameter allows you to specify the number of days that data each chunk contains. It doesn’t change how the data at the destination is presented, but it can potentially affect fetch speed – in general, the bigger the value, the faster the fetch.

This parameter is also relevant to reducing sampling. If you get a warning in the sync logs about your data being sampled, you can try using a smaller window. We talk more about sampling in the Challenges section.

Now your source is set up. Check if the connection can be established, and that’s it! You’re ready to export Google Analytics data. For more information about the Google Analytics V4 connector, check Airbyte’s documentation.

Set up a BigQuery destination

Now it’s time to set up the destination of your data, BigQuery. To do that, go to “Destinations” and create a new one. As destination type select BigQuery, and the following parameters will appear.

Name

You can write a name to help you identify the destination in your workspace.

Default Dataset ID

Indicate the dataset ID where your tables will be stored in BigQuery. If you don’t have a dataset created yet, check GCS’s guide. It’s important to note that the default dataset will be used if you don’t specify a namespace in your connection settings.

Project ID

The ID of the project containing the BigQuery dataset defined above.

Loading method

As a loading method for BigQuery, you can select between “Standard Inserts” and “GCS Staging” options. The “Standard Inserts” option is inefficient and should only be used for testing. If you want to sync data regularly, we recommend setting up GCS Staging.

Credentials JSON

You may also need a credentials JSON key. To create it, check out GCS's helpful guide.

Dataset Location

The Dataset Location parameter is optional, but in my experience, you should set it to match the value of the “Data location” in your BigQuery dataset. Otherwise, you may run into errors when syncing the data.

Transformation Query Run Type

An advanced option indicates when the transformation queries – used for Airbyte’s basic normalization – are executed in BigQuery. If you don’t know which method is best for you, read more about interactive and batch query jobs.

Google Big Query Client Chunk Size

This is the BigQuery client's buffer size, from 1MB to 15MB. I used the default option, “15”.

You’re all set! Now set up the destination and wait for the connection to be tested.

Set up a Google Analytics to BigQuery connection

Now it’s time to set up the connection from Google Analytics to BigQuery. For that, go to “Connections” and add a new connection. Then, select the source and destination that you just created.

Airbyte will try to fetch the schema of your source and, if everything goes well, you’ll see a screen with the following options to set up your connection.

Replication frequency

As the name suggests, the replication frequency parameter indicates how often your data should be synced. I recommend setting it to “manual” to avoid triggering unnecessary syncs if you're just testing.

Destination namespace

When using BigQuery as a destination, the namespace refers to the dataset where your tables will be stored.

At Airbyte, we load data from multiple sources to a single BigQuery project. Because I don’t want data to be held in the default dataset specified in the destination configuration, I used the “Custom format” option. I indicated the dataset ID I wanted to use for GA’s data. In this case, I named it “google_analytics.” If you want to know more about namespaces, check Airbyte’s documentation.

Destination stream prefix

If you would like your BigQuery tables to have a prefix, you can indicate it in this parameter. In my case, I added the “ga_” prefix to identify the tables coming from GA quickly.

Next, you need to set up the streams. In this case, Airbyte automatically showed me all the available streams that have been configured for the GA connector. Each stream represents a table that will be created in BigQuery.

You can expand each stream to see the fields they contain, their data type, how they will be named in BigQuery, etc. You can also select which streams you want to sync by using the “Sync” toggle. I chose all of them.

Sync mode

One of the essential aspects that you should set up is the Sync mode, as this defines how the data will be replicated from the source to the destination.

To understand the implications of choosing a Sync mode for GA’s data, we should clarify the concept of golden data: Data is golden when the same request will not produce any new results if asked later.

In other words, golden is data that will not change – like page views from 2 months ago. Not-golden data still needs to be updated – today’s page views, for example.

You have the following sync options:

  1. Full refresh | Overwrite

Sync all data from GA and replace data in BigQuery by overwriting it.

Pros:

  • No duplicate data.

Cons:

  • Inefficient – it could cause problems if you need to sync large amounts of data or sync very often.
  1. Full refresh | Append

Sync all data from GA and append data in BigQuery.

Pros:

  • Keeps data history.

Cons:

  • Duplicate data – you might need to deduplicate it for reporting.
  • Fast-growing tables – since you’re appending the same reports with each sync.
  • Inefficient – it could cause problems if you need to sync large amounts of data or sync very often.
  1. Incremental | Append

Sync new records from GA and append data in BigQuery. The incremental cursor is the column “ga_date.” The connector adds a lookback window of 2 days to ensure any previously synced non-golden data is re-synced.

Pros:

  • Efficient since you’re only getting new records.

Cons:

  • It doesn’t have automatic duplicate data removal - you might need to deduplicate it for reporting.
  1. Incremental | Deduped + history

Sync new records from GA and append data in BigQuery to keep a history of changes. The incremental cursor is the column “ga_date.” Same as the previous sync mode, the connector adds a lookback window of 2 days to ensure any previously synced non-golden data is re-synced.

The final table is produced by de-duplicating the intermediate ones using a primary key, which you can select.

Pros:

  • Efficient since you’re only getting new records.
  • Keeps data history.
  • No duplicate data.

Cons:

  • There’s no suitable primary key for deduplication provided by the API – However, a composite primary key could be generated from the combination of several columns.

Each of the methods above has advantages and disadvantages, and your selection should depend on your specific use case. If you want to know more, you can read about sync modes.

At Airbyte, we selected the Full refresh | Overwrite sync mode because it provides the most advantages, like no duplicate data, with acceptable disadvantages since we currently don’t have large amounts of data. We sync every 24 hours, which is not so often.

Normalization

On top of replication, Airbyte provides the option to enable or disable an additional transformation step at the end of the sync, called basic normalization. In this case, the basic normalization maps the JSON object coming from the Google Analytics API to a format compatible with BigQuery. Hence, I suggest using the “Normalized tabular data” option.

Now you’re ready to start your first sync! To do that, click on “Set up connection” and the sync process should start running.

Meanwhile, you could monitor the logs to see what’s happening under the hood. If there were any errors, the logs would allow you to debug. If everything goes well, you should see a “Success” status for your sync.

In my case, replicating 102 MB took around 7 minutes.

Query Google Analytics data in BigQuery

After a successful sync, you should be able to see the data in BigQuery. If you go to your dataset, you’ll see that several sets of tables were created.

Raw tables

The tables named _airbyte_raw_* contain the raw JSON data that was fetched from the source. They only have 3 columns: _airbyte_ab_id, airbyte_emitted_at, _airbyte_data.

Normalized tables

Tables whose name matches the stream name are your final, normalized tables. The raw JSON data should have been mapped into columns.

Now, you could easily run queries or create reports on top of the normalized tables. For example, I was curious to know which of Airbyte’s pages had the most unique page views during Q1 2022, and there you have it!

Sync Google Analytics Custom Reports

Airbyte’s GA connector allows you to sync data from Custom Reports, too. A Custom Report is a report that you create: you pick the dimensions and metrics and decide how they should be displayed. Read more about creating custom reports here.

In the Custom Reports parameter of the GA source, you can input a JSON array of objects, using the following parameters:


{"name": string, "dimensions": [string], "metrics": [string]}

For example, I want to generate a report that shows how many users we have per day and where they are, and another report showing the users per city. I name these reports “new_users_per_day” and “users_per_city”.

Note that incremental syncs are supported only if you add “ga:date” dimension to your custom report.


[{"name": "new_users_per_day", "dimensions": ["ga:date","ga:country","ga:region"], "metrics": ["ga:newUsers"]}, {"name": "users_per_city", "dimensions": ["ga:city"], "metrics": ["ga:users"]}]

If you had already configured your connection without using Custom Reports in the source, it’s important to refresh the source schema in your replication settings, so the new streams are added.

Then, go ahead and run a sync. If everything goes well, you should see your reports as new tables in BigQuery!

Challenges

The following are some limitations that you may encounter when replicating data from Google Analytics (Beta version connector) to BigQuery using Airbyte Cloud.

Sampling

Google Analytics Reporting API v4 may limit the returned data by using sampling, which can affect the results of your data and is the reason why sometimes the values returned from the API do not match the web interface. Sampling is an approximation of the actual data.

The data request time increment in days parameter is used to specify the number of days to look back and can be used to avoid sampling. The smaller the number of days, the less chance of sampling.

When sampling occurs, a warning is logged to the sync log.

Cursor field

Currently, the cursor field for incremental syncs is set to “ga_date” for all streams and it cannot be changed to another column.

Conclusion

Taking your Google Analytics data to the next level by replicating it to a data warehouse can give you great advantages, and this tutorial has shown you that achieving it is not difficult and requires no code.

In this tutorial you have learned:

  • How Airbyte has benefits over using native Google replication solutions.
  • How to export Google Analytics data to BigQuery using Airbyte Cloud.
  • How to replicate Google Analytics custom reports to BigQuery.
  • Challenges and limitations you may encounter when using the Google Analytics connector.

Now that we have Google Analytics raw data in BigQuery, the next step for the content team will be to create some transformations using dbt and build a Metabase dashboard. Our plan is to include more sources, like Google Search Console, to enrich our data. We might cover our next steps in upcoming tutorials!

Join the conversation at Airbyte’s community Slack Channel to share your ideas with over 5000 data engineers and help make everyone’s project a success. With Airbyte, the integration possibilities are endless, and we can't wait to see what you're going to build!

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Frequently Asked Questions

What data can you extract from Google Analytics 4 (GA4)?

Google Analytics API provides access to a wide range of data related to website traffic and user behavior. The following are the categories of data that can be accessed through the API:  

1. Audience data: This includes information about the demographics, interests, and behavior of website visitors.  

2. Acquisition data: This includes data related to how visitors are finding the website, such as through search engines, social media, or referral links.  

3. Behavior data: This includes data related to how visitors are interacting with the website, such as which pages they are visiting, how long they are staying on the site, and which actions they are taking.  

4. Conversion data: This includes data related to the goals and conversions on the website, such as the number of purchases, form submissions, or other desired actions.  

5. E-commerce data: This includes data related to online sales, such as revenue, average order value, and product performance.  

6. Real-time data: This includes data related to the current activity on the website, such as the number of active users and the pages they are currently viewing.  

Overall, the Google Analytics API provides a wealth of data that can be used to gain insights into website performance and user behavior.

What data can you transfer to BigQuery?

You can transfer a wide variety of data to BigQuery. This usually includes structured, semi-structured, and unstructured data like transaction records, log files, JSON data, CSV files, and more, allowing robust, scalable data integration and analysis.

What are top ETL tools to transfer data from Google Analytics 4 (GA4) to BigQuery?

The most prominent ETL tools to transfer data from Google Analytics 4 (GA4) to BigQuery include:

  • Airbyte
  • Fivetran
  • Stitch
  • Matillion
  • Talend Data Integration

These tools help in extracting data from Google Analytics 4 (GA4) and various sources (APIs, databases, and more), transforming it efficiently, and loading it into BigQuery and other databases, data warehouses and data lakes, enhancing data management capabilities.