How to load data from MongoDb to BigQuery

Learn how to use Airbyte to synchronize your MongoDb data into BigQuery within minutes.

Trusted by data-driven companies

Building your pipeline or Using Airbyte

Airbyte is the only open solution empowering data teams  to meet all their growing custom business demands in the new AI era.

Building in-house pipelines
Bespoke pipelines are:
  • Inconsistent and inaccurate data
  • Laborious and expensive
  • Brittle and inflexible
Furthermore, you will need to build and maintain Y x Z pipelines with Y sources and Z destinations to cover all your needs.
After Airbyte
Airbyte connections are:
  • Reliable and accurate
  • Extensible and scalable for all your needs
  • Deployed and governed your way
All your pipelines in minutes, however custom they are, thanks to Airbyte’s connector marketplace and Connector Builder.

Start syncing with Airbyte in 3 easy steps within 10 minutes

Set up a MongoDb connector in Airbyte

Connect to MongoDb or one of 400+ pre-built or 10,000+ custom connectors through simple account authentication.

Set up BigQuery for your extracted MongoDb data

Select BigQuery where you want to import data from your MongoDb source to. You can also choose other cloud data warehouses, databases, data lakes, vector databases, or any other supported Airbyte destinations.

Configure the MongoDb to BigQuery in Airbyte

This includes selecting the data you want to extract - streams and columns -, the sync frequency, where in the destination you want that data to be loaded.

Take a virtual tour

Check out our interactive demo and our how-to videos to learn how you can sync data from any source to any destination.

Demo video of Airbyte Cloud

Demo video of AI Connector Builder

Old Automated Content

TL;DR

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps:

  1. set up MongoDb as a source connector (using Auth, or usually an API key)
  2. set up BigQuery as a destination connector
  3. define which data you want to transfer and how frequently

You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud.

This tutorial’s purpose is to show you how.

What is MongoDb

MongoDB is a popular open-source NoSQL database that stores data in a flexible, document-based format. It is designed to handle large volumes of unstructured data and is highly scalable, making it a popular choice for modern web applications. MongoDB uses a JSON-like format to store data, which allows for easy integration with web applications and APIs. It also supports dynamic queries, indexing, and aggregation, making it a powerful tool for data analysis. MongoDB is widely used in industries such as finance, healthcare, and e-commerce, and is known for its ease of use and flexibility.

What is BigQuery

BigQuery is an enterprise data warehouse that draws on the processing power of Google Cloud Storage to enable fast processing of SQL queries through massive datasets. BigQuery helps businesses select the most appropriate software provider to assemble their data, based on the platforms the business uses. Once a business’ data is acculumated, it is moved into BigQuery. The company controls access to the data, but BigQuery stores and processes it for greater speed and convenience.

Integrate MongoDb with BigQuery in minutes

Try for free now

Prerequisites

  1. A MongoDb account to transfer your customer data automatically from.
  2. A BigQuery account.
  3. An active Airbyte Cloud account, or you can also choose to use Airbyte Open Source locally. You can follow the instructions to set up Airbyte on your system using docker-compose.

Airbyte is an open-source data integration platform that consolidates and streamlines the process of extracting and loading data from multiple data sources to data warehouses. It offers pre-built connectors, including MongoDb and BigQuery, for seamless data migration.

When using Airbyte to move data from MongoDb to BigQuery, it extracts data from MongoDb using the source connector, converts it into a format BigQuery can ingest using the provided schema, and then loads it into BigQuery via the destination connector. This allows businesses to leverage their MongoDb data for advanced analytics and insights within BigQuery, simplifying the ETL process and saving significant time and resources.

Step 1: Set up MongoDb as a source connector

1. First, you need to have a MongoDB instance running and accessible from the internet. You will also need to have the necessary credentials to access the database.

2. In the Airbyte dashboard, click on "Sources" and then click on "New Source."

3. Select "MongoDB" from the list of available sources.

4. In the "Connection Configuration" section, enter the following information:
- Host: The hostname or IP address of your MongoDB instance.
- Port: The port number on which your MongoDB instance is running.
- Username: The username you use to access your MongoDB instance.
- Password: The password you use to access your MongoDB instance.
- Authentication Database: The name of the database where your authentication credentials are stored.

5. Click on "Test Connection" to ensure that Airbyte can connect to your MongoDB instance.

6. If the connection is successful, click on "Save" to save your MongoDB source configuration.

7. You can now create a new pipeline and select your MongoDB source as the input. You can then configure the pipeline to transform and load your data into your desired destination.

Step 2: Set up BigQuery as a destination connector

1. First, navigate to the Airbyte dashboard and select the "Destinations" tab on the left-hand side of the screen.

2. Scroll down until you find the "BigQuery" destination connector and click on it.

3. Click the "Create Destination" button to begin setting up your BigQuery destination.

4. Enter your Google Cloud Platform project ID and service account credentials in the appropriate fields.

5. Next, select the dataset you want to use for your destination and enter the table prefix you want to use.

6. Choose the schema mapping for your data, which will determine how your data is organized in BigQuery.

7. Finally, review your settings and click the "Create Destination" button to complete the setup process.

8. Once your destination is created, you can begin configuring your source connectors to start syncing data to BigQuery.

9. To do this, navigate to the "Sources" tab on the left-hand side of the screen and select the source connector you want to use.

10. Follow the prompts to enter your source credentials and configure your sync settings.

11. When you reach the "Destination" step, select your BigQuery destination from the dropdown menu and choose the dataset and table prefix you want to use.

12. Review your settings and click the "Create Connection" button to start syncing data from your source to your BigQuery destination.

Step 3: Set up a connection to sync your MongoDb data to BigQuery

Once you've successfully connected MongoDb as a data source and BigQuery as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select MongoDb from the dropdown list of your configured sources.
  3. Select your destination: Choose BigQuery from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific MongoDb objects you want to import data from towards BigQuery. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from MongoDb to BigQuery according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your BigQuery data warehouse is always up-to-date with your MongoDb data.

Use Cases to transfer your MongoDb data to BigQuery

Integrating data from MongoDb to BigQuery provides several benefits. Here are a few use cases:

  1. Advanced Analytics: BigQuery’s powerful data processing capabilities enable you to perform complex queries and data analysis on your MongoDb data, extracting insights that wouldn't be possible within MongoDb alone.
  2. Data Consolidation: If you're using multiple other sources along with MongoDb, syncing to BigQuery allows you to centralize your data for a holistic view of your operations, and to set up a change data capture process so you never have any discrepancies in your data again.
  3. Historical Data Analysis: MongoDb has limits on historical data. Syncing data to BigQuery allows for long-term data retention and analysis of historical trends over time.
  4. Data Security and Compliance: BigQuery provides robust data security features. Syncing MongoDb data to BigQuery ensures your data is secured and allows for advanced data governance and compliance management.
  5. Scalability: BigQuery can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding MongoDb data.
  6. Data Science and Machine Learning: By having MongoDb data in BigQuery, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
  7. Reporting and Visualization: While MongoDb provides reporting tools, data visualization tools like Tableau, PowerBI, Looker (Google Data Studio) can connect to BigQuery, providing more advanced business intelligence options. If you have a MongoDb table that needs to be converted to a BigQuery table, Airbyte can do that automatically.

Wrapping Up

To summarize, this tutorial has shown you how to:

  1. Configure a MongoDb account as an Airbyte data source connector.
  2. Configure BigQuery as a data destination connector.
  3. Create an Airbyte data pipeline that will automatically be moving data directly from MongoDb to BigQuery after you set a schedule

With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.

We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!

In this section, you will use Clever Cloud to create a MongoDB instance. Once you sign up, Choose the option to create an add-on from your personal space.

From the available list of add-ons, choose the MongoDB add-on.

For the instance size, choose the DEV plan, which is free to use.

Enter an add-on name and select a region as shown below, and choose Next.

You should now have a new MongoDB database and the details to connect to it. To connect to the instance, copy the mongo CLI connection command from the Clever Cloud add-on dashboard shown below:

Replace “mongo” with “mongosh” before executing the command in your terminal, as demonstrated below:

You should now be connected to the PRIMARY replica of the MongoDB replica set (indicated by [primary] in the shell).

In the Airbyte connection to MongoDB, you will make use of the URL for the primary replica. This can be retrieved by running rs.isMaster().primary in the MongoDB shell, which will respond with a string in the format of [hostname]:[port]. In our case, the URL returned by this command is is n2-c2-mongodb-clevercloud-customers.services.clever-cloud.com:27017

For this demo,  download a sample “books” collection from a publicly available dataset. Then execute the following command:

mongoimport --host=n1-c2-mongodb-clevercloud-customers.services.clever-cloud.com --port=27017 --username= --password= --db= –collection=books books.json

Create an Airbyte MongoDB source by choosing sources from your Airbyte dashboard and clicking on the New source button. Then from the list of sources, choose MongoDB, and you should see a UI similar to the following:

To keep this tutorial simple, and for demonstration purposes only, in the above image we have selected a Standalone MongoDB instance. However, you may also consider selecting one of the alternative MongoDB configuration parameters if you wish to have a more resilient connection to your MongoDB cluster. Enter the Host, Port, Username, DB Name, and Password that were shown earlier in the Clever Cloud MongoDB configuration UI. Then choose Set up source.

To set up a BigQuery Airbyte destination, you need first to create a BigQuery dataset. Login in to your Google cloud dashboard. From the Welcome page, and click on the Run a query in BigQuery button as shown below:

From the 3-dot menu of your cloud project, choose the only option to create a dataset.


For the Dataset ID, choose a descriptive name like mongodb_dataset. Only alphanumeric names are allowed as dataset ID names.

Choose the dataset location from the menu and click Create Dataset. You should now be able to see a newly created dataset. Take note of the Dataset ID you just entered since you will need it to set up BigQuery as an Airbyte Destination.

Get a copy of the account keys to access our BigQuery project. Choose API & Services from the Quick Access menu in your cloud dashboard.

From the credentials menu, choose to create a new Service account.

Pick a name for your Service account and click Create and continue.

Next, you need to specify what resources can be accessed from this service account. From the available roles,  BigQuery Data Editor and BigQuery Job User should be sufficient. Alternatively BigQuery Admin should work (and is shown in the image below), but more specific security roles should be used in production systems.

Click Done, and you should see a new service account.

Choose this newly created service account email and add a new key by clicking the Add Key button.

From the Create private key pop-up, choose JSON.

Once you select Create, a new private key file will be downloaded to your system. The final step involves creating a new Airbyte destination.

From your Airbyte dashboard, choose Destinations, select New Destination, and pick BigQuery from the available options. You will then see a UI similar to the following:

Complete the fields as shown above, and for Service Account Key JSON you should copy the entire contents of the JSON file that you downloaded from BigQuery.

The final step for this tutorial involves building a connection between our newly setup SQL Server and our BigQuery warehouse. To achieve this, go to Connections and choose to set up a New connection. Select the source and destination that you just created, and Airbyte will show you the tables (referred to as streams in Airbyte) that can be synced.

Airbyte has detected the books collections that you imported into MongoDB. For sync mode, choose one from the available modes – for more information you may wish to consult the blog: An overview of Airbyte’s replication modes. For the Replication frequency, specify the interval between sync runs. Once you are done with the configurations, choose Set up connection and Airbyte will start its first sync. Once complete, you will be able to see how many records were replicated.

Let’s head over to our dataset dashboard on BigQuery.

Choose the books table to preview its contents and schema.

Airbyte Cloud correctly moved 431 MongoDB documents.

That’s it for the tutorial - you now have a data pipeline that automatically transfers data from MongoDB to Google BigQuery. You can now use BigQuery's powerful analytics capabilities to do complex analysis of your data.

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

What sets Airbyte Apart

Modern GenAI Workflows

Streamline AI workflows with Airbyte: load unstructured data into vector stores like Pinecone, Weaviate, and Milvus. Supports RAG transformations with LangChain chunking and embeddings from OpenAI, Cohere, etc., all in one operation.

Move Large Volumes, Fast

Quickly get up and running with a 5-minute setup that supports both incremental and full refreshes, for databases of any size.

An Extensible Open-Source Standard

More than 1,000 developers contribute to Airbyte’s connectors, different interfaces (UI, API, Terraform Provider, Python Library), and integrations with the rest of the stack. Airbyte’s Connector Builder lets you edit or add new connectors in minutes.

Full Control & Security

Airbyte secures your data with cloud-hosted, self-hosted or hybrid deployment options. Single Sign-On (SSO) and Role-Based Access Control (RBAC) ensure only authorized users have access with the right permissions. Airbyte acts as a HIPAA conduit and supports compliance with CCPA, GDPR, and SOC2.

Fully Featured & Integrated

Airbyte automates schema evolution for seamless data flow, and utilizes efficient Change Data Capture (CDC) for real-time updates. Select only the columns you need, and leverage our dbt integration for powerful data transformations.

Enterprise Support with SLAs

Airbyte Self-Managed Enterprise comes with dedicated support and guaranteed service level agreements (SLAs), ensuring that your data movement infrastructure remains reliable and performant, and expert assistance is available when needed.

What our users say

Jean-Mathieu Saponaro
Data & Analytics Senior Eng Manager

"The intake layer of Datadog’s self-serve analytics platform is largely built on Airbyte.Airbyte’s ease of use and extensibility allowed any team in the company to push their data into the platform - without assistance from the data team!"

Learn more
Chase Zieman headshot
Chase Zieman
Chief Data Officer

“Airbyte helped us accelerate our progress by years, compared to our competitors. We don’t need to worry about connectors and focus on creating value for our users instead of building infrastructure. That’s priceless. The time and energy saved allows us to disrupt and grow faster.”

Learn more
Alexis Weill
Data Lead

“We chose Airbyte for its ease of use, its pricing scalability and its absence of vendor lock-in. Having a lean team makes them our top criteria.
The value of being able to scale and execute at a high level by maximizing resources is immense”

Learn more

Sync with Airbyte

1. First, you need to have a MongoDB instance running and accessible from the internet. You will also need to have the necessary credentials to access the database.

2. In the Airbyte dashboard, click on "Sources" and then click on "New Source."

3. Select "MongoDB" from the list of available sources.

4. In the "Connection Configuration" section, enter the following information:
- Host: The hostname or IP address of your MongoDB instance.
- Port: The port number on which your MongoDB instance is running.
- Username: The username you use to access your MongoDB instance.
- Password: The password you use to access your MongoDB instance.
- Authentication Database: The name of the database where your authentication credentials are stored.

5. Click on "Test Connection" to ensure that Airbyte can connect to your MongoDB instance.

6. If the connection is successful, click on "Save" to save your MongoDB source configuration.

7. You can now create a new pipeline and select your MongoDB source as the input. You can then configure the pipeline to transform and load your data into your desired destination.

1. First, navigate to the Airbyte dashboard and select the "Destinations" tab on the left-hand side of the screen.

2. Scroll down until you find the "BigQuery" destination connector and click on it.

3. Click the "Create Destination" button to begin setting up your BigQuery destination.

4. Enter your Google Cloud Platform project ID and service account credentials in the appropriate fields.

5. Next, select the dataset you want to use for your destination and enter the table prefix you want to use.

6. Choose the schema mapping for your data, which will determine how your data is organized in BigQuery.

7. Finally, review your settings and click the "Create Destination" button to complete the setup process.

8. Once your destination is created, you can begin configuring your source connectors to start syncing data to BigQuery.

9. To do this, navigate to the "Sources" tab on the left-hand side of the screen and select the source connector you want to use.

10. Follow the prompts to enter your source credentials and configure your sync settings.

11. When you reach the "Destination" step, select your BigQuery destination from the dropdown menu and choose the dataset and table prefix you want to use.

12. Review your settings and click the "Create Connection" button to start syncing data from your source to your BigQuery destination.

Once you've successfully connected MongoDb as a data source and BigQuery as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select MongoDb from the dropdown list of your configured sources.
  3. Select your destination: Choose BigQuery from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific MongoDb objects you want to import data from towards BigQuery. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from MongoDb to BigQuery according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your BigQuery data warehouse is always up-to-date with your MongoDb data.

To set up a BigQuery Airbyte destination, you need first to create a BigQuery dataset. Login in to your Google cloud dashboard. From the Welcome page, and click on the Run a query in BigQuery button as shown below:

From the 3-dot menu of your cloud project, choose the only option to create a dataset.


For the Dataset ID, choose a descriptive name like mongodb_dataset. Only alphanumeric names are allowed as dataset ID names.

Choose the dataset location from the menu and click Create Dataset. You should now be able to see a newly created dataset. Take note of the Dataset ID you just entered since you will need it to set up BigQuery as an Airbyte Destination.

Get a copy of the account keys to access our BigQuery project. Choose API & Services from the Quick Access menu in your cloud dashboard.

From the credentials menu, choose to create a new Service account.

Pick a name for your Service account and click Create and continue.

Next, you need to specify what resources can be accessed from this service account. From the available roles,  BigQuery Data Editor and BigQuery Job User should be sufficient. Alternatively BigQuery Admin should work (and is shown in the image below), but more specific security roles should be used in production systems.

Click Done, and you should see a new service account.

Choose this newly created service account email and add a new key by clicking the Add Key button.

From the Create private key pop-up, choose JSON.

Once you select Create, a new private key file will be downloaded to your system. The final step involves creating a new Airbyte destination.

From your Airbyte dashboard, choose Destinations, select New Destination, and pick BigQuery from the available options. You will then see a UI similar to the following:

Complete the fields as shown above, and for Service Account Key JSON you should copy the entire contents of the JSON file that you downloaded from BigQuery.

The final step for this tutorial involves building a connection between our newly setup SQL Server and our BigQuery warehouse. To achieve this, go to Connections and choose to set up a New connection. Select the source and destination that you just created, and Airbyte will show you the tables (referred to as streams in Airbyte) that can be synced.

Airbyte has detected the books collections that you imported into MongoDB. For sync mode, choose one from the available modes – for more information you may wish to consult the blog: An overview of Airbyte’s replication modes. For the Replication frequency, specify the interval between sync runs. Once you are done with the configurations, choose Set up connection and Airbyte will start its first sync. Once complete, you will be able to see how many records were replicated.

Let’s head over to our dataset dashboard on BigQuery.

Choose the books table to preview its contents and schema.

Airbyte Cloud correctly moved 431 MongoDB documents.

That’s it for the tutorial - you now have a data pipeline that automatically transfers data from MongoDB to Google BigQuery. You can now use BigQuery's powerful analytics capabilities to do complex analysis of your data.

How to Sync MongoDb to BigQuery Manually

FAQs

ETL, an acronym for Extract, Transform, Load, is a vital data integration process. It involves extracting data from diverse sources, transforming it into a usable format, and loading it into a database, data warehouse or data lake. This process enables meaningful data analysis, enhancing business intelligence.

MongoDB is a popular open-source NoSQL database that stores data in a flexible, document-based format. It is designed to handle large volumes of unstructured data and is highly scalable, making it a popular choice for modern web applications. MongoDB uses a JSON-like format to store data, which allows for easy integration with web applications and APIs. It also supports dynamic queries, indexing, and aggregation, making it a powerful tool for data analysis. MongoDB is widely used in industries such as finance, healthcare, and e-commerce, and is known for its ease of use and flexibility.

MongoDB gives access to a wide range of data types, including:

1. Documents: MongoDB stores data in the form of documents, which are similar to JSON objects. Each document contains a set of key-value pairs that represent the data.
2. Collections: A collection is a group of related documents that are stored together in MongoDB. Collections can be thought of as tables in a relational database.
3. Indexes: MongoDB supports various types of indexes, including single-field, compound, and geospatial indexes. Indexes are used to improve query performance.
4. GridFS: MongoDB's GridFS is a specification for storing and retrieving large files, such as images and videos, in MongoDB.
5. Aggregation: MongoDB's aggregation framework provides a way to perform complex data analysis operations, such as grouping, filtering, and sorting, on large datasets.
6. Transactions: MongoDB supports multi-document transactions, which allow multiple operations to be performed atomically.
7. Change streams: MongoDB's change streams provide a way to monitor changes to data in real-time, allowing applications to react to changes as they occur.

Overall, MongoDB provides access to a flexible and powerful data model that can handle a wide range of data types and use cases.

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps: 
1. Set up MongoDB to BigQuery as a source connector (using Auth, or usually an API key)
2. Choose a destination (more than 50 available destination databases, data warehouses or lakes) to sync data too and set it up as a destination connector
3. Define which data you want to transfer from MongoDB to BigQuery and how frequently
You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud. 

ELT, standing for Extract, Load, Transform, is a modern take on the traditional ETL data integration process. In ELT, data is first extracted from various sources, loaded directly into a data warehouse, and then transformed. This approach enhances data processing speed, analytical flexibility and autonomy.

ETL and ELT are critical data integration strategies with key differences. ETL (Extract, Transform, Load) transforms data before loading, ideal for structured data. In contrast, ELT (Extract, Load, Transform) loads data before transformation, perfect for processing large, diverse data sets in modern data warehouses. ELT is becoming the new standard as it offers a lot more flexibility and autonomy to data analysts.

Warehouses and Lakes
Databases

How to load data from MongoDb to BigQuery

Learn how to use Airbyte to synchronize your MongoDb data into BigQuery within minutes.

TL;DR

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps:

  1. set up MongoDb as a source connector (using Auth, or usually an API key)
  2. set up BigQuery as a destination connector
  3. define which data you want to transfer and how frequently

You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud.

This tutorial’s purpose is to show you how.

What is MongoDb

MongoDB is a popular open-source NoSQL database that stores data in a flexible, document-based format. It is designed to handle large volumes of unstructured data and is highly scalable, making it a popular choice for modern web applications. MongoDB uses a JSON-like format to store data, which allows for easy integration with web applications and APIs. It also supports dynamic queries, indexing, and aggregation, making it a powerful tool for data analysis. MongoDB is widely used in industries such as finance, healthcare, and e-commerce, and is known for its ease of use and flexibility.

What is BigQuery

BigQuery is an enterprise data warehouse that draws on the processing power of Google Cloud Storage to enable fast processing of SQL queries through massive datasets. BigQuery helps businesses select the most appropriate software provider to assemble their data, based on the platforms the business uses. Once a business’ data is acculumated, it is moved into BigQuery. The company controls access to the data, but BigQuery stores and processes it for greater speed and convenience.

Integrate MongoDb with BigQuery in minutes

Try for free now

Prerequisites

  1. A MongoDb account to transfer your customer data automatically from.
  2. A BigQuery account.
  3. An active Airbyte Cloud account, or you can also choose to use Airbyte Open Source locally. You can follow the instructions to set up Airbyte on your system using docker-compose.

Airbyte is an open-source data integration platform that consolidates and streamlines the process of extracting and loading data from multiple data sources to data warehouses. It offers pre-built connectors, including MongoDb and BigQuery, for seamless data migration.

When using Airbyte to move data from MongoDb to BigQuery, it extracts data from MongoDb using the source connector, converts it into a format BigQuery can ingest using the provided schema, and then loads it into BigQuery via the destination connector. This allows businesses to leverage their MongoDb data for advanced analytics and insights within BigQuery, simplifying the ETL process and saving significant time and resources.

Step 1: Set up MongoDb as a source connector

1. First, you need to have a MongoDB instance running and accessible from the internet. You will also need to have the necessary credentials to access the database.

2. In the Airbyte dashboard, click on "Sources" and then click on "New Source."

3. Select "MongoDB" from the list of available sources.

4. In the "Connection Configuration" section, enter the following information:
- Host: The hostname or IP address of your MongoDB instance.
- Port: The port number on which your MongoDB instance is running.
- Username: The username you use to access your MongoDB instance.
- Password: The password you use to access your MongoDB instance.
- Authentication Database: The name of the database where your authentication credentials are stored.

5. Click on "Test Connection" to ensure that Airbyte can connect to your MongoDB instance.

6. If the connection is successful, click on "Save" to save your MongoDB source configuration.

7. You can now create a new pipeline and select your MongoDB source as the input. You can then configure the pipeline to transform and load your data into your desired destination.

Step 2: Set up BigQuery as a destination connector

1. First, navigate to the Airbyte dashboard and select the "Destinations" tab on the left-hand side of the screen.

2. Scroll down until you find the "BigQuery" destination connector and click on it.

3. Click the "Create Destination" button to begin setting up your BigQuery destination.

4. Enter your Google Cloud Platform project ID and service account credentials in the appropriate fields.

5. Next, select the dataset you want to use for your destination and enter the table prefix you want to use.

6. Choose the schema mapping for your data, which will determine how your data is organized in BigQuery.

7. Finally, review your settings and click the "Create Destination" button to complete the setup process.

8. Once your destination is created, you can begin configuring your source connectors to start syncing data to BigQuery.

9. To do this, navigate to the "Sources" tab on the left-hand side of the screen and select the source connector you want to use.

10. Follow the prompts to enter your source credentials and configure your sync settings.

11. When you reach the "Destination" step, select your BigQuery destination from the dropdown menu and choose the dataset and table prefix you want to use.

12. Review your settings and click the "Create Connection" button to start syncing data from your source to your BigQuery destination.

Step 3: Set up a connection to sync your MongoDb data to BigQuery

Once you've successfully connected MongoDb as a data source and BigQuery as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select MongoDb from the dropdown list of your configured sources.
  3. Select your destination: Choose BigQuery from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific MongoDb objects you want to import data from towards BigQuery. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from MongoDb to BigQuery according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your BigQuery data warehouse is always up-to-date with your MongoDb data.

Use Cases to transfer your MongoDb data to BigQuery

Integrating data from MongoDb to BigQuery provides several benefits. Here are a few use cases:

  1. Advanced Analytics: BigQuery’s powerful data processing capabilities enable you to perform complex queries and data analysis on your MongoDb data, extracting insights that wouldn't be possible within MongoDb alone.
  2. Data Consolidation: If you're using multiple other sources along with MongoDb, syncing to BigQuery allows you to centralize your data for a holistic view of your operations, and to set up a change data capture process so you never have any discrepancies in your data again.
  3. Historical Data Analysis: MongoDb has limits on historical data. Syncing data to BigQuery allows for long-term data retention and analysis of historical trends over time.
  4. Data Security and Compliance: BigQuery provides robust data security features. Syncing MongoDb data to BigQuery ensures your data is secured and allows for advanced data governance and compliance management.
  5. Scalability: BigQuery can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding MongoDb data.
  6. Data Science and Machine Learning: By having MongoDb data in BigQuery, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
  7. Reporting and Visualization: While MongoDb provides reporting tools, data visualization tools like Tableau, PowerBI, Looker (Google Data Studio) can connect to BigQuery, providing more advanced business intelligence options. If you have a MongoDb table that needs to be converted to a BigQuery table, Airbyte can do that automatically.

Wrapping Up

To summarize, this tutorial has shown you how to:

  1. Configure a MongoDb account as an Airbyte data source connector.
  2. Configure BigQuery as a data destination connector.
  3. Create an Airbyte data pipeline that will automatically be moving data directly from MongoDb to BigQuery after you set a schedule

With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.

We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Connectors Used

As organizations scale and their data needs grow more complex, many find themselves at a crossroads between the flexibility of NoSQL databases like MongoDB and the analytical power of cloud data warehouses such as Google BigQuery. This transition from MongoDB to BigQuery is more than just a change in storage – it's a shift in how data is structured, queried, and utilized for business intelligence. In this article, we'll explore two methods for accomplishing this migration: using Airbyte, an open-source data integration platform, and a manual approach using Google Cloud's native tools.

Benefits of moving to BigQuery from MongoDB

  1. Fast analytics without impacting operational workloads: BigQuery is designed for fast and efficient analytics. By creating a copy of your operational data in BigQuery, you can execute complex analytical queries against that copy, without impacting your operational workloads.
  2. Creating a single source of truth and optimize reporting workflows: It can be time-consuming and challenging for analysts to work with multiple platforms. Combining data from multiple systems into a centralized data warehouse such as BigQuery reduces this workload by serving as a single source of truth.
  3. Improved security: Replicating data out of MongoDB into an analytical system such as BigQuery removes the need to grant permissions to data analysts on operational systems, which can improve security.

{{COMPONENT_CTA}}

Methods to Move Data From Mongodb to Bigquery

  • Method 1: Connecting Mongodb to Bigquery using Airbyte.
  • Method 2: Connecting Mongdb to Bigquery manually.

Method 1: Connecting Mongodb to Bigquery using Airbyte.

Prerequisites

  1. Clever Cloud - hosting of the MongoDB database.
  2. mongosh - CLI client to access a MongoDB database.
  3. Google cloud - to create a BigQuery data warehouse.
  4. Airbyte cloud - a data integration tool that is used to replicate and synchronize data between MongoDB and BigQuery. Alternatively, you may choose to install Airbyte OSS locally.

Step 1-a: Launch MongoDB

In this section, you will use Clever Cloud to create a MongoDB instance. Once you sign up, Choose the option to create an add-on from your personal space.

From the available list of add-ons, choose the MongoDB add-on.

For the instance size, choose the DEV plan, which is free to use.

Enter an add-on name and select a region as shown below, and choose Next.

You should now have a new MongoDB database and the details to connect to it. To connect to the instance, copy the mongo CLI connection command from the Clever Cloud add-on dashboard shown below:

Replace “mongo” with “mongosh” before executing the command in your terminal, as demonstrated below:

You should now be connected to the PRIMARY replica of the MongoDB replica set (indicated by [primary] in the shell).

In the Airbyte connection to MongoDB, you will make use of the URL for the primary replica. This can be retrieved by running rs.isMaster().primary in the MongoDB shell, which will respond with a string in the format of [hostname]:[port]. In our case, the URL returned by this command is is n2-c2-mongodb-clevercloud-customers.services.clever-cloud.com:27017

Step 1-b: Add data to MongoDB (optional)

For this demo,  download a sample “books” collection from a publicly available dataset. Then execute the following command:

mongoimport --host=n1-c2-mongodb-clevercloud-customers.services.clever-cloud.com --port=27017 --username= --password= --db= –collection=books books.json

Step 2: Configure a MongoDB source connector

Create an Airbyte MongoDB source by choosing sources from your Airbyte dashboard and clicking on the New source button. Then from the list of sources, choose MongoDB, and you should see a UI similar to the following:

To keep this tutorial simple, and for demonstration purposes only, in the above image we have selected a Standalone MongoDB instance. However, you may also consider selecting one of the alternative MongoDB configuration parameters if you wish to have a more resilient connection to your MongoDB cluster. Enter the Host, Port, Username, DB Name, and Password that were shown earlier in the Clever Cloud MongoDB configuration UI. Then choose Set up source.

Step 3: Launch BigQuery

To set up a BigQuery Airbyte destination, you need first to create a BigQuery dataset. Login in to your Google cloud dashboard. From the Welcome page, and click on the Run a query in BigQuery button as shown below:

From the 3-dot menu of your cloud project, choose the only option to create a dataset.


For the Dataset ID, choose a descriptive name like mongodb_dataset. Only alphanumeric names are allowed as dataset ID names.

Choose the dataset location from the menu and click Create Dataset. You should now be able to see a newly created dataset. Take note of the Dataset ID you just entered since you will need it to set up BigQuery as an Airbyte Destination.

Get a copy of the account keys to access our BigQuery project. Choose API & Services from the Quick Access menu in your cloud dashboard.

From the credentials menu, choose to create a new Service account.

Pick a name for your Service account and click Create and continue.

Next, you need to specify what resources can be accessed from this service account. From the available roles,  BigQuery Data Editor and BigQuery Job User should be sufficient. Alternatively BigQuery Admin should work (and is shown in the image below), but more specific security roles should be used in production systems.

Click Done, and you should see a new service account.

Choose this newly created service account email and add a new key by clicking the Add Key button.

From the Create private key pop-up, choose JSON.

Once you select Create, a new private key file will be downloaded to your system. The final step involves creating a new Airbyte destination.

Step 4: Configure a BigQuery destination connector

From your Airbyte dashboard, choose Destinations, select New Destination, and pick BigQuery from the available options. You will then see a UI similar to the following:

Complete the fields as shown above, and for Service Account Key JSON you should copy the entire contents of the JSON file that you downloaded from BigQuery.

Step 5: Set up an Airbyte Connection from MongoDB to BigQuery

The final step for this tutorial involves building a connection between our newly setup SQL Server and our BigQuery warehouse. To achieve this, go to Connections and choose to set up a New connection. Select the source and destination that you just created, and Airbyte will show you the tables (referred to as streams in Airbyte) that can be synced.

Airbyte has detected the books collections that you imported into MongoDB. For sync mode, choose one from the available modes – for more information you may wish to consult the blog: An overview of Airbyte’s replication modes. For the Replication frequency, specify the interval between sync runs. Once you are done with the configurations, choose Set up connection and Airbyte will start its first sync. Once complete, you will be able to see how many records were replicated.

Let’s head over to our dataset dashboard on BigQuery.

Choose the books table to preview its contents and schema.

Airbyte Cloud correctly moved 431 MongoDB documents.

That’s it for the tutorial - you now have a data pipeline that automatically transfers data from MongoDB to Google BigQuery. You can now use BigQuery's powerful analytics capabilities to do complex analysis of your data.

Method 2: Connecting Mongdb to Bigquery manually.

Moving data from MongoDB to Google BigQuery manually can be a bit challenging, but it's definitely possible. Here's a step-by-step guide to help you accomplish this task:

Step 1: Export Data from MongoDB

Before you can move your data, you need to export it from MongoDB. You can use the mongoexport command-line tool to export your data in a JSON or CSV format.

  1. Open your terminal or command prompt.
  2. Use the mongoexport command to export your collection. For example, to export a collection named myCollection from a database named myDatabase to a JSON file, you would use:
    mongoexport --db=myDatabase --collection=myCollection --out=myCollection.json
  3. To export to a CSV file, you would specify the fields and use the --type=csv option:
    mongoexport --db=myDatabase --collection=myCollection --type=csv --fields=field1,field2 --out=myCollection.csv
  4. Once the export command completes, you should have a myCollection.json or myCollection.csv file with your data.

Step 2: Prepare Your Data for BigQuery

BigQuery expects the data to be in a certain format, especially if you're using JSON. Make sure your exported data conforms to the following:

  • JSON file should contain newline-delimited JSON (NDJSON).
  • The structure of the data should match the BigQuery table schema you plan to use.

If necessary, you can write a script to transform your data into the correct format or structure.

Step 3: Upload Your Data to Google Cloud Storage

BigQuery can load data directly from Google Cloud Storage. You need to upload your exported files there first.

Create a Google Cloud Storage bucket if you don't already have one:
gsutil mb gs://your-bucket-name

Upload your data file to the bucket:
gsutil cp myCollection.json gs://your-bucket-name

Step 4: Create a BigQuery Dataset and Table

  1. Go to the BigQuery web UI in your Google Cloud Platform console.
  2. Click on your project name, then click on "Create dataset."
  3. Fill in the details for your dataset and click "Create dataset."
  4. Inside the dataset, click on "Create table."
  5. Set the "Create table from" option to "Google Cloud Storage" and enter the path to your file (gs://your-bucket-name/myCollection.json).
  6. Specify the table name and table type (Native table).
  7. In the Schema section, either manually input the schema that matches your MongoDB data or use the "Auto-detect" option if your data is well structured.
  8. Click "Create table."

Step 5: Load Data into BigQuery

If you did not load the data in the previous step while creating the table, you can manually load it as follows:

  1. In the BigQuery UI, navigate to your dataset and table.
  2. Click on your table, then click on the "Load data" button.
  3. Choose the source format and select the file from Google Cloud Storage.
  4. Configure the schema settings as needed.
  5. Click "Start job" to begin loading your data into BigQuery.

Step 6: Verify Data Integrity

After loading, it's crucial to verify that your data has been correctly imported:

  1. Run some queries in the BigQuery UI to check if the data looks correct.
  2. Compare the number of records and some sample data with your original MongoDB data to ensure completeness and accuracy.

Step 7: Clean Up

Once your data is successfully moved and verified, you might want to clean up to avoid unnecessary storage costs:

Delete the exported data from Google Cloud Storage:
gsutil rm gs://your-bucket-name/myCollection.json

If you created any temporary files or scripts for transforming your data, consider archiving or deleting them if they are no longer needed.

By following these steps, you should be able to move your data from MongoDB to Google BigQuery manually. Remember to handle your data securely and comply with any relevant data protection regulations during the process.

Ready to Migrate MongoDB to BigQuery? Which method should you choose?

Data synchronization

Manual: Requires custom scripts for incremental updates, which can be complex to maintain and error-prone.

Airbyte: Supports both incremental and full refreshes out of the box, accommodating databases of any size. This flexibility ensures efficient data transfer and reduces unnecessary strain on resources.

Automation

Manual: Relies on external schedulers like cron jobs, which may lack robust error handling and reporting.

Airbyte: Offers built-in scheduling for full-refresh, incremental, and log-based Change Data Capture (CDC) replications. This feature provides a seamless, automated data pipeline that keeps your BigQuery instance up-to-date with minimal intervention.

Security

Manual: Requires careful implementation of security best practices, which can be overlooked or improperly configured.

Airbyte: Provides secure connections to databases using industry-standard methods like SSL/TLS and SSH tunnels, ensuring data integrity and confidentiality during transfer.

Data transformation

Manual: Necessitates separate ETL processes, often requiring additional tools or complex scripts.

Airbyte: Integrates with dbt, allowing you to add custom transformations directly within the application. This streamlines the ETL process and provides a unified platform for both data extraction and transformation.

Error handling

Manual: Typically requires custom logging implementations, which may not capture all necessary details for troubleshooting.

Airbyte: Logs all errors in full detail, providing comprehensive information to help you understand and resolve issues quickly. This feature is crucial for maintaining data integrity and ensuring smooth operations.

Scalability

Manual: Can become unmanageable as data volume and complexity increase, often requiring significant refactoring.

Airbyte: Designed to handle databases of any size, with the ability to scale horizontally as your data needs grow.

Conclusion

Migrating data from MongoDB to BigQuery represents a significant step in evolving your data infrastructure. While manual methods offer granular control, Airbyte emerges as the superior solution for most scenarios. Its automated pipelines, robust security features, and built-in transformation capabilities streamline the migration process, reducing complexity and potential errors. As data volumes grow and real-time analytics become increasingly critical, tools like Airbyte will play a pivotal role in ensuring seamless data integration. The choice between manual and automated approaches depends on your specific needs, but for organizations seeking efficiency, scalability, and ease of use, Airbyte provides a compelling path forward in the MongoDB to BigQuery migration journey.

With Airbyte, the data integration possibilities are endless, and we look forward to seeing you use it! We invite you to join the conversation on our community Slack Channel, participate in discussions on Airbyte’s discourse, or sign up for our newsletter. You may also wish to check out other Airbyte tutorials and Airbyte’s blog!

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Connectors Used

Frequently Asked Questions

What data can you extract from MongoDb?

MongoDB gives access to a wide range of data types, including:

1. Documents: MongoDB stores data in the form of documents, which are similar to JSON objects. Each document contains a set of key-value pairs that represent the data.
2. Collections: A collection is a group of related documents that are stored together in MongoDB. Collections can be thought of as tables in a relational database.
3. Indexes: MongoDB supports various types of indexes, including single-field, compound, and geospatial indexes. Indexes are used to improve query performance.
4. GridFS: MongoDB's GridFS is a specification for storing and retrieving large files, such as images and videos, in MongoDB.
5. Aggregation: MongoDB's aggregation framework provides a way to perform complex data analysis operations, such as grouping, filtering, and sorting, on large datasets.
6. Transactions: MongoDB supports multi-document transactions, which allow multiple operations to be performed atomically.
7. Change streams: MongoDB's change streams provide a way to monitor changes to data in real-time, allowing applications to react to changes as they occur.

Overall, MongoDB provides access to a flexible and powerful data model that can handle a wide range of data types and use cases.

What data can you transfer to BigQuery?

You can transfer a wide variety of data to BigQuery. This usually includes structured, semi-structured, and unstructured data like transaction records, log files, JSON data, CSV files, and more, allowing robust, scalable data integration and analysis.

What are top ETL tools to transfer data from MongoDb to BigQuery?

The most prominent ETL tools to transfer data from MongoDb to BigQuery include:

  • Airbyte
  • Fivetran
  • Stitch
  • Matillion
  • Talend Data Integration

These tools help in extracting data from MongoDb and various sources (APIs, databases, and more), transforming it efficiently, and loading it into BigQuery and other databases, data warehouses and data lakes, enhancing data management capabilities.

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Connectors Used