All tutorials
No items found.

Synchronize data from MongoDB to BigQuery

Learn how to use Airbyte to easily synchronize your MongoDB data into BigQuery.

MongoDB is a distributed document-oriented database that is built for modern transactional workloads. It may be used for storing rapidly changing multi-structured data, and is commonly used for storing the data required to power applications.

On the other hand, BigQuery is a cost-effective, serverless, multi cloud enterprise data warehouse that is often used to store current and historical data from one or more systems. As part of an overall data integration strategy, organizations often choose to complement their databases with a data warehouse such as BigQuery, which will provide the following benefits:

  1. Fast analytics without impacting operational workloads: BigQuery is designed for fast and efficient analytics. By creating a copy of your operational data in BigQuery, you can execute complex analytical queries against that copy, without impacting your operational workloads.
  2. Creating a single source of truth and optimize reporting workflows: It can be time-consuming and challenging for analysts to work with multiple platforms. Combining data from multiple systems into a centralized data warehouse such as BigQuery reduces this workload by serving as a single source of truth.
  3. Improved security: Replicating data out of MongoDB into an analytical system such as BigQuery removes the need to grant permissions to data analysts on operational systems, which can improve security. 

In order to achieve the above benefits,  you will need to replicate  your data from MongoDB to BigQuery. This tutorial will demonstrate how to use Airbyte to do this for you.

Prerequisites

  1. Clever Cloud - hosting of the MongoDB database.
  2. mongosh - CLI client to access a MongoDB database.
  3. Google cloud - to create a BigQuery data warehouse.
  4. Airbyte cloud - a data integration tool that is used to replicate and synchronize data between MongoDB and BigQuery. Alternatively, you may choose to install Airbyte OSS locally

Step 1-a: Launch MongoDB

In this section, you will use Clever Cloud to create a MongoDB instance. Once you sign up, Choose the option to create an add-on from your personal space.

From the available list of add-ons, choose the MongoDB add-on.

For the instance size, choose the DEV plan, which is free to use.

Enter an add-on name and select a region as shown below, and choose Next.

You should now have a new MongoDB database and the details to connect to it. To connect to the instance, copy the mongo CLI connection command from the Clever Cloud add-on dashboard shown below:

Replace “mongo” with “mongosh” before executing the command in your terminal, as demonstrated below:

You should now be connected to the PRIMARY replica of the MongoDB replica set (indicated by [primary] in the shell).

In the Airbyte connection to MongoDB, you will make use of the URL for the primary replica. This can be retrieved by running rs.isMaster().primary in the MongoDB shell, which will respond with a string in the format of [hostname]:[port]. In our case, the URL returned by this command is is n2-c2-mongodb-clevercloud-customers.services.clever-cloud.com:27017

Step 1-b: Add data to MongoDB (optional)

For this demo,  download a sample “books” collection from a publicly available dataset. Then execute the following command:

mongoimport --host=n1-c2-mongodb-clevercloud-customers.services.clever-cloud.com --port=27017 --username= --password= --db= –collection=books books.json

Step 2: Configure a MongoDB source connector

Create an Airbyte MongoDB source by choosing sources from your Airbyte dashboard and clicking on the New source button. Then from the list of sources, choose MongoDB, and you should see a UI similar to the following:

To keep this tutorial simple, and for demonstration purposes only, in the above image we have selected a Standalone MongoDB instance. However, you may also consider selecting one of the alternative MongoDB configuration parameters if you wish to have a more resilient connection to your MongoDB cluster. Enter the Host, Port, Username, DB Name, and Password that were shown earlier in the Clever Cloud MongoDB configuration UI. Then choose Set up source.

Step 3: Launch BigQuery

To set up a BigQuery Airbyte destination, you need first to create a BigQuery dataset. Login in to your Google cloud dashboard. From the Welcome page, and click on the Run a query in BigQuery button as shown below:

From the 3-dot menu of your cloud project, choose the only option to create a dataset.


For the Dataset ID, choose a descriptive name like mongodb_dataset. Only alphanumeric names are allowed as dataset ID names.

Choose the dataset location from the menu and click Create Dataset. You should now be able to see a newly created dataset. Take note of the Dataset ID you just entered since you will need it to set up BigQuery as an Airbyte Destination.

Get a copy of the account keys to access our BigQuery project. Choose API & Services from the Quick Access menu in your cloud dashboard.

From the credentials menu, choose to create a new Service account.

Pick a name for your Service account and click Create and continue.

Next, you need to specify what resources can be accessed from this service account. From the available roles,  BigQuery Data Editor and BigQuery Job User should be sufficient. Alternatively BigQuery Admin should work (and is shown in the image below), but more specific security roles should be used in production systems.

Click Done, and you should see a new service account.

Choose this newly created service account email and add a new key by clicking the Add Key button.

From the Create private key pop-up, choose JSON.

Once you select Create, a new private key file will be downloaded to your system. The final step involves creating a new Airbyte destination. 

Step 4: Configure a BigQuery destination connector

From your Airbyte dashboard, choose Destinations, select New Destination, and pick BigQuery from the available options. You will then see a UI similar to the following: 

Complete the fields as shown above, and for Service Account Key JSON you should copy the entire contents of the JSON file that you downloaded from BigQuery.

Step 5: Set up an Airbyte Connection between MongoDB and BigQuery

The final step for this tutorial involves building a connection between our newly setup SQL Server and our BigQuery warehouse. To achieve this, go to Connections and choose to set up a New connection. Select the source and destination that you just created, and Airbyte will show you the tables (referred to as streams in Airbyte) that can be synced.

Airbyte has detected the books collections that you imported into MongoDB. For sync mode, choose one from the available modes – for more information you may wish to consult the blog: An overview of Airbyte’s replication modes. For the Replication frequency, specify the interval between sync runs. Once you are done with the configurations, choose Set up connection and Airbyte will start its first sync. Once complete, you will be able to see how many records were replicated.

Let’s head over to our dataset dashboard on BigQuery.

Choose the books table to preview its contents and schema.

Airbyte Cloud correctly moved 431 MongoDB documents.

That’s it for the tutorial - you now have a data pipeline that automatically transfers data from MongoDB to Google BigQuery. You can now use BigQuery's powerful analytics capabilities to do complex analysis of your data.

Conclusion

In summary, in this tutorial you have learned how to:

  1. Launch a MongoDB database on Clever Cloud, and then how to add some test data to it.
  2. Configure an Airbyte source connector to read data from MongoDB.
  3. Launch a BigQuery database on Google Cloud.
  4. Configure an Airbyte destination connector to send data into BigQuery.
  5. Create an Airbyte connection that replicates data from a MongoDB to BigQuery.

With Airbyte, the data integration possibilities are endless, and we look forward to seeing you use it! We invite you to join the conversation on our community Slack Channel, participate in discussions on Airbyte’s discourse, or sign up for our newsletter. You may also wish to check out other Airbyte tutorials and Airbyte’s blog!

Open-source data integration

Get all your ELT data pipelines running in minutes with Airbyte.

Similar use cases

Replicate data from MySQL to BigQuery

Learn how to easily replicate your data from MySQL to BigQuery, where it can be combined with data from other sources to get a holistic view of your business.

Replicate data from Azure SQL Database to BigQuery

Learn how to easily move your Azure SQL Database data into BigQuery where it can be combined with data from other sources to get a holistic view of your business and to gain valuable insights.

Replicate data from IBM Db2 to Postgres

Learn how to easily replicate your data from IBM Db2 to Postgres, where it can be combined with data from other sources to get a holistic view of your business.