Databases
Databases

How to load data from MongoDB to PostgreSQL Destination

Learn how to use Airbyte to synchronize your MongoDB data into PostgreSQL Destination within minutes.

TL;DR

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps:

  1. set up MongoDB as a source connector (using Auth, or usually an API key)
  2. set up PostgreSQL Destination as a destination connector
  3. define which data you want to transfer and how frequently

You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud.

This tutorial’s purpose is to show you how.

What is MongoDB

MongoDB is a popular open-source NoSQL database that stores data in a flexible, document-based format. It is designed to handle large volumes of unstructured data and is highly scalable, making it a popular choice for modern web applications. MongoDB uses a JSON-like format to store data, which allows for easy integration with web applications and APIs. It also supports dynamic queries, indexing, and aggregation, making it a powerful tool for data analysis. MongoDB is widely used in industries such as finance, healthcare, and e-commerce, and is known for its ease of use and flexibility.

What is PostgreSQL Destination

An object-relational database management system, PostgreSQL is able to handle a wide range of workloads, supports multiple standards, and is cross-platform, running on numerous operating systems including Microsoft Windows, Solaris, Linux, and FreeBSD. It is highly extensible, and supports more than 12 procedural languages, Spatial data support, Gin and GIST Indexes, and more. Many web, mobile, and analytics applications use PostgreSQL as the primary data warehouse or data store.

Prerequisites

  1. A MongoDB account to transfer your customer data automatically from.
  2. A PostgreSQL Destination account.
  3. An active Airbyte Cloud account, or you can also choose to use Airbyte Open Source locally. You can follow the instructions to set up Airbyte on your system using docker-compose.

Airbyte is an open-source data integration platform that consolidates and streamlines the process of extracting and loading data from multiple data sources to data warehouses. It offers pre-built connectors, including MongoDB and PostgreSQL Destination, for seamless data migration.

When using Airbyte to move data from MongoDB to PostgreSQL Destination, it extracts data from MongoDB using the source connector, converts it into a format PostgreSQL Destination can ingest using the provided schema, and then loads it into PostgreSQL Destination via the destination connector. This allows businesses to leverage their MongoDB data for advanced analytics and insights within PostgreSQL Destination, simplifying the ETL process and saving significant time and resources.

Step 1: Set up MongoDB as a source connector

1. First, you need to have a MongoDB instance running and accessible from the internet. You will also need to have the necessary credentials to access the database.

2. In the Airbyte dashboard, click on "Sources" and then click on "New Source."

3. Select "MongoDB" from the list of available sources.

4. In the "Connection Configuration" section, enter the following information:
- Host: The hostname or IP address of your MongoDB instance.
- Port: The port number on which your MongoDB instance is running.
- Username: The username you use to access your MongoDB instance.
- Password: The password you use to access your MongoDB instance.
- Authentication Database: The name of the database where your authentication credentials are stored.

5. Click on "Test Connection" to ensure that Airbyte can connect to your MongoDB instance.

6. If the connection is successful, click on "Save" to save your MongoDB source configuration.

7. You can now create a new pipeline and select your MongoDB source as the input. You can then configure the pipeline to transform and load your data into your desired destination.

Step 2: Set up PostgreSQL Destination as a destination connector

Step 3: Set up a connection to sync your MongoDB data to PostgreSQL Destination

Once you've successfully connected MongoDB as a data source and PostgreSQL Destination as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select MongoDB from the dropdown list of your configured sources.
  3. Select your destination: Choose PostgreSQL Destination from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific MongoDB objects you want to import data from towards PostgreSQL Destination. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from MongoDB to PostgreSQL Destination according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your PostgreSQL Destination data warehouse is always up-to-date with your MongoDB data.

Use Cases to transfer your MongoDB data to PostgreSQL Destination

Integrating data from MongoDB to PostgreSQL Destination provides several benefits. Here are a few use cases:

  1. Advanced Analytics: PostgreSQL Destination’s powerful data processing capabilities enable you to perform complex queries and data analysis on your MongoDB data, extracting insights that wouldn't be possible within MongoDB alone.
  2. Data Consolidation: If you're using multiple other sources along with MongoDB, syncing to PostgreSQL Destination allows you to centralize your data for a holistic view of your operations, and to set up a change data capture process so you never have any discrepancies in your data again.
  3. Historical Data Analysis: MongoDB has limits on historical data. Syncing data to PostgreSQL Destination allows for long-term data retention and analysis of historical trends over time.
  4. Data Security and Compliance: PostgreSQL Destination provides robust data security features. Syncing MongoDB data to PostgreSQL Destination ensures your data is secured and allows for advanced data governance and compliance management.
  5. Scalability: PostgreSQL Destination can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding MongoDB data.
  6. Data Science and Machine Learning: By having MongoDB data in PostgreSQL Destination, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
  7. Reporting and Visualization: While MongoDB provides reporting tools, data visualization tools like Tableau, PowerBI, Looker (Google Data Studio) can connect to PostgreSQL Destination, providing more advanced business intelligence options. If you have a MongoDB table that needs to be converted to a PostgreSQL Destination table, Airbyte can do that automatically.

Wrapping Up

To summarize, this tutorial has shown you how to:

  1. Configure a MongoDB account as an Airbyte data source connector.
  2. Configure PostgreSQL Destination as a data destination connector.
  3. Create an Airbyte data pipeline that will automatically be moving data directly from MongoDB to PostgreSQL Destination after you set a schedule

With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.

We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!

MongoDB is a distributed database that is built for modern transactional and analytical applications and may be used for rapidly changing, multi-structured data. On the other hand, PostgreSQL is an SQL database that has all of the features that you require from a relational database . If you are unsure of the differences between these systems, on the MongoDB website, you can find an article that compares PostgreSQL and MongoDB.

Choosing one or the other between MongoDB and PostgreSQL may not be your only option – in-fact, because each database has different strengths you may wish to use them side-by-side. If this is your case,  then you may need to sync data between them.  

Custom building a data pipeline to replicate data from MongoDB to Postgres is time-consuming and tedious. On the other hand, Airbyte is designed exactly for this task. This article will demonstrate how to use Airbyte to replicate and synchronize data from MongoDB to PostgreSQL!

Prerequisites

This tutorial makes use of the following tools:

  1. Clever Cloud - hosting of the MongoDB and PostgreSQL databases.
  2. mongosh - a CLI client to interface with the MongoDB database.
  3. Postgres CLI - a terminal-based front-end to PostgreSQL.
  4. Airbyte cloud - a data integration tool that will be used to replicate and synchronize data between MongoDB and PostgreSQL. Alternatively, you may choose to install Airbyte OSS locally.

Step 1-a: Launch MongoDB

In this section, you will use Clever Cloud to create a MongoDB instance. Once you sign up, choose the option to create an add-on from your personal space.

From the available list of add-ons, choose the MongoDB add-on.

For the instance size, choose the DEV plan, which is free to use.

Enter an add-on name and select a region as shown below, and then click Next.

You should now have a new MongoDB database created with all the details to connect to it. Copy the mongo CLI connection command from the Clever Cloud add-on dashboard shown below

Copy and replace “mongo” with “mongosh” before executing the command in your terminal, as demonstrated below:

You should now be connected to the PRIMARY replica of the MongoDB replica set (indicated by [primary] in the shell).

In the Airbyte connection to MongoDB, you will make use of the URL for the primary replica. This can be retrieved by running rs.isMaster().primary in the MongoDB shell, which will respond with a string in the format of [hostname]:[port]. In our case, the URL returned by this command is is n2-c2-mongodb-clevercloud-customers.services.clever-cloud.com:27017

Step 1-b: Add data to MongoDB (optional)

For this demo, we download and then import a sample restaurant collection using the mongoimport db tool command.


mongoimport --host=n2-c2-mongodb-clevercloud-customers.services.clever-cloud.com --port=27017 --username= --password= --db= –collection=restaurant restaurant.json

Step 2: Configure a MongoDB source connector

Create an Airbyte MongoDB source by choosing sources from your Airbyte dashboard and clicking on the New source button. Then from the list of sources, choose MongoDB, and you should see a UI similar to the following:

To keep this tutorial simple, and for demonstration purposes only, in the above image we have selected a Standalone MongoDB instance. However, you may also consider selecting one of the alternative MongoDB configuration parameters if you wish to have a more resilient connection to your MongoDB cluster.

Enter the Host, Port, Username, DB Name, and Password that were shown earlier in the Clever Cloud MongoDB configuration UI. Then choose Set up source.

Step 3: Launch PostgreSQL

To set up a PostgreSQL database, create a new add-on on your Clever Cloud dashboard, and choose PostgreSQL from the available add-ons

For plan, choose the DEV option, which provides 256 MB for storage.

Give a name to your add-on, and choose a location.

Click on Next once you are satisfied with your configurations, and then Clever Cloud should show you the PostgreSQL database credentials with information that will be required by Airbyte, including host, user, password, and database name.

To connect to our newly created database, copy the Connection URI and provide it as an argument to the psql CLI tool as shown below.

Step 4: Configure a PostgreSQL destination connector

Go to Destinations in your Airbyte Dashboard, choose to Create destination from the list, and choose PostgreSQL. You will then see a UI similar to the following:

Enter in the PostgreSQL parameters that were returned by Clever Cloud, and click Set up destination.

Step 5: Set up an Airbyte connection from MongoDB to Postgres

Go to Connections in your Airbyte dashboard and choose New connection. Select the source and the destination that you just created, at which point you should see a UI similar to the following:

Airbyte has correctly detected the restaurant collection as a stream, and you can choose how it should be replicated to PostgreSQL. For sync mode, choose one from the available modes – for more information you may wish to consult the blog: An overview of Airbyte’s replication modes.

For Replication frequency, specify the interval between sync runs. Once you are done with the configurations, choose Set up connection and Airbyte will start its first sync. Once complete, you will be able to see how many records were replicated.

Log in to Postgres host to see the replicated data. Note that you must change the search_path according to the DB Name that you specified when you set up the PostgreSQL destination in Airbyte.

You should now be able to view the replicated data using standard SQL commands.

Conclusion

In summary, in this tutorial you have learned how to:

  1. Launch a MongoDB database on Clever Cloud, and then how to add some test data to it.
  2. Configure an Airbyte source connector to read data from MongoDB.
  3. Launch a PostgreSQL database on Clever Cloud.
  4. Configure an Airbyte destination connector to send data into PostgreSQL.
  5. Create an Airbyte connection that replicates data from a MongoDB to Postgres.

With Airbyte, the data integration possibilities are endless, and we look forward to seeing you use it! We invite you to join the conversation on our community Slack Channel, participate in discussions on Airbyte’s discourse, or sign up for our newsletter. You may also be interested in other Airbyte tutorials and Airbyte’s blog!

Frequently Asked Questions

What data can you extract from MongoDB?

MongoDB gives access to a wide range of data types, including:

1. Documents: MongoDB stores data in the form of documents, which are similar to JSON objects. Each document contains a set of key-value pairs that represent the data.
2. Collections: A collection is a group of related documents that are stored together in MongoDB. Collections can be thought of as tables in a relational database.
3. Indexes: MongoDB supports various types of indexes, including single-field, compound, and geospatial indexes. Indexes are used to improve query performance.
4. GridFS: MongoDB's GridFS is a specification for storing and retrieving large files, such as images and videos, in MongoDB.
5. Aggregation: MongoDB's aggregation framework provides a way to perform complex data analysis operations, such as grouping, filtering, and sorting, on large datasets.
6. Transactions: MongoDB supports multi-document transactions, which allow multiple operations to be performed atomically.
7. Change streams: MongoDB's change streams provide a way to monitor changes to data in real-time, allowing applications to react to changes as they occur.

Overall, MongoDB provides access to a flexible and powerful data model that can handle a wide range of data types and use cases.

What data can you transfer to PostgreSQL Destination?

You can transfer a wide variety of data to PostgreSQL Destination. This usually includes structured, semi-structured, and unstructured data like transaction records, log files, JSON data, CSV files, and more, allowing robust, scalable data integration and analysis.

What are top ETL tools to transfer data from MongoDB to PostgreSQL Destination?

The most prominent ETL tools to transfer data from MongoDB to PostgreSQL Destination include:

  • Airbyte
  • Fivetran
  • Stitch
  • Matillion
  • Talend Data Integration

These tools help in extracting data from MongoDB and various sources (APIs, databases, and more), transforming it efficiently, and loading it into PostgreSQL Destination and other databases, data warehouses and data lakes, enhancing data management capabilities.