No items found.

Replicate Data Between PostgreSQL Databases

Migrate data between any PostgreSQL instances in under 10 minutes.

PostgreSQL database replication has several purposes. It can help with high availability (if a server fails, a backup one can take over) and load-balancing (distributing traffic across multiple servers that provide the same data).

Enabling analytics is another increasingly popular purpose of database replication. Imagine that you have a Django application that uses PostgreSQL as an operational database. If you’d like to enable reporting and analytics on top of that database, it’s best to replicate it to another server not to interfere with the production application.

No matter your replication purpose, this tutorial will show you how to easily replicate data from a PostgreSQL source to a PostgreSQL destination using Airbyte.

I also introduce PostgreSQL native replication and the advantages of using Airbyte. If you want to skip those sections, feel free to jump to the prerequisites.

If you prefer to follow along with a video instead, we have created one, too!

Native PostgreSQL replication techniques

When it comes to PostgreSQL replication, there are various techniques to achieve it. Such techniques can be divided into disk-based, log-based, and SQL-based replication. We will primarily focus on log-based replication techniques.

At the same time, PostgreSQL provides several kinds of log-based replication: log-shipping, streaming, and logical replication.

Log-shipping replication

Log-shipping replication handles synchronization by allowing a primary server to change the data and standby server(s) to track changes in the primary server. The primary server is in continuous archiving mode, generating Write-Ahead Logging (WAL) files that the standby reads to keep in sync. Therefore, the standby is in recovery mode (which is read-only).

Because no modifications to the database tables are necessary to activate this functionality, it has a lower administrative overhead than alternative replication methods.

Streaming replication

Streaming replication is similar to log-shipping, with the difference that the standby connects to the primary, which transmits WAL records in real-time, rather than waiting for the WAL file to be full.

The fundamental disadvantage of log-shipping and streaming replication is that the standby server remains in recovery mode, and modifications are not permitted. Any changes made to the primary server are replicated on standby, making it impossible to replicate tables or databases selectively.

Logical replication

Logical replication is PostgreSQL’s technique for replicating data objects and modifications based on their identity, often a primary key. This type of replication also uses WAL files to build a stream of logical data changes, and it implements a publish-and-subscribe approach. Changes in the publisher are delivered to subscribers in real-time.

Unlike log-shipping and streaming, logical replication allows for per-table replication.

The most significant disadvantage of logical replication is that schema changes are not replicated and must be kept in sync manually. Tables lacking primary keys or unique identifiers will also have issues. You can read more about logical replication restrictions here.

Advantages of using Airbyte for PostgreSQL replication

If PostgreSQL supports replication natively, why would you use Airbyte? Built-in Postgres replication solutions may offer a minimal overhead but, as we saw before, they don’t come without drawbacks.

The main advantage of using Airbyte for PostgreSQL replication is flexibility: You can use different replication methods for each table within the same database. 

Airbyte supports several sync modes for Postgres – like full, incremental, and logical replication (also called CDC).

It’s not uncommon to have different replication needs for different tables in your databases: some of your tables may be small, while others are huge. Some tables may have primary keys, while others do not. You may need to track deletes for some tables, while you may need to keep just a snapshot for others.

Suppose you want to use logical replication, but some of your tables don’t have primary keys. When using Airbyte, you can choose to replicate tables without a primary key using full or incremental replication; meanwhile, other tables in the same database can use logical replication.

Now that you have some background let’s get hands-on!

Prerequisites

Step 1: Set up your source Postgres database (optional)

If you don’t have a readily available Postgres database to sync, here are some quick instructions. Run the following commands in a new terminal window to start backgrounded source and destination databases:


docker run --rm --name airbyte-source -e POSTGRES_PASSWORD=password -p 2000:5432 -d postgres
docker run --rm --name airbyte-destination -e POSTGRES_PASSWORD=password -p 3000:5432 -d postgres

Add two tables with a few rows to the source database:


docker exec -it airbyte-source psql -U postgres -c "CREATE TABLE users(id SERIAL PRIMARY KEY, col1 VARCHAR(200));"
docker exec -it airbyte-source psql -U postgres -c "INSERT INTO public.users(col1) VALUES('record1');"
docker exec -it airbyte-source psql -U postgres -c "INSERT INTO public.users(col1) VALUES('record2');"
docker exec -it airbyte-source psql -U postgres -c "INSERT INTO public.users(col1) VALUES('record3');"

docker exec -it airbyte-source psql -U postgres -c "CREATE TABLE cities(city_code VARCHAR(8), city VARCHAR(200));"
docker exec -it airbyte-source psql -U postgres -c "INSERT INTO public.cities(city_code, city) VALUES('BCN', 'Barcelona');"
docker exec -it airbyte-source psql -U postgres -c "INSERT INTO public.cities(city_code, city) VALUES('MAD', 'Madrid');"   
docker exec -it airbyte-source psql -U postgres -c "INSERT INTO public.cities(city_code, city) VALUES('VAL', 'Valencia');"

You now have a Postgres database ready to be replicated.

Alternatively, you can use a local Postgres database on your computer: use host.docker.internal (if you are on Mac) as the host instead of localhost when setting up the source and destination.

Step 2: Set up an Airbyte Postgres source and destination

If you didn’t do it before when deploying Airbyte, go to http://localhost:8000 to launch the UI.

Then, click on sources and add a new source. As the connector type, select Postgres. If you used our instructions above to create a Postgres database, fill in the following configuration fields.

  • Host: localhost
  • Port: 2000
  • User: postgres
  • Password: password
  • DB Name: postgres

We will use the “Standard” replication method to keep things simple for this tutorial. But you can use logical replication if needed. Follow along with this video to configure it for Postgres.

We will also use “No Tunnel” as the SSH option. If you want to use SSH, check out our documentation.

Click on "Set up source", and now you’re ready to configure a destination.

Go to destinations and add a new one. Select Postgres as the destination type and fill in with the following details.

  • Host: localhost
  • Port: 3000
  • User: postgres
  • Password: password
  • DB Name: postgres

That’s it. You have configured the source and destination.

Step 3: Create an Airbyte connection

Go to connections and create a new connection. Then, select the existing Postgres source you have just created and then do the same for the destination. Once you’re done, you can set up the connection as follows.

  • Replication Frequency: I recommend setting it to “manual” if you’re testing. When you’re ready, you can change to any frequency that makes sense to your use case.
  • Destination Namespace: I selected a mirror source structure, so the schema and tables are the same as the source.
  • Destination Stream Prefix: I added the prefix tutorial_ so my table will be created as tutorial_users.

Then, it’s time to configure the streams, which in this case are the tables in our database. If we expand the tables, we can see the columns they have. We can also see they’re part of the public namespace or schema. The destination schema will be also public.

Now, you should select a sync mode. I chose Full refresh | Overwrite to sync the cities table and Incremental | Append for the users table since it has an id column (primary key) suitable for the incremental cursor field. The most important thing to note is that you can have different sync modes for each table! Learn more about sync modes in our documentation

Once you’re ready, save the changes. Then, you can run your first sync by clicking on “Sync now.” You can check your run logs to verify everything is going well. Just wait for the sync to be completed, and that’s it! You’ve synchronized two Postgres databases.


Step 4: Verify that the sync worked

Now, let's verify that this worked. Let's output the contents of the destination database:


docker exec airbyte-destination psql -U postgres -c "SELECT * FROM public.tutorial_users;"


docker exec airbyte-destination psql -U postgres -c "SELECT * FROM public.tutorial_cities;"

You should see the rows from the source database inside the destination database. Note that Airbyte added extra columns which contain metadata used to support synchronization.

Wrapping up

In this tutorial, you learned about PostgreSQL native replication and how to synchronize two Postgres databases with Airbyte. Apart from being straightforward, Airbyte is also highly flexible when it comes to selecting different synchronization methods for your replication.

If you enjoyed using Airbyte, you might want to check our fully managed solution: Airbyte Cloud.  

We also invite you to ‍join the conversation on our community Slack Channel to share your ideas with thousands of data engineers and help make everyone’s project a success. With Airbyte, the integration possibilities are endless, and we can't wait to see what you will build.

Similar use cases

Airflow and Airbyte OSS - Better Together

Learn how to create an Airflow DAG (directed acyclic graph) that triggers Airbyte synchronizations.

Explore Airbyte's incremental refresh data synchronization

Learn how Airbyte’s incremental synchronization replication modes work.

Version control Airbyte configurations with Octavia CLI

Use Octavia CLI to import, edit, and apply Airbyte application configurations to replicate data from Postgres to BigQuery.