Warehouses and Lakes
Databases

How to load data from Postgres to Snowflake destination

Learn how to use Airbyte to synchronize your Postgres data into Snowflake destination within minutes.

TL;DR

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps:

  1. set up Postgres as a source connector (using Auth, or usually an API key)
  2. set up Snowflake destination as a destination connector
  3. define which data you want to transfer and how frequently

You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud.

This tutorial’s purpose is to show you how.

What is Postgres

An object-relational database management system, PostgreSQL is able to handle a wide range of workloads, supports multiple standards, and is cross-platform, running on numerous operating systems including Microsoft Windows, Solaris, Linux, and FreeBSD. It is highly extensible, and supports more than 12 procedural languages, Spatial data support, Gin and GIST Indexes, and more. Many webs, mobile, and analytics applications use PostgreSQL as the primary data warehouse or data store.

What is Snowflake destination

A cloud data platform, Snowflake Data Cloud provides a warehouse-as-a-service built specifically for the cloud. The Snowflake platform is designed to empower many types of data workloads, and offers secure, immediate, governed access to a comprehensive network of data. Snowflake’s innovative technology goes above the capabilities of the ordinary database, supplying users all the functionality of database storage, query processing, and cloud services in one package.

Integrate Postgres with Snowflake destination in minutes

Try for free now

Prerequisites

  1. A Postgres account to transfer your customer data automatically from.
  2. A Snowflake destination account.
  3. An active Airbyte Cloud account, or you can also choose to use Airbyte Open Source locally. You can follow the instructions to set up Airbyte on your system using docker-compose.

Airbyte is an open-source data integration platform that consolidates and streamlines the process of extracting and loading data from multiple data sources to data warehouses. It offers pre-built connectors, including Postgres and Snowflake destination, for seamless data migration.

When using Airbyte to move data from Postgres to Snowflake destination, it extracts data from Postgres using the source connector, converts it into a format Snowflake destination can ingest using the provided schema, and then loads it into Snowflake destination via the destination connector. This allows businesses to leverage their Postgres data for advanced analytics and insights within Snowflake destination, simplifying the ETL process and saving significant time and resources.

Step 1: Set up Postgres as a source connector

1. Open your PostgreSQL database and create a new user with the necessary permissions to access the data you want to replicate.

2. Obtain the hostname or IP address of your PostgreSQL server and the port number it is listening on.

3. Create a new database in PostgreSQL that will be used to store the replicated data.

4. Obtain the name of the database you just created.

5. In Airbyte, navigate to the PostgreSQL source connector and click on "Create Connection".

6. Enter a name for your connection and fill in the required fields, including the hostname or IP address, port number, database name, username, and password.

7. Test the connection to ensure that Airbyte can successfully connect to your PostgreSQL database.

8. Select the tables or views you want to replicate and configure any necessary settings, such as the replication frequency and the replication method.

9. Save your configuration and start the replication process.

10. Monitor the replication process to ensure that it is running smoothly and troubleshoot any issues that arise.

Step 2: Set up Snowflake destination as a destination connector

1. First, navigate to the Airbyte website and log in to your account.

2. Once you are logged in, click on the "Destinations" tab on the left-hand side of the screen.

3. Scroll down until you find the Snowflake Data Cloud destination connector and click on it.

4. You will be prompted to enter your Snowflake account information, including your account name, username, and password.

5. After entering your account information, click on the "Test" button to ensure that the connection is successful.

6. If the test is successful, click on the "Save" button to save your Snowflake Data Cloud destination connector settings.

7. You can now use the Snowflake Data Cloud destination connector to transfer data from your Airbyte sources to your Snowflake account.

8. To set up a data transfer, navigate to the "Sources" tab on the left-hand side of the screen and select the source you want to transfer data from.

9. Click on the "Create New Connection" button and select the Snowflake Data Cloud destination connector as your destination.

10. Follow the prompts to set up your data transfer, including selecting the tables or data sources you want to transfer and setting up any necessary transformations or mappings.

11. Once you have set up your data transfer, click on the "Run" button to start the transfer process.

Step 3: Set up a connection to sync your Postgres data to Snowflake destination

Once you've successfully connected Postgres as a data source and Snowflake destination as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select Postgres from the dropdown list of your configured sources.
  3. Select your destination: Choose Snowflake destination from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific Postgres objects you want to import data from towards Snowflake destination. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from Postgres to Snowflake destination according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your Snowflake destination data warehouse is always up-to-date with your Postgres data.

Use Cases to transfer your Postgres data to Snowflake destination

Integrating data from Postgres to Snowflake destination provides several benefits. Here are a few use cases:

  1. Advanced Analytics: Snowflake destination’s powerful data processing capabilities enable you to perform complex queries and data analysis on your Postgres data, extracting insights that wouldn't be possible within Postgres alone.
  2. Data Consolidation: If you're using multiple other sources along with Postgres, syncing to Snowflake destination allows you to centralize your data for a holistic view of your operations, and to set up a change data capture process so you never have any discrepancies in your data again.
  3. Historical Data Analysis: Postgres has limits on historical data. Syncing data to Snowflake destination allows for long-term data retention and analysis of historical trends over time.
  4. Data Security and Compliance: Snowflake destination provides robust data security features. Syncing Postgres data to Snowflake destination ensures your data is secured and allows for advanced data governance and compliance management.
  5. Scalability: Snowflake destination can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding Postgres data.
  6. Data Science and Machine Learning: By having Postgres data in Snowflake destination, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
  7. Reporting and Visualization: While Postgres provides reporting tools, data visualization tools like Tableau, PowerBI, Looker (Google Data Studio) can connect to Snowflake destination, providing more advanced business intelligence options. If you have a Postgres table that needs to be converted to a Snowflake destination table, Airbyte can do that automatically.

Wrapping Up

To summarize, this tutorial has shown you how to:

  1. Configure a Postgres account as an Airbyte data source connector.
  2. Configure Snowflake destination as a data destination connector.
  3. Create an Airbyte data pipeline that will automatically be moving data directly from Postgres to Snowflake destination after you set a schedule

With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.

We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Imagine you’re replicating a database containing millions of records for a fast-growing online store to a data warehouse for analysis. Your data replication solution has to be fast, efficient, and cost-effective.

In this tutorial, you’ll learn how to use Airbyte to transfer data from Postgres to Snowflake. You’ll also learn how to deal with edge cases, like partitioned tables and custom data types.

OLTP vs OLAP

Before you start replicating data, it's important to understand what OLTP and OLAP mean.

OLTP is a form of data storage and processing that executes a number of concurrent transactions. OLTP workloads are optimized for database inserts. Example application areas of OLTP include online banking, online shopping, order entry, and sending text messages. OLTP follows ACID properties. In basic terms, this means a transaction to the database either completes or fails.

OLAP is a form of data storage and processing that focuses on storing denormalized data to make querying easier when carrying out data analysis. OLAP systems are optimized for data analysis. This form of storage is used in data warehouses and data lakess. Data analysts can make queries on historical data from an OLAP environment to make better analytical decisions.

Snowflake is an OLAP is a cloud-based data warehouse. There is no hardware to configure, no software to install, and no maintenance required. Snowflake ships with an integrated data warehouse, as well as analytics tools for running OLAP workloads. These features make Snowflake a great choice for OLAP workloads.

There are two main sync modes to replicate data from OLTP to OLAP systems.

  • A Full refresh replicates all of the data from the OLTP source to the OLAP destination.
  • An Incremental sync replicates data at intervals from a OLTP source to a OLAP destination as new data is added.

Airbyte is a data integration tool that allows you to replicate data to data warehouses, lakes, and databases. Airbyte supports both full refresh and incremental syncs. In this tutorial, you will use full refresh mode to replicate data from PostgreSQL to Snowflake.

How to load data from Postgres to Snowflake?

Sowflake supports a rich set of data types, including semi-structured data types such as JSON and XML. This means that most tables won’t require extra data type conversion. For tables with data types that Snowflake doesn’t support, Airbyte converts that data to an Airbyte-specific string type which ultimately ends up as a VARCHAR type on Snowflake.

Airbyte streams data from a source, builds it into temp files, and delivers the data to the destination. You can choose to use your local machine for storing temp files as part of the replication process, or use an external staging area like Amazon S3 or Google Cloud Storage (GCS). An external staging area offers faster replication speeds than your local machine.

Let's learn  to seamlessly transfer data from Postgres to Snowflake in just three simple steps:

  • Step 1a: Set up PostgreSQL
  • Step 1b: Set Up Snowflake
  • Step 2a: Set Up a PostgreSQL Source
  • Step 2b: Set Up a Snowflake Destination
  • Step 3: Setup a Connection from Postgres to Snowflake

Pre-requisites

Before you can start replicating data, you’ll need to ensure the following conditions are met:

  1. A staging area (Amazon S3 or Google Cloud Storage) for fast and efficient loading of data from PostgreSQL to Snowflake.
  2. An instance of PostgreSQL. Follow these instructions to install PostgreSQL if you don't have it locally.
  3. Docker and Docker Compose installed.

Now that you understand the prerequisites, you’ll go over the steps of replicating a simple PostgreSQL database to Snowflake with Airbyte.

Step 1a: Set up PostgreSQL

The first step is to create and populate a local PostgreSQL instance. You’ll need to clone the below GitHub repo. The repo contains all the code and files you’ll use throughout the tutorial

git clone https://github.com/vicradon/postgres-airbyte-snowflake-tut.git

After cloning the repo, navigate to the clone repo. Then create a new database.

createdb airbyte_tut -U postgres

After creating the database, run the command below to create a table, movies and populate it with data.

psql -d airbyte_tut -f movies.sql -U postgres

Now, enter the psql shell as user postgres, psql -U postgres, then connect to the airbyte_tut database, \c airbyte_tut. Next, create a dedicated read-only user that will have access to the relevant tables in the airbyte_tut database.

CREATE USER airbyte PASSWORD 'your_password_here';

Then, give the Airbyte user access to the airbtye_tut database.

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airbyte;

Step 1b: Set Up Snowflake

When creating a Snowflake account you’ll need to pick a Snowflake edition and a cloud provider as part of the account creation process. Next, you’ll receive an email containing your login URL similar to https://id.region.cloud-provider.snowflakecomputing.com/console/login. Make sure to bookmark this URL for future logins.

After you create your username and password you will be redirected to the Snowflake dashboard after your account has been successfully created. The worksheet area will be the primary place you’ll run scripts for creating and modifying resources.

You need to create the destination database, user, role, and schema on Snowflake where the sync will occur. Airbyte already provides a script for running this automation. Paste the snippet below into your worksheet area on Snowflake. Make sure to change the airbyte_password variable to your preferred password before running the script.

-- set variables (these need to be uppercase)
set airbyte_role = 'AIRBYTE_ROLE';
set airbyte_username = 'AIRBYTE_USER';
set airbyte_warehouse = 'AIRBYTE_WAREHOUSE';
set airbyte_database = 'AIRBYTE_DATABASE';
set airbyte_schema = 'AIRBYTE_SCHEMA';
-- set user password
set airbyte_password ='YOUR_AIRBYTE_PASSWORD';

begin;

-- create Airbyte role
use role securityadmin;
create role if not exists identifier($airbyte_role);
grant role identifier($airbyte_role) to role SYSADMIN;

-- create Airbyte user
create user if not exists identifier($airbyte_username)
password = $airbyte_password
default_role = $airbyte_role
default_warehouse = $airbyte_warehouse;

grant role identifier($airbyte_role) to user identifier($airbyte_username);

-- change role to sysadmin for warehouse/database steps
use role sysadmin;

-- create Airbyte warehouse
create warehouse if not exists identifier($airbyte_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;

-- create Airbyte database
create database if not exists identifier($airbyte_database);

-- grant Airbyte warehouse access
grant USAGE
on warehouse identifier($airbyte_warehouse)
to role identifier($airbyte_role);

-- grant Airbyte database access
grant OWNERSHIP
on database identifier($airbyte_database)
to role identifier($airbyte_role);

commit;

begin;

USE DATABASE identifier($airbyte_database);

-- create schema for Airbyte data
CREATE SCHEMA IF NOT EXISTS identifier($airbyte_schema);

commit;

begin;

-- grant Airbyte schema access
grant OWNERSHIP
on schema identifier($airbyte_schema)
to role identifier($airbyte_role);

commit;

Select All queries and run the script.

Step 2a: Set Up a PostgreSQL Source

Ensure you have Docker Compose on the environment where you want to run Airbyte. Now, run the commands below to initiate Airbyte setup:

git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up

You can then open your app on http://localhost:8000. After inputting your email, you’ll be presented with an onboarding experience to create a first Airbyte connection, starting with the source.

Select Postgres as your source and set up the connection parameters as explained in the Postgres source documentation.

Step 2b: Set Up a Snowflake Destination

After inputting the name and destination type, you’ll be presented with several required fields that Airbyte will use to connect to the Snowflake destination. Fill in the form with the default details.

Host =YOUR_SNOWFLAKE_HOST (https://id.region.cloud-provider.snowflakecomputing.com)
Role = AIRBYTE_ROLE
Warehouse = AIRBYTE_WAREHOUSE
Database = AIRBYTE_DATABASE
Schema = AIRBYTE_SCHEMA
User = AIRBYTE_USER
Password = password

When you get to the loading method section, select your preferred loading method and input the required details. The default loading method, uses your local machine to generate the temp files. An example using AWS S3 is shown below.

Step 3: Setup a Connection from Postgres to Snowflake

After verifying your connection, you’ll be redirected to set it up. Choose a manual sync frequency and set the namespace configuration to Destination Connector Settings.

Ensure the `movies` table shows up. If it doesn’t, check that the table is present in the `airbyte_tut` database and refresh the schema. Complete the process by clicking Set up connection.

Then you can On your Airbyte dashboard, navigate to the Connections page and click on the first connection.

This will redirect you to a page with more details about that connection. Click Sync now to start synchronizing data.

This action will trigger a sync to Snowflake. When the sync completes, 18 records will be pushed to Snowflake.

Handling Special Cases

Next we will cover three special cases to replicate partitioned tables, custom data types and working with geospatial data.

#1 Replicating PostgreSQL Partitioned Tables to Snowflake

Replicating tables with thousands of rows can be a slow process, so tables are often broken down into logical units called partitions. Using Airbyte, it’s possible to partition the tables by user-signup date, then push each of these partitions to Snowflake.

To begin, navigate to the tutorial folder and execute the table-partitioning.sql file. This loads data into a users table and creates two partitions. To replicate this partitioned table to Snowflake, you have two options:

  • Export the table as a single view.
  • Export each partition as separate files.

The first option involves selecting the users table only. The second involves selecting each partition.

It’s important to know that privileges applied to a user don’t affect partitioned tables after the privileges have been applied, you have to reassign privileges. Simply rerun the original command you used to assign privileges.

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airbyte;

The advantage of exporting each partition is the speed of parallel execution is that each table will be replicated in its own stream. After the replication process is complete, the tables can then be combined into a single table on Snowflake.

To get each partitioned table into Snowflake, navigate to the Settings tab on the Airbyte-Snowflake connection. Click Update latest source schema to load the new tables.

Finally, save the changes to restart the migration.

#2 Replicating Custom Data Types from PostgreSQL to Snowflake

Custom data types (also called user-defined types) in PostgreSQL are abstractions of the basic data types. A custom data type could be a basic data type like VARCHAR, with the addition of a check to ensure that imputed data has no spaces. An example is a countries table defined by the following query.

CREATE TABLE countries (
   id serial primary key,
   code VARCHAR NOT NULL UNIQUE,
   name VARCHAR,
   CHECK (
       code !~ 's'
   )
)

You could define a custom data type for the code field in the table using the CREATE DOMAIN statement.

CREATE DOMAIN COUNTRY_CODE AS
  VARCHAR NOT NULL UNIQUE CHECK (value !~ 's');

You can then use this data type in the normal way.

CREATE TABLE countries (
   id serial primary key,
   code COUNTRY_CODE,
   name VARCHAR
)

Snowflake does not support user-defined types, so Airbyte converts all user-defined types to VARCHAR before replicating the data to Snowflake to handle this. To demonstrate the process of replicating custom data types to Snowflake, navigate to the tutorial folder and load a table with user-defined types.

i user-defined-types.sql

After the table is created and populated with data, update your schema with the latest table.

Save the configuration to restart the sync process. When the sync succeeds, check Snowflake for the new data.

You’ll notice that the data type of the code column is VARCHAR. This is due to the transformation that occurred on the data.

PostgreSQL has support for unstructured data through: JSON, JSONB, HStore and XML. Fortunately, Snowflake supports JSON and XML data types. The other two data types, JSONB and HStore, are converted to the VARCHAR data type.

#3 Replicating Geospatial Data from PostgreSQL to Snowflake

Most modern applications and systems utilize one form of spatial data or another. Spatial data is usually represented as using a coordinate system like the cartesian or spherical systems. For PostgreSQL databases, PostGIS is used to enable spatial data storage. It enables PostgreSQL databases to store and query location related data such as longitude and latitude through its custom data types.

PostGIS offers the following data types: Box2d, Box3d, Geometry, Geometry_dump, Geography.

The Geography data type is the most important data type offered by PostGIS, it provides a way to store longitude and latitude data on PostgreSQL. This is done using the Point type, which takes in longitude and latitude as inputs.

Snowflake supports the Geography data type, meaning no typecasting will occur during replication. The replication process is the same as before.

To replicate a table containing airports data to Snowflake, install the PostGIS extension for Windows or Ubuntu. Then, activate the extension.

CREATE EXTENSION postgis

Create a table containing airports and their coordinates by running the airports.sql file from the tutorial folder:

i airports.sql

Update your schema on Airbyte. Notice the geography_columns and geometry_columns tables that were added. These tables originated from the PostGIS extension.

Sync the data using Airbyte, then view the replicated data on Snowflake.

Geography data is converted to VARCHAR by default, but Snowflake has a SQL function, TO_GEOGRAPHY, for converting this VARCHAR into the Geography data type.

Conclusion

Data warehouses like Snowflake are optimized for analytical workloads. For this reason, it's common to replicate data from transactional databases like PostgreSQL. To leran more, checkout this amazing tutorial about Postgres Replication.

Airbyte makes it easy to replicate data from PostgreSQL to Snowflake. However, it's important to know what to expect from some special cases, like replicating partitioned tables and how custom data types are mapped.

If you enjoyed using Airbyte, you might want to check our fully managed solution: Airbyte Cloud.

About the author

Osinachi Chukwujama is a web developer and technical writer. He enjoys building backend applications and utilizing cloud computing. He plays the organ and creates casual animations when he isn't coding.

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Frequently Asked Questions

What data can you extract from Postgres?

PostgreSQL gives access to a wide range of data types, including:  

1. Numeric data types: This includes integers, floating-point numbers, and decimal numbers.  

2. Character data types: This includes strings, text, and character arrays.  

3. Date and time data types: This includes dates, times, and timestamps.  

4. Boolean data types: This includes true/false values.  

5. Network address data types: This includes IP addresses and MAC addresses.  

6. Geometric data types: This includes points, lines, and polygons.  

7. Array data types: This includes arrays of any of the above data types.  

8. JSON and JSONB data types: This includes JSON objects and arrays.  

9. XML data types: This includes XML documents.  

10. Composite data types: This includes user-defined data types that can contain multiple fields of different data types.  

Overall, PostgreSQL's API provides access to a wide range of data types, making it a versatile and powerful tool for data management and analysis.

What data can you transfer to Snowflake destination?

You can transfer a wide variety of data to Snowflake destination. This usually includes structured, semi-structured, and unstructured data like transaction records, log files, JSON data, CSV files, and more, allowing robust, scalable data integration and analysis.

What are top ETL tools to transfer data from Postgres to Snowflake destination?

The most prominent ETL tools to transfer data from Postgres to Snowflake destination include:

  • Airbyte
  • Fivetran
  • Stitch
  • Matillion
  • Talend Data Integration

These tools help in extracting data from Postgres and various sources (APIs, databases, and more), transforming it efficiently, and loading it into Snowflake destination and other databases, data warehouses and data lakes, enhancing data management capabilities.