An object-relational database management system, PostgreSQL is able to handle a wide range of workloads, supports multiple standards, and is cross-platform, running on numerous operating systems including Microsoft Windows, Solaris, Linux, and FreeBSD. It is highly extensible, and supports more than 12 procedural languages, Spatial data support, Gin and GIST Indexes, and more. Many webs, mobile, and analytics applications use PostgreSQL as the primary data warehouse or data store.
A cloud data platform, Snowflake Data Cloud provides a warehouse-as-a-service built specifically for the cloud. The Snowflake platform is designed to empower many types of data workloads, and offers secure, immediate, governed access to a comprehensive network of data. Snowflake’s innovative technology goes above the capabilities of the ordinary database, supplying users all the functionality of database storage, query processing, and cloud services in one package.
1. Open your PostgreSQL database and create a new user with the necessary permissions to access the data you want to replicate.
2. Obtain the hostname or IP address of your PostgreSQL server and the port number it is listening on.
3. Create a new database in PostgreSQL that will be used to store the replicated data.
4. Obtain the name of the database you just created.
5. In Airbyte, navigate to the PostgreSQL source connector and click on "Create Connection".
6. Enter a name for your connection and fill in the required fields, including the hostname or IP address, port number, database name, username, and password.
7. Test the connection to ensure that Airbyte can successfully connect to your PostgreSQL database.
8. Select the tables or views you want to replicate and configure any necessary settings, such as the replication frequency and the replication method.
9. Save your configuration and start the replication process.
10. Monitor the replication process to ensure that it is running smoothly and troubleshoot any issues that arise.
1. First, navigate to the Airbyte website and log in to your account.
2. Once you are logged in, click on the "Destinations" tab on the left-hand side of the screen.
3. Scroll down until you find the Snowflake Data Cloud destination connector and click on it.
4. You will be prompted to enter your Snowflake account information, including your account name, username, and password.
5. After entering your account information, click on the "Test" button to ensure that the connection is successful.
6. If the test is successful, click on the "Save" button to save your Snowflake Data Cloud destination connector settings.
7. You can now use the Snowflake Data Cloud destination connector to transfer data from your Airbyte sources to your Snowflake account.
8. To set up a data transfer, navigate to the "Sources" tab on the left-hand side of the screen and select the source you want to transfer data from.
9. Click on the "Create New Connection" button and select the Snowflake Data Cloud destination connector as your destination.
10. Follow the prompts to set up your data transfer, including selecting the tables or data sources you want to transfer and setting up any necessary transformations or mappings.
11. Once you have set up your data transfer, click on the "Run" button to start the transfer process.
With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.
We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!
What should you do next?
Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:
Imagine you’re replicating a database containing millions of records for a fast-growing online store to a data warehouse for analysis. Your data replication solution has to be fast, efficient, and cost-effective.
In this tutorial, you’ll learn how to use Airbyte to transfer data from Postgres to Snowflake. You’ll also learn how to deal with edge cases, like partitioned tables and custom data types.
OLTP vs OLAP
Before you start replicating data, it's important to understand what OLTP and OLAP mean.
OLTP is a form of data storage and processing that executes a number of concurrent transactions. OLTP workloads are optimized for database inserts. Example application areas of OLTP include online banking, online shopping, order entry, and sending text messages. OLTP follows ACID properties. In basic terms, this means a transaction to the database either completes or fails.
OLAP is a form of data storage and processing that focuses on storing denormalized data to make querying easier when carrying out data analysis. OLAP systems are optimized for data analysis. This form of storage is used in data warehouses and data lakess. Data analysts can make queries on historical data from an OLAP environment to make better analytical decisions.
Snowflake is an OLAP is a cloud-based data warehouse. There is no hardware to configure, no software to install, and no maintenance required. Snowflake ships with an integrated data warehouse, as well as analytics tools for running OLAP workloads. These features make Snowflake a great choice for OLAP workloads.
There are two main sync modes to replicate data from OLTP to OLAP systems.
- A Full refresh replicates all of the data from the OLTP source to the OLAP destination.
- An Incremental sync replicates data at intervals from a OLTP source to a OLAP destination as new data is added.
Airbyte is a data integration tool that allows you to replicate data to data warehouses, lakes, and databases. Airbyte supports both full refresh and incremental syncs. In this tutorial, you will use full refresh mode to replicate data from PostgreSQL to Snowflake.
{{COMPONENT_CTA}}
How to load data from Postgres to Snowflake?
- Method 1: Connecting Postgres to Snowflake using Airbyte.
- Method 2: Connecting Postgres to Snowflake manually.
Method 1: Connecting Postgres to Snowflake using Airbyte
Sowflake supports a rich set of data types, including semi-structured data types such as JSON and XML. This means that most tables won’t require extra data type conversion. For tables with data types that Snowflake doesn’t support, Airbyte converts that data to an Airbyte-specific string type which ultimately ends up as a VARCHAR type on Snowflake.
Airbyte streams data from a source, builds it into temp files, and delivers the data to the destination. You can choose to use your local machine for storing temp files as part of the replication process, or use an external staging area like Amazon S3 or Google Cloud Storage (GCS). An external staging area offers faster replication speeds than your local machine.
Let's learn to seamlessly transfer data from Postgres to Snowflake in just three simple steps:
- Step 1a: Set up PostgreSQL
- Step 1b: Set Up Snowflake
- Step 2a: Set Up a PostgreSQL Source
- Step 2b: Set Up a Snowflake Destination
- Step 3: Setup a Connection from Postgres to Snowflake
Pre-requisites
Before you can start replicating data, you’ll need to ensure the following conditions are met:
- A staging area (Amazon S3 or Google Cloud Storage) for fast and efficient loading of data from PostgreSQL to Snowflake.
- An instance of PostgreSQL. Follow these instructions to install PostgreSQL if you don't have it locally.
- Docker and Docker Compose installed.
Now that you understand the prerequisites, you’ll go over the steps of replicating a simple PostgreSQL database to Snowflake with Airbyte.
Step 1a: Set up PostgreSQL
The first step is to create and populate a local PostgreSQL instance. You’ll need to clone the below GitHub repo. The repo contains all the code and files you’ll use throughout the tutorial
git clone https://github.com/vicradon/postgres-airbyte-snowflake-tut.git
After cloning the repo, navigate to the clone repo. Then create a new database.
createdb airbyte_tut -U postgres
After creating the database, run the command below to create a table, movies and populate it with data.
psql -d airbyte_tut -f movies.sql -U postgres
Now, enter the psql shell as user postgres, psql -U postgres, then connect to the airbyte_tut database, \c airbyte_tut. Next, create a dedicated read-only user that will have access to the relevant tables in the airbyte_tut database.
CREATE USER airbyte PASSWORD 'your_password_here';
Then, give the Airbyte user access to the airbtye_tut database.
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airbyte;
Step 1b: Set Up Snowflake
When creating a Snowflake account you’ll need to pick a Snowflake edition and a cloud provider as part of the account creation process. Next, you’ll receive an email containing your login URL similar to https://id.region.cloud-provider.snowflakecomputing.com/console/login. Make sure to bookmark this URL for future logins.
After you create your username and password you will be redirected to the Snowflake dashboard after your account has been successfully created. The worksheet area will be the primary place you’ll run scripts for creating and modifying resources.
You need to create the destination database, user, role, and schema on Snowflake where the sync will occur. Airbyte already provides a script for running this automation. Paste the snippet below into your worksheet area on Snowflake. Make sure to change the airbyte_password variable to your preferred password before running the script.
-- set variables (these need to be uppercase)
set airbyte_role = 'AIRBYTE_ROLE';
set airbyte_username = 'AIRBYTE_USER';
set airbyte_warehouse = 'AIRBYTE_WAREHOUSE';
set airbyte_database = 'AIRBYTE_DATABASE';
set airbyte_schema = 'AIRBYTE_SCHEMA';
-- set user password
set airbyte_password ='YOUR_AIRBYTE_PASSWORD';
begin;
-- create Airbyte role
use role securityadmin;
create role if not exists identifier($airbyte_role);
grant role identifier($airbyte_role) to role SYSADMIN;
-- create Airbyte user
create user if not exists identifier($airbyte_username)
password = $airbyte_password
default_role = $airbyte_role
default_warehouse = $airbyte_warehouse;
grant role identifier($airbyte_role) to user identifier($airbyte_username);
-- change role to sysadmin for warehouse/database steps
use role sysadmin;
-- create Airbyte warehouse
create warehouse if not exists identifier($airbyte_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- create Airbyte database
create database if not exists identifier($airbyte_database);
-- grant Airbyte warehouse access
grant USAGE
on warehouse identifier($airbyte_warehouse)
to role identifier($airbyte_role);
-- grant Airbyte database access
grant OWNERSHIP
on database identifier($airbyte_database)
to role identifier($airbyte_role);
commit;
begin;
USE DATABASE identifier($airbyte_database);
-- create schema for Airbyte data
CREATE SCHEMA IF NOT EXISTS identifier($airbyte_schema);
commit;
begin;
-- grant Airbyte schema access
grant OWNERSHIP
on schema identifier($airbyte_schema)
to role identifier($airbyte_role);
commit;
Select All queries and run the script.
Step 2a: Set Up a PostgreSQL Source
Ensure you have Docker Compose on the environment where you want to run Airbyte. Now, run the commands below to initiate Airbyte setup:
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up
You can then open your app on http://localhost:8000. After inputting your email, you’ll be presented with an onboarding experience to create a first Airbyte connection, starting with the source.
Select Postgres as your source and set up the connection parameters as explained in the Postgres source documentation.
Step 2b: Set Up a Snowflake Destination
After inputting the name and destination type, you’ll be presented with several required fields that Airbyte will use to connect to the Snowflake destination. Fill in the form with the default details.
Host =YOUR_SNOWFLAKE_HOST (https://id.region.cloud-provider.snowflakecomputing.com)
Role = AIRBYTE_ROLE
Warehouse = AIRBYTE_WAREHOUSE
Database = AIRBYTE_DATABASE
Schema = AIRBYTE_SCHEMA
User = AIRBYTE_USER
Password = password
When you get to the loading method section, select your preferred loading method and input the required details. The default loading method, uses your local machine to generate the temp files. An example using AWS S3 is shown below.
Step 3: Setup a Connection from Postgres to Snowflake
After verifying your connection, you’ll be redirected to set it up. Choose a manual sync frequency and set the namespace configuration to Destination Connector Settings.
Ensure the `movies` table shows up. If it doesn’t, check that the table is present in the `airbyte_tut` database and refresh the schema. Complete the process by clicking Set up connection.
Then you can On your Airbyte dashboard, navigate to the Connections page and click on the first connection.
This will redirect you to a page with more details about that connection. Click Sync now to start synchronizing data.
This action will trigger a sync to Snowflake. When the sync completes, 18 records will be pushed to Snowflake.
Method 2: Connecting Postgres to Snowflake manually
Moving data from PostgreSQL to Snowflake without using third-party connectors or integrations involves several steps, including exporting data from PostgreSQL, preparing the data for Snowflake, and importing the data into Snowflake. Here's a detailed step-by-step guide:
Step 1: Export Data from PostgreSQL
1. Connect to PostgreSQL Database:
Use `psql` or any PostgreSQL client to connect to your database.
```sh
psql -h hostname -p port -U username -d databasename
```
2. Choose Data to Export:
Decide which tables or data you want to migrate to Snowflake.
3. Export Data to CSV:
Use the `COPY` command to export the data to a CSV file. For each table, run:
```sql
COPY (SELECT * FROM your_table) TO '/path/to/your_table.csv' WITH CSV HEADER;
```
Replace `your_table` with the name of your table and `/path/to/your_table.csv` with the path where you want to save the CSV file.
Step 2: Prepare Data for Snowflake
1. Check Data Types:
Review the exported data and ensure that data types are compatible with Snowflake. You may need to convert data types that are not directly compatible.
2. Clean Data:
If necessary, clean the data to remove any inconsistencies or to comply with Snowflake's data format requirements.
3. Split Large Files:
If you have very large CSV files, consider splitting them into smaller files to make the upload process more manageable and to avoid timeouts.
4. Compress Files:
Compress the CSV files using GZIP to save space and reduce upload time.
```sh
gzip /path/to/your_table.csv
```
Step 3: Upload Data to a Cloud Storage Service
1. Choose a Cloud Storage Service:
Snowflake supports Amazon S3, Google Cloud Storage, and Azure Blob Storage. Choose one that you have access to and that is supported in your Snowflake region.
2. Upload Files:
Use the cloud storage provider's tools or SDKs to upload your GZIP files.
Step 4: Create File Format in Snowflake
1. Log in to Snowflake:
Use the Snowflake web interface or the Snowflake client to log in to your account.
2. Create a File Format:
Define a file format that matches the format of your CSV files.
```sql
CREATE OR REPLACE FILE FORMAT my_csv_format
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ('NULL', 'null')
COMPRESSION = 'GZIP';
```
Step 5: Create a Stage in Snowflake
1. Create a Stage:
Create a stage object that points to the location of your uploaded files in the cloud storage.
```sql
CREATE OR REPLACE STAGE my_stage
URL = 's3://mybucket/myfolder/'
FILE_FORMAT = my_csv_format;
```
Replace `s3://mybucket/myfolder/` with the path to your files in cloud storage.
Step 6: Copy Data into Snowflake
1. Create Tables in Snowflake:
Create tables in Snowflake that match the schema of your PostgreSQL tables.
2. Copy Data:
Use the `COPY INTO` command to load data from the stage into your Snowflake tables.
```sql
COPY INTO my_table
FROM @my_stage/your_table.csv.gz
FILE_FORMAT = (FORMAT_NAME = my_csv_format)
ON_ERROR = 'CONTINUE';
```
Repeat this step for each table you are importing.
Step 7: Validate Data
1. Check Row Counts:
Compare the row counts in Snowflake tables with the original PostgreSQL tables to ensure completeness.
2. Sample Data:
Query random samples of data in Snowflake and compare them with the original data in PostgreSQL for accuracy.
3. Check for Errors:
Review the load history and error logs in Snowflake to identify any issues that occurred during the data load.
Step 8: Finalize the Migration
1. Perform Additional Data Validation:
Depending on the complexity of your data, you may need to perform additional validation, such as checking data integrity, foreign key relationships, and indexes.
2. Adjust Queries and Stored Procedures:
Update any queries, views, or stored procedures to work with Snowflake's SQL syntax and features.
3. Test Applications:
If the data is used by applications, thoroughly test them to ensure they work correctly with the new data in Snowflake.
4. Schedule Incremental Updates:
If your data in PostgreSQL will continue to change, plan for incremental updates to keep the Snowflake data in sync.
By following these steps, you should be able to move data from PostgreSQL to Snowflake without using third-party connectors or integrations. Remember to always perform thorough testing at each step to ensure the integrity and accuracy of your data migration.
Handling Special Cases
Next we will cover three special cases to replicate partitioned tables, custom data types and working with geospatial data.
#1 Replicating PostgreSQL Partitioned Tables to Snowflake
Replicating tables with thousands of rows can be a slow process, so tables are often broken down into logical units called partitions. Using Airbyte, it’s possible to partition the tables by user-signup date, then push each of these partitions to Snowflake.
To begin, navigate to the tutorial folder and execute the table-partitioning.sql file. This loads data into a users table and creates two partitions. To replicate this partitioned table to Snowflake, you have two options:
- Export the table as a single view.
- Export each partition as separate files.
The first option involves selecting the users table only. The second involves selecting each partition.
It’s important to know that privileges applied to a user don’t affect partitioned tables after the privileges have been applied, you have to reassign privileges. Simply rerun the original command you used to assign privileges.
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airbyte;
The advantage of exporting each partition is the speed of parallel execution is that each table will be replicated in its own stream. After the replication process is complete, the tables can then be combined into a single table on Snowflake.
To get each partitioned table into Snowflake, navigate to the Settings tab on the Airbyte-Snowflake connection. Click Update latest source schema to load the new tables.
Finally, save the changes to restart the migration.
#2 Replicating Custom Data Types from PostgreSQL to Snowflake
Custom data types (also called user-defined types) in PostgreSQL are abstractions of the basic data types. A custom data type could be a basic data type like VARCHAR, with the addition of a check to ensure that imputed data has no spaces. An example is a countries table defined by the following query.
CREATE TABLE countries (
id serial primary key,
code VARCHAR NOT NULL UNIQUE,
name VARCHAR,
CHECK (
code !~ 's'
)
)
You could define a custom data type for the code field in the table using the CREATE DOMAIN statement.
CREATE DOMAIN COUNTRY_CODE AS
VARCHAR NOT NULL UNIQUE CHECK (value !~ 's');
You can then use this data type in the normal way.
CREATE TABLE countries (
id serial primary key,
code COUNTRY_CODE,
name VARCHAR
)
Snowflake does not support user-defined types, so Airbyte converts all user-defined types to VARCHAR before replicating the data to Snowflake to handle this. To demonstrate the process of replicating custom data types to Snowflake, navigate to the tutorial folder and load a table with user-defined types.
i user-defined-types.sql
After the table is created and populated with data, update your schema with the latest table.
Save the configuration to restart the sync process. When the sync succeeds, check Snowflake for the new data.
You’ll notice that the data type of the code column is VARCHAR. This is due to the transformation that occurred on the data.
PostgreSQL has support for unstructured data through: JSON, JSONB, HStore and XML. Fortunately, Snowflake supports JSON and XML data types. The other two data types, JSONB and HStore, are converted to the VARCHAR data type.
#3 Replicating Geospatial Data from PostgreSQL to Snowflake
Most modern applications and systems utilize one form of spatial data or another. Spatial data is usually represented as using a coordinate system like the cartesian or spherical systems. For PostgreSQL databases, PostGIS is used to enable spatial data storage. It enables PostgreSQL databases to store and query location related data such as longitude and latitude through its custom data types.
PostGIS offers the following data types: Box2d, Box3d, Geometry, Geometry_dump, Geography.
The Geography data type is the most important data type offered by PostGIS, it provides a way to store longitude and latitude data on PostgreSQL. This is done using the Point type, which takes in longitude and latitude as inputs.
Snowflake supports the Geography data type, meaning no typecasting will occur during replication. The replication process is the same as before.
To replicate a table containing airports data to Snowflake, install the PostGIS extension for Windows or Ubuntu. Then, activate the extension.
CREATE EXTENSION postgis
Create a table containing airports and their coordinates by running the airports.sql file from the tutorial folder:
i airports.sql
Update your schema on Airbyte. Notice the geography_columns and geometry_columns tables that were added. These tables originated from the PostGIS extension.
Sync the data using Airbyte, then view the replicated data on Snowflake.
Geography data is converted to VARCHAR by default, but Snowflake has a SQL function, TO_GEOGRAPHY, for converting this VARCHAR into the Geography data type.
Use Cases To Transfer Data From Postgres To Snowflake
1. Data Warehousing and Analytics
Companies are moving data from Postgres to Snowflake to exploit the latter's data warehousing and analytics capabilities. Snowflake’s architecture is designed to support highly scalable data and complex queries, making it easy for BI and analytics.
Example: A retail company can move sales data from Postgres to Snowflake. Snowflake can then run complex analytical queries to help make informed decisions by aggregating sales performance across different regions and time zones.
2. Data Integration and ETL Processes
Another common use case for transferring data from Postgres to Snowflake is streamlining ETL processes. Snowflake makes integrating a wide array of data sources and preparing and transforming data for analysis much more accessible.
Example: A financial services firm can use ETL tools to automate the transfer of transaction data from Postgres to Snowflake, where it can be transformed and loaded for real-time financial reporting and compliance monitoring.
3. Backup and Disaster Recovery
For a robust backup and disaster recovery strategy, one can transfer data from Postgres to Snowflake. Snowflake's cloud infrastructure provides high availability and durability for the stored data, ensuring it's always there when you need it, and protecting it against loss or system failures.
Handling data types from Postgres to Snowflake
Data migration from Postgres to Snowflake requires precise identification of data types that are compatible and can be migrated from the former to the latter. Most of the data types can be migrated between the two platforms without issues, but some data types require conversion to maintain data consistency and fast query response. In the case of unsupported Postgres data types, the migration process can be made easier with the help of data integration tools like Airbyte, through features such as data type mapping and transformation. Postgres and Snowflake are two connectors that are available in Airbyte and allow users to integrate the two platforms with less effort.
Conclusion
Data warehouses like Snowflake are optimized for analytical workloads. For this reason, it's common to replicate data from transactional databases like PostgreSQL. To leran more, checkout this amazing tutorial about Postgres Replication.
Airbyte makes it easy to replicate data from PostgreSQL to Snowflake. However, it's important to know what to expect from some special cases, like replicating partitioned tables and how custom data types are mapped.
If you enjoyed using Airbyte, you might want to check our fully managed solution: Airbyte Cloud.
FAQs
1. How do you choose the sync mode for Postgres to Snowflake migration?
When choosing a sync mode to migrate your data from Postgres to Snowflake, it should be based on the freshness of your data and the frequency of changes. Incremental sync is efficient as it only transfers new or changed data and is ideal for high-frequency updates.
2. How can data security be ensured during the transfer?
Ensuring the security of your data during the transfer from Postgres to Snowflake is paramount. By encrypting data using SSL/TLS during transit, you can rest assured that your data is safe. Snowflake's robust access control and monitoring tools further enhance this security, providing you with peace of mind about the safety of your data.
3. Is PostgreSQL compatible with Snowflake?
PostgreSQL is fully compatible with Snowflake. This compatibility opens up a wide range of tools and connectors for data migration from Postgres to Snowflake, ensuring seamless integration.
4. Why is Snowflake better than Postgres?
Snowflake & PostgreSQL are designed to address different needs and use cases in data management. Snowflake is a cloud-based data warehousing solution aimed at fast, scalable analytics on large amounts of data. The architecture is such that it can effectively share resources, thus delivering excellent elasticity needed in big data environments. On the other hand, PostgreSQL is a versatile relational database system. It best serves in transactional workload use cases where data integrity and complex query capabilities are at the core of requirements. The choice between them depends on your specific data needs.
About the author
Osinachi Chukwujama is a web developer and technical writer. He enjoys building backend applications and utilizing cloud computing. He plays the organ and creates casual animations when he isn't coding.
What should you do next?
Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:
Ready to get started?
Frequently Asked Questions
PostgreSQL gives access to a wide range of data types, including:
1. Numeric data types: This includes integers, floating-point numbers, and decimal numbers.
2. Character data types: This includes strings, text, and character arrays.
3. Date and time data types: This includes dates, times, and timestamps.
4. Boolean data types: This includes true/false values.
5. Network address data types: This includes IP addresses and MAC addresses.
6. Geometric data types: This includes points, lines, and polygons.
7. Array data types: This includes arrays of any of the above data types.
8. JSON and JSONB data types: This includes JSON objects and arrays.
9. XML data types: This includes XML documents.
10. Composite data types: This includes user-defined data types that can contain multiple fields of different data types.
Overall, PostgreSQL's API provides access to a wide range of data types, making it a versatile and powerful tool for data management and analysis.