Building your pipeline or Using Airbyte
Airbyte is the only open solution empowering data teams to meet all their growing custom business demands in the new AI era.
- Inconsistent and inaccurate data
- Laborious and expensive
- Brittle and inflexible
- Reliable and accurate
- Extensible and scalable for all your needs
- Deployed and governed your way
Start syncing with Airbyte in 3 easy steps within 10 minutes
Take a virtual tour
Demo video of Airbyte Cloud
Demo video of AI Connector Builder
What sets Airbyte Apart
Modern GenAI Workflows
Move Large Volumes, Fast
An Extensible Open-Source Standard
Full Control & Security
Fully Featured & Integrated
Enterprise Support with SLAs
What our users say
"The intake layer of Datadog’s self-serve analytics platform is largely built on Airbyte.Airbyte’s ease of use and extensibility allowed any team in the company to push their data into the platform - without assistance from the data team!"
“Airbyte helped us accelerate our progress by years, compared to our competitors. We don’t need to worry about connectors and focus on creating value for our users instead of building infrastructure. That’s priceless. The time and energy saved allows us to disrupt and grow faster.”
“We chose Airbyte for its ease of use, its pricing scalability and its absence of vendor lock-in. Having a lean team makes them our top criteria. The value of being able to scale and execute at a high level by maximizing resources is immense”
FAQs
What is ETL?
ETL, an acronym for Extract, Transform, Load, is a vital data integration process. It involves extracting data from diverse sources, transforming it into a usable format, and loading it into a database, data warehouse or data lake. This process enables meaningful data analysis, enhancing business intelligence.
A fully managed data warehouse service in the Amazon Web Services (AWS) cloud, Amazon Redshift is designed for storage and analysis of large-scale datasets. Redshift allows businesses to scale from a few hundred gigabytes to more than a petabyte (a million gigabytes), and utilizes ML techniques to analyze queries, offering businesses new insights from their data. Users can query and combine exabytes of data using standard SQL, and easily save their query results to their S3 data lake.
Amazon Redshift provides access to a wide range of data related to the Redshift cluster, including:
1. Cluster metadata: Information about the cluster, such as its configuration, status, and performance metrics.
2. Query execution data: Details about queries executed on the cluster, including query text, execution time, and resource usage.
3. Cluster events: Notifications about events that occur on the cluster, such as node failures or cluster scaling.
4. Cluster snapshots: Point-in-time backups of the cluster, including metadata and data files.
5. Cluster security: Information about the cluster's security configuration, including user accounts, permissions, and encryption settings.
6. Cluster logs: Detailed logs of cluster activity, including system events, query execution, and error messages.
7. Cluster performance metrics: Metrics related to the cluster's performance, such as CPU usage, disk I/O, and network traffic.
Overall, Redshift's API provides a comprehensive set of data that can be used to monitor and optimize the performance of Redshift clusters, as well as to troubleshoot issues and manage security.
What is ELT?
ELT, standing for Extract, Load, Transform, is a modern take on the traditional ETL data integration process. In ELT, data is first extracted from various sources, loaded directly into a data warehouse, and then transformed. This approach enhances data processing speed, analytical flexibility and autonomy.
Difference between ETL and ELT?
ETL and ELT are critical data integration strategies with key differences. ETL (Extract, Transform, Load) transforms data before loading, ideal for structured data. In contrast, ELT (Extract, Load, Transform) loads data before transformation, perfect for processing large, diverse data sets in modern data warehouses. ELT is becoming the new standard as it offers a lot more flexibility and autonomy to data analysts.
A fully managed data warehouse service in the Amazon Web Services (AWS) cloud, Amazon Redshift is designed for storage and analysis of large-scale datasets. Redshift allows businesses to scale from a few hundred gigabytes to more than a petabyte (a million gigabytes), and utilizes ML techniques to analyze queries, offering businesses new insights from their data. Users can query and combine exabytes of data using standard SQL, and easily save their query results to their S3 data lake.
A cloud data platform, Snowflake Data Cloud provides a warehouse-as-a-service built specifically for the cloud. The Snowflake platform is designed to empower many types of data workloads, and offers secure, immediate, governed access to a comprehensive network of data. Snowflake’s innovative technology goes above the capabilities of the ordinary database, supplying users all the functionality of database storage, query processing, and cloud services in one package.
1. Open the Airbyte UI and navigate to the "Sources" tab.
2. Click on the "Create a new connection" button and select "Redshift" as the source.
3. Enter a name for the connection and click "Next".
4. Enter the necessary credentials for your Redshift database, including the host, port, database name, username, and password.
5. Test the connection to ensure that the credentials are correct and the connection is successful.
6. Select the tables or views that you want to replicate from Redshift to Airbyte.
7. Choose the replication method, either full or incremental, and set any necessary parameters.
8. Click "Create connection" to save the configuration and start the replication process.
9. Monitor the replication progress and troubleshoot any errors that may occur. 10. Once the replication is complete, you can use the data in Airbyte for further analysis or integration with other tools.
1. First, navigate to the Airbyte website and log in to your account.
2. Once you are logged in, click on the "Destinations" tab on the left-hand side of the screen.
3. Scroll down until you find the Snowflake Data Cloud destination connector and click on it.
4. You will be prompted to enter your Snowflake account information, including your account name, username, and password.
5. After entering your account information, click on the "Test" button to ensure that the connection is successful.
6. If the test is successful, click on the "Save" button to save your Snowflake Data Cloud destination connector settings.
7. You can now use the Snowflake Data Cloud destination connector to transfer data from your Airbyte sources to your Snowflake account.
8. To set up a data transfer, navigate to the "Sources" tab on the left-hand side of the screen and select the source you want to transfer data from.
9. Click on the "Create New Connection" button and select the Snowflake Data Cloud destination connector as your destination.
10. Follow the prompts to set up your data transfer, including selecting the tables or data sources you want to transfer and setting up any necessary transformations or mappings.
11. Once you have set up your data transfer, click on the "Run" button to start the transfer process.
With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.
We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!
What should you do next?
Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:
For decades, data warehousing solutions have been the backbone of enterprise reporting and business intelligence. But, in recent years, cloud-based data warehouses like Amazon Redshift and Snowflake have become extremely popular. So, why would someone want to migrate from one cloud-based data warehouse to another?
The answer is simple: More scale and flexibility. With Snowflake, users can quickly scale-out data and compute resources independently by automatically adding nodes. Using the VARIANT data type, Snowflake also supports storing richer data such as objects, arrays, and JSON data. Debugging Redshift is not always straightforward as well, as Redshift users know. Sometimes it goes beyond feature differences that could trigger a desire to migrate. Maybe your team just knows how to work with Snowflake better than Redshift, or perhaps your organization wants to standardize on one particular technology.
This recipe will explain the steps you need to take to migrate from Redshift to Snowflake to maximize your business value using Airbyte.
{{COMPONENT_CTA}}
Pre-requisites
1. You'll need to get Airbyte to move your data. To deploy Airbyte, follow the simple instructions in our documentation here.
2. Both Redshift and Snowflake are SaaS services, and you'll need to have an account on these platforms to get started.
3. When you create a data warehouse cluster with Redshift, it adds sample data by default within it. Our data is stored in the ‘redshift-cluster’ that we created within the ‘dev’ database. The clusters created can be seen on the dashboard under ‘Amazon Redshift > Clusters’. To create a new cluster, click on the icon highlighted below and follow along.
4a) For the Destination, you will need to create an empty database and warehouse within Snowflake to host your data. To do so, click on the Databases icon as shown in the navigation bar, hit ‘Create...’.
Provide a name for your database. In the example below, we have named our database ‘Snowflake_Destination’.
4b) After setting up the database, click on the warehouse icon and ‘Create’ a warehouse named ‘COMPUTE_WH’. In our example, we have used an X-Small compute instance. However, you can scale up the compute instance using bigger instance types or adding more instances. This can be achieved with a few simple clicks and will be demonstrated in a later section to illustrate the business value of the migration. Now you have all the prerequisites to start the migration.
Step 1: Set up the Redshift Source in Airbyte
Open Airbyte by navigating to http://localhost:8000 in your web browser. 1b) Proceed to set up the source by filling out the details as follows -
In the ‘Set up the source’’ airbyte screen, we have named the source 'Redshift_Source' in this example, but you can change it to something else if you like. We picked the source type as ‘Redshift’. The remaining information can be obtained from the redshift-cluster dashboard, as illustrated in the highlighted parts of the screenshot below.
Apart from this, a few more settings need to be adjusted in your Redshift console. For example, suppose you are running Airbyte locally on your machine and connecting to Redshift. In that case, you will need to enable your cluster to be publicly accessible over the internet so that Airbyte can connect to it. You can do so by navigating to the ‘Actions’ dropdown menu on the Redshift console, click ‘Modify publicly accessible setting’ and change it to be ‘Enabled’ (if not already).
Next, go to the VPC service offered by AWS. You can do so by searching for it in the search bar, as shown below.
Then navigate to the ‘Security Groups’ tab on the left to land on the page that looks like the one below to create a custom inbound and outbound rule. For example, you will need to create a custom TCP rule over port 5439 that allows incoming connections from your local IP address. After the rule is saved, it will be displayed on the dashboard as highlighted in blue.
The custom inbound rule needs to have the specifications as outlined in the image below. Note that the same step needs to be done for the outbound rules as well, and then your Redshift source will be ready for connection with Airbyte.
Step 2: Set up the Snowflake destination in Airbyte
After the Source is configured, proceed to the ‘Set up the destination’ page in Airbyte to configure the destination.
Similar to our previous step, the destination name is customizable. In this example, we have used “Snowflake_Destination”. Then, we picked the destination type as Snowflake. Refer to the image below for details on the other required fields.
The host value can be retrieved from your Snowflake dashboard.
Note: For this example, we are using the SYSADMIN role in Snowflake, but it is highly recommended to create a custom role in Snowflake with reduced privileges for use with Airbyte. By default, Airbyte uses Snowflake's default schema for writing data, but you can put data in another schema if you wish (like REDSHIFT_SCHEMA in our case). Having a separate schema is helpful when you want to separate the data you are migrating from the existing data. Lastly, the username and password should be as per the credentials setup during the Snowflake account creation.
Hit the “Setup the destination” button, and if everything goes well, you should see a message telling you that all the connection tests have passed.
Step 3: Set up the connection from Redshift to Snowflake
Once the destination is set up, you will be directed to the ‘set up the connection’ screen in Airbyte. In this step, you will notice that Airbyte has already detected the tables and schemas to migrate. By default, all tables are selected for migration. However, if you only want a subset of the data to be migrated, you can un-select the tables you wish to skip. In this step, you can also specify details such as sync frequency between source and destination. There are several interval options, as shown in the figure — from 5 minutes to every hour. The granularity of the sync operation can also be set by selecting the correct sync mode for your use case. Read sync mode in the Airbyte docs for more details.
After we have specified all our customizations, we can click on ‘Set up Connection’ to kick-off data migration from Redshift to Snowflake. At a glance, you'll be able to see the last sync status of your connection and when the previous sync happened.
That's how easy it is to move your data from Redshift to Snowflake using Airbyte. However, if you want to validate that the migration has happened, look at the destination database in Snowflake, and you will notice that additional data tables are present.
Step 4: Evaluating the results
After successfully migrating the data, let's evaluate Snowflake's key features which inspired the migration. In Snowflake, computing power and storage are decoupled, making Snowflake's storage capacity not dependent on the cluster size. Furthermore, Snowflake enables you to scale your data with three simple clicks, compared to Redshift's cumbersome process.
As shown in the figure below, Airbyte loads JSON data into a Snowflake table using the VARIANT data type. Using Snowflake's powerful JSON querying tools, you can work with JSON data that is stored in a table along with non-JSON data. On the other hand, Redshift has limited support for semistructured data types, requiring multiple complex sub-table joins to produce a reporting view.
Wrapping up
To summarize, here is what we’ve done during this recipe:
1. Configured a Redshift Airbyte source
2. Configured a Snowflake Airbyte destination
3. Created an Airbyte connection that automatically migrates data from Redshift to Snowflake
4. Explored the easy to use scaling feature and support for JSON data in Snowflake using the VARIANT data type
We know that development and operations teams working on fast-moving projects with tight timelines need quick answers to their questions from developers who are actively developing Airbyte. They also want to share their learnings with experienced community members who have “been there and done that.”
Explore our detailed analysis of two additional cloud data warehouse leaders, Snowflake vs. Redshift, to unveil subtle differences and empower informed decisions within the dynamic data management arena.
Join the conversation at Airbyte’s community Slack Channel to share your ideas with over 1000 data engineers and help make everyone’s project a success.
With Airbyte, the integration possibilities are endless, and we can't wait to see what you're going to build!
What should you do next?
Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:
Ready to get started?
Frequently Asked Questions
Amazon Redshift provides access to a wide range of data related to the Redshift cluster, including:
1. Cluster metadata: Information about the cluster, such as its configuration, status, and performance metrics.
2. Query execution data: Details about queries executed on the cluster, including query text, execution time, and resource usage.
3. Cluster events: Notifications about events that occur on the cluster, such as node failures or cluster scaling.
4. Cluster snapshots: Point-in-time backups of the cluster, including metadata and data files.
5. Cluster security: Information about the cluster's security configuration, including user accounts, permissions, and encryption settings.
6. Cluster logs: Detailed logs of cluster activity, including system events, query execution, and error messages.
7. Cluster performance metrics: Metrics related to the cluster's performance, such as CPU usage, disk I/O, and network traffic.
Overall, Redshift's API provides a comprehensive set of data that can be used to monitor and optimize the performance of Redshift clusters, as well as to troubleshoot issues and manage security.
What should you do next?
Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey: