Learn how to replicate data from a Microsoft SQL Server database to a Redshift data warehouse.
Replicating data from Microsoft SQL Server (MSSQL) to Amazon Redshift with Airbyte provides faster analytical queries. Data analytics need dedicated compute resources. When processing terabytes to petabytes of data for the purpose of analytics, SQL server may be slow and expensive due to its per socket, CPU-based billing model. Moving to Redshift significantly reduces processing time and running costs.
Airbyte Cloud allows you to seamlessly move data between any data source and destination, including popular databases, data warehouses, and business applications. Airbyte's database replication to datawarehouses uses change data capture (CDC) with checkpointing capabilities and scheduling to simply pick up from where you left off.
This tutorial will take you through the critical steps to set up Airbyte Cloud and replicate data from your SQL Server instance running in Amazon RDS to Redshift.
Below are the prerequisite tools you’ll need to get started on replicating up your SQL Server data to Redshift.
In this example, we will configure an existing SQL Server instance hosted on AWS as our Airbyte Cloud Source. Log into AWS and go to Amazon RDS > <YOUR_DATABASE> > Connectivity and Security to make note of your endpoint and port that will be required to configure Airbyte.
Note: Ensure that the DB instance's public accessibility is set to Yes to allow external connections. To modify the Public access setting, see Modifying an Amazon RDS DB instance.
Next, we will need to update the security group just created by going to EC2 > Security Groups > Create security group. Give the group a name and description. In the Inbound Rules section, select MSSQL as the type and add ‘220.127.116.11/32’ (Airbyte Cloud IP) in the Source section and add the rule. Next, add another rule with MSSQL type and choose ‘My IP’ from the drop-down. This will allow you to connect to your instance from your local machine.
Once the security group has been updated, you can add some data to your instance. You can use any database management tool you prefer. Now that you have your SQL Server instance set up to allow connections from Airbyte Cloud, you can begin configuring the Airbyte Source. Login and create a new connection and select Microsoft SQL Server as the source type and give it a name. You can find more information about the SQL Server Airbyte connector at the link here. Enter the host, the port, the database name (airbyte in this case), and the user and password you used when setting up SQL Server. Once configured, click on set up the destination.
To set up Redshift as your Airbyte destination you will have to allow connections from Airbyte Cloud to your cluster. Login to AWS and go to Amazon Redshift > Clusters > <YOUR_CLUSTER> and make a note of the endpoint for your cluster.
Your endpoint will be in the format <YOUR_REDSHIFT_HOST>:<YOUR_REDSHIFT_PORT>/<YOUR_DATABASE_NAME>. Make a note of these three values required to configure the Airbyte Cloud.
Next, you will also need to edit the inbound rules for the security group for your Redshift cluster. You can find the security group for your Redshift cluster in the Network and security settings section.
Once you know which security group is being used by Redshift, go back to EC2 > Security Groups > <YOUR SECURITY GROUP>. In the Inbound Rules section, select Redshift as the type and add ‘18.104.22.168/32’ (Airbyte Cloud IP) in the Source section and add the rule.
Next, from the Redshift cluster page, go to Actions > Modify publicly accessible settings.
In the pop-up, select the Enable option and save changes. You can confirm that your cluster is publicly accessible by going to Properties > Network and security settings which should now be listed as Enabled.
To set up your destination, select Redshift as your destination type and give it a name. You can find more information about the Redshift Airbyte connector at the link here. Enter the host, the port, and the database name (dev in this case), and also enter the user and password you used when creating your Redshift cluster and click on set up Destination.
Once the source and destination are configured, you can access your connection settings. You can set the Replication frequency depending on how often you want Airbyte to replicate your data.
Next, you can choose which tables to sync and set the sync frequency and the sync mode for each table individually. This example will select the customers table and set the Sync more to Incremental | Append.
You can also choose between using Raw Data or Basic Normalization. We will select Basic Normalization to set up the connection in this example. You can also choose to apply custom data transformations, but we will keep it simple by skipping the data transformation part in this example.
Once configured, save the connection and select Sync now to run your first sync once configured. Once the sync is complete, you should see how many rows were replicated (849 in this case).
To view the replicated data, go to the Redshift Query editor for your cluster and select your database to view the tables created by Airbyte Cloud.
Clicking on a particular table will show you the schema generated by Airbyte for your data.
You can run the following query from the query pane to view your data.
You can also view the row count for your table by running the following query.
To test out the incremental sync, you can add some more rows to your SQL Server table. In this example, 10 more rows were added. Once you add some more data you can run another sync. The 10 newly added rows are replicated up to Redshift.
To summarize, we look at how we can replicate data from SQL Server to Redshift using Airbyte Cloud by:
We know that development and operations teams working on fast-moving projects with tight timelines need quick answers to their questions from developers actively developing Airbyte. They also want to share their learnings with experienced community members who have "been there and done that." Join the conversation at Airbyte's community Slack Channel to share your ideas with over 1000 data engineers and help make everyone's project successful.
Learn how to easily export Postgres data to CSV, JSON, Parquet, and Avro file formats stored in AWS S3.
Learn how to stream changes from a MySQL database to Kafka using Change Data Capture (CDC).
Get your data out of Redshift and into Snowflake easily with open source data integration.