Data is a crucial aspect of business success in today's technology landscape. For leveraging data to grow, you must collect and transform it into a form that fits for analysis. That is where processes like Change Data Capture (CDC) come into play. It allows you to synchronize data in real time between two operational systems. CDC is very common in relational databases, and in this article, you will learn about this feature from the standpoint of a popular SQL database, SQL Server. So, let's get started and learn how to implement CDC on Microsoft SQL Server.
What is Change Data Capture?
Change Data Capture (CDC) is the process of tracking and identifying changes to data in a storage system such as a database or data warehouse. It provides near real-time or real-time data movement by processing and moving data continuously as new changes are made. In practice, CDC is often used to replicate data between two data storage systems. It automatically synchronizes databases as soon as the source data changes. As a result, CDC is most frequently used in an Extraction, Loading, and Transformation (ELT) application. ELT applications like Airbyte allow you to incrementally load change data with CDC from the operational system to the data warehouse.
SQL Server Overview
SQL Server is a popular relational database management system (RDBMS) created in 1989. The database stores data in tabular format within rows and columns and supports SQL for querying and manipulating data. SQL Server gives advanced enterprise features to ensure high availability and scalability options that reduce downtime for critical applications. These features include database mirroring, failover clustering, and always-on-availability groups.
The database also provides a wide range of tools dedicated to business intelligence and reporting, such as SSRS, SSIS, and SSAS. Major organizations that use SQL Server in their tech stack include Accenture, Dell, Cognizant, and Microsoft.
What is CDC in SQL Server?
CDC in SQL Server tracks changes in the database and delivers the changes to downstream systems. To be more specific, it is a process that records insert, update, and delete operations applied to a user table and then delivers the changes to other systems. With the CDC, only the changed data gets processed to downstream applications within a specified time rather than whole tables to maintain efficiency.
In SQL Server, the data flows from the transaction log (which is the source). As the inserts, deletes, and updates are applied to source tables, entries that describe the changes are added to the log. Then, from the log, it reads data and adds information about the changes to the tables you want to update. Functions are provided to enumerate the changes appearing in associated tables over a particular range, returning the information as a filtered result set.
The CDC process has a small overhead on the database resources and does not impact the tables.
How to Implement CDC in SQL Server?
Enabling change data capture is straightforward if you have little understanding of SQL language. You just have to track changes by using SQL queries. Here's a detailed guide:
Prerequisites
- SQL Server Server.
- Windows Powershell.
- Having "sysadmin" privileges in database.
Step 1: Create or Select a Database And Table
Make sure SQL Server Server is installed on your system, then open the terminal and type in the following code:
The above command will create a database with the name of database_name and enable admin access. Then, it will create a table with table_name and the other fields.
Step 2: Enable CDC in Database
To enable CDC in the entire database, use the following stored procedure query:
This will enable CDC on the database_name. However, if you have multiple tables and you want to enable CDC on specific tables, you can use the following code:
In the above code, fill in the field accordingly, including schema_name, table_name, role_name (as your database role), and capture_instance_name.
Step 3: Insert & Update Values into Table
Now, let’s insert some values into the table and then update it to check if the CDC works accurately. Here is a query to insert values into the table:
Now, let’s make a few changes to the table data to verify if SQL Server CDC. Below are some query examples:
Now you can look at the CDC CT (Change Table) and identify the updates on the table data in SQL Server. Done.
Disable CDC In SQL Server
After enabling CDC for any reason, if you want to disable the feature, you can easily do that using simple SQL commands.
To disable it on the database level, you can use the following command:
Replace database_name with the name of the database.
To disable CDC in its enabled tables one at a time. You can again use the code below:
SQL Server CDC Using Airbyte
While manually performing CDC in SQL Server gives you flexibility, it requires technical expertise and can be complex to implement. That’s where tools like Airbyte help you out.
Airbyte is a data integration tool that provides a dedicated CDC connector for SQL Server Server. It uses SQL Server CDC feature for you to capture row-level INSERT, UPDATE, and DELETE operations that occur on cdc-enabled tables.
To automate SQL Server CDC, you can configure it as a source and synchronize it for incremental loads with the destination of your choice. Here’s a step-by-step guide on how to perform this task:
Step 1: Configure SQL Server As Source
- Sign up or log in to the Airbyte cloud.
- On the home page, click on the Sources section from the left navigation menu.
- In the Sources section, you’ll see a search bar type in SQL Server there. A connector card will appear with the SQL Server icon on it. Click on the card.
- You’ll be directed to the Create a Source page. There, fill in the required details, including Host, Port, Database, Username, and Password. In the Update Method, select Read Changes using the CDC option to enable the feature.
- Lastly, click on Set up Source.
Step 2: Configure the Destination of your Choice
- After setting up the CDC source connector, click on the Destinations tab just below the Sources menu.
- There, select the search bar from the top and type in the destination of your choice to synchronize data.
- After selecting the destination, click on the connector card.
- On the Create a destination page, fill in the required details.
- Click on Set up destination.
Step 3: Connect Source And Destination
- Go to the main dashboard of Airbyte Cloud. Click on Connections > Create a new Connection.
- Select SQL Server (Step 1) as source and destination (Step 2) as destination to establish a connection between them.
- Enter a unique Connection Name and configure other details, such as Replication frequency, according to your requirements. You can choose to tweak other configurations, including Destination namespace, Schedule type, Detect and propagate schema changes, and Destination Stream Prefix.
- Below is an Activate the streams you want to sync section on the connection page. You can select which streams you want to sync, and they will be loaded in the destination. Learn more about sync modes from here.
- Lastly, click the Set up connection option. Once the setup is complete, you must run sync by clicking the Sync now button.
That’s it. The above steps will establish a connection between the SQL Server and the storage system of your choice.
Conclusion
You have learned about change data capture and how to implement it in SQL Server. It involves a series of steps, from choosing a database, creating a table, enabling CDC, and verifying it. However, once it is set up, it can be very beneficial for your data management practices. By following the steps mentioned above, you can easily set up CDC in SQL Server.
If you are still finding it difficult to implement the feature, you can use Airbyte to automate the change data capture data pipeline for you. Using the pre-built connector of SQL Server from the Airbyte cloud, you can synchronize data from this database to your destination with a few clicks.
💡Related Read: Mongodb CDC