Data Insights
Video

Bring Your Own Infra

Davin Chia
April 13, 2023
10 min read
Limitless data movement with free Alpha and Beta connectors

The second and more sophisticated approach is “Upgrade all infrastructure and processes to adhere to the new regulations” and do a lift-and-shift to the EU. For brevity, we skip the US and EU privacy law analysis and jump straight to a general data-aware strategy: move all data processing to the region with stricter data-privacy laws. This is the EU in this scenario. Compliance is ensured through making sure all data is compliant.

The only wrinkle in this plan: migrations are often one of the hardest projects. Further, this still isn’t a long-term solution. What happens if GlobaMart starts directly processing payments and needs to become PCI compliant? Will we upgrade all infrastructure and processes once more? Overall, we exerted a lot of effort and ended up right where we started. Surely there must be a better way?

Control Data Plane Split: The Key to Flexibility

A far more sophisticated solution is the Control Data Plane Split.

This tried-and-tested approach involves separating the architecture into two distinct components:

  1. Control Plane: The ‘Brain’. This component houses all the business logic complexity and serves as the central configuration location, allowing easy and efficient development and operation.
  2. Data Plane: The ‘Hands’. Simple workers perform atomic operations. These workers can be moved anywhere, enabling businesses data processing to the source and maintain compliance with regional regulations.

Let’s use Airbyte, a Data Integration Platform, to illustrate how this works in practice.

This diagram illustrates Airbyte Cloud’s architecture with the Control Data Plane split.

Some details to note,

  • The Control Plane - Airbyte’s Brain - is on the left. This contains all the complex business logic and state, such as scheduling, configuration, permissioning and so on.
  • The Data Plane - Airbyte’s Hands - are on the right. We see two planes. The first is in Paris, while the second is a temporary plane for development. 
  • The control and date planes communicate asynchronously via specific data-plane queues. The control plane’s Routing Service places jobs in the relevant queue. The data planes constantly poll their specific queues for work, execute jobs as soon as they are aware of them, and update the control plane after.

Thus, Airbyte’s Control Data Plane split is a pull-based model with queues. There are many flavors of splits, and analyzing tradeoffs is outside the scope of this blog post.

It is immediately obvious an architecture like this easily solves GlobaMart’s issue - stand up a data plane in the EU region and configure jobs to be scheduled in the new region. Immediate business value with little to none complexity!

Careful readers will notice one interesting detail in the above diagram - the Control and Data planes are in different Clouds! Indeed this deployment flexibility is another benefit of this specific flavor of a Control Data plane split and is due to the Airbyte Data Plane’s minimal infrastructure requirements.

An Airbyte Data Plane only has two infrastructure requirements:

  1. The ability to run Docker containers.
    Docker is a commonly accepted infrastructure layer. All public Cloud providers have numerous Docker offerings with various tuning levers. Widely understood commodity technology.
  2. The ability to make outbound network connections.
    No firewall rules changes. Security departments can rest easy. 

This minimal set of requirements explains how the Control and Data planes can exist in different Clouds as the above diagram shows. This is how Airbyte Cloud is today. We are available in GCP and AWS and going to Azure - simply spinning up another data plane - is a question of when and not how. 

By separating the “Brain” from the “Hands”, the Control Data plane split minimizes operational complexity while ensuring businesses can scale their data operations to meet changing compliance requirements.

Airbyte's BYOI Solution

So, what does this mean for Airbyte’s users?

Cloud Users interested in using Airbyte Cloud while maintaining Data Control are now able to deploy an Airbyte Data Plane into their infrastructure. Airbyte will work with you to do the hard work of operating the Data Plane in your infrastructure.

OSS Users who want to continue using Airbyte OSS across different regions without the hassle of maintaining multiple Airbyte instances are now able to deploy various Airbyte Data Planes within their own infrastructure. This is a premium OSS offering and Airbyte will work with you to help set up the initial data plane and provide continuous operational advice and support as you scale your Airbyte usage.

Both of these are currently in Alpha, so please reach out here if you are interested! Please reach out on Slack for any questions or comments.

Thank you for using Airbyte and being patient as we work to make Airbyte better!

The data movement infrastructure for the modern data teams.
Try a 14-day free trial