One of the most common requirements we see with our customers is data sovereignty. Your organization collects data in various operational systems from users around the world. But those users, and their data, are often governed by laws and regulations that dictate when that data can be collected, how it is to be handled, and who can access it.
At some point, you’ve probably found yourself in a situation where you had to comply with GDPR (European Union), PIPEDA (Canada), HIPAA (USA), APPs (Australia), PIPL (China) or a similar framework. Each of these frameworks regulates how data can move within and across borders, and noncompliance can have serious financial consequences. For example, non-compliance with GDPR can result in a bill for 20 million euros or 4% of global turnover. That’s not a good thing for anyone. Or, perhaps you don’t have any compliance obligations, but you don’t want prying eyes in other countries having access to your data.
Fortunately, Airbyte runs on Kubernetes, and Kubernetes is equipped to handle situations like this. In a basic deployment, Kubernetes assumes the control plane and data plane are in the same network boundary. However, separating the Kubernetes control plane from the data plane is a powerful architecture pattern when scaling your Airbyte deployment, ensuring that control operations are kept separate from customer data, and data can remain in the jurisdiction from which it originated. In this article, we’ll take a look at the difference between a control plane and a data plane and how you can implement this feature to help you manage your compliance obligations.
Control plane: the brain of the cluster Think of the control plane like a brain. It manages the state of the cluster and coordinates work. In Airbyte, the control plane is responsible for Airbyte's user interface, APIs, Terraform provider, and orchestrating work.
Data plane: the muscle of the cluster Think of the data plane like a muscle. It’s where containers are actually run. In Airbyte, the data plane initiates jobs, syncs data, completes jobs, and reports its status back to the control plane.
How these go together In this remote worker node setup, each Airbyte data plane connects to a centralized control plane. However, data never passes through the control plane, ensuring your data is always regionalized.
The control plane cluster runs the airbyte-server, connector-builder-server, etc. It stores only metadata. The individual data planes run worker pods and connector pods, as instructed by the control plane.
Setting up independent data planes in Airbyte Each Airbyte workspace runs in a single region. By default, that’s going to be the same as your control plane, but it’s not particularly difficult to configure new data planes in Airbyte. There are a few steps you need to take to ensure your data plane can authenticate with your control plane.
Create a region in Airbyte. Regions are objects that contain data planes, and which you associate to workspaces. You create a region with a single API request to /api/public/v1/regions. Create a data plane within that region. Again, it only takes a single API request to /api/public/v1/dataplanes. Airbyte responds with a client ID and client secret. Make note of these values, because you’ll need them later. Associate your region to one of your workspaces. Once again, you can accomplish this with a single API request to /api/public/v1/workspaces, or you can set this from the workspace settings in Airbyte’s UI. Configure Kubernetes secrets. Your data plane relies on Kubernetes secrets to identify itself with the control plane. Here’s a lightweight example file you can create. apiVersion: v1 kind: Secret metadata: name: airbyte-config-secrets type: Opaque data: # Enterprise License Key license-key: your-airbyte-license-key # Insert the data plane credentials received in step 2 DATA_PLANE_CLIENT_ID: your-data-plane-client-id DATA_PLANE_CLIENT_SECRET: your-data-plane-client-id5. Create your deployment values.yaml file.
airbyteUrl: https://airbyte.example.com # Base URL for the control plane so Airbyte knows where to authenticate edition: enterprise # Required for Self-Managed Enterprise dataPlane: # Used to render the data plane creds secret into the Helm chart. secretName: airbyte-config-secrets id: "preview-data-plane" # Describe secret name and key where each of the client ID and secret are stored clientIdSecretName: airbyte-config-secrets clientIdSecretKey: "DATA_PLANE_CLIENT_ID" clientSecretSecretName: airbyte-config-secrets clientSecretSecretKey: "DATA_PLANE_CLIENT_SECRET" # Describe the secret name and key where the Airbyte license key is found enterprise: secretName: airbyte-config-secrets licenseKeySecretKey: AIRBYTE_LICENSE_KEY6. Deploy your data plane to your Kubernetes cluster with Helm.
$ kubectl create namespace airbyte-dataplane $ helm repo add airbyte-enterprise https://airbytehq.github.io/helm-charts $ helm repo update $ helm install airbyte-enterprise airbyte/airbyte-data-plane --version AIRBYTE_VERSION --namespace airbyte-dataplane --create-namespace --values values.yaml --secrets secrets.yamlAlthough this example is simplified and idealized, as you can see, Airbyte is well-equipped to handle multiple data planes. Whether for operational security or compliance, we’re ready to meet your data sovereignty needs.
Separate control and data planes are available now in the Enterprise edition of Airbyte.