Use Octavia CLI to import, edit, and apply Airbyte application configurations to replicate data from Postgres to BigQuery.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
Note: This tutorial leverages Octavia CLI, which is an alpha unofficial CLI that won't be maintained. Since the publication of this tutorial, Airbyte has released an official Terraform Provider, which we would advise to use instead of the CLI.
Most Airbyte users get started with the Airbyte web app UI to configure sources, destinations, and connections to replicate data. The latest configuration is stored in a Postgres database installed with Airbyte.
Configuring resources in the UI is easy but becomes error-prone and inefficient as you onboard more users, add more connections, and manage several Airbyte instances. For example, by solely using the UI, data engineers cannot review the configuration changes done by a peer before applying them. Configuring an Airbyte resource on the UI is fast, but replicating changes between local, staging, and production environments is better handled with code.
To serve power users, Airbyte provides Configuration as Code (CaC) in YAML and a command line interface (Octavia CLI) to manage resource configurations. Octavia CLI uses the Airbyte API under the hood but provides a better developer experience through a command line interface (CLI) than directly interacting with the API.
Octavia CLI provides commands to import, edit, and apply Airbyte resource configurations: sources, destinations, and connections. Note that Octavia manages Airbyte Configurations as Code (CaC) and not Airbyte Infrastructure as Code (IaC). Thus, Octavia CLI can not provision an Airbyte instance. Here are the most popular Octavia CLI use cases.
When Airbyte configurations are edited frequently, you may want to keep a history of the source, destination, and connection configurations in a Git repository.
When several users edit Airbyte configurations, you may want to review and test configuration changes in a Pull Request before applying them to production.
When you have to manage multiple Airbyte instances, such as local, testing, staging, and production, you may want to copy configuration files manually or as part of your CI/CD pipeline.
When you have more connections, or your connections contain several Airbyte streams (database tables or API endpoints), you may want to define configurations in YAML programmatically.
Since we announced the Octavia CLI in April this year, we’ve added support for extra features and have seen hundreds of open-source Airbyte users use it successfully in production. In this tutorial, you will learn how to use Octavia CLI to configure Airbyte resources to move data between Postgres and BigQuery. More precisely, you will learn how to:
You can install Octavia CLI as a command available in your bash profile with Docker or modify your Airbyte docker-compose.yml file to apply the configuration on start. You can explore these three modes in the Octavia CLI documentation. Note that you should install the same Octavia version as the targetted Airbyte instance to avoid API incompatibility issues.
Here you can install the latest Octavia version as a bash command:
The .octavia file is where you configure environment variables and secrets. The file content will be used to set the environment variables of the Octavia CLI container that runs when you call the octavia command. Below you can inspect what was appended to my .bashrc file in case you want to need to run Octavia CLI without the alias.
If you are using Airbyte version 0.40.16 or above that comes with Basic Authentication for the webapp, API and CLI, you need to add the AIRBYTE_USERNAME and AIRBYTE_PASSWORD environment variables in the .octavia file.
Then you can bootstrap an Octavia project on a new folder with the octavia init command.
This will create folders for the source, destination, and connection resource definitions. It also creates an api_http_headers.yaml file.
If you have an existing Airbyte instance that you want to version control, you can get the configurations with the octavia import all command after you init an Octavia CLI project.
Next, you will configure a Postgres source. To create a source definition with Octavia CLI you can use the octavia generate source command. You will need to pass a DEFINITION_ID and RESOURCE_NAME. Due to a current limitation, to create a source, you first need to get the definition id for the source. You can run the command octavia list connectors sources to get the connector id:
Then you can bootstrap a Postgres source with the octavia generate source command:
The CLI creates a postgres folder under sources with a configuration.yaml file.
The YAML file contains all the fields with the default values you see on the UI, and the description for each field as a comment. Below you can see the beginning of the file.
You must edit the configuration above before applying the changes to your Airbyte instance. You must fill in the values for the REQUIRED fields and edit, comment, or delete the lines for the OPTIONAL fields. Commenting on the OPTIONAL fields may be useful when you want to edit the configuration in the future as you keep all options like in the UI. Otherwise, you can generate a new source configuration to see all available options.
Source and destination configurations have credential fields you want to store as something other than plain text. Octavia offers secret management through environment variables expansion on configuration files. You can set environment variables in your ~/.octavia file. Then you can reference these variables on your configuration files with ${POSTGRES_PASSWORD}. After editing the configuration, it should look like this:
To apply the changes to your local Airbyte instance, you can run octavia apply. Octavia will validate the configuration against a JSON schema and will fail to apply the changes if finding any configuration error. If an error occurs, you will get a stack trace from the API response.
Then you can check that the configuration is also available in the UI.
After you apply some configuration changes with Octavia, no connection test will be run like when you save connector settings in the UI (see GitHub issue). You can still use the UI to test that the source setting allows Airbyte to connect.
After you apply some changes, Octavia creates a state.yaml file in the resource directory with the checksum of the latest configuration applied and the generation timestamp. The state files are instance and workspace specific so they are only useful when multiple users or Octavia CLI processes work on the same instance and workspace. If you apply the same configuration across multiple instances, then you don’t need to commit state files in your Git repository.
Each time you run the apply command, Octavia will also compute and display differences between the current resource state locally including changes since you last run the apply command, and the state in your Airbyte instance including changes you may have done on the UI.
After editing the YAML configuration file for your Airbyte source, you can run git add and git commit to version control your Airbyte configuration.
Next, you can configure a BigQuery destination. After you get the BigQuery definition id with the octavia list connectors destinations command, you can bootstrap the configuration file sources/bigquery/configuration.yaml for the BigQuery destination.
After you edit the destination template, it will look something like this:
After adding the environment variables in your .octavia file, you can apply the changes to your instance.
Then you can see that the changes were applied in the UI. Remember to test the connection.
Once you have source and destination configuration files, you can create a connection template with the octavia generate connection command.
After editing the configuration, your configuration should look like this:
Notice that the stream configuration was generated automatically and, as stated in the comments, “ONLY edit streams.config, streams.stream should not be edited as schema cannot be changed". After editing the configuration, you can apply the changes to your Airbyte instance. Note that you can specify a unique configuration file to the apply command with the -f option.
As before, this will create a state file with a configuration hash. You can now see the configuration on Airbyte UI as well.
When you edit the YAML configuration files, Octavia will validate the new configuration and compute a diff before applying the changes.
Imagine that, for example, you want to change the connection scheduling from manual to hourly. Here it comes in handy to comment the default configuration template instead of removing optional fields. If you edit a field with the wrong syntax, Octavia will fail to apply the changes and display an error message with the field that failed to validate.
Above, I misspelled the time_unit value to be hour instead of hours. When running octavia apply -f connections/postgres_to_bigquery/configuration.yaml you will get a long stack trace and error message at the end.
After fixing the error, Octavia will compute a difference, and display it. It will also ask you to confirm the changes unless you use the –force option.
Above, you will notice the two differences related to our configuration changes locally: root['schedule_data'] and root['schedule_type']. The rest of the differences come from Airbyte adding some extra changes after applying the local config. To avoid this difference, you must import the Airbyte config locally and commit these fields.
If you have already configured an Airbyte instance and want to version control changes or manage configurations with Octavia, you can get the instance configuration with the octavia import all command. This command will retrieve all sources, destinations, and connection configurations. You can then commit this to a Git repository. Once you start to edit Airbyte resources with Octavia CLI then is better to avoid using the UI as well as you will continue to see some differences when importing changes.
Before retrieving the configurations, you have to bootstrap an Octavia project. For example, if you change to a new folder locally, create an octavia project with octavia init and run the import command, you will get this output.
This will create the following files.
Note that when importing resources Octavia overwrites the environment variables that you configured originally with the real value and replace secrets with '**********'. The configuration file will look like this:
Given the above limitation you can still version control Airbyte configs, but you won’t be able to push this new configuration to a different instance without editing the secrets in the template and adding the environment variables that may change between instances.
You can then apply these changes to a different Airbyte instance.
You can manage Airbyte resources in the web app or with YAML files. The web app makes it easy to get started and create configurations. When used in combination with Octavia CLI you can get both an accessible ELT tool and support for Configuration as Code.
For more details, you can check the Octavia CLI documentation and join the #octavia-cli channel in our community slack. As demonstrated in this tutorial, it’s easy to version control your existing Airbyte configuration with Octavia CLI commands.
You can track upcoming Octavia features on Github Issues. To support Airbyte deployments that grow the number of users, connections, and instances we want to expand support for other GitOps use cases such as managing Airbyte Infrastructure as Code (IaC), showcasing how to build CI/CD pipelines to deploy Airbyte, and testing Airbyte configurations.