Let’s face it, doing a full reset when the source schema changed is cumbersome. Data pipelines should maintain themselves. This is actually part of our mission to commoditize data integration. Commoditizing means that there shouldn’t be any operational load in maintaining a pipeline. So we’re very excited today to announce that schema propagation is now available in both Airbyte Open Source and Airbyte Cloud.
As with column selection, this is part of our latest effort to give our users more control over the schema of the data replicated to their data warehouse. With schema propagation, users will be able to specify how Airbyte should handle any change of schema in the source. You can now ask Airbyte to automatically replicate any new field or stream that is detected in the source.
How to activate schema propagation in Airbyte?
When setting up a new connection, you can request that Airbyte automatically adds to the replication any new stream or field that is added in the source in the future.
You can also change that setting on your existing connection. We now offer two additional levels of schema change propagation:
- Propagate column changes only will ensure that changes within a sync stream will automatically be replicated,
- Propagate all changes, which will also propagate new streams detected in the source.
For those who opt for a more manual approach, you can also choose to Ignore any schema change, in which case the schema you’ve set up will not change even if the source one changes until you approve the changes.
You can even choose to Pause the connection on any schema change if you want to review any change manually before you keep syncing.
Reliability is an important concern when replicating data, so Airbyte will only attempt to propagate changes that will not break any downstream workflow: if a primary key or a cursor goes missing for example, we will always pause the replication. For more complex changes, the replication process will always stop, and you will be prompted to review the changes in the schema and decide what to do.
How does schema propagation work?
The Airbyte platform relies on the existing Airbyte protocol primitives to implement schema propagation: the same DiscoverSchema operation that is being run when a user sets up a new connection is also being run automatically before sync. The platform then compares the newly fetched schema with the one that is currently stored from replication.
Here again, relying on the existing Airbyte protocol allows us to make that functionality available to all connectors: the one provided by Airbyte, the one you created yourself with our CDK or the connection builder.
Our goal with the schema propagation feature is to reduce the operational load of maintaining pipelines. New ETL replications should be easy to set up and should require minimal involvement from the user to keep running: being resilient to infrastructure issue or source changes. This is an area of focus for us. If you have an idea to reduce our operational load while using Airbyte, please reach out to us on Slack.