Column selection has become available to the community on both Airbyte Open Source and Airbyte Cloud, for all connectors available, and all connectors you may build with the no-code connector builder or low-code CDK.
When setting a new connection, you will now be able to select which fields you want to synchronize for each stream you choose to replicate from your API or database source. The table created in your destination will only include the columns you select
How to use column selection within Airbyte
In the connection setup process, you will be asked to select the streams you want to sync.
By clicking on the + button in the line, this will show a popup where you can select the columns to replicate. It’s as easy as this.
How we built column selection so it’s available across all connectors
One challenge of implementing that functionality in Airbyte was to make sure it is supported by all connectors.
Column selection could benefit from being handled directly by source connectors: database source connectors would only run select statements only on the fields users care about. But that approach does not bring any value benefits for API connectors; it also requires that all source connectors be updated before users can benefit from the new options. Moreover, one strength of Airbyte’s open-source approach is that our users can build their own connectors for sources that we do not support natively just yet. We want to make sure our functionality is compatible with all connectors in the wild.
This is why our implementation does not rely on any changes being done in connectors. Instead, the Airbyte infrastructure (workers) removes any fields that were not selected during the sync process. This allows us to make the functionality available to all connectors without any protocol change.
Why use column selection?
Column selection has been one of the most requested features by the community and for good reason! Column selection is crucial in ELT processes for several reasons:
- Data Minimization: Not all data is useful for every analysis or report. Including unnecessary columns increases storage costs and slows down processing times. Selecting only necessary columns optimizes storage and improves efficiency.
- Data Privacy and Security: Certain columns may contain sensitive information such as personally identifiable information (PII) or proprietary data. By excluding such columns from the ELT process, organizations can protect this data and comply with data privacy regulations.
- Data Relevance: Some columns may be irrelevant for certain business analyses. Column selection enables you to tailor the data to the specific requirements of the analysis, leading to faster and more meaningful results.
Column selection will give Airbyte users more control over the columns they replicate over to their data warehouse. But the source schema can also change, and we plan to offer you more ways to control replication behavior when that happens. Stay tuned!