Improving Security for Open Source Airbyte Users

We were dismayed to learn that an Airbyte user self-hosting an unsecured instance had their connector credentials stolen. Transparency is a core value at Airbyte, so we are choosing to highlight this to our community and discuss the steps we will take to improve Airbyte's security defaults.

Airbyte takes security extremely seriously, but as an open source project, we avoid making too many assumptions on infrastructure. From inception we have strongly recommended that self hosted Airbyte instances not be exposed to the public internet, a security model similar to Redis, Elasticsearch, and Airflow.

Data pipelines are a particularly rich target for attackers as by nature they are repositories of credentials, and the shared responsibility of open source means both Airbyte users and the Airbyte team must take steps to keep our pipelines secure.

The facts

  • Before March 2022, Airbyte allowed users to export their entire Airbyte configuration, including secrets, via an /export API endpoint which was also available in the UI. 
    This endpoint/UI button was used to allow users to upgrade their Airbyte instance to receive new product features & security updates. 
  • In March 2022, we changed Airbyte’s upgrade flow to no longer require this export/import process, instead allowing the upgrade to happen in-place without exporting any credentials. 
    As part of that change, this export endpoint scrubbed all secrets from the output by default, unless the Airbyte instance operator manually disabled that secret scrubbing behavior. 
    As a result, all Airbyte instances running version 0.35.63 or later do not have the problem faced by the user who reported this incident. The current version of Airbyte at time of writing is 0.40.
  • In August 2022, a user reported that their EC2 instance running Airbyte was hacked. The root cause of the issue was that the AWS security groups for the EC2 instance were incorrectly configured, potentially exposing that instance to the public internet. This means their Airbyte instance was also compromised.
  • This user was running v0.35.5 which means they did not have the version containing the March security update. So the intruder was able to easily discover and download the user’s credentials. 

Airbyte’s other existing security measures include:

  • Since inception, our deployment documentation has a red danger sign warning users that they should not expose Airbyte over the public internet. 
  • We also have standalone documentation on securing Airbyte.
  • For Airbyte Cloud users, customer secrets are currently stored in separate secret stores (KMS) than the database. This feature will be upstreamed to open source in the near future.

Action Items

I run open source Airbyte. What should I do?

  • Make sure you secure your Airbyte instance, per our docs (AWS, Azure, GCP, and other options).
    Specifically, don’t expose Airbyte to the public internet.
  • Upgrade frequently to receive security patches.
    We publish updates on GitHub (per release), Slack (biweekly), and our newsletter (monthly).

What will Airbyte do to improve security?

  • We will implement basic password authentication on Airbyte UI.
    Most people don’t change default passwords, so this password will be autogenerated so as to be unique per user.
    This is still imperfect because attackers can read logs if the instance is compromised, but it will at least be harder to access.
  • We will enable external secret storage in Airbyte Core.
    This work was already done for Airbyte Cloud and was already on our roadmap for Airbyte’s v1.0 release.
    This improves the security profile by using proper encryption at rest and ensuring that even compromised Airbyte instances do not leak credentials without requiring further access privilege.
  • We will implement scanners to detect publicly exposed Airbyte instances.
    Our team has prior experience sending proactive warnings if we can find unsecured instances.
  • We will improve our open source documentation.
    Hosting security is a very complex topic - please let us know if something can be improved and of course PRs are welcome (all our docs are open source!)

FAQ

Where does Airbyte open source store credentials once they are entered in the UI?

Secrets are currently stored inside one of the containers that Airbyte manages. The storage of credentials did not play a role in the reported issue. However, any compromised EC2 instances will lead to credentials also being compromised.

Airbyte has plans to bring external secret storage to open source in the near future, based on our work on Airbyte Cloud.

Does Airbyte have a public API that exposes secrets?

No, as described in our documentation, all self hosted Airbyte instances should not be exposed to the public internet. 

In addition, as mentioned above, this API was previously exposed to facilitate the upgrade process. The upgrade flow was improved in v0.35 to no longer require import/export of credentials AND the secrets were no longer exposed in the API by default. 

I have spotted another security vulnerability in Airbyte. What are your policies for responsible disclosure?

TL;DR: email security@airbyte.io
Please do not file GitHub issues for security vulnerabilities because that could put other Airbyte users at risk!

Airbyte takes security issues very seriously. If you have any concern around Airbyte or believe you have uncovered a vulnerability, please get in touch via the e-mail address security@airbyte.io. In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you as soon as possible.

Note that this security address should be used only for undisclosed vulnerabilities. Dealing with fixed issues or general questions on how to use the security features should be handled regularly via the user and the dev lists. Please report any security problems to us before disclosing it publicly.

Must I use Airbyte Cloud in order for Airbyte to be secure?

No!!! Airbyte is committed to security by default for open source users as well. We provide instructions on how to secure an Airbyte instance in our docs, and constantly make security patches - log4shell was patched on the same day - to open source Airbyte (including upstreaming improvements from our open source users and our work on Airbyte Cloud).

What security measures does Airbyte Cloud take?

  • The `/export` API endpoint does not exist in Airbyte Cloud.
  • Encryption at rest: We use a dedicated secrets store (KMS) instead of database storage - this means credentials are encrypted and stored separately from Airbyte instances.
  • Encryption in transit: We use HTTPS security everywhere
  • All Cloud infrastructure in a private network secured with VPN access only Airbyte Eng has access to.
  • External endpoints are only accessible via an authenticated user, secured by our cloud providers’ recommended best practices. Further, all Airbyte Cloud public systems (UI & API) are scoped by Workspace, and secured by Role-based Access Control.
  • We have SOC2 Type 2 assessment completed by independent third-party and committed our Security and Data Privacy Policy.

We will also be bringing many of these security improvements from our work on Airbyte Cloud into Airbyte Core, specifically the secrets management.

Open-source data integration

Get all your ELT data pipelines running in minutes with Airbyte.