Airbyte 1.0 has been in the making for the last 2 years as we hold ourselves to a very high standard in terms of the 1.0 status. We have defined a set of criterias that, we think, should stand for evaluation criterias for data teams to evaluate data movement solutions.
While Airbyte has unique differentiations that sets it apart, as a) its full extensibility over its platform and connectors, b) its community- and partner-powered Marketplace of connectors that are all customizable in the low-code / no-code / AI-powered Connector Builder (both released in 1.0 ), and 3) its deployment flexibility (self-hosted, cloud or hybrid), Airbyte also needs to be best in class on those 1.0 evaluation criterias to become best in class in the data movement category.
Let’s dive into the four criterias that have earned Airbyte its 1.0 new designation:
Criteria 1: Broad Deployments Across All Major Use Cases The first requirement to reach 1.0 is for a product to have been deployed broadly with many years of hardening in production. During the first year of Airbyte, we set ourselves the goal to become the open-source standard for data movement. Today Airbyte is the most used data movement solution in the world with:
170,000+ deployments 7,000+ companies syncing data daily 900+ contributors with over 1,200 PR contributions 10,000+ connectors created with our Connector Builder We had several use cases in mind when we created Airbyte back in 2020, but we didn’t expect all the use cases we’ve been exposed to: from standard database replication to warming caches (!!). But we have also seen 3 major use cases more and more recurringly: core reporting, enabling self-service analytics across many teams, and the 3rd one advanced AI-driven use cases including, RAGs and AI chatbots
Being able to support major use cases is great, but without the ability to easily deploy solutions and work with your existing devops and CI/CD toolchains, users will struggle to be successful. As a company founded on open source, with a vibrant community of more than 20k practitioners we listened to your feedback, delivering a fully managed cloud environment, abctl to let you deploy anywhere in under 5 minutes, and support for popular frameworks like helm charts, and Terraform Provider. Here’s a quick demo video:
Criteria 2: Setting a New Reliability Standard Reliability is the cornerstone of any production-grade data movement platform. Airbyte 1.0 sets a new standard for reliable pipelines with several key features:
Record Change History: Increasing Resilience Against Problematic Rows Record change history prevents sync failures caused by problematic rows. If an oversized or invalid record would have broken the sync in the past, Airbyte now modifies that record in transit, logging the changes and ensuring the sync completes. This boosts resilience and allows data engineers to handle edge cases gracefully.
Resumable Full Refresh: Building resilient systems for syncing data Resumable full refreshes allow full refresh syncs to resume from the last checkpoint if a sync is interrupted. It’s especially useful for data sources that don’t support incremental syncs, making large syncs far more reliable.
Refreshes: Reimport Historical Data with Zero Downtime The Refresh feature in Airbyte allows users to reimport historical data while avoiding data downtime. Unlike the previous Reset operation, which cleared destination data and triggered a new sync, Refresh ensures that data is only cleared after the new set of data has been successfully read. Additionally, Refresh keeps both newly synced and previously captured records, ensuring no data is lost during the reimporting process, even for sources with limited history.
Automatic Detection of Dropped Records: Ensuring Data Integrity The automatic detection of dropped records feature adds an extra layer of reliability to Airbyte syncs. It monitors data at the source, platform, and destination levels, ensuring that all records are accounted for and no data is lost during the sync process. By comparing record counts across these stages, Airbyte can detect if any records were dropped due to issues like serialization problems or large data payloads. If a discrepancy is detected, users are notified so they can take corrective action.
Monitoring Sync Progress and Solving OOM Failures To improve sync success rates, Airbyte has focused on eliminating stuck syncs , a problem often caused by source throttling, memory limits, or platform limitations. Airbyte introduced smarter handling of API rate limits, real-time progress tracking, and adjustments to memory management, ensuring syncs proceed efficiently even with large data volumes or slow data sources. By adding visibility into sync status and automating recovery from common issues, Airbyte has significantly reduced the number of sync failures.
Checkpointing: Ensuring Sync Continuity Airbyte’s checkpointing ensures that any sync failure, whether due to a network outage or system crash, can resume from the last successful state. This feature dramatically improves reliability, especially for large data syncs that could take hours or days to complete.
WASS (WAL Acquisition Synchronization System): Supporting Large CDC Syncs WASS enhances CDC syncs by enabling Airbyte to handle very large databases without losing sync positions in the transaction log. This allows for more efficient and reliable incremental syncs, particularly in databases with high transaction volumes.
Notifications and Webhooks: Monitoring Syncs in Real-Time Airbyte’s notifications and webhooks enable users to monitor the health of their ETL jobs effortlessly. Notifications can alert users to failed jobs, schema changes, and successful syncs via email or Slack. For more advanced automation, webhooks can trigger specific actions—such as transformations or notifications in other systems—based on significant events within the sync. This feature reduces the time spent monitoring data pipelines and allows for seamless integration with other workflow tools.
Connection Timeline (coming in a few days) Replacing the “Job History” tab, the Connection Timeline offers a more comprehensive view of events that impact your data connections. In addition to tracking syncs, refreshes, and clears, it now includes detailed records of schema changes and connection modifications, with insights into which user initiated these actions. This improved transparency helps users troubleshoot issues faster and monitor the evolution of their data connections with greater clarity. This feature will be released in the next few days.
Connector Quality Improvements We increased test coverage in CI and connectors now have regression tests to compare outputs before releasing a new version.
We have also lowered the barrier to contribute to or just edit a connector by migrating API connectors to the low-code framework and even allowing to open a PR from the Connector Builder .
We have even added connectors report usage and success rate stats so you know better what to expect from a connector.
–
These features contribute to a system that can handle terabytes of data while maintaining integrity, a critical need for data engineers managing large-scale data workflows.
Criteria 3: Setting a New Throughput Performance Standard Performance is a key driver for any large-scale data pipeline, and Airbyte 1.0 has significantly improved throughput for:
Database sources: from 1MBps in 2023 to 15MBps today overall. This was done through many database-specific improvements: MongoDB , MySQL , Postgres (benchmark and lessons )
API sources: from 2MBps to 8Mbps today.As an simple example, we switched from json lib to orjon , which sped up the serialization of records by 1.8x .
The actual sync speed will depend on the API limits and the destination you choose. But our goal here is that Airbyte will soon no longer be a bottleneck on the sync speed.
Criteria 4: Fitting All Production Workflows Airbyte 1.0 has been designed to integrate seamlessly into production environments, providing flexibility through multiple management interfaces and deep integration with modern data stacks.
We can now safely say that Airbyte has an interface for all your production workflows:
Our UI (released in 2021), mostly focusing on analytics engineers and startups Our API (released in 2022), enabling programmatic use case of Airbyte, such as embedding connectors in your own product or managing a large number of connections. Our Terraform Provider (released in 2023) to power CI/CD, infrastructure-as-code, and enterprise-scale data movement. Our Python library PyAirbyte (released in 2024) to enable code-base and AI-focused use cases and application building. Each of them has clear documentation, and we will soon release our first official Airbyte course that will go over each one.
Airbyte also integrates natively with all major players in the data infrastructure:
Orchestration: Airbyte integrates with popular orchestration platforms like Airflow, Dagster, and Prefect, making it easy to fit into existing ETL workflows. Transformation: Airbyte supports tools like dbt or SQL to handle data modeling post-sync. Metadata for observability, as in Metaplane’s tutorial . AI- and RAG-specific transformation, including chunking powered by LangChain or Llama, and embeddings enabled by OpenAI, Cohere and other providers, allowing you to load, transform, and store data in a single operation. Of course, all major destinations, from databases, warehouses, lakes, but also vector databases. We’re also releasing Databricks as a certified destination today. Airbyte even provides a well-defined experience to contributors so they can easily contribute to the main repo or any connectors for all the community to benefit. This is now the case with our new marketplace.
Wrapping up Airbyte 1.0 is not just another release; it’s a robust, production-ready platform that meets the complex and evolving needs of data and AI engineers. By focusing on broad use case coverage, reliability, performance, and integration with the rest of the data stack, Airbyte has earned its 1.0 status as the go-to data movement platform for modern data and AI infrastructures.
If you’re ready to future-proof your data pipelines with Airbyte 1.0, get started today or join our upcoming webinar on Airbyte 1.0 to learn more about the exciting features powering this release.
Or you can also check the other announcements of Airbyte 1.0:
The future of data integration is here, and it’s open, both powered by AI and powering it. Let’s build it together!