Introduction Today, we’re excited to launch Data Activation (also known as Reverse ETL) . With data activation, you can sync insights from the consolidated and transformed data in your warehouse to the operational tools your business teams actually use. Operationalize your data by syncing insights from Snowflake, BigQuery, and other data warehouses into destinations like Salesforce, Customer.io, and HubSpot without relying on brittle, custom data pipelines.
Why We Prefer the Term Data Activation At Airbyte, we've embraced the moniker, "Data Activation" because we believe it more accurately captures what we’re doing with our technology. It’s not just moving data backwards from warehouses, but bringing insights to life wherever they create value. While "reverse ETL" suggests a simple reversal of traditional data flows, Data Activation represents a fundamental shift in how we think about data movement: not forward or reverse directions, but seamless, omnidirectional flows that adapt to wherever modern workflows demand them. Most importantly, this term emphasizes the transformative moment when warehouse-resident insights, ML models, and AI outputs become actionable. They literally activate features and capabilities within the tools your teams use every day, turning dormant analytical assets into real business impact at the point of decision.
Why We Built It Our mission has always been to help you move data effortlessly, freeing you to focus on solving hard problems rather than building and maintaining infrastructure. With hundreds of connectors and a robust open source community, we’ve made it easy to bring data into your warehouse. But getting value from that data doesn’t end there.
Modern data teams need to push insights back out to sales tools, CRMs, and support platforms, so frontline teams can take action and make more informed decisions. That’s the challenge to which Data Activation rises: turning analytics into action.
But this process is difficult. APIs are brittle, mapping is complex, and observability is often an afterthought. So we built Data Activation with a few core principles:
Declarative and reliable : Built on Airbyte’s battle-tested platform, with support for retries, monitoring, and sync observability.Warehouse-native : Start from the source you trust—your data warehouse.Flexible identity resolution : Map user identifiers flexibly between sources and destinations.Built-in error handling and reporting : Know when things go wrong, track rejected records, and plan follow-up actions with confidence.Building Data Activation on Airbyte's Foundation Airbyte’s robust, open-source sync engine provided the foundation for Data Activation, allowing us to build on top of proven, scalable infrastructure. Key adaptations include:
Schema Discovery & Field Mapping: Leveraging Airbyte’s connectors, we extended automatic schema discovery to not just read source fields but also present destination fields for flexible mapping, including identity resolution.Filters & Conditional Logic: Built a layer on top of the sync engine to allow users to define multiple filters, and conditional rules for the data being synced.Operational Observability: Expanded logging and monitoring to include real-time stats for activated data, including accepted, filtered, and rejected records.Destination-Specific Enhancements: Added support for operational tool nuances (like Salesforce, HubSpot, Customer.io) such as object-level mapping, custom fields, and lifecycle triggers.Secure, Flexible Authentication: Reused OAuth/API key handling, ensuring that both sources and destinations maintain enterprise-grade security.These adaptations transform Airbyte from a warehouse-centric ETL tool into a full Data Activation platform , bridging modeled data in the warehouse directly to operational systems where teams take action.
How It Works You can configure a destination like HubSpot or Salesforce and set up a sync that pushes your warehouse data directly to specific objects like Contacts or Leads. Through our intuitive UI, you can:
Authenticate securely (OAuth or API key) Discover schemas dynamically Map fields with confidence (including identity resolution) Sync records (insert/update mode with smart cursor management) Monitor performance and observe rejected records in real time Rejected records, such as rows that fail schema validation or API-level rules, are stored in a bucket you manage yourself . A UI-accessible link makes it easy to download and triage issues. This feature is available to you no matter which Airbyte plan you’re on, and the experience is consistent in both products.
Identity Resolution and Field Mapping Pretty quickly, we had to overcome a couple of big challenges while designing a data activation solution.
Destination schema : Unlike traditional data warehouse destinations, the places where you want to activate your data already have strict schemas. The final format of the data needs to satisfy that schema. You can’t, for example, bring data into a CRM that requires a unique identifier like an email address when records in your data warehouse don’t have email addresses.Different identities : different upstream sources tend to base records on different identifiers. For example, your go to market organization might rely on various touchpoints (emails, cookies, mobile apps, CRMs, transactions, social media) to develop a profile of your customers, but each of these sources use different identifiers. Some might result in an email address or phone number, but some rely on probabilistic matching like device fingerprints and browsing patterns. As part of identity resolution, enriching warehouse models with browser fingerprinting signals (e.g., canvas, WebGL, fonts, TLS) can surface high-risk devices and suspected bots.Your organization might have a customer who is a 32-year-old man named Richard who works for a company called Pied Piper, but the data that expresses Richard and his use of your product and marketing ecosystem exists in different systems with different levels of confidence. This presents certain challenges when you want to get these disparate pieces of data into your CRM that has less tolerance for partial records.
Airbyte has two ways to simplify these challenges. Together, they help ensure data activation has a low barrier to entry, even if the current state of data hygiene in your organization is lower than you’d like.
To see Airbyte in action, let’s look at an example where you want to activate marketing analytics data from Google BigQuery and bring it into your Contacts table in your CRM, HubSpot.
Field mapping When you create a connection from your source data warehouse to your destination CRM, Airbyte asks you to map fields. During this process, you select the fields from your source that you want to transfer and the corresponding field it should sync to in your destination.
You can map as few as one field, but you probably want to map many. As you select those fields, you can also apply Airbyte’s more traditional mapping operations: hashing, encrypting, and filtering, ensuring that poorly formatted fields and unexpected PII don’t sneak into your destination unexpectedly.
This is a highly flexible process. You select the source stream, including a sync mode and cursor, if applicable. Then you select the destination stream, insertion method, and primary key.
The insertion method is critical in different ways. Going back to the identity resolution problem, you can imagine a situation where an email exists in many tables in your data warehouse, each derived from a different upstream analytics tool, and each containing a subset of your total data about that person. While it might seem desirable to have a single complete record about this contact in your data warehouse, it’s not actually necessary.
Thanks to the insertion method option, you can choose to upsert data and keep updating the same record as various syncs run from different tables in your data warehouse, eventually compiling one single and complete record about a person in your CRM, even if that complete record was lacking in your data warehouse.
One source might describe product usage while another describes support tickets and another describes marketing campaign engagement. Together these different data sets add up to one very well-rounded record that’s available to your front-line people in the place they already work: HubSpot.
Error Handling and Rejected Records In our BigQuery to HubSpot example, email is HubSpot’s primary key, so we have a requirement that all our contacts must have an email. Yet, if we accept that some of the records in BigQuery are inevitably incomplete and the upstream source did not assign the record an email, what happens?
The answer is HubSpot rejects the record . Rejected records are records Airbyte was unable to sync to your destination, even though the sync itself was otherwise successful. Records become rejected because they don't conform to the schema of the destination. The underlying reasons for this can be complex, but from Airbyte’s perspective, it was imperative we achieved two things:
Not fail the sync because of a rejected record Inform you of the problem so you can repair it When you set up a data activation destination like HubSpot or any other, you have the option to specify an object storage location. In this case, it’s S3. You can parse this log from S3 at regular intervals to identify incomplete and problematic records.
Airbyte’s UI communicates to you that this has happened. Here, we have a particularly egregious example of a successful sync with 0 records loaded, but it demonstrates the point that this is a data issue and not a sync failure in the traditional sense.
This report is also available in Airbyte’s log for the sync, which you can find on the connection Timeline page.
Sync summary: {
"totalStats" : {
"recordsRejected" : 1000
}, "streamStats" : [ {
"streamName" : "USERS" ,
"streamNamespace" : "DATA_PRODUCT" ,
"stats" : {
"recordsRejected" : 1000
} } ],
"performanceMetrics" : {
"mappers" : {
"field-renaming" : 0
}
}
}Should you repair rejected records? Our opinion is that you should when you can. These records may contain valuable data that you want to sync, and in large numbers, their absence can erode the effectiveness of your data activation initiative.
At the same time, rejected records may simply illustrate to you that a particular dataset isn’t robust enough for your modern needs, and it might be time to stop trying to sync the data.
Regardless of your choice, when you repair records, either in your data warehouse or the upstream source that syncs to your data warehouse, Airbyte can process them again during your next sync.
What’s Live Today We’re launching with a focused but powerful MVP:
Sources : Snowflake, BigQuery, PostgresDestinations : Salesforce (Enterprise), Customer.io, HubSpotFeatures : UI mapping, schema discovery, record sync, retry logic, rejected record tracking, and stats reporting (extracted, loaded, rejected)These capabilities are available on all Airbyte plans . Most capabilities, except rejected records handling, are also available on Core, our open source offering. We’ve deliberately scoped this MVP to focus on high-leverage go to market destinations first—prioritizing quality and reliability over surface area.
Conclusion Data Activation represents a critical milestone in Airbyte's journey to unified data movement. By maintaining our commitment to sovereignty while expanding into data activation, we've proven that sovereignty and functionality aren't mutually exclusive. The engineering challenges—from adapting our sync engine to building flexible identity resolution—pushed us to create novel solutions that benefit our entire platform.
As we expand our destination catalog monthly, we're not just adding connectors—we're enabling data teams to close the loop between analytics and action, all while maintaining complete control over their data. The future of data movement is bidirectional, sovereign, and built on proven infrastructure.
Ready to activate your warehouse data? Get started with Data Activation and turn your analytics into action. All features are available in the current Airbyte version 1.8.3 - no new install required!