Agentic Data Engineering Resources

Resource

How to Build Embedded Integrations in SaaS

Compare four embedded integration approaches for SaaS and learn which architecture supports AI agent context delivery.

March 9, 2026

Summarize with AI:

Most teams pick their embedded integration architecture based on how fast they can ship the first replication connector. Then they spend the next two years paying for that decision in maintenance, credential failures, and schema drift. The architecture choice becomes even more consequential as AI agents shift the integration requirement from syncing records to delivering deep, permission-governed, multi-format context.

TL;DR

Embedded integrations let your product connect customer tools across embedded runtime (sync logic) and embedded marketplace (catalog).
Four approaches trade off control vs. speed differently: custom build, unified API, embedded iPaaS, and replication connector platforms.
Maintenance dominates total cost of ownership — credential failures, schema drift, rate limits, and API deprecations accumulate faster than the initial build.
AI agents require a different architecture than record syncing: provider-specific fields, unstructured docs, embeddings, sub-minute freshness, row-level permissions, and vector DB delivery.

What Are Embedded Integrations?

An embedded integration is integration functionality built into your SaaS product.

Engineers commonly group two architectural layers under "embedded":

Embedded Runtime: Sync logic, data transformation, and error handling running on the provider's infrastructure but orchestrated by your product via API.
Embedded Marketplace: A catalog of available integrations.

Most embedded integration platforms provide both. The choice of approach determines how much you build vs. buy.

Which Embedded Integration Architecture Fits Your Use Case?

The right choice depends on what kind of data your product needs and how much maintenance your team can absorb. The following table compares the four approaches across the dimensions that matter most in production.

Dimension	Custom Build	Unified API	Embedded iPaaS	Replication Connector Platform (Airbyte)
What you build	Everything: auth, mapping, sync, UI, monitoring	Application logic on top of normalized API	Workflow configuration using visual designer + pre-built replication connectors	Agent logic on top of replicated data
Data depth	Full (you control what you access)	Shallow (normalized common fields only)	Medium (full provider API available, but you build each workflow)	Deep (full replication preserving provider-specific fields)
Unstructured data	If you build it	No	Limited (depends on replication connector and workflow)	Yes (files + records in same connection, automatic metadata)
Auth management	You build OAuth, token refresh, credential storage	Provider handles all auth	Provider handles replication connector auth; you configure per-customer	Provider handles auth
Time to first integration	2–4 months per integration	Days per category	Weeks per integration	Hours to days per replication connector
Maintenance owner	You (100%)	Provider (replication connectors) + You (app logic)	Provider (replication connectors) + You (workflow logic)	Provider (replication connectors) + You (agent logic)
Data freshness	You control (build polling, webhooks, Change Data Capture (CDC))	CRON polling (minutes to hours)	Depends on platform and replication connector	Incremental sync + CDC (sub-minute)
Permission scoping	You build it	Over-scoped by default (unified OAuth)	Integration-specific scopes	Row-level + user-level access control lists (ACLs) built in
Deployment	Your infrastructure	Provider cloud by default, with additional self‑hosted, hybrid, and on‑premises options supported by major vendors	Provider cloud (some offer private cloud)	Cloud, multi-cloud, on-prem, hybrid
Best for	Core differentiating integrations requiring full control	Broad, shallow integrations across many providers in one category	Complex, customer-configurable workflows across categories	AI agents needing deep, governed, multi-format context

When custom build is worth the cost: Your integration is the product differentiator, and you have the engineering capacity to maintain it. Custom projects consistently take 2–4 months per integration, with actual timelines frequently reaching 2–3x initial estimates. Most products have one to three integrations worth building custom and dozens that aren't.

When unified APIs make sense: Your product needs to support many providers in the same category with standard data. A unified API gets you there in days per category rather than months per provider, though you're limited to the normalized fields the provider exposes.

When embedded iPaaS fits: Your customers need configurable, multi-step workflows that vary by customer. Embedded iPaaS platforms also work well when you integrate across multiple application categories; unified APIs handle this poorly since each category requires a separate provider. The tradeoff is implementation time: each workflow requires configuration, testing, and ongoing maintenance as replication connectors update.

When a replication connector platform fits: Your product needs full-depth data replication with provider-specific fields, unstructured files, automatic embeddings, row-level permissions, and vector database delivery. A replication connector platform like Airbyte handles all of this in one pipeline, with the provider maintaining replication connectors and surfacing schema changes automatically. This is the architecture AI agents require: governed, multi-format context delivered where your agent can retrieve it, not a normalized subset of structured records.

What Does Maintenance Actually Cost?

Every approach claims to reduce maintenance. The actual burden breaks into six categories, and each approach shifts ownership differently.

Maintenance Category	What Breaks	Custom Build	Unified API	Embedded iPaaS	Replication Connector Platform
Credential lifecycle	OAuth tokens expire, refresh tokens rotate, API keys revoked	You handle everything	Provider handles	Provider handles	Provider handles
Schema drift	Provider adds/removes/renames fields across API versions	You detect and fix	Provider absorbs for normalized fields; custom fields unaffected (not in schema)	Provider updates replication connector; you update workflow if affected	Provider updates replication connector; schema changes surfaced automatically
Rate limit changes	Provider adjusts limits by plan tier or policy	You detect and handle	Provider handles within their calling patterns	Provider handles at replication connector level; you handle at workflow level	Provider handles
API version deprecation	Provider sunsets API versions (e.g., Salesforce SOAP → REST, HubSpot V1 → V3)	You migrate manually	Provider migrates	Provider migrates replication connector; you verify workflow compatibility	Provider migrates
New provider features	Provider ships new endpoints, objects, or capabilities	You add support manually	Available only if they fit normalized schema	Available if replication connector is updated and you build workflow	Available when replication connector is updated
Customer support for broken connections	Customer's integration fails; your team investigates	Full investigation burden on your team	Provider logs available; investigation split	Platform provides logs, alerts, debugging tools	Platform provides observability, tracing, metrics

Schema Drift Is the Hidden Tax

Salesforce maintains a release schedule that regularly adds, renames, or deprecates fields. HubSpot migrated from V1 to V3 APIs, and this migration changed field naming conventions entirely; their marketing_emails stream migration note documents significant schema changes and removed fields that were previously present. Even "stable" providers make incremental schema changes that break mapping logic. The cumulative effect is a maintenance load that grows with every provider you support, whether you built the replication connector or not.

Credential Lifecycle Compounds at Scale

OAuth token lifetimes vary dramatically by provider. Salesforce uses an activity-based expiration model and breaks OAuth convention by not returning the expires_at parameter in token responses. HubSpot refresh token responses may include a refresh token rule your system must always follow. When a customer's refresh token fails (revoked, expired session, changed permissions), the integration silently stops working.

At 10 customers, you handle this manually. At 1,000 customers across 15 providers, credential failures become a daily support ticket category. That daily volume is where the maintenance cost shifts from engineering time to product trust.

What Changes When AI Agents Need the Data?

AI agents shift the integration requirement from record syncing to context delivery. The following table maps each agent-specific requirement against the four approaches.

Agent Requirement	Why It Matters	Custom Build	Unified API	Embedded iPaaS	Replication Connector Platform
Provider-specific fields	Agent needs actual Salesforce stage values, not normalized labels	Full access	Lowest common denominator (LCD) schema only	If workflow accesses full API	Full replication
Unstructured data (docs, messages, files)	Agent reasons over documents, not just records	If you build it	Not supported	Limited by replication connector	Files + records in same pipeline
Automatic embeddings + metadata	RAG pipeline needs vectors, not raw text	You build pipeline	Not supported	Not supported	Automatic generation
Sub-minute freshness	Stale data produces stale answers	If you build CDC	CRON polling	Platform-dependent	Incremental sync + CDC
Row-level permissions	Agent must only see what querying user can access	You build ACLs	Over-scoped OAuth	If replication connector supports	Built-in ACLs
Vector database delivery	Agent retrieves via semantic search	You build pipeline	Not supported	Not supported	Delivers to Pinecone, Weaviate, Milvus, Chroma
On-prem / data sovereignty	Enterprise won't send data to third-party cloud	Your infrastructure	Provider cloud only	Some offer private cloud	Deploy anywhere

Consider an agent answering "What's the status of the Acme renewal?" That agent needs the deal record from CRM (with provider-specific stage values), the latest proposal document from Google Drive, the relevant Slack thread, and the permission verification that the user asking is authorized to see Acme's data.

Normalization discards provider-specific data at ingest time, and iPaaS platforms don't generate embeddings or deliver to vector databases. These aren't missing features a product update could add — they reflect architectural decisions about what the pipeline preserves and where it delivers. For AI agent applications, "integration" means governed, multi-format context delivery, and that demands different architecture.

How Do Airbyte Agents Deliver Context for AI Agents?

Airbyte Agents provides embedded data infrastructure designed for the shift from record syncing to governed, multi-format context delivery for AI agents. The pipeline is purpose-built for agent context delivery:

600+ replication connectors with full data replication (provider-specific fields preserved)
Structured records and unstructured files in the same connection with automatic metadata extraction
Embedding generation and delivery to vector databases (Pinecone, Weaviate, Milvus, Chroma)
Row-level and user-level access controls across all sources
Incremental sync with CDC for sub-minute replication
Deployment anywhere (cloud, multi-cloud, on-prem, hybrid)

Each of these capabilities maps directly to an agent requirement that traditional embedded integration platforms leave unaddressed. The difference between syncing a contact record and delivering governed, retrieval-ready context is the difference between a product that reads data and one an AI agent can reason over.

What's the Best Way to Build Embedded Integrations for Your Product?

Start with the use case. If your product syncs structured records across a single SaaS category, a unified API is the fastest path. If customers need configurable multi-step workflows, embedded iPaaS provides the flexibility. If one or two integrations are your core differentiator, build those custom and use a platform for the rest.

If your product delivers AI agent context, the architecture changes entirely, and that requires context engineering infrastructure purpose-built for the job. For AI agent applications, Airbyte Agents pairs this infrastructure with a Context Store: a live, searchable index of business data populated by Airbyte replication connectors and agent connectors. That index gives agents governed context they can retrieve without rebuilding permissions, freshness, and retrieval logic source by source.

Get a demo to see how Airbyte Agents provides AI agent infrastructure, or try Airbyte Agents today.

Frequently Asked Questions

What is the difference between embedded iPaaS and unified API?

Embedded iPaaS provides a visual workflow builder with pre-built replication connectors for creating complex, customer-configurable integrations across any software category. Unified APIs provide a single normalized interface per category (CRM, HRIS, Accounting) with a standard data model. Embedded iPaaS offers more flexibility and depth per integration; unified APIs offer more speed and breadth across providers in a category.

How long does it take to build an embedded integration?

Timelines range from hours (replication connector platform) to months (custom build), with the comparison table above showing specifics per approach. The hidden variable is maintenance: the initial build is often 20–30% of the total cost of ownership over two years.

What maintenance do embedded integrations require?

Maintenance grows with the number of providers you support, not the number of replication connectors you build. Schema drift, credential failures, and API deprecations compound across providers regardless of whether you or a platform owns the replication connector. The maintenance table above breaks down ownership by approach.

Can embedded integrations handle unstructured data?

Most embedded platforms are built for structured records. Processing documents, messages, or recordings alongside those records, and making that content retrievable for AI agents through embeddings and vector search, requires deeper infrastructure than traditional embedded platforms provide.

Do I need different integration infrastructure for AI agents?

The agent requirements table above maps the gap. Traditional platforms cover record syncing well. Agent context delivery requires a pipeline that preserves source fidelity, processes multiple formats, generates embeddings, enforces permissions, and delivers to vector databases as a single integrated system.

Try Airbyte Agents

Airbyte connects your agents to all of your data and assembles context before they run. Build agents that actually know your business.

Try it free Talk to sales

How to Build Embedded Integrations in SaaS

Related posts

Try Airbyte Agents