In my intro post, I mentioned that context has many layers, from memory and first-party data extraction to multiple transformation layers that make context actionable for agents.
After conversations with teams building agents (some succeeding and many failing), I've learned that context is everything. It’s all the external signals and information you inject as part of your prompt for the LLM to reason from. It can be data, past conversations, or memory, like ChatGPT knowing your name. None of this information is encoded into the model itself, so the more context you have, the less hallucination you get.
When you look at how an agent actually works, there's a clear architecture. A model runs on compute, follows instructions from a prompt, and reasons over the context it’s given in the moment. Without context, you're tied to what the LLM was trained on, which is mainly internet data.
Context doesn’t exist on its own. Infrastructure creates it.
Most teams build three layers. Production needs nine. That's why demos sparkle, and production collapses.
Why "Infrastructure" Not Just "Context" Most teams obsess over prompts and retrieval. Which embeddings should we use? Which vector database? These things matter, but they make up only three of the nine layers needed in production. The other six emerge as severe, often catastrophic problems. Brittle connector code breaks when it meets real customer diversity. Permission models don’t transfer across systems. Entity resolution works on samples, then falls apart at scale.
In dozens of conversations, the pattern is consistent. Teams nail retrieval, and their RAG pipeline is clean. Then they reach scale and realize their agents don’t retain conversations across sessions. Every agent action should be traceable back to the information that informed it, so decisions can be explained and audited.
The context pipeline works well until it runs into real customer diversity. That’s when different authentication mechanisms, unexpected rate limits, and the complexity of entity resolution start to surface.
This isn’t a prompting problem or a model problem. It’s an infrastructure problem. And infrastructure has layers.
The 9 Layers: A Complete Map Here's what a complete ‘’agent infrastructure’’ looks like:
Models: LLMs, SLMs, VLMs, the foundation of generative AI use casesExtraction & Integration: SaaS and database connectors, authentication, schema normalization, rate limits, retries, pagination, API change handlingPreparation & Transformation: Data cleaning, document parsing, chunking, embeddings, metadata, entity resolution, semantic linkingIndexing & Retrieval: Vector and hybrid search, cross-source discovery, relevance ranking, context assemblyDelivery & Orchestration: Multi-agent coordination, task routing, tool calls, workflows, error handlingMemory & State: Session memory, long-term enterprise memory, summarization, state awarenessGovernance & Safety: Identity, access control, policy enforcement, audit logs, data sovereignty, secure deploymentCost & Latency Optimization: Caching, deduplication, streaming responses, parallel I/O, performance tuningObservability & Evaluation: Traces, context inspection, latency, freshness metrics, authorization validation, auditabilityLayer 1: Models This is the foundation of generative AI use cases. LLMs, SLMs, VLMs. I won't spend time here because, frankly, models aren't the bottleneck anymore. They're on a predictable curve. Smarter every quarter, cheaper every year. They're becoming commodities.
The real infrastructure work that'll decide whether agents work in production happens above the foundation of GPU, models, prompt, and context layers.
Layer 2: Extraction and Integration How do you get data from Salesforce, Slack, Stripe, and your internal databases into a place where agents can use it?
You need connectors handling OAuth, API keys, and service accounts. These connectors must normalize schemas across different systems. They need to respect rate limits so your integrations don't break under load.
You also need to handle API changes without breaking your product. That’s just for one connector.
This is where many teams reach for MCP servers or similar tools. They’re great for spinning up connector prototypes in minutes. You can point an agent at an API and see value almost immediately. It can be easy to think the integration work is complete until you try to productize it.
MCPs don’t handle rate limits, backoff logic, retries, pagination quirks, or messy edge cases. So, when teams move from a demo to production, they often realize they still have to build all the hard parts themselves. At that point, they’re back at square one.
One team told me they spent six weeks just maintaining Salesforce and HubSpot connectors while key pieces of their roadmap stalled.
Connectors are a living system. You can manage one. Managing two, still doable. But once the count grows, maintenance takes over. Systems change, integrations break, and the last thing you want is a product that stalls because a single connector failed.
All of this work just gets the data out of source systems. Raw data isn't enough, though.
Layer 3: Preparation and Transformation Once you have raw data, you need to transform it into something agents can reason with. That means more than just chunking documents intelligently or generating embeddings. The real work is understanding the connections between different pieces of data.
If you're feeding an agent information about charges from Stripe, but it has no way to pull customer information or link those charges to specific people, you have a problem. Entity resolution makes sure "John Smith" in Salesforce is recognized as the same person as "jsmith@company.com" in your support system.
Without that semantic layer, your context is just raw data sitting in a vector database. With it, it becomes an understanding the agent can act on.
Now you can build search.
Layer 4: Indexing and Retrieval Agents need the ability to discover and retrieve relevant information from multiple sources in real time. This means searching across data silos so they understand which information is available and how it connects. You need to surface the most relevant results for the agent's current task.
This is the layer everyone talks about. It's important. But it's just one layer.
At this layer, systems thinking matters.
Layer 5: Delivery and Orchestration When multiple agents work together, it quickly becomes a distributed systems architecture applied to agents. Orchestration handles task routing between agents, reliable tool use lets them call APIs and act in the real world, and resilient error handling keeps the system working when one agent in the chain fails.
Layer 6: Memory and State Agents need access to data that matters. Not just the current interaction, but the full context of what's available across enterprise systems. What customer data exists in the CRM? What tickets are open? What preferences have been set in other tools?
From what I’ve seen, agents need different types of memory: session memory for short-term context, long-term memory for persistent understanding, and summarization so memory doesn’t grow unbounded.
Every enterprise team I talk to asks about this first.
Layer 7: Governance and Safety This is the layer that keeps you out of legal trouble. When agents are handling sensitive internal data, personal information, or regulated information, you need to know exactly what they can access and what they cannot.
Open-source infrastructure is essential here precisely because these systems touch such sensitive information. You need the ability to deploy and have full control over what's happening, to see the technology transparently, and to audit every access and action.
You need audit logs, policy enforcement, and clear control over data sovereignty.
For enterprise deployments, this layer is make-or-break. When you're dealing with security compliance issues and sensitive information, getting this foundation right determines whether you can ship anything at all.
This is where demos break.
Layer 8: Cost and Latency Optimization This is the layer where systems that looked good in demos start to break at scale. Real-time data access becomes slow and expensive, and doesn’t hold up under real load.
Agentic data infrastructure needs to be reliable and observable. That means connectors must handle authentication, schema variations, and rate limits.
Smart caching defines reuse windows, deduplication removes redundant fetches, and streaming responses allow agents to think before retrieval finishes. With parallel I/O, a slow Salesforce API never stalls the rest of the system.
This layer separates real-time data access from demo environments.
Layer 9: Observability and Evaluation How do you know your agent is working? How do you debug when it's not?
In practice, teams need traces that show exactly which data an agent retrieved, which tools it called, and what context shaped its decision. You also need to define acceptable performance for context freshness and accuracy. Agents must only access data they’re authorized to see, and every action needs to remain auditable.
Without this, you're guessing.
The Historical Pattern: Infrastructure Captures Value Infrastructure maturation follows a consistent pattern. APIs proliferated before Stripe and Twilio made them reliable. Cloud computing existed before AWS made it accessible. The breakthrough wasn't the technology but the infrastructure.
Value doesn't sit with the raw technology. It sits with the infrastructure that makes it dependable.
Innovation moves so fast that no single company can know the best patterns on its own. They only emerge through broad adoption and learning from how people use the technology in production. And because agents increasingly handle sensitive internal data, including HR systems, you need infrastructure with full control and transparency.
That's why the need for open, deployable infrastructure is real.
Why Layers Have Dependencies You can't skip layers. They build on each other.
Each layer requires different expertise. Layer 2 relies on security and integration engineering. Layer 5 needs distributed systems thinking. Layer 7 depends on compliance and risk expertise. Layer 9 needs DevOps and SRE experience. No single engineer covers all nine layers, which is why teams struggle.
When layers are addressed out of order, failures compound. Skip Layer 2 and Layer 4 retrieval breaks because there’s no reliable data to index. Build governance in Layer 7 without observability in Layer 9, and you can’t audit what your policies actually enforce. Optimize Layer 8 without understanding entity resolution in Layer 3, and you end up caching the wrong data.
Without reliable data ingress, effective agent orchestration is impossible. Governance and observability are interdependent. If you don't track permissions, your audit logs are incomplete.
Teams that shortcut foundational infrastructure work create problems that only surface later. By the time they're scaling to multiple customers, especially larger ones, the maintenance burden becomes overwhelming. Your engineers stop building features and start debugging connectors.
How to Use This Framework Audit your stack layer by layer. Where do you have real solutions? Where are you hacking things together? Where have you built nothing at all?
Most teams discover they've invested heavily in certain core capabilities, while others are missing or poorly implemented. For example, they might have strong model infrastructure and retrieval systems, but lack proper governance, entity resolution, or real-time data access.
The pattern is consistent: visible technical components get attention, while the data infrastructure that actually makes agents work at scale is neglected.
That's your roadmap. The gaps are where your production failures will emerge. Prioritize by risk. If you're handling sensitive data, governance can't wait. If you're burning budget on model calls, optimization is urgent. If you can't debug failures, observability comes first.
The framework makes invisible complexity visible. It turns “we need agents” into a concrete set of engineering problems around data access, permissions, entity resolution, and safe write-backs. These are known problems with proven solutions that teams can build on instead of reinventing from scratch.
Making the Invisible Visible The reason teams underestimate agent infrastructure is simple. Most failures happen silently, in data connectors and integration layers that nobody glamorizes. You don't see them in demos. You don't read about them in papers. They're invisible until they break.
Early demos usually look solid. Problems tend to surface when teams start adding more customer integrations and scaling beyond the first few.
What's Coming In the following weeks, I’ll keep coming back to this 9-layer framework and deep-dive into the most critical layers. We’ll go beyond the framework into concrete patterns, architectures, and solutions.
You can expect answers to:
How do you actually build reliable extraction at scale? What does good governance look like? How do you optimize costs without sacrificing quality? What observability patterns work for agents versus traditional software? The answers to these questions determine whether your agent ships or dies in staging.
The model race is visible and thrilling. But the decisive work is happening where fewer people look: in the agentic data infrastructure that gives agents access to real data, reliable connectors, and trustworthy context.
That's where agentic AI will be won.
Subscribe to Agent Blueprint to learn more about agentic data infrastructure.