Blog

AI Agents

Control is Key for Scaling Agentic Products

Why control is essential for scaling agentic products—ensuring reliability, safety, and performance as autonomous systems operate at scale.

AI AGENTS

March 18, 2026

4 min

Michel Tricot

Summarize with AI:

Control is Key for Scaling Agentic Products

Most teams building agent products have never questioned the data their agents receive. They’ve tuned prompts, evaluated models, stress-tested orchestration logic. But the actual bytes flowing into their context windows? They let vendors decide that.

That’s the decision that breaks everything at scale.

At low volume, nobody notices. Your agent calls a vendor API, gets back ten fields when it only needs three, and the extra tokens cost fractions of a penny. But scale that to a million operations a month across ten data sources, and those unquestioned defaults become the single biggest source of cost overruns, hallucinations, and debugging nightmares.

The models hold up fine. The orchestration logic holds up fine. The data flowing into agents does not, because teams have delegated control over what enters their context windows to third-party MCP servers, SaaS APIs, and pre-configured services that were never designed for their specific workflows.

If you don't control your parameters and context windows, your AI becomes useless. The models are fine, it’s the plumbing that’s feeding them garbage.

The Hidden Cost of Default Parameters

The damage starts with something nobody thinks to question, the default API responses.

Your agent queries a vendor API to get ticket information. The API returns 10 fields by default, things like ticket ID, status, assignee, creation date, last modified date, priority, description, full comment history, attachment metadata, and audit log. Your agent only needs three of those fields. Status, assignee, and priority.

At 50 operations a day, nobody notices. The extra seven fields add maybe 900 tokens per call, which is less than a penny in wasted compute.

Now scale to 1 million operations per month. Those 900 unnecessary tokens per call become 900 million wasted tokens monthly. At roughly $0.01 per 1,000 tokens, you're burning $9,000 a month on data your agent doesn't need. And that's just one integration. Most production agents pull from five, ten, sometimes fifteen data sources.

Multiply by the number of sources. Multiply by growth in user base. Multiply by the fact that vendor APIs tend to add fields over time, not remove them. The waste that's invisible at prototype scale becomes the line item that kills your unit economics at production scale.

Control Lives Between Your Agents and Their Data

When I say "control," I don't mean hand-tuning every parameter for every API call. That doesn't scale either.

Control means building infrastructure layers that sit between your agents and their data sources. These layers intercept responses before they hit context windows and apply policies automatically. Think of it as the plumbing between your agent and the outside world.

In practice, this infrastructure handles several responsibilities.

Response interception captures what comes back from a vendor API before your agent ever sees it.
Size-aware filtering enforces limits like "never pass more than 2,000 tokens from any single source into context."
Schema transformation converts inconsistent vendor response formats into a standard structure your agent expects.
Field selection strips out everything except the data your agent actually needs for the task.
Audit logging records what was requested, what was returned, what was filtered, and what ultimately entered the context window.

This should be policy-driven and automatic. You define rules once, things like "All Slack responses get truncated to 2,000 tokens. All Jira responses strip comment history and audit logs. All Salesforce responses convert to our internal schema." Then the infrastructure enforces them on every call, at every scale.

What MCP Gets Right, and Where You Still Need Control

MCP is so convenient and effective that it is easy to gain a sense of complacency. It gives your agents a standardized way to connect to tools without writing bespoke auth flows, handling schema quirks, or maintaining custom integrations for every vendor in your stack. For getting something working fast, it's hard to beat.

The question is what happens as you scale that convenience into production.

Every vendor MCP server returns what makes sense for its own product experience. Slack's gives you message content, thread history, reactions, user metadata, and channel context, everything useful to someone navigating a conversation. Your agent might only need three decision points from a 200-message thread. It gets the rest anyway.

Multiply that across your stack and the defaults start to matter. Salesforce returns what Salesforce considers important. GitHub returns what GitHub considers important. None of those servers were designed with your agent's specific task in mind, and they can't be. That's just the nature of general-purpose tooling.

At low volume, the gap is manageable. At scale, it shows up in token costs, in context windows crowded with data the agent didn't need, and in accuracy problems that are hard to trace back to their source. That's the scenario where you need a layer between what MCP returns and what actually reaches your agent, one that filters, shapes, and enforces context budgets based on what your specific workflows require.

MCP gets you connected. Control keeps you from paying for data you never needed.

Context Window Economics at Scale

The architectural problems above have direct financial consequences, and the math gets ugly fast.

Here's an illustrative example. Assume a production agent product making 1 million calls per month, with average token pricing of $0.01 per 1,000 tokens.

With no control, each call pulls unfiltered responses from three data sources, averaging 5,000 tokens total per call. That's 5 billion tokens a month, running you $50,000.

With basic filtering, you implement field selection and strip unnecessary metadata, reducing average tokens to 2,000 per call. Monthly consumption drops to 2 billion tokens and $20,000. You've saved $30,000 per month just by controlling what gets returned.

With full control layers, you implement response interception, size limits, schema transformation, and task-aware filtering. Average tokens drop to 800 per call, bringing monthly consumption down to 800 million tokens and $8,000.

The difference between no control and full control is $42,000 per month. Over a year, that's more than $500,000. And this assumes flat usage. Most agent products are growing, which means every month of growth multiplies the gap between controlled and uncontrolled costs.

Cost-Aware Routing: Matching Resources to Tasks

The savings don't stop at filtering. Control layers open up a second area of savings that's impossible without them, the ability to route different queries to different models based on what the task actually requires.

Without control, every agent call follows the same path. The same model, the same context window size, the same cost per query. A simple "who's assigned to this bug?" costs the same as "analyze the root cause across these 50 related incidents." That pricing uniformity makes no sense when the queries have fundamentally different complexity.

With control layers inspecting requests before execution, you can implement routing logic. A simple lookup needs 200 tokens of context and a fast, inexpensive model. A root cause analysis needs 4,000 tokens and a more capable model. Your infrastructure recognizes the difference and routes accordingly.

Here's what this looks like in practice. Your interception layer classifies incoming requests by complexity. Simple lookups, status checks, and single-field retrievals route to smaller models with tight context windows. Multi-source analyses, summarization tasks, and reasoning-heavy queries route to larger models with full context budgets.

The economics shift dramatically. In most production agent systems, 70% of calls are simple lookups. Routing those to a model that costs one-tenth as much per token cuts your total compute spend by more than half. Combined with filtering, you're looking at 80–90% cost reduction compared to the uncontrolled baseline.

This requires control at the architecture level. You can't route what you can't inspect. The control layer that filters your responses is the same layer that makes intelligent routing possible.

Hallucinations Get Worse With More Data, Not Better

The default assumption is that giving agents more context improves accuracy. More data, better answers. The opposite is true.

Wasted tokens aren't just expensive, they actively degrade the quality of your agent's output.

Think about why. When an agent asks "what's the latest activity on the Acme account?" and receives 100 tokens with the recent activity summary, it generates an accurate response. When it receives 5,000 tokens including full field history, every internal note, sharing rules, and related opportunity metadata, the model has to figure out which information matters. More noise means more chances to latch onto irrelevant details and generate incorrect summaries.

The pattern in production confirms this. The agents with the worst accuracy are the ones drowning in context, not starving for it. Oversized responses filled with tangentially related data create exactly the conditions where models hallucinate.

Control layers address this by ensuring the information that reaches the context window is relevant to the specific task. Truncation, field selection, and prioritization filter down to signal and remove noise.

The Debugging Black Box Nobody Plans For

Cost overruns and hallucinations are painful, but at least you notice them. The third failure mode is worse, because something breaks and you can't figure out why.

Your agent generates a customer summary that's wrong. The user complains, your team investigates, and then what?

Without control layers, you're debugging a black box. The agent called some vendor services that returned something, but you don't know exactly what, how much data came back, which fields were included, or what filled the context window. You can see the prompt that went to the model and the response that came out. Everything in between is invisible.

With control layers, you have a complete audit trail. You can replay the exact conditions that caused the hallucination, identify whether the problem was bad source data, insufficient filtering, or context overflow, and fix the specific issue.

This isn't optional at production scale. When you're handling thousands of user interactions daily, you need observability into every step of the pipeline. If you can't inspect the data flowing through your system, you can't debug it. And if you can't debug it, you can't improve it.

Building Control Layers: Practical Patterns

Control belongs in the infrastructure layer, between your connectors and your agents. Scattering it across individual agent implementations or burying it in application code creates the same maintenance burden you're trying to avoid.

Here are the patterns that work. If you're hands-on, the implementation examples below illustrate how each one could look in practice.

Response interception middleware

Build a layer that every vendor response passes through before reaching any agent. This layer applies policies, logs data, and enforces constraints. It's the single point where you can inspect and transform everything.

class ResponseInterceptor:
    def __init__(self, policy_registry, logger, default_policy):
        self.policy_registry = policy_registry
        self.logger = logger
        self.default_policy = default_policy

    def intercept(self, source, raw_response, task_type=None):
       policy = self.policy_registry.for_source(source) or self.default_policy
        # Prefer redacted logging for compliance
        self.logger.log_raw_redacted(source, policy.redact(raw_response))
       try:
            filtered = policy.apply_field_selection(raw_response, task_type=task_type)
            truncated = policy.enforce_token_limit(filtered, task_type=task_type)
            normalized = policy.transform_schema(truncated)
        except Exception as exc:
            # Fail closed (drop) or fail open (pass-through) per risk policy.
            self.logger.log_intercept_error(source, str(exc))
            normalized = self.default_policy.safe_fallback(raw_response)
        self.logger.log_processed(source, normalized)
        return normalized

You write these policies once. The infrastructure applies them to every call, regardless of which agent makes the request or how many operations you're running.

Schema normalization

Different vendors return data in wildly different formats. Your control layer transforms all responses into a consistent schema your agents expect. Agent code doesn't need to handle vendor-specific quirks. It just consumes clean, predictable data.

def normalize_to_internal_schema(source, raw_data):
    """Convert vendor-specific responses to standard format."""
    if source == "slack":
        return {
            "content": raw_data.get("text", ""),
            "author": raw_data.get("user", {}).get("real_name", ""),
            "timestamp": raw_data.get("ts", ""),
            "source": "slack"
        }
    elif source == "jira":
        return {
            "content": raw_data.get("fields", {}).get("summary", ""),
            "author": raw_data.get("fields", {}).get("assignee", {}).get("displayName", ""),
            "timestamp": raw_data.get("fields", {}).get("updated", ""),
            "source": "jira"
        }
    # Same pattern for every source.
    # Agent code never touches vendor-specific formats.

Every vendor response flows through this single layer. You define policies per source, apply them automatically, and log everything for debugging.

Policy-based size enforcement

Define maximum token budgets per source, per query type, and per agent workflow. Your interception layer enforces these limits automatically. "Slack responses never exceed 2,000 tokens" becomes a policy, not a per-query decision.

SOURCE_POLICIES = {
    "slack": {"max_tokens": 2000, "strip_fields": ["reactions", "attachments"], "schema": "internal_message_v1"},
    "jira": {"max_tokens": 1500, "strip_fields": ["comment_history"], "schema": "internal_ticket_v1"},
}
class ConfigBackedPolicy:
    def __init__(self, cfg):
        self.cfg = cfg
    def apply_field_selection(self, payload, task_type=None):
        strip = set(self.cfg.get("strip_fields", []))
        return {k: v for k, v in payload.items() if k not in strip}
    def enforce_token_limit(self, payload, task_type=None):
        max_tokens = self.cfg.get("max_tokens", 1000)
        return truncate_to_tokens(payload, max_tokens)
    def transform_schema(self, payload):
        return normalize_to_internal_schema(self.cfg["schema"], payload)
    def redact(self, payload):
        return redact_sensitive_fields(payload)
    def safe_fallback(self, payload):
        return {"content": "", "source": "unknown", "error": "intercept_failed"}

Your agents consume one predictable format regardless of where the data originated. Adding a new source means writing one transformation function, not updating every agent.

Task-aware filtering

This is the most sophisticated pattern. Your control layer knows what the agent is trying to accomplish and filters responses accordingly. A status check strips everything except status fields. A deep analysis passes through more context. Same source, different filtering based on the task.

TASK_FILTERS = {
   TASK_FILTERS = {
    "status_check": {"include_fields": ["status", "assignee", "priority"], "max_tokens": 200},
    "root_cause_analysis": {"include_fields": ["status", "assignee", "description", "comments"], "max_tokens": 4000},
}

def filter_for_task(task_type, source_data):
    task_filter = TASK_FILTERS.get(task_type, {"include_fields": list(source_data.keys()), "max_tokens": 1000})
    filtered = {k: v for k, v in source_data.items() if k in task_filter["include_fields"]}
    return truncate_to_tokens(filtered, task_filter["max_tokens"])

Cost-aware routing

With parameter control, you can route simple queries to cheaper models with smaller context windows and complex queries to capable models with full context. Without control, every call uses the same expensive defaults.

def route_request(task_type, context_tokens):
    if task_type in ["status_check", "simple_lookup"] and context_tokens < 500:
        return {"model": "fast-small", "max_context": 1024}
    elif task_type in ["summarization", "root_cause_analysis"]:
        return {"model": "capable-large", "max_context": 8192}
    else:
        return {"model": "balanced-medium", "max_context": 4096}

These patterns share a common thread. They move control out of vendor hands and into your infrastructure. They apply automatically and compound over time, because every interception prevents downstream problems.

Governance Requires Control, Not Trust

For enterprise teams, control isn't just about cost and performance, it's a prerequisite for deploying agents at all.

Every enterprise security review asks the same questions. What data did this agent access? What was filtered out? Who approved these parameters? Can you produce an audit trail for any individual interaction?

Without the control layers described above, the answer to every one of those questions is "we don't know." That answer stops deployments cold.

The gap between trusting vendors and proving compliance is significant. When you use a vendor's pre-configured service, you're trusting that they handle data correctly, filter appropriately, and log what matters. You have no way to verify any of that. When you control parameters through your own infrastructure, you can produce the receipts.

Compliance Isn't Optional, and Vendors Can't Prove It for You

The compliance requirements are specific and non-negotiable. HIPAA demands audit trails showing exactly what patient data an agent accessed and how it was used. SOC 2 requires evidence that access controls are enforced consistently, not just configured. PCI mandates that cardholder data never enters systems without explicit controls. Meeting any of these with uncontrolled services means asking your vendor for attestations you can't independently verify.

Control layers make governance automatic rather than aspirational. The same interception middleware that reduces costs and improves debugging also produces audit trails, enforces data minimization for HIPAA or GDPR, and documents every filtering decision. The work you do for performance directly serves your compliance posture.

Data Residency Adds Another Layer

Teams building internal agents at enterprises with strict data residency requirements face an additional constraint. If customer data can't leave a specific region or environment, you can't route it through a vendor's cloud-hosted service. You need control layers running in your own infrastructure, in your own region, under your own security policies. For many enterprise teams, on-premises deployment is a regulatory requirement, not a preference.

If you're building agent products that will eventually sell into enterprises, design your control layers for governance from the start. Retrofitting audit trails and compliance controls onto an architecture that delegates everything to vendors is orders of magnitude harder than building them in from the beginning.

When to Delegate vs. When to Control

Controlling everything is its own form of waste. The goal is intentional control, applied where it matters most.

Nobody has a perfect framework for this yet. The boundaries shift depending on your product, your scale, and your compliance requirements. But here's a starting point that's working for the teams I talk to.

Control when frequency is high, delegate when it's low. If an operation runs thousands of times per day, even small inefficiencies matter. Implement filtering, size limits, and response interception. If it runs a few times per week, default vendor responses are fine.
Control when costs are sensitive, delegate when they're not. Workflows where token usage directly impacts unit economics need parameter control. Internal tools with low usage and generous budgets can tolerate vendor defaults.
Control when governance matters, delegate when it's minimal. Any workflow touching customer data, financial information, or compliance-relevant decisions needs the audit trail capabilities described in the previous section. Informational queries with no compliance implications can use third-party services without custom control layers.

Most teams I talk to haven't made this decision consciously. They've defaulted to vendor control because it was easier during prototyping, and they haven't revisited that choice as they've scaled.

The Mistake Every Infrastructure Wave Repeats

There's a specific lesson from cloud computing and SaaS data tools that agent teams keep ignoring, and it's that the moment you should invest in control is before you need it.

Companies that adopted cloud infrastructure with default configurations didn't feel the pain at small scale. The defaults worked. But every month of growth locked in those defaults deeper. By the time teams realized they were paying ten times what they should for storage tiers, compute parameters, and network routing, migrating to controlled configurations meant re-architecting production systems under load.

The same thing happened with SaaS data tools. Teams that outsourced parameter decisions to vendors found themselves locked into formats, schedules, and pricing they couldn't change without months of migration work.

The teams that controlled their infrastructure parameters early didn't do it because they had perfect foresight. They did it because they understood that defaults are architectural debt. Every month you run on vendor-controlled parameters, you accumulate dependencies that make switching harder. Agent infrastructure is at this exact inflection point. The teams implementing control layers now, while volume is still low enough to make changes easily, are the ones whose economics and architecture will survive the transition to production scale.

Where to Start

Here's the audit I'd recommend for any team building agent products.

Map every data source your agents call, and for each one, answer three questions. Do you control what fields get returned? Do you control how much data enters the context window? Can you inspect and replay any individual call for debugging?

If the answer to any of those is no, you've found where to start.

The model race is visible and exciting. But the decisive work is happening in the plumbing, in the control layers that determine what data your agents actually see, how much you pay for it, and whether you can debug it when things go wrong.

Control is about ensuring that as you scale from hundreds of operations to millions, your infrastructure decisions compound in your favor rather than against you.

Subscribe to Agent Blueprint to learn more about agentic data infrastructure.