Agentic Data Engineering Resources

Resource

LangChain Agents Explained

Learn how LangChain agents use the ReAct loop, tool calling, and LangGraph runtime. Covers create_agent, middleware, and connecting agents to enterprise data.

Pedro Lopez

February 26, 2026

Summarize with AI:

LangChain became the default framework for building AI agents because it solved the right abstraction problem at the right time: it gave developers a standard interface for large language model (LLM) reasoning, tool calling, and dynamic workflows right as the industry shifted from static chains to autonomous agents. With 90,000+ GitHub stars and 1,000+ integrations, it anchors the ecosystem most teams build on.

TL;DR

LangChain agents use an LLM to dynamically choose actions and tools in a loop (the ReAct pattern), offering more flexibility than pre-defined chains. The current architecture is built on LangGraph for durable, stateful execution.
Agents consist of a model, tools, a system prompt, and state management. The create_agent function is the standard entry point, and middleware adds production features like human-in-the-loop approval.
The ecosystem is layered: LangGraph for custom workflows, LangChain for standard agents, and Deep Agents for complex autonomous tasks. LangSmith provides observability and debugging across all layers.
Connecting agents to real-world enterprise data is the biggest production hurdle. Authentication, permissions, and data normalization create infrastructure challenges that platforms like Airbyte Agents are purpose-built to handle.

What Are LangChain Agents?

A LangChain agent is a system that uses an LLM as its reasoning engine to decide which actions to take and in what order. The LLM doesn't just generate text. It evaluates the current situation, selects from available tools, interprets the results, and repeats until it produces a useful answer.

The key distinction is between agents and chains:

Chains follow a fixed sequence defined at development time: Step 1 feeds into Step 2, which feeds into Step 3. You define the pipeline, and it executes the same way every time.
Agents decide dynamically. The LLM assesses what's needed based on the input and context, then assembles its own workflow at runtime.

A concrete example makes the difference clear. A user asks: "What should I wear for my bike ride today?" A chain needs a pre-built pipeline designed for that exact question: check weather, then recommend clothing. An agent with access to weather, route planning, and clothing recommendation tools assembles the right workflow on the fly. It might check the weather first, realize it needs the route to assess wind exposure, call the route tool, then combine both results into a clothing recommendation. The agent determines the path; the developer provides the tools.

How Do LangChain Agents Work?

LangChain agents run the ReAct (Reasoning + Acting) pattern in a loop. LangGraph implements this loop through its graph-based runtime, with the model and tools as nodes connected by conditional edges. This runtime provides persistence, checkpointing, and human-in-the-loop support.

The ReAct Loop

The core decision loop operates as a cyclic graph structure with four steps:

Reasoning (Thought Phase): The model node receives the message history and performs chain-of-thought reasoning. It decides whether to use tools or provide a final answer.
Acting (Action Phase): If the model determines that tools are needed, it generates an AIMessage with tool_calls attributes containing structured data (tool name, arguments, and call IDs).
Observation Phase: The tools node executes the requested tools and returns ToolMessage objects containing the results. LangGraph appends these to the message history as an append-only conversation record in state.
Iteration: Control returns to the model node with the updated message history. The model sees the tool results and decides whether to call more tools or provide a final answer.

The conditional edge logic drives the cycle. It inspects the last message for tool_calls. If tool calls are present, it routes to the tools node for execution. Otherwise, it routes to END and terminates the agent loop. When the model produces a response without tool_calls, the agent returns the final output to the user.

<pre><code>from langchain.agents import create_agent

from langchain.tools import tool

@tool
def get_weather_for_location(city: str) -&gt; str:
    &quot;&quot;&quot;Get weather for a given city.&quot;&quot;&quot;
    return f&quot;It's always sunny in {city}!&quot;

model = init_chat_model(&quot;claude-sonnet-4-5-20250929&quot;, temperature=0)

SYSTEM_PROMPT = &quot;&quot;&quot;You are a helpful assistant that provides weather information.

If you can tell from the question that they mean wherever they are,
use the get_user_location tool to find their location.&quot;&quot;&quot;

agent = create_agent(
    model=model,
    system_prompt=SYSTEM_PROMPT,
    tools=[get_weather_for_location]
)

result = agent.invoke({
    &quot;messages&quot;: [{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;What should I wear for my bike ride in SF today?&quot;}]
})</code></pre>

Core Components

Every LangChain agent is built from four parts:

Component	What It Does	Key Detail
Model	Acts as the reasoning engine. LangChain provides a standard interface across providers (OpenAI, Anthropic, Google, and open-source models) through init_chat_model.	Must support tool calling (function calling). You can swap models without rewriting agent logic.
Tools	Python functions the agent can call. Each tool has a name, description, and input schema. The LLM reads these descriptions to decide when and how to use each tool.	Can be static (defined at agent creation) or dynamic (resolved at runtime based on context, such as user permissions).
System prompt	Instructions that shape the agent's behavior, personality, and constraints.	Tells the LLM how to reason, what tone to use, and what boundaries to respect.
State and memory	LangGraph's state management keeps conversation history and tool results across the loop.	A checkpointer persists state across sessions. Thread IDs isolate different conversations from each other.

Tools are defined using the @tool decorator, which generates the name, description, and input schema automatically from type hints and docstrings:

<pre><code>from langchain_core.tools import tool

@tool
def get_weather(city: str) -&gt; str:
    &quot;&quot;&quot;Get the current weather for a given city.&quot;&quot;&quot;
    return f&quot;72°F, sunny in {city}&quot;</code></pre>

Middleware

Middleware is a new abstraction in LangChain's v1.0 architecture. It provides composable hooks that modify agent behavior without rewriting core logic. Middleware intercepts execution at three points: before the model call, after the model responds, and around tool execution.

HumanInTheLoopMiddleware gates sensitive tool calls behind human approval. SummarizationMiddleware compresses conversations automatically when context grows too long. Custom RedactionMiddleware strips sensitive data before logging. You can compose multiple middleware into a stack:

<pre><code>from langchain.agents.middleware import PIIMiddleware, HumanInTheLoopMiddleware

agent = create_agent(
    model=&quot;claude-sonnet-4-5-20250929&quot;,
    tools=[deploy_service, rollback_service],
    middleware=[
        SummarizationMiddleware(model=&quot;openai:gpt-4o-mini&quot;, max_tokens_before_summary=3000),
        HumanInTheLoopMiddleware(interrupt_on={&quot;deploy_service&quot;: True})
    ]
)</code></pre>

Teams add security, compliance, and operational concerns as middleware layers rather than embedding them in agent logic or forking the framework.

What Is the Difference Between LangChain, LangGraph, and Deep Agents?

This is the most confusing part of the LangChain ecosystem, and the rapid pace of change hasn't helped. Here's how the layers relate.

Layer	What It Is	When to Use It
LangChain agents (create_agent)	Pre-built ReAct agent with tool calling, streaming, and provider-agnostic model interfaces on top of LangGraph Runtime. Runs the Thought → Action → Observation loop for you.	Standard agent patterns with tool calling and basic loops.
LangGraph	Low-level orchestration framework at the foundation. Defines agent behavior as a graph of nodes (processing steps) and edges (routing logic). Handles durable execution, streaming, checkpointing, and state management. create_react_agent is also available as a lightweight starting point.	Custom workflows mixing deterministic and agentic steps, complex multi-agent coordination, or fine-grained control over execution flow.
Deep Agents	Sits on top of LangChain's agent abstraction. According to official documentation, it is "our agent harness for building autonomous, long-running agents that tackle complex, open-ended tasks over extended time horizons." Includes built-in task planning tools, filesystem access for context management, sub-agent spawning, automatic conversation compression, and long-term memory persistence.	Projects requiring planning, task decomposition, or extended autonomy.
LangSmith	Separate commercial platform providing tracing, debugging, evaluation, and cost monitoring across all three layers above.	Observability and production monitoring for any layer.

The relationship is hierarchical: Deep Agents build on LangChain agents, and LangChain agents build on LangGraph. Many teams start with create_agent and move to LangGraph when they need more control, or up to Deep Agents when they need long-running autonomy.

What Can You Build with LangChain Agents?

The ReAct loop and tool-calling architecture support a range of production patterns:

Research and Knowledge Assistants

These are agents that search across documents, web sources, and knowledge bases to answer complex questions. The agent decides whether to search, retrieve, or synthesize based on the query. Morningstar built Mo, a financial research assistant using LangChain, that lets analysts query their research database using natural language to generate concise investment insights in seconds.

Customer Support Agents

These are agents that handle customer inquiries by dynamically routing questions, looking up account information, checking order status, and pulling relevant knowledge base articles, all without pre-built decision trees dictating every path.

With access to customer relationship management (CRM) and ticketing tools, a support agent can authenticate a customer, retrieve their recent orders, check shipment tracking, and compose a resolution in a single conversation.

When an issue exceeds the agent's capabilities or involves sensitive actions such as issuing refunds above a threshold, the agent escalates to a human representative with full context attached. Klarna deployed an AI assistant using LangSmith and LangGraph that reduced customer resolution time by 80% while maintaining regulatory compliance in financial services.

Data Analysis and Reporting

You can also build agents that translate natural language questions into database queries, perform calculations, and present findings. A user asks "What was our churn rate last quarter compared to the previous one?" and the agent writes the SQL, executes it, interprets the results, and generates a summary with key trends highlighted. Teams often package this as a searchable database an analyst can query in plain language instead of writing SQL.

When an initial query doesn't fully answer the question, the agent iterates: it refines its SQL, pulls additional context, or breaks the analysis into sub-steps without requiring the user to specify each operation.

Multi-Agent Systems

LangGraph lets you orchestrate multiple specialized agents into coordinated workflows.

A supervisor agent routes tasks to domain-specific sub-agents, each with its own tools and prompt: a research agent, a coding agent, and a data agent working in concert. This pattern handles complex workflows where a single agent can't cover all required domains.

DocentPro built a multi-agent system in LangGraph for AI-powered travel itinerary search, with LangSmith tracing across autonomous components.

What Are the Limitations of LangChain Agents?

LangChain agents introduce tradeoffs that don't exist with static chains. Debugging and data access are where most teams run into friction.

Debugging and Predictability

Because agents choose their own path, behavior is harder to predict than fixed chains. The same input can produce different tool-calling sequences across runs. When something fails, you need to trace both the LLM's reasoning and the tool execution chain to understand what went wrong.

Silent failures compound this. A tool errors out, but the agent continues with broken assumptions. Retry storms amplify API usage without logging. Context grows without bound in long sessions, and token costs climb accordingly.

LangSmith helps with observability and tracing, but debugging still requires real investment in evaluation and testing infrastructure. Production teams typically implement bespoke test logic for each data point, single-step evaluations for specific decision points, full agent turn testing, multi-turn conversations with conditional logic, and proper environment setup with clean, reproducible test conditions.

Data Access at Scale

Tutorials show agents calling simple tools like a weather API, a calculator, or a web search. Production agents need access to customer data across CRMs, support tickets, knowledge bases, and communication tools. Each source has different APIs, authentication flows, rate limits, and data formats.

This is where most teams hit a wall. In practice, data access infrastructure often takes longer to build than the agent logic itself: OAuth token lifecycle management, user-delegated permission enforcement, service-specific rate limiting, schema normalization across heterogeneous sources, and enterprise security requirements.

Tokens expire mid-workflow, schemas misalign across systems, permission models vary between platforms, and burst-pattern rate limiting breaks across dozens of services. None of these issues surface when you build against a single, clean data source, which is why most tutorials skip them entirely. Full observability becomes essential because agents expose data access failures that tutorial examples never show.

How Do You Connect LangChain Agents to Enterprise Data?

Getting agents into production requires solving the data access layer separately from agent logic.

The Data Infrastructure Challenge

LangChain agents are only as useful as the data they can access. Most teams start by hardcoding a few API connections as custom tools, which works for prototypes. But scaling an AI agent to production reveals infrastructure requirements that go far beyond agent logic: multi-tenant credential management, automatic schema normalization, row-level access controls enforced at the data layer, and compliance-grade audit trails.

This is exactly the problem Airbyte Agents was built to solve.

Instead of writing custom tool code for each data source, Airbyte Agents provides automatic schema normalization, row-level access controls, live agent connectors with standardized interfaces, automatic embedding generation during replication, and a Context Store that stores relevant data in Airbyte-managed object storage for sub-second search, without repeatedly querying vendor APIs. Agent MCP integration lets agents discover and manage data sources programmatically through MCP.

Your team focuses on agent logic instead of data plumbing.

Airbyte as a Document Loader for LangChain

Airbyte integrates directly with LangChain through document loaders. Data loads as duck-typed Document objects compatible with LangChain's Document class, so data flows into existing workflows without conversion overhead. These document loaders support sources including Gong, Hubspot, Salesforce, Shopify, Stripe, Typeform, and Zendesk Support.

The integration supports incremental syncs through Airbyte's synchronization modes and only replicates changed data since the previous sync. This keeps vector databases current without full reloads. For sub-minute freshness, Change Data Capture (CDC) reads database transaction logs to track INSERT, UPDATE, and DELETE operations, with metadata fields like abcdc_updated_at and abcdc_deleted_at capturing timing information. Traditional batch syncs miss updates until the next scheduled run, so agents work with stale information between sync windows.

What Is the Fastest Way to Build LangChain Agents That Access Real Data?

LangChain gives you the agent framework, including the ReAct loop, tool calling architecture, middleware system, and checkpoint-based state persistence. The fastest path to production pairs this with purpose-built AI data infrastructure like Airbyte Agents, which handles the data layer so your team can focus on agent logic, retrieval quality, and tool design.

Airbyte provides document loader compatibility for LangChain and programmatic pipeline management for complex workflows. Agent SDK gives teams another way to plug Airbyte into existing agent-development workflows, while open-source agent connectors (including Gong, Zendesk Support, GitHub, HubSpot, Salesforce, Jira, Asana, Stripe, Greenhouse, and Linear) support live data operations with strongly typed interfaces.

For teams that need fast, reliable context retrieval across business systems, Context Store helps agents access relevant enterprise data without repeatedly querying source systems at runtime.

Get a demo to see how Airbyte Agents handles data access for your LangChain agents, or try Airbyte Agents today.

Frequently Asked Questions

Is LangChain free to use?

LangChain's core framework and LangGraph are both open source under the MIT license, free for personal and commercial use. LangSmith is a separate commercial platform that offers a free tier with 5,000 monthly traces and paid tiers for production workloads.

Should I use LangChain or LangGraph?

Start with create_agent or LangGraph's create_react_agent for standard agent patterns. Move to LangGraph's full graph API when you need custom workflows that mix deterministic and agentic steps, complex multi-agent coordination, or fine-grained control over the execution graph. Most teams begin with the higher-level abstraction and drop down to LangGraph when they hit its limits.

What models work with LangChain agents?

LangChain supports OpenAI (GPT-4o, GPT-4), Anthropic (Claude), Google (Gemini), and many open-source models through init_chat_model. Any model with tool calling (function calling) support works.

What are Deep Agents in LangChain?

Deep Agents sit on top of LangChain's agent abstraction and are designed for autonomous, long-running tasks that require planning, task decomposition, and extended context management. They include built-in task planning tools, sub-agent spawning, automatic conversation compression, and long-term memory persistence. Use Deep Agents when your workload needs hours of autonomous execution rather than a single request-response cycle.

How does Airbyte integrate with LangChain?

Airbyte provides LangChain document loaders with incremental sync and CDC support for keeping vector databases current. For production deployments, Airbyte Agents adds multi-tenant credential management, automatic embedding generation, row-level access controls, MCP integration, and sub-second search through the Context Store.

Try Airbyte Agents

Airbyte connects your agents to all of your data and assembles context before they run. Build agents that actually know your business.

Try it free Talk to sales

LangChain Agents Explained

Related posts

Try Airbyte Agents