Skip to main content
LangChain provides prebuilt middleware for common use cases. Each middleware is production-ready and configurable for your specific needs.

Provider-agnostic middleware

The following middleware work with any LLM provider:
MiddlewareDescription
SummarizationAutomatically summarize conversation history when approaching token limits.
Human-in-the-loopPause execution for human approval of tool calls.
Model call limitLimit the number of model calls to prevent excessive costs.
Tool call limitControl tool execution by limiting call counts.
Model fallbackAutomatically fallback to alternative models when primary fails.
PII detectionDetect and handle Personally Identifiable Information (PII).
To-do listEquip agents with task planning and tracking capabilities.
LLM tool selectorUse an LLM to select relevant tools before calling main model.
Tool retryAutomatically retry failed tool calls with exponential backoff.
LLM tool emulatorEmulate tool execution using an LLM for testing purposes.
Context editingManage conversation context by trimming or clearing tool uses.
Shell toolExpose a persistent shell session to agents for command execution.
File searchProvide Glob and Grep search tools over filesystem files.

Summarization

Automatically summarize conversation history when approaching token limits, preserving recent messages while compressing older context. Summarization is useful for the following:
  • Long-running conversations that exceed context windows.
  • Multi-turn dialogues with extensive history.
  • Applications where preserving full conversation context matters.
API reference: SummarizationMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger={"tokens": 4000},
            keep={"messages": 20},
        ),
    ],
)
model
string | BaseChatModel
required
Model for generating summaries. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance. See init_chat_model for more information.
trigger
dict | list[dict]
Conditions for triggering summarization. Can be:
  • A single condition dict (all properties must be met - AND logic)
  • A list of condition dicts (any condition must be met - OR logic)
Each condition can include:
  • fraction (float): Fraction of model’s context size (0-1)
  • tokens (int): Absolute token count
  • messages (int): Message count
At least one property must be specified per condition. If not provided, summarization will not trigger automatically.
keep
dict
default:"{messages: 20}"
How much context to preserve after summarization. Specify exactly one of:
  • fraction (float): Fraction of model’s context size to keep (0-1)
  • tokens (int): Absolute token count to keep
  • messages (int): Number of recent messages to keep
token_counter
function
Custom token counting function. Defaults to character-based counting.
summary_prompt
string
Custom prompt template for summarization. Uses built-in template if not specified. The template should include {messages} placeholder where conversation history will be inserted.
trim_tokens_to_summarize
number
default:"4000"
Maximum number of tokens to include when generating the summary. Messages will be trimmed to fit this limit before summarization.
summary_prefix
string
Prefix to add to the summary message. If not provided, a default prefix is used.
max_tokens_before_summary
number
deprecated
Deprecated: Use trigger: {"tokens": value} instead. Token threshold for triggering summarization.
messages_to_keep
number
deprecated
Deprecated: Use keep: {"messages": value} instead. Recent messages to preserve.
The summarization middleware monitors message token counts and automatically summarizes older messages when thresholds are reached.Trigger conditions control when summarization runs:
  • Single condition object (all properties must be met - AND logic)
  • Array of conditions (any condition must be met - OR logic)
  • Each condition can use fraction (of model’s context size), tokens (absolute count), or messages (message count)
Keep conditions control how much context to preserve (specify exactly one):
  • fraction - Fraction of model’s context size to keep
  • tokens - Absolute token count to keep
  • messages - Number of recent messages to keep
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware


# Single condition: trigger if tokens >= 4000 AND messages >= 10
agent = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger={"tokens": 4000, "messages": 10},
            keep={"messages": 20},
        ),
    ],
)

# Multiple conditions
agent2 = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger=[
                {"tokens": 5000, "messages": 3},
                {"tokens": 3000, "messages": 6},
            ],
            keep={"messages": 20},
        ),
    ],
)

# Using fractional limits
agent3 = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger={"fraction": 0.8},
            keep={"fraction": 0.3},
        ),
    ],
)

Human-in-the-loop

Pause agent execution for human approval, editing, or rejection of tool calls before they execute. Human-in-the-loop is useful for the following:
  • High-stakes operations requiring human approval (e.g. database writes, financial transactions).
  • Compliance workflows where human oversight is mandatory.
  • Long-running conversations where human feedback guides the agent.
API reference: HumanInTheLoopMiddleware
Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver

agent = create_agent(
    model="gpt-4o",
    tools=[read_email_tool, send_email_tool],
    checkpointer=InMemorySaver(),
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={
                "send_email_tool": {
                    "allowed_decisions": ["approve", "edit", "reject"],
                },
                "read_email_tool": False,
            }
        ),
    ],
)
For complete examples, configuration options, and integration patterns, see the Human-in-the-loop documentation.

Model call limit

Limit the number of model calls to prevent infinite loops or excessive costs. Model call limit is useful for the following:
  • Preventing runaway agents from making too many API calls.
  • Enforcing cost controls on production deployments.
  • Testing agent behavior within specific call budgets.
API reference: ModelCallLimitMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import ModelCallLimitMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        ModelCallLimitMiddleware(
            thread_limit=10,
            run_limit=5,
            exit_behavior="end",
        ),
    ],
)
thread_limit
number
Maximum model calls across all runs in a thread. Defaults to no limit.
run_limit
number
Maximum model calls per single invocation. Defaults to no limit.
exit_behavior
string
default:"end"
Behavior when limit is reached. Options: 'end' (graceful termination) or 'error' (raise exception)

Tool call limit

Control agent execution by limiting the number of tool calls, either globally across all tools or for specific tools. Tool call limits are useful for the following:
  • Preventing excessive calls to expensive external APIs.
  • Limiting web searches or database queries.
  • Enforcing rate limits on specific tool usage.
  • Protecting against runaway agent loops.
API reference: ToolCallLimitMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import ToolCallLimitMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool],
    middleware=[
        # Global limit
        ToolCallLimitMiddleware(thread_limit=20, run_limit=10),
        # Tool-specific limit
        ToolCallLimitMiddleware(
            tool_name="search",
            thread_limit=5,
            run_limit=3,
        ),
    ],
)
tool_name
string
Name of specific tool to limit. If not provided, limits apply to all tools globally.
thread_limit
number
Maximum tool calls across all runs in a thread (conversation). Persists across multiple invocations with the same thread ID. Requires a checkpointer to maintain state. None means no thread limit.
run_limit
number
Maximum tool calls per single invocation (one user message → response cycle). Resets with each new user message. None means no run limit.Note: At least one of thread_limit or run_limit must be specified.
exit_behavior
string
default:"continue"
Behavior when limit is reached:
  • 'continue' (default) - Block exceeded tool calls with error messages, let other tools and the model continue. The model decides when to end based on the error messages.
  • 'error' - Raise a ToolCallLimitExceededError exception, stopping execution immediately
  • 'end' - Stop execution immediately with a ToolMessage and AI message for the exceeded tool call. Only works when limiting a single tool; raises NotImplementedError if other tools have pending calls.
Specify limits with:
  • Thread limit - Max calls across all runs in a conversation (requires checkpointer)
  • Run limit - Max calls per single invocation (resets each turn)
Exit behaviors:
  • 'continue' (default) - Block exceeded calls with error messages, agent continues
  • 'error' - Raise exception immediately
  • 'end' - Stop with ToolMessage + AI message (single-tool scenarios only)
from langchain.agents import create_agent
from langchain.agents.middleware import ToolCallLimitMiddleware


global_limiter = ToolCallLimitMiddleware(thread_limit=20, run_limit=10)
search_limiter = ToolCallLimitMiddleware(tool_name="search", thread_limit=5, run_limit=3)
database_limiter = ToolCallLimitMiddleware(tool_name="query_database", thread_limit=10)
strict_limiter = ToolCallLimitMiddleware(tool_name="scrape_webpage", run_limit=2, exit_behavior="error")

agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool, scraper_tool],
    middleware=[global_limiter, search_limiter, database_limiter, strict_limiter],
)

Model fallback

Automatically fallback to alternative models when the primary model fails. Model fallback is useful for the following:
  • Building resilient agents that handle model outages.
  • Cost optimization by falling back to cheaper models.
  • Provider redundancy across OpenAI, Anthropic, etc.
API reference: ModelFallbackMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import ModelFallbackMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        ModelFallbackMiddleware(
            "gpt-4o-mini",
            "claude-3-5-sonnet-20241022",
        ),
    ],
)
first_model
string | BaseChatModel
required
First fallback model to try when the primary model fails. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance.
*additional_models
string | BaseChatModel
Additional fallback models to try in order if previous models fail

PII detection

Detect and handle Personally Identifiable Information (PII) in conversations using configurable strategies. PII detection is useful for the following:
  • Healthcare and financial applications with compliance requirements.
  • Customer service agents that need to sanitize logs.
  • Any application handling sensitive user data.
API reference: PIIMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        PIIMiddleware("email", strategy="redact", apply_to_input=True),
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
    ],
)

Custom PII types

You can create custom PII types by providing a detector parameter. This allows you to detect patterns specific to your use case beyond the built-in types. Three ways to create custom detectors:
  1. Regex pattern string - Simple pattern matching
  2. Custom function - Complex detection logic with validation
from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware
import re


# Method 1: Regex pattern string
agent1 = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        PIIMiddleware(
            "api_key",
            detector=r"sk-[a-zA-Z0-9]{32}",
            strategy="block",
        ),
    ],
)

# Method 2: Compiled regex pattern
agent2 = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        PIIMiddleware(
            "phone_number",
            detector=re.compile(r"\+?\d{1,3}[\s.-]?\d{3,4}[\s.-]?\d{4}"),
            strategy="mask",
        ),
    ],
)

# Method 3: Custom detector function
def detect_ssn(content: str) -> list[dict[str, str | int]]:
    """Detect SSN with validation.

    Returns a list of dictionaries with 'text', 'start', and 'end' keys.
    """
    import re
    matches = []
    pattern = r"\d{3}-\d{2}-\d{4}"
    for match in re.finditer(pattern, content):
        ssn = match.group(0)
        # Validate: first 3 digits shouldn't be 000, 666, or 900-999
        first_three = int(ssn[:3])
        if first_three not in [0, 666] and not (900 <= first_three <= 999):
            matches.append({
                "text": ssn,
                "start": match.start(),
                "end": match.end(),
            })
    return matches

agent3 = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        PIIMiddleware(
            "ssn",
            detector=detect_ssn,
            strategy="hash",
        ),
    ],
)
Custom detector function signature: The detector function must accept a string (content) and return matches: Returns a list of dictionaries with text, start, and end keys:
def detector(content: str) -> list[dict[str, str | int]]:
    return [
        {"text": "matched_text", "start": 0, "end": 12},
        # ... more matches
    ]
For custom detectors:
  • Use regex strings for simple patterns
  • Use RegExp objects when you need flags (e.g., case-insensitive matching)
  • Use custom functions when you need validation logic beyond pattern matching
  • Custom functions give you full control over detection logic and can implement complex validation rules
pii_type
string
required
Type of PII to detect. Can be a built-in type (email, credit_card, ip, mac_address, url) or a custom type name.
strategy
string
default:"redact"
How to handle detected PII. Options:
  • 'block' - Raise exception when detected
  • 'redact' - Replace with [REDACTED_TYPE]
  • 'mask' - Partially mask (e.g., ****-****-****-1234)
  • 'hash' - Replace with deterministic hash
detector
function | regex
Custom detector function or regex pattern. If not provided, uses built-in detector for the PII type.
apply_to_input
boolean
default:"True"
Check user messages before model call
apply_to_output
boolean
default:"False"
Check AI messages after model call
apply_to_tool_results
boolean
default:"False"
Check tool result messages after execution

To-do list

Equip agents with task planning and tracking capabilities for complex multi-step tasks. To-do lists are useful for the following:
  • Complex multi-step tasks requiring coordination across multiple tools.
  • Long-running operations where progress visibility is important.
This middleware automatically provides agents with a write_todos tool and system prompts to guide effective task planning.
API reference: TodoListMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import TodoListMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[read_file, write_file, run_tests],
    middleware=[TodoListMiddleware()],
)
system_prompt
string
Custom system prompt for guiding todo usage. Uses built-in prompt if not specified.
tool_description
string
Custom description for the write_todos tool. Uses built-in description if not specified.

LLM tool selector

Use an LLM to intelligently select relevant tools before calling the main model. LLM tool selectors are useful for the following:
  • Agents with many tools (10+) where most aren’t relevant per query.
  • Reducing token usage by filtering irrelevant tools.
  • Improving model focus and accuracy.
This middleware uses structured output to ask an LLM which tools are most relevant for the current query. The structured output schema defines the available tool names and descriptions. Model providers often add this structured output information to the system prompt behind the scenes. API reference: LLMToolSelectorMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolSelectorMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[tool1, tool2, tool3, tool4, tool5, ...],
    middleware=[
        LLMToolSelectorMiddleware(
            model="gpt-4o-mini",
            max_tools=3,
            always_include=["search"],
        ),
    ],
)
model
string | BaseChatModel
Model for tool selection. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance. See init_chat_model for more information.Defaults to the agent’s main model.
system_prompt
string
Instructions for the selection model. Uses built-in prompt if not specified.
max_tools
number
Maximum number of tools to select. If the model selects more, only the first max_tools will be used. No limit if not specified.
always_include
list[string]
Tool names to always include regardless of selection. These do not count against the max_tools limit.

Tool retry

Automatically retry failed tool calls with configurable exponential backoff. Tool retry is useful for the following:
  • Handling transient failures in external API calls.
  • Improving reliability of network-dependent tools.
  • Building resilient agents that gracefully handle temporary errors.
API reference: ToolRetryMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import ToolRetryMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool],
    middleware=[
        ToolRetryMiddleware(
            max_retries=3,
            backoff_factor=2.0,
            initial_delay=1.0,
        ),
    ],
)
max_retries
number
default:"2"
Maximum number of retry attempts after the initial call (3 total attempts with default)
tools
list[BaseTool | str]
Optional list of tools or tool names to apply retry logic to. If None, applies to all tools.
retry_on
tuple[type[Exception], ...] | callable
default:"(Exception,)"
Either a tuple of exception types to retry on, or a callable that takes an exception and returns True if it should be retried.
on_failure
string | callable
default:"return_message"
Behavior when all retries are exhausted. Options:
  • 'return_message' - Return a ToolMessage with error details (allows LLM to handle failure)
  • 'raise' - Re-raise the exception (stops agent execution)
  • Custom callable - Function that takes the exception and returns a string for the ToolMessage content
backoff_factor
number
default:"2.0"
Multiplier for exponential backoff. Each retry waits initial_delay * (backoff_factor ** retry_number) seconds. Set to 0.0 for constant delay.
initial_delay
number
default:"1.0"
Initial delay in seconds before first retry
max_delay
number
default:"60.0"
Maximum delay in seconds between retries (caps exponential backoff growth)
jitter
boolean
default:"true"
Whether to add random jitter (±25%) to delay to avoid thundering herd
The middleware automatically retries failed tool calls with exponential backoff.Key configuration:
  • max_retries - Number of retry attempts (default: 2)
  • backoff_factor - Multiplier for exponential backoff (default: 2.0)
  • initial_delay - Starting delay in seconds (default: 1.0)
  • max_delay - Cap on delay growth (default: 60.0)
  • jitter - Add random variation (default: True)
Failure handling:
  • on_failure='return_message' - Return error message
  • on_failure='raise' - Re-raise exception
  • Custom function - Function returning error message
from langchain.agents import create_agent
from langchain.agents.middleware import ToolRetryMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool, api_tool],
    middleware=[
        ToolRetryMiddleware(
            max_retries=3,
            backoff_factor=2.0,
            initial_delay=1.0,
            max_delay=60.0,
            jitter=True,
            tools=["api_tool"],
            retry_on=(ConnectionError, TimeoutError),
            on_failure="return_message",
        ),
    ],
)

LLM tool emulator

Emulate tool execution using an LLM for testing purposes, replacing actual tool calls with AI-generated responses. LLM tool emulators are useful for the following:
  • Testing agent behavior without executing real tools.
  • Developing agents when external tools are unavailable or expensive.
  • Prototyping agent workflows before implementing actual tools.
API reference: LLMToolEmulator
from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolEmulator

agent = create_agent(
    model="gpt-4o",
    tools=[get_weather, search_database, send_email],
    middleware=[
        LLMToolEmulator(),  # Emulate all tools
    ],
)
tools
list[str | BaseTool]
List of tool names (str) or BaseTool instances to emulate. If None (default), ALL tools will be emulated. If empty list [], no tools will be emulated. If array with tool names/instances, only those tools will be emulated.
model
string | BaseChatModel
Model to use for generating emulated tool responses. Can be a model identifier string (e.g., 'anthropic:claude-sonnet-4-5-20250929') or a BaseChatModel instance. Defaults to the agent’s model if not specified. See init_chat_model for more information.
The middleware uses an LLM to generate plausible responses for tool calls instead of executing the actual tools.
from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolEmulator
from langchain.tools import tool


@tool
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"Weather in {location}"

@tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email."""
    return "Email sent"


# Emulate all tools (default behavior)
agent = create_agent(
    model="gpt-4o",
    tools=[get_weather, send_email],
    middleware=[LLMToolEmulator()],
)

# Emulate specific tools only
agent2 = create_agent(
    model="gpt-4o",
    tools=[get_weather, send_email],
    middleware=[LLMToolEmulator(tools=["get_weather"])],
)

# Use custom model for emulation
agent4 = create_agent(
    model="gpt-4o",
    tools=[get_weather, send_email],
    middleware=[LLMToolEmulator(model="anthropic:claude-sonnet-4-5-20250929")],
)

Context editing

Manage conversation context by clearing older tool call outputs when token limits are reached, while preserving recent results. This helps keep context windows manageable in long conversations with many tool calls. Context editing is useful for the following:
  • Long conversations with many tool calls that exceed token limits
  • Reducing token costs by removing older tool outputs that are no longer relevant
  • Maintaining only the most recent N tool results in context
API reference: ContextEditingMiddleware, ClearToolUsesEdit
from langchain.agents import create_agent
from langchain.agents.middleware import ContextEditingMiddleware, ClearToolUsesEdit

agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(
                    trigger=100000,
                    keep=3,
                ),
            ],
        ),
    ],
)
edits
list[ContextEdit]
default:"[ClearToolUsesEdit()]"
List of ContextEdit strategies to apply
token_count_method
string
default:"approximate"
Token counting method. Options: 'approximate' or 'model'
ClearToolUsesEdit options:
trigger
number
default:"100000"
Token count that triggers the edit. When the conversation exceeds this token count, older tool outputs will be cleared.
clear_at_least
number
default:"0"
Minimum number of tokens to reclaim when the edit runs. If set to 0, clears as much as needed.
keep
number
default:"3"
Number of most recent tool results that must be preserved. These will never be cleared.
clear_tool_inputs
boolean
default:"False"
Whether to clear the originating tool call parameters on the AI message. When True, tool call arguments are replaced with empty objects.
exclude_tools
list[string]
default:"()"
List of tool names to exclude from clearing. These tools will never have their outputs cleared.
placeholder
string
default:"[cleared]"
Placeholder text inserted for cleared tool outputs. This replaces the original tool message content.
The middleware applies context editing strategies when token limits are reached. The most common strategy is ClearToolUsesEdit, which clears older tool results while preserving recent ones.How it works:
  1. Monitor token count in conversation
  2. When threshold is reached, clear older tool outputs
  3. Keep most recent N tool results
  4. Optionally preserve tool call arguments for context
from langchain.agents import create_agent
from langchain.agents.middleware import ContextEditingMiddleware, ClearToolUsesEdit


agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, calculator_tool, database_tool],
    middleware=[
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(
                    trigger=2000,
                    keep=3,
                    clear_tool_inputs=False,
                    exclude_tools=[],
                    placeholder="[cleared]",
                ),
            ],
        ),
    ],
)

Shell tool

Expose a persistent shell session to agents for command execution. Shell tool middleware is useful for the following:
  • Agents that need to execute system commands
  • Development and deployment automation tasks
  • Testing and validation workflows
  • File system operations and script execution
Security consideration: Use appropriate execution policies (HostExecutionPolicy, DockerExecutionPolicy, or CodexSandboxExecutionPolicy) to match your deployment’s security requirements.
Limitation: Persistent shell sessions do not currently work with interrupts (human-in-the-loop). We anticipate adding support for this in the future.
API reference: @[ShellToolMiddleware]
from langchain.agents import create_agent
from langchain.agents.middleware import (
    ShellToolMiddleware,
    HostExecutionPolicy,
)

agent = create_agent(
    model="gpt-4o",
    tools=[search_tool],
    middleware=[
        ShellToolMiddleware(
            workspace_root="/workspace",
            execution_policy=HostExecutionPolicy(),
        ),
    ],
)
workspace_root
str | Path | None
Base directory for the shell session. If omitted, a temporary directory is created when the agent starts and removed when it ends.
startup_commands
tuple[str, ...] | list[str] | str | None
Optional commands executed sequentially after the session starts
shutdown_commands
tuple[str, ...] | list[str] | str | None
Optional commands executed before the session shuts down
execution_policy
BaseExecutionPolicy | None
Execution policy controlling timeouts, output limits, and resource configuration. Options:
  • HostExecutionPolicy - Full host access (default); best for trusted environments where the agent already runs inside a container or VM
  • DockerExecutionPolicy - Launches a separate Docker container for each agent run, providing harder isolation
  • CodexSandboxExecutionPolicy - Reuses the Codex CLI sandbox for additional syscall/filesystem restrictions
redaction_rules
tuple[RedactionRule, ...] | list[RedactionRule] | None
Optional redaction rules to sanitize command output before returning it to the model
tool_description
str | None
Optional override for the registered shell tool description
shell_command
Sequence[str] | str | None
Optional shell executable (string) or argument sequence used to launch the persistent session. Defaults to /bin/bash.
env
Mapping[str, Any] | None
Optional environment variables to supply to the shell session. Values are coerced to strings before command execution.
The middleware provides a single persistent shell session that agents can use to execute commands sequentially.Execution policies:
  • HostExecutionPolicy (default) - Native execution with full host access
  • DockerExecutionPolicy - Isolated Docker container execution
  • CodexSandboxExecutionPolicy - Sandboxed execution via Codex CLI
from langchain.agents import create_agent
from langchain.agents.middleware import (
    ShellToolMiddleware,
    HostExecutionPolicy,
    DockerExecutionPolicy,
    RedactionRule,
)


# Basic shell tool with host execution
agent = create_agent(
    model="gpt-4o",
    tools=[search_tool],
    middleware=[
        ShellToolMiddleware(
            workspace_root="/workspace",
            execution_policy=HostExecutionPolicy(),
        ),
    ],
)

# Docker isolation with startup commands
agent_docker = create_agent(
    model="gpt-4o",
    tools=[],
    middleware=[
        ShellToolMiddleware(
            workspace_root="/workspace",
            startup_commands=["pip install requests", "export PYTHONPATH=/workspace"],
            execution_policy=DockerExecutionPolicy(
                image="python:3.11-slim",
                command_timeout=60.0,
            ),
        ),
    ],
)

# With output redaction
agent_redacted = create_agent(
    model="gpt-4o",
    tools=[],
    middleware=[
        ShellToolMiddleware(
            workspace_root="/workspace",
            redaction_rules=[
                RedactionRule(pii_type="api_key", detector=r"sk-[a-zA-Z0-9]{32}"),
            ],
        ),
    ],
)
Provide Glob and Grep search tools over filesystem files. File search middleware is useful for the following:
  • Code exploration and analysis
  • Finding files by name patterns
  • Searching code content with regex
  • Large codebases where file discovery is needed
API reference: @[FilesystemFileSearchMiddleware]
from langchain.agents import create_agent
from langchain.agents.middleware import FilesystemFileSearchMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[],
    middleware=[
        FilesystemFileSearchMiddleware(
            root_path="/workspace",
            use_ripgrep=True,
        ),
    ],
)
root_path
str
required
Root directory to search. All file operations are relative to this path.
use_ripgrep
bool
default:"True"
Whether to use ripgrep for search. Falls back to Python regex if ripgrep is unavailable.
max_file_size_mb
int
default:"10"
Maximum file size to search in MB. Files larger than this are skipped.
The middleware adds two search tools to agents:Glob tool - Fast file pattern matching:
  • Supports patterns like **/*.py, src/**/*.ts
  • Returns matching file paths sorted by modification time
Grep tool - Content search with regex:
  • Full regex syntax support
  • Filter by file patterns with include parameter
  • Three output modes: files_with_matches, content, count
from langchain.agents import create_agent
from langchain.agents.middleware import FilesystemFileSearchMiddleware
from langchain.messages import HumanMessage


agent = create_agent(
    model="gpt-4o",
    tools=[],
    middleware=[
        FilesystemFileSearchMiddleware(
            root_path="/workspace",
            use_ripgrep=True,
            max_file_size_mb=10,
        ),
    ],
)

# Agent can now use glob_search and grep_search tools
result = agent.invoke({
    "messages": [HumanMessage("Find all Python files containing 'async def'")]
})

# The agent will use:
# 1. glob_search(pattern="**/*.py") to find Python files
# 2. grep_search(pattern="async def", include="*.py") to find async functions

Provider-specific middleware

These middleware are optimized for specific LLM providers.

Anthropic

Middleware specifically designed for Anthropic’s Claude models.
MiddlewareDescription
Prompt cachingReduce costs by caching repetitive prompt prefixes
Bash toolExecute Claude’s native bash tool with local command execution
Text editorProvide Claude’s text editor tool for file editing
MemoryProvide Claude’s memory tool for persistent agent memory
File searchSearch tools for state-based file systems

Prompt caching

Reduce costs and latency by caching static or repetitive prompt content (like system prompts, tool definitions, and conversation history) on Anthropic’s servers. This middleware implements a conversational caching strategy that places cache breakpoints after the most recent message, allowing the entire conversation history (including the latest user message) to be cached and reused in subsequent API calls. Prompt caching is useful for the following:
  • Applications with long, static system prompts that don’t change between requests
  • Agents with many tool definitions that remain constant across invocations
  • Conversations where early message history is reused across multiple turns
  • High-volume deployments where reducing API costs and latency is critical
Learn more about Anthropic prompt caching strategies and limitations.
API reference: AnthropicPromptCachingMiddleware
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import AnthropicPromptCachingMiddleware
from langchain.agents import create_agent

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    system_prompt="<Your long system prompt here>",
    middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],
)
type
string
default:"ephemeral"
Cache type. Only 'ephemeral' is currently supported.
ttl
string
default:"5m"
Time to live for cached content. Valid values: '5m' or '1h'
min_messages_to_cache
number
default:"0"
Minimum number of messages before caching starts
unsupported_model_behavior
string
default:"warn"
Behavior when using non-Anthropic models. Options: 'ignore', 'warn', or 'raise'
The middleware caches content up to and including the latest message in each request. On subsequent requests within the TTL window (5 minutes or 1 hour), previously seen content is retrieved from cache rather than reprocessed, significantly reducing costs and latency.How it works:
  1. First request: System prompt, tools, and the user message “Hi, my name is Bob” are sent to the API and cached
  2. Second request: The cached content (system prompt, tools, and first message) is retrieved from cache. Only the new message “What’s my name?” needs to be processed, plus the model’s response from the first request
  3. This pattern continues for each turn, with each request reusing the cached conversation history
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import AnthropicPromptCachingMiddleware
from langchain.agents import create_agent
from langchain.messages import HumanMessage


LONG_PROMPT = """
Please be a helpful assistant.

<Lots more context ...>
"""

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    system_prompt=LONG_PROMPT,
    middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],
)

# First invocation: Creates cache with system prompt, tools, and "Hi, my name is Bob"
agent.invoke({"messages": [HumanMessage("Hi, my name is Bob")]})

# Second invocation: Reuses cached system prompt, tools, and previous messages
# Only processes the new message "What's my name?" and the previous AI response
agent.invoke({"messages": [HumanMessage("What's my name?")]})

Bash tool

Execute Claude’s native bash_20250124 tool with local command execution. The bash tool middleware is useful for the following:
  • Using Claude’s built-in bash tool with local execution
  • Leveraging Claude’s optimized bash tool interface
  • Agents that need persistent shell sessions with Anthropic models
This middleware wraps ShellToolMiddleware and exposes it as Claude’s native bash tool.
API reference: @[ClaudeBashToolMiddleware]
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import ClaudeBashToolMiddleware
from langchain.agents import create_agent

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        ClaudeBashToolMiddleware(
            workspace_root="/workspace",
        ),
    ],
)
ClaudeBashToolMiddleware accepts all parameters from @[ShellToolMiddleware], including:
workspace_root
str | Path | None
Base directory for the shell session
startup_commands
tuple[str, ...] | list[str] | str | None
Commands to run when the session starts
execution_policy
BaseExecutionPolicy | None
Execution policy (HostExecutionPolicy, DockerExecutionPolicy, or CodexSandboxExecutionPolicy)
redaction_rules
tuple[RedactionRule, ...] | list[RedactionRule] | None
Rules for sanitizing command output
See Shell tool for full configuration details.
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import ClaudeBashToolMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import DockerExecutionPolicy


agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        ClaudeBashToolMiddleware(
            workspace_root="/workspace",
            startup_commands=["pip install requests"],
            execution_policy=DockerExecutionPolicy(
                image="python:3.11-slim",
            ),
        ),
    ],
)

# Claude can now use its native bash tool
result = agent.invoke({
    "messages": [{"role": "user", "content": "List files in the workspace"}]
})

Text editor

Provide Claude’s text editor tool (text_editor_20250728) for file creation and editing. The text editor middleware is useful for the following:
  • File-based agent workflows
  • Code editing and refactoring tasks
  • Multi-file project work
  • Agents that need persistent file storage
Available in two variants: State-based (files in LangGraph state) and Filesystem-based (files on disk).
API reference: @[StateClaudeTextEditorMiddleware], @[FilesystemClaudeTextEditorMiddleware]
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import StateClaudeTextEditorMiddleware
from langchain.agents import create_agent

# State-based (files in LangGraph state)
agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeTextEditorMiddleware(),
    ],
)
@[StateClaudeTextEditorMiddleware] (state-based)
allowed_path_prefixes
Sequence[str] | None
Optional list of allowed path prefixes. If specified, only paths starting with these prefixes are allowed.
@[FilesystemClaudeTextEditorMiddleware] (filesystem-based)
root_path
str
required
Root directory for file operations
allowed_prefixes
list[str] | None
Optional list of allowed virtual path prefixes (default: ["/"])
max_file_size_mb
int
default:"10"
Maximum file size in MB
Claude’s text editor tool supports the following commands:
  • view - View file contents or list directory
  • create - Create a new file
  • str_replace - Replace string in file
  • insert - Insert text at line number
  • delete - Delete a file
  • rename - Rename/move a file
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import (
    StateClaudeTextEditorMiddleware,
    FilesystemClaudeTextEditorMiddleware,
)
from langchain.agents import create_agent


# State-based: Files persist in LangGraph state
agent_state = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeTextEditorMiddleware(
            allowed_path_prefixes=["/project"],
        ),
    ],
)

# Filesystem-based: Files persist on disk
agent_fs = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        FilesystemClaudeTextEditorMiddleware(
            root_path="/workspace",
            allowed_prefixes=["/src"],
            max_file_size_mb=10,
        ),
    ],
)

Memory

Provide Claude’s memory tool (memory_20250818) for persistent agent memory across conversation turns. The memory middleware is useful for the following:
  • Long-running agent conversations
  • Maintaining context across interruptions
  • Task progress tracking
  • Persistent agent state management
Claude’s memory tool uses a /memories directory and automatically injects a system prompt encouraging the agent to check and update memory.
API reference: @[StateClaudeMemoryMiddleware], @[FilesystemClaudeMemoryMiddleware]
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import StateClaudeMemoryMiddleware
from langchain.agents import create_agent

# State-based memory
agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeMemoryMiddleware(),
    ],
)
@[StateClaudeMemoryMiddleware] (state-based)
allowed_path_prefixes
Sequence[str] | None
Optional list of allowed path prefixes. Defaults to ["/memories"].
system_prompt
str
System prompt to inject. Defaults to Anthropic’s recommended memory prompt that encourages the agent to check and update memory.
@[FilesystemClaudeMemoryMiddleware] (filesystem-based)
root_path
str
required
Root directory for file operations
allowed_prefixes
list[str] | None
Optional list of allowed virtual path prefixes. Defaults to ["/memories"].
max_file_size_mb
int
default:"10"
Maximum file size in MB
system_prompt
str
System prompt to inject
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import (
    StateClaudeMemoryMiddleware,
    FilesystemClaudeMemoryMiddleware,
)
from langchain.agents import create_agent


# State-based: Memory persists in LangGraph state
agent_state = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeMemoryMiddleware(),
    ],
)

# Filesystem-based: Memory persists on disk
agent_fs = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        FilesystemClaudeMemoryMiddleware(
            root_path="/workspace",
        ),
    ],
)

# The agent will automatically:
# 1. Check /memories directory at start
# 2. Record progress and thoughts during execution
# 3. Update memory files as work progresses

File search

Provide Glob and Grep search tools for files stored in LangGraph state. File search middleware is useful for the following:
  • Searching through state-based virtual file systems
  • Works with text editor and memory tools
  • Finding files by patterns
  • Content search with regex
API reference: @[StateFileSearchMiddleware]
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import (
    StateClaudeTextEditorMiddleware,
    StateFileSearchMiddleware,
)
from langchain.agents import create_agent

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeTextEditorMiddleware(),
        StateFileSearchMiddleware(),  # Search text editor files
    ],
)
state_key
str
default:"text_editor_files"
State key containing files to search. Use "text_editor_files" for text editor files or "memory_files" for memory files.
The middleware adds Glob and Grep search tools that work with state-based files.
from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import (
    StateClaudeTextEditorMiddleware,
    StateClaudeMemoryMiddleware,
    StateFileSearchMiddleware,
)
from langchain.agents import create_agent


# Search text editor files
agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeTextEditorMiddleware(),
        StateFileSearchMiddleware(state_key="text_editor_files"),
    ],
)

# Search memory files
agent_memory = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    tools=[],
    middleware=[
        StateClaudeMemoryMiddleware(),
        StateFileSearchMiddleware(state_key="memory_files"),
    ],
)

OpenAI

Middleware specifically designed for OpenAI models.
MiddlewareDescription
Content moderationModerate agent traffic using OpenAI’s moderation endpoint

Content moderation

Moderate agent traffic (user input, model output, and tool results) using OpenAI’s moderation endpoint to detect and handle unsafe content. Content moderation is useful for the following:
  • Applications requiring content safety and compliance
  • Filtering harmful, hateful, or inappropriate content
  • Customer-facing agents that need safety guardrails
  • Meeting platform moderation requirements
Learn more about OpenAI’s moderation models and categories.
API reference: @[OpenAIModerationMiddleware]
from langchain_openai import ChatOpenAI
from langchain_openai.middleware import OpenAIModerationMiddleware
from langchain.agents import create_agent

agent = create_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=[search_tool, database_tool],
    middleware=[
        OpenAIModerationMiddleware(
            model="omni-moderation-latest",
            check_input=True,
            check_output=True,
            exit_behavior="end",
        ),
    ],
)
model
ModerationModel
default:"omni-moderation-latest"
OpenAI moderation model to use. Options: 'omni-moderation-latest', 'omni-moderation-2024-09-26', 'text-moderation-latest', 'text-moderation-stable'
check_input
bool
default:"True"
Whether to check user input messages before the model is called
check_output
bool
default:"True"
Whether to check model output messages after the model is called
check_tool_results
bool
default:"False"
Whether to check tool result messages before the model is called
exit_behavior
string
default:"end"
How to handle violations when content is flagged. Options:
  • 'end' - End agent execution immediately with a violation message
  • 'error' - Raise OpenAIModerationError exception
  • 'replace' - Replace the flagged content with the violation message and continue
violation_message
str | None
Custom template for violation messages. Supports template variables:
  • {categories} - Comma-separated list of flagged categories
  • {category_scores} - JSON string of category scores
  • {original_content} - The original flagged content
Default: "I'm sorry, but I can't comply with that request. It was flagged for {categories}."
client
OpenAI | None
Optional pre-configured OpenAI client to reuse. If not provided, a new client will be created.
async_client
AsyncOpenAI | None
Optional pre-configured AsyncOpenAI client to reuse. If not provided, a new async client will be created.
The middleware integrates OpenAI’s moderation endpoint to check content at different stages:Moderation stages:
  • check_input - User messages before model call
  • check_output - AI messages after model call
  • check_tool_results - Tool outputs before model call
Exit behaviors:
  • 'end' (default) - Stop execution with violation message
  • 'error' - Raise exception for application handling
  • 'replace' - Replace flagged content and continue
from langchain_openai import ChatOpenAI
from langchain_openai.middleware import OpenAIModerationMiddleware
from langchain.agents import create_agent


# Basic moderation
agent = create_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=[search_tool, customer_data_tool],
    middleware=[
        OpenAIModerationMiddleware(
            model="omni-moderation-latest",
            check_input=True,
            check_output=True,
        ),
    ],
)

# Strict moderation with custom message
agent_strict = create_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=[search_tool, customer_data_tool],
    middleware=[
        OpenAIModerationMiddleware(
            model="omni-moderation-latest",
            check_input=True,
            check_output=True,
            check_tool_results=True,
            exit_behavior="error",
            violation_message=(
                "Content policy violation detected: {categories}. "
                "Please rephrase your request."
            ),
        ),
    ],
)

# Moderation with replacement behavior
agent_replace = create_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=[search_tool],
    middleware=[
        OpenAIModerationMiddleware(
            check_input=True,
            exit_behavior="replace",
            violation_message="[Content removed due to safety policies]",
        ),
    ],
)

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.