Runtime Control for AI Agents

Alex Casalboni

Developer Advocate

March 31, 2026

Prompt-based defenses don’t hold up. Adaptive attacks bypass them — and probabilistic firewalls — over 90 percent of the time. Because a compromised agent will eventually attempt an unauthorized action, relying on input filtering leaves enterprise infrastructure exposed to critical poisoning and destructive API calls. You cannot secure autonomous AI agents by filtering their text inputs; governing the action path is the only reliable alternative. By treating system tool calls as dynamic software capabilities, you can control, throttle, limit, and kill agent access using FeatureOps architectures. The following sections explain why static orchestrators fail, how to instrument a 5-layer runtime control stack, and how to use feature flags alongside the Model Context Protocol (MCP) to implement hard execution boundaries without killing deployment velocity.

TL;DR

Prompt-based defenses fail over 90 percent of the time against adaptive attacks, rendering text filtering useless against execution poisoning.
Modern frameworks mandate a 5-layer runtime stack that requires explicit approval, authorization, policy checks, containment, and observability before a tool runs.
Static security gateways treat interception purely as an allow-or-block mechanism, breaking the progressive rollouts and failure-scope containment necessary for continuous delivery.
Treating agent capabilities as feature flags gives you the power to throttle actions, manage identity scopes, restrict access, and trigger instant kill switches.
Integrating a Model Context Protocol server allows developers to enforce flag creation and evaluation capabilities natively inside their IDE.

Why prompt engineering is a dead end for agent security

Developers default to writing monolithic system prompts to keep AI models well-behaved, but that logic shatters the moment the model pulls real levers in a production system. When tested against standard benchmarks, adaptive attackers easily maneuver around text-based defenses, bypassing 12 recent alignment mechanisms with a higher than 90 percent success rate and hitting an 86 percent partial success rate against autonomous web agents.

Consider a support agent forbidden by its prompt from deleting accounts with active billing states. A malicious user submitting a ticket with a hidden text payload will effortlessly override those instructions and wipe three enterprise accounts before your infrastructure team even notices.

These security failures explain why the Open Worldwide Application Security Project (OWASP) targets runtime poisoning and unexpected code execution as emerging threats in agentic applications. Prompt-based defenses abandon you the moment a model hallucinates, so the security boundary belongs directly on the execution layer.

Governing the action path with a 5-layer runtime stack

Autonomous agents do not just generate text; they execute commands. That’s why modern security architectures treat the “action path” — how the agent calls tools, not what it says — as the primary control surface. Instead of policing what the agent says, you control how it calls tools, what identity it assumes, and what permissions it holds at the moment of execution.

To accomplish this, official LangGraph inference documentation mandates 5 explicit layers for intercepting commands: approval, authorization, policy checks, containment, and observability. Strong observability creates the audit trails and AI decision logs needed for SOC 2 compliance reviews. Implementing these boundaries needs physical stops in the software execution cycle.

Frameworks like LangGraph and AutoGen use explicit pause and resume patterns to force human-in-the-loop intercepts before critical actions take place. While setting up these manual stops secures high-risk operations like database migrations and satisfies the core capabilities of an AI control plane, forcing a human to approve every basic data retrieval request destroys the ROI of automation. A fast, architectural way to approve safe actions automatically while containing the failure scope of everything else prevents workflow bottlenecks in production environments.

The velocity tax of static security gates

When facing agent risk, engineering teams instinctively build static API gateways and deploy DAG-based orchestrators like Apache Airflow. However, these static tools are fundamentally incompatible with the long-running and event-driven nature of modern AI agents. Hardcoding a route to check whether the agent carries the right token causes serious friction when product teams try to release new capabilities progressively to beta users or canary audiences.

Many vendors attempt to solve this execution gating inside the model environment. While OpenAI provides native tool guardrails that evaluate YAML policies before every tool call, these checks contain major operational blind spots. They specifically exclude certain hosted tools and handoff calls from the evaluation pipeline, and deeper protocol specs reveal that these native guardrails remain insufficient without resilient sandbox execution and distinct agent boundaries. Effective true AI governance begins at the execution phase far outside the model boundary.

The identity scoping gap

Dynamic identity scoping adds another layer of complexity. Google’s agent safety guidance requires scoped identity checks to distinguish between ‘agent-auth’ and ‘user-auth’ permissions. Because hardcoded gateways struggle to differentiate whether an agent acts on its own service account or the session token of the requesting user, securing modern infrastructure requires treating system tool calls as dynamic software capabilities.

FeatureOps as the runtime primitive for AI agents

Feature flags serve as the control mechanism for governing agent actions, wrapping an API tool to decouple the agent intent from the actual system execution. The model decides it wants to act, but you use the feature management platform to decide whether the execution connects. Agent execution loops process thousands of sub-steps a minute, meaning that adding a network call to verify authorization at every step introduces significant latency overhead. You solve this by evaluating permissions locally.

With Unleash, you handle this scale by serving flag data locally with sub-millisecond latency. By pulling the rules payload down to the local worker node, the agent evaluates its permissions against an in-memory cache, eliminating the slow HTTP request back to the management server. Using this configuration, you can restrict specific agent tools to internal users, evaluate risk levels, monitor the impact metrics, and disable the tool instantly if the agent begins behaving erratically. The FeatureOps operational framework governs independent systems using feature flags as runtime primitives for agents.

Proven at enterprise scale

Real-world enterprise deployments prove the architecture scales. Wayfair handles over 20,000 requests per second securely, managing high-spike traffic at one third the cost of their previous homegrown solution. Mercadona Tech releases to production over 100 times a day across 12 teams, relying on feature flags to reduce mean time to recovery.

Automating execution controls with the Unleash MCP server

Securing every tool becomes a bottleneck if the process demands writing manual orchestration code. The Model Context Protocol (MCP) solves this by establishing an open standard for connecting AI models to external data sources. Because it enables arbitrary data access and code execution paths, the protocol demands formal authorization and uncompromising safety checks.

With the Unleash MCP server, you connect your AI coding assistants directly to your feature management platform to bridge the operational gap. A developer can instruct an AI assistant to write a new system tool and immediately tell it to secure the endpoint. The agent connects to the server to execute commands like create_flag, evaluate_change, wrap_change, and set_flag_rollout natively within the development session.

With the server integration, you establish a clear workflow for designing fast, failure-tolerant agent updates based on the Evaluate-Detect-Create-Wrap-Cleanup lifecycle. Integrating development tasks with an MCP server embeds authorization boundaries natively within the coding cycle to eliminate manual configuration steps.

Turning security gates into release channels

Agent safety operates fundamentally as a software continuous delivery problem. The moment you accept that large language models will eventually hallucinate disastrous commands, you stop trying to fix the text prompt and start building execution boundaries that scale around the tools themselves. Minimizing the damage of poor reasoning takes precedence over forcing perfect reasoning accuracy.

A resilient AI control plane enforces these boundaries at scale. By treating agent action capabilities as feature flags within a feature management platform like Unleash, you combine instant rollback safety with high-velocity product delivery. Engineers can expose complex actions to specific tenancies and design for failure by cutting access instantly if performance metrics drop.

FAQs about runtime control for AI agents

What is runtime control for AI agents?

It shifts the security boundary from reading text prompts to actively governing how an agent executes API and system tool calls. It treats agent operations as dynamic endpoints that you manage using explicit authorization and containment layers. Researchers analyzing emerging architecture designs identify the execution path as the primary control surface. Production frameworks require a five-layer stack to implement these protections correctly and prevent rogue tool operations.

Why are LLM firewalls insufficient for autonomous agents?

LLM firewalls operate on probabilities and struggle to evaluate complex, looping reasoning chains. Because adaptive attacks bypass prompt defenses over 90 percent of the time, a compromised agent will likely attempt an unauthorized action. Only execution-level blocks can stop a corrupted execution sequence from reaching the core system database. The OWASP Agentic Top 10 specifically identifies unexpected code execution as a vulnerability that pure text filters misinterpret.

How do feature flags secure AI tool calls?

They act as instant code-level primitives that decouple the agent intent from the system execution. You wrap an internal tool in a flag to scope the action to specific identities or trigger an instant kill switch if performance metrics fail. Google’s agent safety guidelines require separating agent identity from human user identity, and feature flags enforce those precise authorization scopes efficiently at the code level.

What is the Model Context Protocol (MCP) in agent security?

It is an open-source standard enabling AI models to interact securely with local data and outward-facing system tools. Because the protocol opens up arbitrary data access paths across your environment, it mandates uncompromising tool safety checks and authorization boundaries. Platform engineering teams use connected MCP servers to automate the creation of these boundaries directly during the coding phase without writing manual routing files.

How does latency affect human-in-the-loop agent steps?

Explicit intercepts add severe physical latency to the execution loop by pausing the mechanical sequence. You evaluate authorization rules locally in memory to prevent the agent flow from stalling out on slow network round trips. Modern architectures achieve the required speed using distributed edge caches that resolve flag evaluations in less than a single millisecond.

Share this article