Runtime Guardrails Are Only as Good as They Are Effective

Written by Shea Brown



Posted on 04/26/2026



In CEO's Corner

There is a real and growing body of serious work on making AI agent deployments safer at runtime. I want to engage with that work honestly, because it matters, and because I think there is an important distinction buried inside it that is worth drawing out.

What Does Runtime Assurance for AI Agents Actually Cover?

When people talk about runtime assurance or runtime security for AI agents, they are talking about a collection of mechanisms that operate while an agent is executing. The landscape here is broad. Microsoft recently released an open-source Agent Governance Toolkit that maps to all ten of OWASP’s agentic AI risk categories. Palo Alto, Oligo, and others have built runtime security platforms targeting AI-specific threats. NVIDIA’s NemoClaw introduces a policy enforcement layer beneath the agent runtime. Frameworks like A2AS, developed collaboratively by AWS, Google, Cisco, Meta, JPMorganChase, and others, propose structured approaches to context window integrity, prompt authentication, capability constraints, and behavior certification.

Research and development in this area is moving fast. Companies like Lucid Computing, EQTY Lab, and Fortanix are building platforms that combine confidential computing infrastructure with behavioral policy enforcement, producing cryptographic receipts of governance compliance that can be verified by customers and regulators. The underlying technology is genuinely interesting: Lucid’s AI Passports, EQTY Lab’s AI Notary system, and Fortanix’s continuous CPU and GPU attestation all address real gaps in how AI deployments are monitored and evidenced.

On the research side, a March 2026 paper from Sahara Labs AI proposes “Proof-of-Guardrail,” a system that uses TEE attestation to cryptographically prove that a specific guardrail ran during inference, and the authors are candid about what their system does not provide: they note explicitly that guardrails can make errors and be jailbroken, and that their approach ensures the integrity of guardrail execution while the reliability of the guardrail itself remains an open question. A separate line of academic work, Attestable Audits out of Cambridge and Nokia Bell Labs, uses TEEs to run AI safety benchmarks and produce cryptographic proofs of the results, an approach that gets closer to pre-deployment evaluation but still certifies the execution of a benchmark rather than behavioral reliability across a real operational domain.

Meanwhile, the broader runtime guardrail space, including NeMo Guardrails, Guardrails AI, and Amazon Bedrock Guardrails, focuses on LLM-based judges and rule classifiers to enforce policy at inference time.

These efforts address real problems. Logging and audit trails, identity and permission management, prompt injection defense, capability sandboxing, and cryptographic integrity verification of agent requests. Each of these categories closes genuine gaps that matter in production deployments, and the field is moving fast.

Why Behavioral Guardrails Are the Hardest Part of Runtime AI Security

Of all the mechanisms in the runtime assurance toolkit, the one I think deserves the most scrutiny is the behavioral guardrail (I’m biased because of our work at BABL AI). Specifically: the use of rules, policies, or LLM-based judges to determine whether an agent’s behavior is acceptable at the moment it is acting.

This is also the mechanism that is most commonly marketed as “assurance,” in a way that I think deserves some precision.

Frameworks like A2AS define what they call “Codified Policies”; rules embedded into the agent’s context window that constrain its behavior. The example they give is readable and intuitive: this app must not modify or send emails; emails labeled “Confidential” must not be processed. When these policies are crisp, deterministic, and testable, they function more like access controls than behavioral guardrails. That is a reasonable thing to build.

But the further you move from deterministic rules toward natural language policies and LLM-based enforcement, the more you are relying on something that behaves probabilistically, not reliably. And that gap matters a great deal for anyone trying to make a defensible claim about system behavior.

Can You Trust a Guardrail That Has Never Been Independently Tested?

A2AS itself acknowledges this. In its known limitations section, the framework flags “security reasoning drift”, the risk that model reasoning variations cause misinterpretation of security instructions, and “security misconfiguration risk,” noting that poorly written policies create false security. These are not edge cases. They are the central challenge of LLM-based policy enforcement.

“Misconfigured certificates or poorly written policies create false security.” — A2AS Framework, v1.0

The research literature is similarly honest. Published work has documented that classification-based guardrails can be bypassed, that instruction hierarchies fail under adversarial conditions, and that LLM judges exhibit inconsistency across semantically equivalent inputs. None of this is a criticism of the people building these systems; it reflects the genuine difficulty of the problem. But it does mean that a behavioral guardrail, however carefully designed, carries performance uncertainty that has to be characterized before it can be trusted.

And that is exactly where the gap opens up.

What Does Independent AI Assurance Actually Require?

Saying that a behavioral guardrail is in place is different from saying that it works, under what conditions, and with what reliability. The first claim is about architecture, while the second is about evidence.

From where I sit, AI assurance is about closing that gap. It means specifying what the behavioral requirement actually is, with enough precision to be testable. It means characterizing the operational design domain, the range of inputs and conditions the system will encounter, so you know what you are evaluating over. It means running structured tests, including adversarial ones, to probe the boundary conditions of claimed behaviors. It means producing evidence that is interpretable by someone with no stake in the outcome.

None of that is accomplished by the guardrail being present. And when the guardrail is a natural language policy prompt interpreted by an LLM, the performance uncertainty is compounded, partly because the policy itself may be ambiguous, but also because there is no stable behavioral surface to evaluate without prior testing.

There is also a recursive dimension worth naming. If an LLM-based judge is the mechanism by which you claim an agent is behaving correctly, that judge is itself an AI system whose reliability needs to be independently characterized. Asserting that a system is compliant because its own guardrail approved it is not really a robust assurance argument; it is a design feature whose effectiveness remains open.

Runtime Security vs. Behavioral Assurance: Two Different Questions

Runtime security and behavioral assurance are not competing approaches. They operate at different layers and address different questions.

Runtime security asks: Is this system operating within defined parameters, is it protected against known attack vectors, and is there a record of what happened?

Behavioral assurance asks: Does this system do what it claims to do, reliably, across the range of conditions it will actually encounter?

Both questions matter. The first is largely an engineering and security problem. The second is an evaluation problem, and it requires independent evidence and assurance. The growing sophistication of runtime monitoring is a welcome development. The claim that it constitutes assurance in the stronger sense is the one that needs more precision.

Shea Brown is the Founder and CEO of BABL AI, an independent AI assurance firm specializing in AI assurance, testing, evaluation, verification, and validation

What’s New?

Stay up to date with the latest updates.

Runtime Guardrails Are Only as Good as They Are Effective

04.26.2026

Subscribe to our Newsletter

Keep up with the latest on BABL AI, AI Auditing and
AI Governance News by subscribing to our news letter