Getting the Control Layer Right: Why Your Agentic AI Architecture Collapses Without It
Katie Bowen
·
5 minute read
Agentic AI has crossed the threshold from demo to deployment. Gartner predicts that by 2028, 33% of enterprise software applications will include agentic AI, up from less than 1% in 2024, and that at least 15% of day-to-day work decisions will be made autonomously by agents. That's not a future problem. That's a now problem.
And the industry is starting to admit that the security model hasn't caught up.
OWASP's 2025 Agentic AI Threats and Mitigations guidance enumerates a taxonomy of agent-specific risks that didn't exist in traditional application security: memory poisoning, tool misuse, intent breaking, cascading hallucinations, identity spoofing across agents, and what they call "rogue agents" operating outside governance boundaries. NIST's AI Risk Management Framework and its Generative AI Profile (NIST AI 600-1) similarly highlight that agentic systems introduce compounding risks around confabulation, data leakage, and untraceable action chains that demand governance controls at the inference layer itself, not just at the application perimeter.
A useful way to organize the response to this is the three-pillar model that's emerging in agent-to-agent orchestration discourse from BricklayerAI: Context, Coordination, and Control. Context is what agents know. Coordination is how they work together. Control is what they're allowed to do.
Most teams are pouring their energy into context and coordination while treating control as an afterthought and the evidence says that's exactly backwards.
Think of your agentic architecture as a bridge. Context is the deck. Coordination is the cabling and trusses. Control is the foundation driven into bedrock. You can engineer the most elegant deck and the most sophisticated cable system in the world; if your piers are sunk into mud, the first real load brings the whole thing down.

What the Research Is Actually Telling Us
The empirical evidence on agentic risk is becoming hard to ignore.
- Prompt injection remains the #1 risk in the OWASP Top 10 for LLM Applications (2025), and indirect prompt injection, where malicious instructions are smuggled in via documents, web pages, or upstream agent outputs, is the most dangerous variant for multi-agent systems.
- Anthropic, in its public research on agent safety, has demonstrated that agents with tool access can be manipulated through indirect injection in retrieved content, executing actions the user never sanctioned.
- A 2024 study from researchers at Carnegie Mellon and elsewhere ("AgentDojo") evaluated agentic systems against realistic injection attacks and found that even frontier models executed adversarial instructions in a meaningful percentage of cases when those instructions arrived through legitimate-looking tool outputs.
- IBM's Cost of a Data Breach Report 2025 found that breaches involving shadow data, data flowing through systems without proper governance, were 16% more expensive and took longer to identify and contain. Agent systems generate shadow data at industrial scale by default.
- McKinsey's 2024 State of AI survey reports that inaccuracy, cybersecurity, and IP infringement are the top three risks organizations consider relevant from generative AI, yet fewer than half are actively mitigating any of them.⁸
The signal is consistent: the failure modes that matter most for agents are control failures, not context or coordination failures.
Why Control Is the Hardest Pillar
Context and coordination are largely engineering problems. They have well-understood patterns, observable failure modes, and forgiving error budgets. If your retrieval is mediocre, your agent gives a worse answer. Annoying, but recoverable.
Control failures are different. When an agent:
-
Leaks PII or PHI into a downstream tool call
-
Executes a prompt-injected instruction from a malicious document
-
Hallucinates a database write
-
Acts on a factually fabricated premise
-
Calls an external API with credentials it shouldn't have used in that context
…there is no graceful degradation. The blast radius is the entire system the agent can touch. And in agent-to-agent topologies, one compromised agent becomes the attack vector for every agent downstream of it, what OWASP characterizes as "cascading agent failure."²
It's the difference between a bumpy ride across the bridge and the bridge falling into the river. Context and coordination shape the ride. Control determines whether anyone makes it across.
This is why the control layer cannot be a single LLM-as-a-judge bolted on at the end. NIST's GenAI Profile explicitly calls for "layered, defense-in-depth measures" at the model interaction boundary. This means a first-class, deterministic, policy-driven plane between every inference and every action.
Why Self-Hosted Matters
Here's where most enterprises trip themselves up: they outsource their control plane to the same vendor that's running their inference.
Think about what that means. The thing deciding whether an output is safe, compliant, and policy-aligned is operated by the same party, and often the same model family, that produced the output in the first place. That's like having the bridge builder also sign off on their own structural inspection.
A self-hosted AI control plane changes the calculus entirely, and the regulatory environment is increasingly demanding it.
- Data sovereignty is non-negotiable. The EU AI Act, in force since August 2024, imposes obligations on high-risk AI systems that include logging, traceability, and human oversight, obligations that are functionally impossible to meet when your guardrail decisions are happening inside an opaque third-party SaaS. Forrester's 2024 analysis of AI governance argues that data residency and inference-time control will become regulatory requirements, not preferences, in regulated industries within 24 months.
- Latency and cost stay predictable. When every agent-to-agent handoff passes through a control checkpoint, round-trips to external SaaS guardrail providers stack up fast. Self-hosted means microseconds, not multi-hop hops over the public internet.
- You own the policy. Your industry, your regulatory regime, your risk appetite. A self-hosted plane lets you encode your policy, not a vendor's interpretation of a generic safety standard. The OWASP agentic guidance explicitly recommends organization-specific policy enforcement at the agent boundary.
- It works in air-gapped and on-prem environments. If you're in defense, healthcare, financial services, or critical infrastructure, the conversation ends here. There is no cloud-API control plane that meets your requirements.
What Prediction Guard Actually Brings to the Table
This is where Prediction Guard fits. It's purpose-built as a self-hosted inference and control layer that is the engineered foundation system, not a decorative cap on top. It maps directly to the mitigations called for in the OWASP, NIST, and MITRE ATLAS frameworks:
- PII and sensitive data detection and masking at the inference boundary, addressing NIST GenAI Profile concerns around data leakage and OWASP's "sensitive information disclosure" risk.
- Prompt injection detection that runs deterministically against inputs before an agent acts on them, directly mitigating OWASP LLM01 (Prompt Injection) and the indirect injection vectors documented in AgentDojo and Anthropic's research.
- Factuality and grounding checks that catch hallucinated tool arguments, addressing the "confabulation" risk NIST AI 600-1 calls out as a top-priority GenAI harm.
- Toxicity, bias, and policy violation screening that's auditable and tunable to your domain.
- Structured output validation that ensures agent-to-agent messages conform to expected schemas, preventing the malformed-handoff failures OWASP cites in cascading agent compromise scenarios.
- Self-hostable deployment so the entire control plane lives inside your VPC, your data center, or your air-gapped environment meeting the data residency expectations of EU AI Act, HIPAA, and emerging financial services guidance.
The point isn't that any single one of these features is revolutionary. The point is that they exist as a unified, governed, self-hosted plane that every agent interaction can be required to pass through, not a patchwork of OSS libraries duct-taped into your orchestration framework.
The Architectural Pattern That Works
If you take one thing from this post, take this: in your agent-to-agent topology, every edge in the graph should pass through your control plane. Not just user-to-agent edges. Every edge. Every span of the bridge sits on a pier.
.png?width=800&height=520&name=carbon%20(1).png)
This pattern aligns with what MITRE ATLAS calls "inference-time guarding" and what NIST refers to as "manage" function controls, placing measurable, auditable enforcement at every model interaction. The properties that emerge:
- Compromise containment. A prompt-injected Agent A cannot weaponize Agent B, because the malicious payload is screened at the boundary.
- Uniform observability. Every agent interaction is logged, scored, and auditable in one place thereby satisfying the traceability requirements in the EU AI Act and the "measure" function in NIST AI RMF.
- Policy as code. Changing your organization's stance on, say, financial advice or medical recommendations is a config change at the control plane, not a rewrite of every agent prompt.
- Model portability. Because control is decoupled from inference, you can swap models, open-weights, frontier, fine-tuned without rewriting your security posture.
Stop Treating Control as a Filter
The framing I want to leave you with is this: control is not a filter at the end of your pipeline. Control is the bedrock your agents stand on.
Context tells your agents what's true. Coordination tells them how to work together. But control is what makes any of it safe enough to put into production against real customer data, real money, and real consequences.
A three-pillar architecture with two strong pillars and one weak one isn't 67% as safe. It's a collapse waiting for a load event. Get your control layer right with a self-hosted, deterministic, policy-driven, and built on something like Prediction Guard and the rest of the architecture has bedrock to stand on.
Get it wrong, and it doesn't matter how elegant your coordination graph is. You've built a beautiful bridge to nowhere, and the first real traffic across it will find out exactly where you cut corners.
Choose accordingly.