Red-Teaming Agentic AI: findings from testing Prediction Guard against the OWASP Top 10

Written by Sharan Shirodkar | Jun 16, 2026 2:33:10 PM

With Anvesha Nain, Khushi Gupta, Saibya Padha, Saksham Singh, Shiven Saxena, Dong-Lin Shih, Jay Thota

Agentic AI is moving into production without any control over the blast radius. Cisco's State of AI Security 2026 report found that 83% of organizations plan to deploy agentic AI capabilities, while only 29% feel ready to do so securely. For organizations in regulated industries, that gap is not abstract. The gap determines whether deployments proceed, whether audits pass, and whether incidents stay contained.

Most security testing for agents today is ad hoc. Teams stitch together prompt injection tests, a couple open-source guardrails, and call it good. As highlighted recently by Anthropic, that approach is severely lacking the risks unique to agentic systems: tool misuse, identity and privilege abuse, memory and context poisoning, cascading failures across multi-agent workflows, and rogue agent behavior. The OWASP Top 10 for Agentic Applications was published to give the field a shared taxonomy, and frameworks like NIST AI RMF and ISO 42001 now reference it as teams work out how to build an AI governance framework for agentic systems.

Prediction Guard's AI control plane, part of its broader Solutions, solves this by embedding a "governance harness" directly within your secure infrastructure perimeter. To understand with rigor how the governance enforcement in our AI control plane holds up across the full OWASP taxonomy, we partnered with a brilliant group of students working with Purdue University's Data Mine.

About the partnership

The Data Mine is Purdue's flagship learning community for data science and AI, pairing student teams with industry partners on production-relevant problems over an academic year. Across fall 2025 and spring 2026, six students built, broke, tested, and re-tested agents exhibiting vulnerabilities corresponding to the OWASP Agentic AI Top 10. The team operated independently, with mentoring by Sharan Shirodkar at Prediction Guard and TA Gia Sareen at The Data Mine. The brief was direct: find where the governance enforcement in Prediction Guard's AI control plane works, and where it does not.

Methodology

The team implemented ten "problematic" AI agents, one per OWASP Agentic AI risk category. Each was intentionally built with realistic architectural weaknesses such as overpermissive tool access, insufficient input validation, shared memory across sessions, and implicit trust between agents. This was deliberate. Hardening every agent before testing would have measured agent design rather than the governance enforcement and control layer (Prediction Guard), and enterprise deployments under quarter deadlines often ship with the same shortcuts. In other words, developers are going to build vulnerable agents in your environments and software vendors are going to want to deploy poorly designed agents behind your firewall. Prediction Guard should allow you to both detect and remediate this problematic agent behavior before something goes wrong.

Every agent runs in two modes: unsafe (Prediction Guard governance turned off) and safeguarded (Prediction Guard governance enforced). A unified test runner executes all agents against an adversarial test suite and produces a per-risk attack outcome table.

METHODOLOGY

Build vulnerable agents per OWASP risk

Define adversarial test cases

Run unsafe baseline

Run with Prediction Guard governance enforced

The relevant governance modules available in Prediction Guard's AI control plane vary by risk category:

OWASP RISK	SAFEGUARDS APPLIED
ASI01 Agent Goal Hijack	`block_prompt_injection` · `pii: replace`
ASI02 Tool Misuse	`block_prompt_injection` · `pii: replace`
ASI03 Identity & Privilege Abuse	`block_prompt_injection`
ASI04 Agentic Supply Chain	`block_prompt_injection`
ASI05 Unexpected Code Execution	Input / output validation
ASI06 Memory & Context Poisoning	`block_prompt_injection`
ASI07 Inter-Agent Communication	`block_prompt_injection`
ASI08 Cascading Failures	Circuit breaker + injection detection
ASI09 Human-Agent Trust Exploitation	`block_prompt_injection` · `factuality` · `toxicity`
ASI10 Rogue Agents	Confirmation gate for high-risk actions

The harness is model-agnostic: the same agents and adversarial suite run against any underlying model, such that the governance, or policy, enforcement layer is what is being measured rather than a single model's behavior.

Findings

Across all ten risk categories, the safeguarded configuration substantially reduced the attack surface, with a structural gap we want to be transparent about.

OUT-OF-THE-BOX GOVERNANCE · OWASP AGENTIC AI TOP 10

5/10 categories where Prediction Guard blocked 100% of attacks

Across the full suite, Prediction Guard cut the attack success rate from 81% without it to 23% with it — blocking 77% of the attacks that succeeded against the unprotected agents.

Figures are pooled across all individual attack cases run against the ten agents. Per-category counts are small, so the per-risk breakdown below shows raw case counts alongside each bar.

What Prediction Guard does to the attacks that worked

For every category, the share of attacks Prediction Guard blocks (green) versus what remains not blocked (red), across the OWASP Agentic AI Top 10.

Blocked by Prediction Guard

Not blocked

ASI01

100% blocked

ASI02

100% blocked

ASI03

75% blocked

25%

ASI04

75% blocked

25%

ASI05

80% blocked

20%

ASI06

33%

67% · 1/3

ASI07

100% blocked

ASI08

33%

67% · 1/3

ASI09

100% blocked

ASI10

100% blocked

Overall

77% blocked

23%

Out-of-the-box defense is strong. Prediction Guard reached a 100% block rate on Prompt Injection (ASI01), PII Misuse (ASI02), Inter-Agent Communication (ASI07), Human-Agent Trust Exploitation (ASI09), and Rogue Agents (ASI10). These are the categories where malicious instructions are identifiable across handshakes within the agent infrastructure (or system). Because Prediction Guard applies a "Zero Trust" philosophy to agentic security and operates directly within the infrastructure on which agents run (rather than relying on perimeter-based security methods), the Prediction Guard governance harness holds the line.

ASI09 was one of the most striking examples of advanced governance enforcement and efficacy. Khushi constructed an agent probed with authority-framing attacks: requests wrapped in fake internal verification flows that pressured the agent into surfacing credentials it should never have disclosed. Uncontrolled (absent Prediction Guard), the manipulation worked. With Prediction Guard enabled, the same attacks were both detected and remediated.

Reading about Human-Agent Trust Manipulation is one thing. Actually engineering the prompts and watching an agent hand over secrets it should have protected was different. It confirmed how easily a production agent can be manipulated through authority framing alone, and how much the safeguarding layer matters once that happens.

— KHUSHI GUPTA

Structural risks are more challenging, and they surface the same cost, governance, and compliance trade-offs that come with scaling agentic AI. Results in two categories showed a much lower attack prevention rate for Prediction Guard's governance system at the time of testing, which was amazingly valuable feedback for the Prediction Guard AI engineering team. ASI06 Memory and Context Poisoning and ASI08 Cascading Failures both saw block rates of 33%.

Certain agents retain and index memory in unique ways (as evidenced by the leaked agent harness code from Claude Code). In many cases instructions are committed to an agent's memory management system before analysis by Prediction Guard's governance enforcement services. Once context is poisoned, subsequent benign-looking prompts may continue to produce manipulated output. Cascading failures behave similarly: an error or compromise propagating downstream between agents is a deeper, architectural problem.

These results immediately allowed the Prediction Guard AI Engineering team to understand where to focus governance harness and platform development. Guided by OWASP's remediation recommendations for ASI06 and ASI08, Prediction Guard has already released the following updates:

Provenance Tracking & Memory Sanitization

Every piece of data written to an agent's memory must be tagged with data lineage (provenance). Users of Prediction Guard's rapidly advancing "Agent Forge" platform can now look at this memory under "analytics," and security teams can export this lineage (knowledge base interactions and modifications) via telemetry coming off of a Prediction Guard deployment, classified against Prediction Guard's AI security event taxonomy for prioritizing which events warrant investigation.

Blast-Radius Caps and Least Agency

To adhere to a zero trust foundation for agentic AI, Prediction Guard's Admin Console updated its already strict AI system and governance segmentation, scoped API permissions, and controls over MCP tool access. An agent should only possess the minimum access required to do its job, ensuring a compromised agent cannot bring down adjacent systems.

Further, research and implementation is underway to better understand where an agent's outputs drift, or when it begins acting on and defending highly biased assumptions it shouldn't have learned. The results from this study were critical to guide this work.

What surprised me most was how persistent poisoned context could be, even after later interactions appeared completely harmless. The problem is much more stateful and cumulative than I had assumed. Prompt injection and toxicity filters caught some malicious inputs, but they were not enough once harmful instructions had already made their way into memory.

— SAIBYA PADHA, on ASI06

Closing the gaps requires governance enforcement going well beyond the request boundary where most "guardrail" systems stop. The results demonstrate that agentic security solutions must consider memory integrity validation, session isolation, circuit breakers and confirmation gates for high-impact actions, and explicit trust boundaries between cooperating agents. The team's confirmation gate prototype for ASI10 Rogue Agents performed well, and the pattern extends naturally to the structural risk categories.

The tool surface is high-leverage. Several risks converge on how agents use tools (e.g., via MCP servers). Saksham, who built a vulnerable customer support agent, identified this vector of attack.

Customer support is an interesting surface for red-teaming because the agent is designed to be helpful. That helpfulness becomes a weakness when a user pretends to be an admin, creates urgency, or hides a malicious instruction inside a normal-looking request. Building an agent is not just about getting good responses. It is about understanding what the agent should be trusted with in the first place.

— SAKSHAM SINGH

Shiven, who built much of the testing infrastructure that unified results across agents, framed the underlying difficulty:

Code is unique in the sense that it only does what you instruct it to. LLMs function more like a black box because they are trained on huge datasets with billions of parameters. We can't fully control or anticipate how they might behave, especially when malicious actors are actively trying to manipulate them. When companies deploy agentic AI in production, this kind of red-teaming work is mandatory to ensure that systems perform as intended and can't be broken.

— SHIVEN SAXENA

What this means for production agentic AI

Three implications for organizations deploying agents in regulated or high-stakes environments.

Red-team continuously, not at launch

The OWASP ASI Top 10 is a starting checklist. The attack catalog itself evolves every month, with new patterns appearing (poem-style prompts, multi-turn manipulation, stylized formatting attacks) faster than vendor benchmarks update. A static test suite goes stale fast. Build continuous adversarial testing into agent deployment processes, with results that feed back into both architecture and safeguarding policy.

Content filtering is necessary but not sufficient

ASI06 and ASI08 are not solvable by inspecting individual requests. Memory integrity, session isolation, trust boundaries between agents, and explicit human-in-the-loop gates for high-impact actions sit above the request boundary. Prediction Guard provides a governance layer with full audit trails. This is why getting the control layer right matters: the governance layer needs to be embedded beside the agent, and the surrounding architecture determines the actual blast radius of any compromise.

Test against realistic, not idealized, deployments

Agentic AI shipped under enterprise time pressure rarely matches reference architectures. Test harnesses that assume well-designed agents will pass tests that production fails. Adversarial evaluations should run against the kinds of weak agents that actually ship, because those are the agents safeguards have to defend.

Anvesha, who worked across multiple risk categories, summarized the lesson she wants to carry forward into her own engineering career:

Don't just look for massive system crashes. The most dangerous vulnerabilities are often the quiet ones where the agent subtly misinterprets its instructions but still looks completely confident.

— ANVESHA NAIN

From the mentor

MENTOR'S PERSPECTIVE

Working with this team across two semesters was one of the most valuable engagements I have had as a mentor. Adversarial thinking is hard to teach, and the team's evolution from defensive builders to systematic red-teamers was visible week over week.

What struck me most was their honesty in the findings. It would have been easy to inflate the block rates or quietly drop the harder structural risks. They didn't. The 33% block rate on ASI06 and ASI08 is the most important data point in this project, and it has directly shaped how we think about layering architectural governance on top of input-boundary controls inside Prediction Guard.

Shiven's proposal for an automated red-teaming agent that continuously scans other agents for vulnerabilities is exactly the kind of follow-on we want to pursue with academic partners. Thanks to our CEO Daniel Whitenack for backing student engagements as part of how we invest in the next generation of AI security practitioners.

Sharan Shirodkar

AI Engineer, Prediction Guard

Acknowledgments. Thank you to Anvesha Nain, Khushi Gupta, Saibya Padha, Saksham Singh, Shiven Saxena, Dong-Lin Shih and Jay Thota for their co-authorship on this work, and to the rest of the fall and spring student teams for their contributions across the year. Thanks to TA Gia Sareen, and to Pete Dragnev at The Data Mine for making this partnership possible. Thanks to Daniel Whitenack, CEO of Prediction Guard, for supporting academic engagements as part of how we build.

Deploying agentic AI in a regulated environment?

Prediction Guard is the sovereign AI control plane for healthcare, finance, defense, and government. If you are building or deploying agents under compliance constraints, we would like to compare notes.

Get in touch

View full post