Updated May 18, 2026
TL;DR: Deploy governed, auditable AI agents without rebuilding your existing application code. By routing standard API patterns (OpenAI-compatible
/chat/completions, Anthropic-compatible/messages, and LangChain vialangchain-predictionguard) through a self-hosted sovereign AI control plane, every agent interaction is enforced against NIST AI Risk Management Framework (AI RMF) and OWASP AI security policies and monitored inside your own infrastructure. The Model Context Protocol (MCP) standardizes how agents access enterprise data, eliminating the ungoverned agent interactions that create audit exposure. Change thebase_url. Keep the governance.
The hardest part of deploying AI agents in a regulated enterprise isn't building the agent. It's proving to your end customers or security team that the agent is under control (i.e., governed). Engineering teams spend months building these systems, then security blocks deployment because the team cannot produce exportable proof that the agent will adhere to company or regulator policies, such as audit logs of policy violations, AI Bill of Materials (AIBOMs), inventories of AI assets used along with corresponding risk, etc. The integration layer, not the model itself, is where governance enforcement is determined.
AI agents autonomously plan and execute tasks using external tools, which means every tool call, every model invocation, and every retrieved document is a potential audit gap if it isn't monitored at the system level. Scaling these systems in regulated environments requires standard integration patterns routed through a self-hosted control plane so that NIST AI RMF, OWASP, or other policies are enforced without requiring developers to refactor existing codebases.
Developers already building with OpenAI-compatible APIs, Anthropic-compatible APIs, and LangChain shouldn't have to change that. This guide explains how to route each of those patterns through a self-hosted control plane so AI governance policies are enforced without disrupting existing workflows, covering OpenAI-compatible and Anthropic-compatible endpoints, the langchain-predictionguard package, and MCP server configuration.
The OpenAI API is the de facto standard interface for building chat, tool-calling, and streaming AI applications. Its /chat/completions endpoint and Python and TypeScript SDKs are what most developers reach for first, and the majority of AI application code in production today is written against it. The Anthropic API offers a Claude-compatible alternative with a slightly different message structure, favoured for structured reasoning tasks and teams already building on Claude-compatible models. LangChain is the most widely adopted agent orchestration framework: it gives developers a composable, model-agnostic layer for building multi-step agent workflows, RAG pipelines, and tool-calling sequences without coupling application logic to any single provider.
Developers value these tools because they are well-documented, widely supported, and already embedded in existing codebases. The problem isn't the tools. The problem is that none of them, used in isolation, give security and compliance teams the policy enforcement, audit logging, or data residency controls that regulated enterprise deployment requires.
Routing these same patterns through a self-hosted control plane resolves that conflict. Developers keep the SDKs and frameworks they already use. The control plane intercepts every call at the API layer, enforces NIST AI RMF and OWASP policies, and generates audit logs inside your own infrastructure. No application code refactoring required. Standardizing on compatible APIs is a strategic decision, not just a developer convenience, because it determines whether your governance architecture survives a model vendor change.
The four integration patterns covered here each address a different industry standard pattern:
|
Framework |
Primary use case |
Governance integration method |
Deployment flexibility |
|---|---|---|---|
|
OpenAI-compatible API |
Chat, tool-calling, streaming agents |
SDK client constructor redirects to control plane via |
Any model behind a compatible endpoint |
|
Anthropic-compatible API |
Claude-compatible agents and structured reasoning |
SDK client constructor redirects to control plane via |
Any compatible endpoint in the control plane |
|
LangChain |
Multi-step agent workflows and RAG |
Native control plane integration via partner package |
Models explicitly supported by the |
|
MCP (Model Context Protocol) |
Standardized agent access to enterprise data |
Registered MCP servers governed by control plane |
Any MCP-compatible tool or data source |
Understanding where each pattern fits, including the fragmentation risk of not standardising, prevents the patchwork of custom connectors that creates audit exposure and blocks production sign-off. The harmonizing AI tools episode covers this in detail.
Developers expect model integrations to be stable. When a new model releases, the reasonable expectation is to evaluate it, swap it in if it performs better, and move on. That expectation is reasonable precisely because the SDK interface doesn't change: the same /chat/completions or /messages contract, the same tool definitions, the same application logic.
The problem is that custom integrations break that expectation. Every SDK initialization is a potential maintenance surface. Every authentication configuration is a manual update. Every model swap triggers a full re-verification cycle: locate every SDK call, update credentials, retest agent behavior, and re-verify that governance coverage still applies to the new model. In regulated environments, that last step is the one that blocks production deployment because security teams have no automated way to confirm that the new model is covered by the same policies as the old one.
A self-hosted control plane resolves this by isolating the model-switching decision from both application code and governance configuration. Developers change one parameter. Security teams update model policies in the Admin console, not in application code. Governance coverage carries forward automatically because enforcement lives at the API layer, not inside the integration. The golden path for AI post covers how this architecture reduces integration maintenance overhead at the system level.
External AI gateways generate audit logs outside your perimeter. AWS Bedrock Guardrails policies store configuration in a primary region, and while cross-region inference can distribute compute across AWS regions for guardrail evaluations, data sovereignty boundaries are still defined by the cloud provider: input prompts and outputs may transit regions during cross-region inference, and all governance configuration and audit data remains within AWS infrastructure.
A self-hosted control plane inverts this. Every governance policy decision and every monitored interaction log is generated and stored inside your own infrastructure. For regulated workloads, that isn't a preference. It's a prerequisite. The agentic AI threats episode covers specific threat vectors that ungoverned agent interactions introduce into enterprise environments.
A composable control plane registers Llama 3 variants, Mistral, Hermes-3, and closed vendor endpoints under one set of AI governance policies. Swap the underlying model and the policy enforcement layer doesn't change. The choosing an AI model episode addresses how to evaluate model performance without rebuilding governance each time a new model enters the market.
Prediction Guard exposes an OpenAI-compatible /chat/completions endpoint that accepts standard OpenAI SDK calls. No new SDK is required and no agent logic changes.
For self-hosted deployments, data never transits Prediction Guard's infrastructure or any third-party environment because the control plane runs entirely inside your own Kubernetes environment, on-premises or in a cloud VPC. The architecture is hardware and infrastructure agnostic on NVIDIA GPU, so model inference runs on GPU or CPU depending on workload requirements. This directly addresses the data egress risk that prevents regulated AI workloads from using external model endpoints.
The single required code change is the base_url parameter in the OpenAI SDK initialization. In most cases, messages, tool definitions, and streaming flags require no changes:
from openai import OpenAI
# Before: routes to api.openai.com
# client = OpenAI(api_key="sk-...")
# After: routes to the self-hosted control plane
client = OpenAI(
base_url="https://your-predictionguard-endpoint/v1",
api_key="your-pg-api-key",
)
response = client.chat.completions.create(
model="Hermes-3-Llama-3.1-70B",
messages=[
{
"role": "user",
"content": "Summarize this contract.",
}
],
)
API key isolation: Control plane API keys authenticate every model call through the governed API layer, preventing direct access to underlying model endpoints outside governance policy boundaries.
Preventing unsanctioned tool execution: Every agent call must route through the control plane before reaching a model or tool, ensuring governance policies evaluate every interaction before execution.
ASI01 coverage: Prompt injection detection runs before inputs reach any model, enforcing AI governance policy at the system level across every request and directly mitigating OWASP ASI01 (Agent Goal Hijack).
The standard /chat/completions endpoint supports synchronous and streaming responses without modification. Governance policies (toxicity filtering, factual consistency checking) apply before responses reach the calling application, and every policy decision is captured in the audit log inside your perimeter.
Prediction Guard also exposes an Anthropic-compatible /messages endpoint, covering teams that have built agents on Claude-compatible models without requiring a separate SDK or parallel governance configuration.
The control plane governs /messages calls identically to /chat/completions calls, with the same policy enforcement and the same audit log format. A single Admin console manages policies across both endpoint types, so security teams maintain one governance configuration rather than separate configurations per vendor.
The Anthropic /messages structure differs from OpenAI in one key way: Anthropic's API structure takes the system prompt as a top-level parameter rather than a message role. The governed endpoint accepts this structure as-is, so agent message logic written for Claude-compatible models continues working without modification:
import anthropic
client = anthropic.Anthropic(
api_key="your-pg-api-key",
base_url="https://your-predictionguard-endpoint",
)
response = client.messages.create(
model="your-configured-model",
max_tokens=1024,
system=(
"You are a document analysis assistant "
"for regulated financial data."
),
messages=[
{
"role": "user",
"content": "Extract key obligations from this agreement.",
}
],
)
Migrating an existing Claude deployment to a governed endpoint takes three steps:
anthropic.Anthropic(api_key=...) call in the codebase.base_url: Add base_url="https://your-predictionguard-endpoint" to the client constructor.Before swapping, store the original API key and base_url values in your secrets manager or environment configuration rather than deleting them. If the governed endpoint is unreachable or returns unexpected errors post-migration, revert by restoring the original API key from your secrets manager and removing the base_url parameter from the client constructor, with no other application code changes required:
import os
import anthropic
# Rollback: remove base_url and restore the original Anthropic API key
client = anthropic.Anthropic(
api_key=os.environ["ANTHROPIC_API_KEY"],
)
The integration pattern maintains compatibility with existing agent logic and message structures. The /messages endpoint contract is identical. The self-hosted vs. third-party deployment guide covers the regulatory trade-offs to evaluate before committing to this migration path.
LangChain is the most common agent orchestration framework in regulated enterprise environments. Prediction Guard publishes a LangChain integration package listed in the official LangChain integration catalogue.
Install the package and set the API key environment variable:
pip install langchain-predictionguard
export PREDICTIONGUARD_API_KEY="your-pg-api-key"
Alternatively, pass the key as an inline parameter, ChatPredictionGuard(model="Hermes-3-Llama-3.1-70B", predictionguard_api_key="your-pg-api-key"), consistent with the constructor pattern shown in the OpenAI and Anthropic sections above. Both approaches are equivalent. The environment variable pattern is shown here as the LangChain convention for managing credentials outside application code.
Once the environment variable is set, initialize the chat model in your agent code:
from langchain_predictionguard import ChatPredictionGuard
chat = ChatPredictionGuard(
model="Hermes-3-Llama-3.1-70B",
)
ChatPredictionGuard routes LangChain's standard chat interface through the OpenAI-compatible endpoint on the control plane, so existing agent code (tool definitions, memory configurations, retrieval chains) continues working without modification. The composability episode frames this standardization problem precisely: one controlled interface governs all the connected components.
Security teams configure which tools a LangChain agent is permitted to call directly in the control plane's Admin console, keeping governance policy decisions separate from application code. To configure tool permissions, security teams use the Admin console to locate the relevant agent, restrict or permit individual tools, and save the updated governance policy. Changes apply to all subsequent agent calls immediately, with no application redeployment required.
Tools set to Blocked are rejected at the control plane before execution, and the rejection is recorded in the audit log with the agent ID, tool name, and the governance policy rule that triggered the block. An engineer adds a new tool to the agent's list, and the control plane evaluates that tool call against governance policy before execution. This directly mitigates OWASP ASI05 (Unexpected Code Execution), which covers agents generating or running code and commands outside defined governance policy boundaries.
Multi-step LangChain workflows pass through the control plane at every model call. The audit log captures the full agent workflow sequence, which is the artifact an auditor needs to verify the agent operated within defined policy boundaries throughout its execution. The AI-driven document processing episode covers comparable multi-step workflow governance patterns in production environments.
The Model Context Protocol is an open standard introduced by Anthropic in November 2024 that standardizes how AI systems integrate with external tools, databases, and data sources. Before MCP, every data source integration required a custom connector, creating an N×M integration problem where each agent needed bespoke code to access each tool. MCP replaces that with a single, governable protocol. Prediction Guard's native MCP integration governs all MCP-registered tool calls at the control plane level.
An MCP server implements two core capabilities: advertising the tools it supports (including JSON schema for inputs and outputs) and responding to call_tool requests from agents. When an agent needs to access a tool, it queries the MCP client for available tools, selects the appropriate one, and sends a structured call. The MCP server validates the call, executes the underlying integration, and returns a standardized response. Every step is interceptable and loggable by the control plane.
Here is how an agent accesses a CRM contact record via a governed MCP integration:
crm_lookup tool schema with its JSON input and output contracts.To configure an MCP server so that every data access request is monitored and auditable:
The alternative to MCP is individual teams building custom API connectors for each tool an agent needs to access. That approach creates audit gaps: each custom connector is a point integration with its own authentication model, its own failure modes, and no central audit log. MCP standardizes the access pattern, and the control plane registers every MCP server before it is accessible to agents. An agent cannot call an unregistered tool, giving security teams a complete tool inventory that feeds directly into the AIBOM.
Governed deployment means every model call, tool invocation, and policy decision is enforced and monitored inside your own infrastructure before an auditor ever asks for evidence. The three capabilities below cover how that enforcement is structured, how audit artifacts are generated, and why self-hosted deployment is the prerequisite for regulated workloads.
Prediction Guard enforces a strict separation of duties. Developers use OpenAI-compatible or Anthropic-compatible SDKs to point existing code at the control plane and ship features. Security and GRC teams configure AI governance policies in the Admin console without touching application code or triggering agent redeployment. The control plane applies those policies to every model call, regardless of which SDK or framework the developer used.
The self-hosted sovereignty episode explains why this separation matters structurally: governance policies defined in a document are not the same as AI governance policies enforced at the system level.
Every model interaction, tool call, and policy decision generates a structured log entry stored inside the customer's own infrastructure for self-hosted deployments. Separately, the control plane generates an AI Bill of Materials (AIBOM): a machine-readable inventory of every model, tool, dataset, and dependency in each AI system. To generate and export an AIBOM from the control plane, use the AIBOM export capability in the Admin console: select the AI system or agent scope you want to capture, choose CycloneDX as the export format, and download the resulting file.
The exported file contains every model identifier, tool registration, dataset reference, dependency version, and the governance policy applied to each, structured for direct submission to an auditor or ingestion into an audit tracking system without reformatting. This is the artifact auditors commonly need when asking which models processed regulated data and under which policies. The alternative, manually compiling this from spreadsheets and email threads, doesn't scale across production AI deployments.
For self-hosted deployments, no data, no governance logic, and no audit log transits Prediction Guard's infrastructure or any third-party environment. This directly addresses the data egress risk that prevents manufacturing IP, Controlled Unclassified Information (CUI), ITAR-controlled data (International Traffic in Arms Regulations), and regulated financial workloads from running on external AI.
The air-gapped AI episode covers reliability and deployment architecture for constrained manufacturing and logistics environments specifically. Prediction Guard's internal analysis indicates up to a 4X TCO reduction compared to assembling fragmented point solutions. Independent verification of scope and methodology has not been published.
The NIST AI Risk Management Framework (AI RMF) Govern, Map, Measure, and Manage functions align to specific control plane capabilities:
The OWASP Agentic AI Top Ten items map directly to control plane enforcement capabilities. Prompt injection filtering at the API layer mitigates ASI01 (Agent Goal Hijack). Enforcing an allowlist of MCP-registered tools mitigates ASI05 (Unexpected Code Execution). Authentication requirements for inter-agent communication support mitigation of ASI07 (Insecure Inter-Agent Communication). The OWASP guidance episode walks through these mitigations at an implementation level.
A document processing agent using LangChain and ChatPredictionGuard routes every model call in a multi-step workflow through the control plane, so governed policies and audit logging apply across the full agent execution sequence. Factual consistency checking applies to outputs before they are written to a downstream system, and the full processing chain is monitored as a structured sequence of governed interactions.
For a ticketing workflow, a similar pattern applies: register Jira or ServiceNow as an MCP server in the control plane, and every ticket interaction is monitored with agent ID, action, and policy decision. The PII detection guide for regulated industries covers the redaction configuration that applies across both patterns when regulated data is in scope.
Book a deployment scoping call to assess whether self-hosted deployment fits your infrastructure and compliance requirements.
Yes. If your code uses the standard OpenAI Python or TypeScript SDK, changing the base_url parameter to the Prediction Guard endpoint URL routes all calls through the governed control plane without any other code changes. The /chat/completions endpoint contract is fully compatible.
For OpenAI, update base_url in the SDK client constructor and replace the API key with a control plane key. For Anthropic-compatible agents, follow the three-step migration path in the Anthropic integration section above: locate SDK initialization calls, redirect to the governed endpoint, and replace the native API key with a control plane key. Before cutting over to production, validate the governed endpoint in a staging environment using a representative sample of production requests, confirm responses are semantically consistent with pre-migration behaviour, and verify in the Admin console audit log that governance policies are being applied as expected. Neither migration pattern requires changes to agent logic, message structure, or tool definitions in most cases.
Yes. The control plane registers open-source models, self-hosted models, and closed vendor endpoints under one AI governance policy configuration, so you can route different agent tasks to different models while applying the same governance policies across all of them without rebuilding policy enforcement per provider.
For self-hosted deployments, all AI inputs, outputs, and audit logs are processed and stored inside the customer's own infrastructure, with no data transiting Prediction Guard's systems or any third-party environment. For Prediction Guard Managed Cloud deployments, the same API compatibility, policy enforcement, and audit logging capabilities apply, but Prediction Guard operates the control plane on your behalf rather than inside your own Kubernetes environment. Data handling, residency, and perimeter boundaries are therefore governed by the terms of your service agreement rather than your own infrastructure controls. Regulated workloads with CUI, ITAR, or strict data residency requirements should evaluate those terms against their specific regulatory obligations before selecting Managed Cloud. Contact Prediction Guard directly to confirm current data residency options and applicable regulatory certifications.
Every model call routed through ChatPredictionGuard is processed by the control plane, which generates structured log entries inside your infrastructure capturing governance policy decisions tied to each interaction. These entries contribute to the AIBOM and the overall audit log available to your compliance and security teams.
OpenAI-compatible API: An endpoint that accepts the same request structure as OpenAI's /chat/completions route, allowing existing OpenAI SDK code to connect without modification by changing only the base_url parameter.
Anthropic-compatible API: An endpoint matching Anthropic's /messages contract, which differs from OpenAI primarily in how system prompts are passed as top-level parameters rather than message roles.
Model Context Protocol (MCP): An open standard introduced by Anthropic in November 2024 that defines how AI agents discover, authenticate, and call external tools through a governed interface, replacing N×M custom connector proliferation with a single auditable protocol.
NIST AI Risk Management Framework (AI RMF): A voluntary framework published by the National Institute of Standards and Technology that organises AI risk management into four functions (Govern, Map, Measure, and Manage), providing a structured approach for organisations to identify, assess, and mitigate AI-related risks. Referenced by regulated enterprises and auditors as a benchmark for evaluating AI governance programme maturity.
AI Bill of Materials (AIBOM): A machine-readable inventory of every model, dataset, tool, and dependency in an AI system, exported in CycloneDX format. Increasingly expected by auditors and regulatory frameworks to provide traceability over which models processed regulated data and under which governance policies, particularly in jurisdictions where AI transparency and supply chain disclosure obligations apply.
CycloneDX: An OWASP-maintained open standard for software bill of materials (SBOM) and security transparency, defining a machine-readable format for inventorying software components, dependencies, and supply chain metadata. Used by Prediction Guard's AIBOM export to provide auditors with a structured, standardized record of AI system composition.
Sovereign AI control plane: A self-hosted governance architecture that enforces NIST AI RMF and OWASP policies at the API layer while keeping all data, enforcement logic, and audit logs inside the customer's own infrastructure.
ASI codes (OWASP Agentic AI Top Ten): The numbered risk codes used in the OWASP Agentic AI Top Ten, such as ASI01 Agent Goal Hijack, ASI05 Unexpected Code Execution, and ASI07 Insecure Inter-Agent Communication. Each ASI code describes a distinct threat vector specific to autonomous agent systems, distinct from the OWASP LLM Top Ten which addresses traditional AI application risks.