Blog

Self-hosted vs. third-party deployment: A technical evaluation guide for regulated enterprises

Written by Daniel Whitenack | May 5, 2026 10:00:00 PM

Updated April 28, 2026

TL;DR: This guide focuses on fully self-contained, 100% data-sovereign AI systems, one architecture pattern Prediction Guard supports. The choice isn't whether you self-host the LLM (you may or may not). It's where AI governance and policy enforcement run. Cloud APIs create compliance gaps by moving audit logs outside your defined perimeter, while self-hosted deployments offer predictable costs at scale and full control over model dependencies. As one example, Prediction Guard offers a self-hosted control plane designed to enforce NIST and OWASP policies at the system level without requiring toolchain rebuilds; evaluate this and alternative solutions against your own infrastructure and compliance requirements.

Most engineering teams obsess over model latency while ignoring the structural risk of where their AI audit logs live. Cloud LLM APIs are fast to pilot, but when regulated data is in scope, you can't defensibly audit a trail that lives on someone else's servers. This guide speaks to the engineering and security leaders responsible for the architecture decision, and to the compliance and risk leaders who will sign off on it.

Self-hosted vs. third-party API

The architecture choice isn't primarily a performance decision. It's a governance decision. Where the control plane runs determines where your audit logs live, who controls your policy enforcement, and how much of your compliance posture depends on a vendor's attestation rather than your own verifiable evidence.

Auditable self-hosted AI setup

In a self-hosted deployment, a sovereign AI control plane, an internal infrastructure control plane that manages, secures, and logs all AI model interactions within your defined perimeter, means every component of the AI governance stack runs inside your own infrastructure rather than a vendor's. The data flow is fully contained: user prompt goes to your application, through your internal API, into a policy enforcement control plane, through your inference server, back through output validation, and into your local audit log storage. Nothing leaves your perimeter.

The operational reality includes real infrastructure management work. Models and AI components run across varied hardware, such as NVIDIA GPUs, CPUs, and accelerators, each with their own driver, library, and compatibility requirements that are prohibitive to manage and update without governance tooling.

On GPU infrastructure, updating CUDA environments across major versions (11.x to 12.x) requires a compatibility package, but does not automatically cause inference failures; within the same major version, minor version compatibility is guaranteed, and the NVIDIA forward compatibility package supports running newer toolkit applications on older base drivers across major release families, per the NVIDIA CUDA Compatibility documentation.

On CPU infrastructure, quantized models depend on runtime libraries such as ONNX Runtime or llama.cpp, where library version mismatches against the host OS or instruction set (AVX2 vs. AVX-512) can silently degrade throughput or cause inference failures that are difficult to reproduce across environments.

Large models (70B+ parameters) require careful memory management to avoid out-of-memory errors under high concurrency. Multi-node distributed inference requires RDMA over Converged Ethernet (RoCE) configuration and NVLink latency tuning. That's the cost of owning the stack, and for regulated environments, it's a cost worth understanding before you architect around it. Watch the EP12: Self-Hosted Sovereignty episode from the Prediction Guard webinar series for a practical walkthrough of what this architecture looks like in production.

Third-party API data retention and governance gaps

Cloud API architectures route AI inputs (prompts, injected context, documents, API and database information) through a vendor's infrastructure, where a external, closed, productized API gateway manages model access and retains logs and critically enforces governance in an external, closed, and opinionated way that is neither transparent nor configurable by your security team, according to the vendor's policies, not yours.

The deeper governance gap with external APIs is that policy enforcement happens outside your control. The Federal Government identified Anthropic as a supply chain risk specifically because their governance enforcement was non-configurable, non-transparent, and outside the user's control, not because of how data was retained.

Agentic AI compounds this exposure. As organizations deploy AI agents into high-trust environments, every ungoverned interaction those agents make becomes an unaudited compliance gap. The risk isn't employees using ChatGPT on a VPN. It's the agents your team built, taking actions on regulated data without policy enforcement on the outbound call.

Data residency and regulatory compliance

Where your AI workloads run determines where your data lives, who controls retention, and how much of your compliance posture depends on a vendor's policies rather than your own verifiable evidence.

Self-hosted LLM data location and transfer controls

In a self-hosted deployment, every AI input, every response, and every audit log stays in your defined perimeter. You control retention duration, access permissions, and audit formatting, making this a verifiable architecture fact rather than a vendor attestation you're accepting on trust. The Security & Self Hosted documentation details how this perimeter is maintained in production.

The compliance ambiguity with cloud APIs isn't hypothetical. CMMC Level 2 requires audit log retention with strict access controls; ITAR-controlled data cannot transit foreign infrastructure; the EU AI Act imposes documentation and post-market monitoring obligations on high-risk AI systems. Few external API providers' default retention and logging behaviors satisfy these requirements.

Even with a signed BAA, the legal weight depends on what the BAA explicitly covers. OpenAI's current HIPAA compliance analysis shows specific features like browsing and image generation are not HIPAA eligible, even when a BAA exists. The Aptible HIPAA-Claude BAA analysis confirms that organizations cannot rely on vendor logging to satisfy the six-year requirement without additional configuration.

Compliance reporting for regulated AI

A self-hosted deployment enables you to implement comprehensive prompt and response logging within your own infrastructure, with audit trails and performance metrics stored entirely inside your own security stack and forwardable to your existing SIEM.

See the Security & Self Hosted documentation for confirmed log format specifications and integration options. Follow data minimization principles (hashing or masking sensitive values per your retention policy) for each interaction. When an auditor asks who accessed what data and when, you produce a structured log from your own infrastructure, rather than requesting records from a vendor operating on a different retention schedule.

Preventing vendor lock-in for LLM governance

Vendor lock-in in LLM deployments is not just a commercial risk. It is a governance risk.

Auditable policy across LLM deployments

An AI governance policy that exists in a document but isn't enforced at the system level isn't a control. It's a liability that surfaces in the next audit cycle when an engineer under delivery pressure skipped the review step. Codifying governance into the system itself, as deterministic runtime enforcement rather than advisory checklists, eliminates the control drift that manual review processes create. The Golden Path for AI blog post makes this case from an infrastructure engineer's perspective.

Preventing model vendor lock-in

The model can be self-hosted or external, and the control plane treats both the same. Self-hosted control planes let you swap underlying models without rebuilding governance. You register a new model endpoint in the control plane, policies apply automatically, and the audit log format stays consistent. Changing models in a vendor-locked architecture means reconfiguring vendor-specific guardrails, re-mapping IAM roles, and rebuilding downstream integrations.

Preventing governance lock-in

AWS Bedrock Guardrails are configured through AWS-specific Console, SDK, and ApplyGuardrail API tooling tightly integrated with the AWS environment, and Azure AI Content Safety uses Azure-specific filter definitions. Neither configuration is portable to the other, and when you re-architect off a hyperscaler, the governance configuration built for that environment does not migrate with you.

Bedrock Guardrails and Azure Content Safety are point solutions providing individual checks (a temperature read), not comprehensive governance aligned with NIST AI RMF or OWASP. Holistic policy enforcement requires a control plane plugged into your security and monitoring infrastructure (the full health-care system, not the thermometer).

Preventing LLM vendor lock-in

Prediction Guard's self-hosted control plane exposes OpenAI-compatible and Anthropic-compatible endpoints, so existing application code connects to the governed control plane without rebuilding the toolchain, and regardless of which models you call. The LangChain integration via the langchain-predictionguard package is listed in the official LangChain integration catalogue. Setup requires one package installation and one environment variable:

pip install -qU langchain-predictionguard  import os os.environ["PREDICTIONGUARD_API_KEY"] = "<Your Prediction Guard API Key>"  from langchain_predictionguard import ChatPredictionGuard  chat = ChatPredictionGuard(model="Hermes-3-Llama-3.1-70B") messages = [     ("system", "You are an internal AI assistant for a defense-adjacent manufacturer.   Do not include controlled defense data (CUI) in outputs."),       ("human", "Draft a status summary of the Q2 production audit findings for         operations leadership.")   ] response = chat.invoke(messages) 

Output validation, toxicity filtering, and policy enforcement are configured in Prediction Guard's Admin console by security and GRC teams, not by developers. Whatever framework developers use (LangChain, LlamaIndex, plain HTTP), the configured policies are enforced. The LangChain integration guide for complete configuration options covers developer setup; governance configuration is documented separately. The EP10: The "USB-C" of AI episode from the Prediction Guard webinar series covers the composability argument in detail.

NIST AI RMF: your guide to responsible AI

The NIST AI Risk Management Framework, alongside CMMC and ITAR for defense-adjacent workloads and the EU AI Act for organizations operating in or selling into the EU, gives regulated organizations a structured method for generating verifiable governance evidence across the full AI lifecycle. This section maps each of the four Core Functions to the deployment architecture choices that determine whether that evidence stays inside your own infrastructure or depends on a vendor's attestation.

Ensuring AI regulatory compliance

The NIST AI RMF defines four core functions that regulated organizations must operationalize with verifiable, system-level evidence, not documented intentions: Govern, Map, Measure, and Manage. Self-hosted deployments generate that evidence inside the customer's own infrastructure.

Map risks: third-party AI governance architectures

The structural difference between architectures maps directly to where each NIST AI RMF function can be satisfied with auditable evidence:

NIST AI RMF Function

Self-hosted capability

External API limitation

Govern

Supports the Govern function by enabling system-enforced, monitored policies inside your perimeter, giving your organization direct control over policy execution evidence

Governance evidence depends on vendor-controlled enforcement infrastructure, limiting your organization's direct visibility into how policies are executed

Map

Supports the Map function by enabling a complete AIBOM with full model provenance and dependency tracing, giving your organization direct access to the contextual risk documentation the Map function requires

Satisfying the Map function's provenance and supply chain documentation requirements typically depends on vendor-provided transparency, which may limit your organization's ability to independently verify model training data and upstream dependencies

Measure

Supports the Measure function by providing direct access to system-level metrics, enabling your organization to define and apply the quantitative, qualitative, or mixed-method tools the Measure function requires

Satisfying the Measure function's assessment requirements typically depends on vendor-provided dashboards, which may limit your organization's ability to define custom measurement methodologies or access the underlying telemetry needed to implement the Measure-to-Manage feedback cycle through your own control plane

Manage

Supports the Manage function by enabling full organizational control over mitigation workflows, model versions, and update cycles, giving your team direct authority to implement and document risk response strategies as the Manage function requires

Satisfying the Manage function's risk response requirements typically depends on vendor-controlled update and access policy infrastructure, which may limit your organization's ability to independently implement, time, or evidence mitigation strategies when the underlying model or access policies change unilaterally

Assessing LLM RMF adherence

The NIST AI RMF Core Functions documentation specifies that Measure function outcomes feed the Manage function to assist risk monitoring and response. That feedback loop requires system-level metric access. Cloud APIs provide dashboards but not the underlying telemetry needed to implement the Measure-to-Manage cycle through your own control plane. Prediction Guard's capability-to-framework mapping tables document which NIST AI RMF functions each control addresses at the application, control plane, and infrastructure tiers.

Implementing OWASP LLM Top Ten defenses

The OWASP LLM Top Ten catalogues the critical security risks specific to large language model applications and prescribes system-level controls that go beyond advisory guidelines.

LLM prompt injection and output handling defenses

OWASP LLM01 (Prompt Injection) requires organizations to separate and clearly denote untrusted content to limit its influence on user prompts. It also requires regular penetration testing that treats the model as an untrusted user. A self-hosted control plane enforces this at the control plane level before prompts reach the model, blocking malicious patterns via policies deployed in the control plane rather than relying on model-level safety fine-tuning.

Prediction Guard's prompt injection detection documentation covers the system-level implementation. Watch the OWASP guidance for AI security episode from the Prediction Guard webinar series for the full implementation walkthrough.

OWASP LLM02 (Insecure Output Handling) requires output sanitization before downstream systems or users receive model responses. In a self-hosted deployment, automated PII detection and flow control occurs within your trusted boundary before the output is returned or monitored, so your sensitive data never leaves the organization. Cloud APIs, when called directly from the application layer, require organizations to implement their own output validation control; that control receives the unsanitized model output over an external API call, meaning the data has already crossed the perimeter before any redaction occurs. Where a self-hosted control plane sits between the application and the external endpoint, this changes: input governance — including PII detection and prompt injection screening — runs inside the trust boundary before the outbound call is made, and output governance runs on the returned response before it is passed back to the application, also inside the perimeter. The model may be external; the enforcement is not.

Securing LLM model dependencies

OWASP LLM08 (Supply Chain Vulnerabilities) requires tracking model dependencies for tampering, poisoned training data, and compromised software packages. An AI Bill of Materials is the structural mechanism: a standardized record identifying specific AI models, datasets, training pipelines, and software dependencies that transforms opaque AI systems into transparent, auditable assets.

The CycloneDX specification, which Prediction Guard supports, formalizes this structure with AI/ML and dataset components. These profiles define core documentation fields including model metadata, dataset metadata, software dependencies, infrastructure details, and security and governance documentation. Cloud APIs give you no visibility into the model's supply chain. A self-hosted deployment gives you full provenance for every component in the stack.

LLM DoS: self-hosted vs. third-party risks

Self-hosted deployments are subject to hardware-bounded throughput, so capacity planning is a direct engineering responsibility. Cloud APIs expose organizations to provider-side outages, rate limits, and policy changes that can interrupt production workloads with no notice. For air-gapped environments where internet connectivity is unavailable or prohibited, cloud API dependency is architecturally impossible, and the Air Gapped Deployment documentation details the configuration requirements for fully disconnected environments.

Operational costs: cloud vs. self-hosted LLM

The cost comparison between self-hosted and cloud API deployments is not linear: upfront hardware investment shifts the cost structure from variable per-token spend to a depreciating fixed asset, while governance infrastructure adds a build-versus-buy dimension that most teams underestimate.

Evaluating LLM deployment hardware

The primary cost variable in a self-hosted deployment is the control plane itself: the infrastructure, engineering time, and operational overhead required to run governance, policy enforcement, and audit logging inside your own perimeter.

Organizations can deploy the control plane on existing server infrastructure, on-premise, or within a private cloud environment, without any model hardware investment if inference is routed through external or third-party model endpoints. The cost structure separates into two lanes: the control plane, which is a fixed operational cost regardless of which models run behind it, and model inference hardware, which is an additional and optional capital investment for organizations that also self-host models.

Control plane infrastructure sizing depends on request volume, logging throughput, and policy evaluation latency requirements rather than model parameter count. Server infrastructure, networking, and storage requirements for the control plane alone are substantially lower than for model inference hardware, and the control plane can scale horizontally on commodity compute.

Hidden costs of control plane build vs. buy

Cloud LLM API pricing varies significantly by model tier: budget models are typically priced in the low cents per million tokens, while frontier and premium models can reach $15 or more per million tokens, though pricing changes frequently; consult official provider pricing pages such as OpenAI's pricing page, Google's Gemini API pricing, and Anthropic's pricing page for current verified figures.

At high token volumes, these costs compound linearly. The hidden addition is manual compliance overhead: spreadsheet-managed AI asset inventories, manual policy review documentation, and audit preparation spanning multiple fragmented tools. Those engineering hours don't go to product delivery.

The build-versus-buy decision for governance control plane infrastructure is where most teams underestimate cost. Building a control plane that satisfies NIST AI RMF alignment, enforces OWASP policy at the system level, and generates AIBOM output from scratch requires months of dedicated engineering work that produces no product delivery output during that period. Buying a pre-built, governed control plane converts that engineering investment into an integration task measured in days rather than months. The Decoding the LLM Landscape guide covers the evaluation framework in detail.

Cost component

Self-hosted

Cloud API

Upfront hardware

Significant capex (Gaudi 2 8-card kit ~$65k starting point)

$0

Annual API/token cost

Ongoing token-level costs are minimal once hardware is provisioned, though actual marginal cost varies by workload, model size, and infrastructure utilisation

Costs are billed per token and scale with usage volume and model tier; at high throughput, cumulative API spend can compound significantly, though actual trajectory depends on usage patterns and negotiated pricing

Governance migration

Governance configuration is designed to be portable across models within the same control plane, though actual portability depends on how policies are implemented and whether the new model's interface is compatible with existing enforcement logic

Governance configuration is typically coupled to provider-specific policy schemas and IAM structures, meaning a provider switch often requires rebuilding guardrails, access controls, and audit integrations, though the extent of that rebuild depends on how much provider-native tooling was adopted

5-year cost trajectory

Marginal cost per token may decrease over time as hardware is amortised across growing inference volume, though actual trajectory depends on hardware refresh cycles, model size changes, and operational staffing costs over the period

Per-token costs remain structurally tied to usage volume and model tier throughout the contract period, meaning cost trajectory is determined by usage growth and any pricing changes the provider applies, rather than by a fixed depreciating asset

Real-world LLM deployment in regulated sectors

Regulated sectors impose concrete, auditor-verifiable requirements on how AI systems are accessed, monitored, and controlled. These are the requirements that the architecture decision directly determines your ability to satisfy. This section covers CMMC and financial services obligations in detail, and addresses the air-gapped deployment scenario where cloud API dependency is not a governance tradeoff but an architectural impossibility.

CMMC and financial services LLM requirements

CMMC Level 2 maps directly to NIST SP 800-171 and requires defense contractors handling Controlled Unclassified Information (CUI) to satisfy Practice AC.1.001, which limits system access to authorized users and processes, and Practice AU.2.041, which mandates the creation and retention of audit logs sufficient to enable the monitoring, analysis, investigation, and reporting of unlawful or unauthorized activity.

Self-hosted deployments enforce access control at the Kubernetes RBAC layer: only roles with explicit CUI access invoke AI endpoints processing controlled data, API keys are scoped per use case, and network-level controls restrict CUI queries to secured VPN or bastion hosts. Audit logs generated inside your own perimeter satisfy AU.2.041 with records your organization controls directly.

Financial services regulators expect deterministic audit trails demonstrating that every model interaction was governed at the time of execution, not reconstructed after the fact. The structural problem with cloud APIs is that policy enforcement occurs on the vendor's infrastructure. If that enforcement changes unilaterally, the audit evidence for prior interactions may not reflect what your governance program documented.

Deploying LLMs in air-gapped environments

For mission-critical environments where network connectivity is unreliable or prohibited, cloud API dependency is architecturally impossible. SimWerx's medic copilot for military, EMS, and disaster relief field medics requires both speed and accuracy in pre-hospital environments.

"Prediction Guard provides a solution that enables them to host LLMs and generative AI behind the firewall, on their own premises." - Bill Streilein CTO, Noblis

The EP02: On-Prem & Air-Gapped AI episode from the Prediction Guard webinar series covers the operational requirements for air-gapped manufacturing and logistics deployments in comparable detail.

Mitigating LLM risks in regulated settings

Not every workload requires a self-hosted deployment, but in regulated environments, the scenarios that do are non-negotiable.

Mandatory self-hosted LLM scenarios

Self-hosted deployment is the defensible choice when:

  • Workloads process controlled defense data (CUI, ITAR), regulated financial data, manufacturing IP, or other data requiring multi-year audit log retention under your own control
  • The organization operates in air-gapped or restricted-network environments (defense-adjacent, secure government facilities)
  • Data sovereignty requirements prohibit any data egress outside a defined geographic or organizational boundary
  • Board-level risk posture requires auditable, internally-held evidence of AI governance rather than vendor attestation
  • Multi-cloud or multi-model architectures require portable governance that doesn't need rebuilding when the underlying vendor changes

When third-party model APIs remain viable

Third-party model APIs remain viable in both full-sovereign and hybrid architectures, provided they are routed through the self-hosted control plane rather than called directly from the application layer. When every outbound call to an external model endpoint passes through the control plane first, PII detection, prompt injection screening, and policy enforcement run inside your trust boundary before data leaves your perimeter.

The model lives outside; the governance does not. The scenarios where third-party model APIs become an unacceptable risk are those where the application layer calls them directly, bypassing the control plane entirely, meaning no policy enforcement fires, no audit log is generated inside your perimeter, and regulated data crosses the boundary without governance evidence. The architecture problem is never which model you use. It is whether governance runs inside your boundary on every interaction with it.

Audit-ready LLM deployment checklist

Moving from pilot to production in a regulated environment requires completing these steps in sequence:

  1. Enumerate all AI assets: Complete an AIBOM covering every model, tool, MCP server, and external API endpoint before any workload goes to production.
  2. Define data classification: Identify which data types each AI workload processes and map them to applicable regulatory requirements (HIPAA, financial services, export controls).
  3. Deploy system-level policy enforcement: Implement NIST AI RMF and OWASP LLM Top Ten controls at the control plane level, not as advisory guidelines.
  4. Validate and integrate audit logs: Confirm that every log field required for your regulatory framework is generated, retained locally, formatted for auditor review, and forwarded to your SIEM/SOAR for live monitoring
  5. Test policy enforcement deterministically: Verify that controls fire on every interaction, not on a sample. Probabilistic sampling fails regulatory audits that require evidence of comprehensive coverage.
  6. Document the architecture for GRC review: Produce an architecture diagram showing data flow, policy enforcement points, and audit log storage before presenting to the CISO or compliance team.

Self-hosted AI governance: what you need to know

Two questions that most commonly stall the self-hosted AI governance decision are whether a self-hosted control plane can deliver production-viable performance at scale, and how long it realistically takes to stand up a governed deployment compared to building equivalent governance infrastructure from scratch.

Self-hosted LLM performance vs. cloud APIs

With NVIDIA GPU deployment, time-to-first-token for production-class models lands well within ranges that support interactive applications. CPU-only deployments are viable for control plane operation and for smaller models where latency targets allow.

Estimating self-hosted LLM deployment time

Building equivalent governance infrastructure from scratch, including NIST AI RMF alignment, OWASP policy enforcement, AIBOM generation, and structured audit logging, takes months of dedicated engineering work with no product delivery output during that period, as the EP03: Agentic AI Automation episode addresses when discussing governed agentic deployment architecture.

If a regulator asked today which models are processing regulated data, under which policies, and where the audit logs for the last several years are stored, how long would it take your team to answer accurately? For most regulated enterprises currently operating on cloud APIs, that question exposes a governance gap that the next audit cycle will find regardless. The architecture decision you make now determines whether you're closing that gap proactively or explaining it reactively.

Book a deployment scoping call to assess whether self-hosted deployment fits your infrastructure and compliance requirements.

FAQs

What is the cost difference between self-hosted and cloud LLMs?

Self-hosted setups require upfront infrastructure investment (NVIDIA GPU servers if you self-host models, CPU servers for the control plane) plus engineering and operational costs. Third-party model APIs charge per token; pricing varies by model tier and scales with usage.

Can self-hosted LLMs run on CPU-only infrastructure?

The Prediction Guard control plane is always CPU-only. Models can optionally run on CPUs as well. Optimized smaller models perform adequately for many production workloads, while larger models or strict latency targets need NVIDIA GPUs. The control plane runs the same way regardless of where the model lives.

What OWASP LLM Top Ten items does a self-hosted control plane address directly?

System-level policy enforcement addresses LLM01 (Prompt Injection) through pre-inference input inspection, LLM02 (Insecure Output Handling) through post-inference PII detection and flow control within the trust boundary, and LLM08 (Supply Chain Vulnerabilities) through AIBOM generation that tracks model provenance and dependency integrity. See the OWASP LLM Top Ten documentation for complete item definitions.

What audit log retention does NIST AI RMF / CMMC require?

NIST AI RMF requires verifiable governance evidence retained inside your own infrastructure to satisfy the Manage-to-Measure feedback loop. CMMC Level 2 under NIST SP 800-171 requires AU.2.041 (audit log creation and retention sufficient for investigation and reporting) and AU.2.042 (individual user actions uniquely traceable), with NIST SP 800-92 recommending a minimum of twelve months of online retention plus archival storage. Cloud AI provider default retention policies do not satisfy either requirement without additional configuration. Organizations also subject to HIPAA must separately meet the Section 164.316(b)(2) six-year retention standard for systems containing ePHI.

Key terms glossary

Sovereign AI control plane: An internal infrastructure control plane that manages, secures, and logs all AI model interactions within an organization's defined perimeter, generating governance evidence that stays inside the customer's own environment rather than a vendor's.

AI Bill of Materials (AIBOM): A structured, auditable inventory of all AI models, datasets, training pipelines, software dependencies, and tools used within an enterprise AI system, based on the CycloneDX ML-BOM specification, which provides a standardized schema for documenting machine learning components, model metadata, datasets, and associated governance attributes.

Control drift: The gap that occurs when governance policies exist in written documents but are not deterministically enforced by the system at runtime, allowing inconsistent application of controls between audit cycles.

PII (Personally Identifiable Information): Any information that can be used to identify, contact, or locate a specific individual — either alone or in combination with other data — including names, email addresses, government ID numbers, financial account identifiers, and similar data elements subject to privacy and data protection regulations.

Prompt Injection: A class of attack in which malicious content embedded in a user prompt or external data source attempts to override, hijack, or manipulate the instructions governing a model's behavior, causing the model to ignore its intended system context or execute attacker-controlled instructions. Prediction Guard's prompt injection detection enforces this control at the control plane level before prompts reach the model.

PHI (Protected Health Information): Any individually identifiable health information covered under HIPAA's Privacy Rule.

ePHI (Electronic Protected Health Information): The electronic form of Protected Health Information specifically covered under HIPAA's Security Rule, and the form most relevant to AI system audit requirements.

NIST AI RMF: The NIST AI RMF defines four core functions (Govern, Map, Measure, Manage) for managing risk across the AI lifecycle, each requiring verifiable system-level evidence.