AI Engineers¶

ML Engineers, AI Developers, Data Scientists, Platform Engineers — implementation patterns, not governance theory.

Part of Stakeholder Views · AI Runtime Behaviour Security

The Problem You Have¶

You're building AI systems. Your security and risk teams have requirements that sound like governance bureaucracy. You've been asked for "guardrails," "a Judge," "human oversight," "PACE resilience" — but what you actually need is:

What do I implement? Concrete patterns, not abstract principles.
Where do I put it? Architecture-level placement in the pipeline.
How do I test it? Verification that controls actually work.
What breaks if I get it wrong? Failure modes you need to handle.
What already exists? Libraries, services, and platform features you can use instead of building from scratch.

What This Framework Gives You¶

The three things you're building¶

Every AI system needs some combination of these. Your risk tier determines how much:

1. Guardrails — input/output filters that run on every request.

What you're implementing: - Input: injection detection, content policy check, PII redaction, schema validation - Output: hallucination check (ground against source), PII scan, toxicity filter, format validation - Latency budget: ~10-20ms total - Libraries: NVIDIA NeMo Guardrails, Guardrails AI, LangChain output parsers, AWS Bedrock Guardrails, Azure AI Content Safety

2. LLM-as-Judge — an independent LLM that evaluates your task agent's output.

What you're implementing: - A separate model (different from your task agent) that receives the input, output, and context - A structured evaluation prompt that checks policy compliance, factual grounding, safety, quality - A scoring/classification response (pass/fail/escalate with confidence) - Routing logic: pass → deliver, fail → block, low-confidence → human review queue

Key constraint: the Judge must use a different model than your task agent. Same-model evaluation has correlated failure modes. If GPT-4 hallucinates a fact, GPT-4 evaluating that fact has a higher chance of missing it than Claude evaluating it, and vice versa.

Implementation guide: LLM-as-Judge Implementation
Prompt examples: Judge Prompt Examples
Model selection: Judge Model Selection
Calibration: Judge Assurance

3. Circuit breaker / PACE fail postures — what your system does when control layers fail.

What you're implementing: - Health checks on guardrail and Judge services - Fallback routing when each layer is unavailable - A kill switch that removes the AI from the path entirely - Pre-defined degradation: full service → limited scope → human-only → static fallback

This is infrastructure code, not AI code. Treat it like any service reliability pattern.

Implementation by risk tier¶

Tier	What You Build	Judge Configuration	Human Oversight
LOW	Basic input/output guardrails	Optional — 1-5% sampling for monitoring	None (exception-based)
MEDIUM	Standard guardrails + Judge integration	5-10% sampling, batch evaluation	Review flagged items only
HIGH	Full guardrail suite + Judge + routing	20-50% coverage, near real-time	Flagged items + random sampling
CRITICAL	Hardened guardrails + Judge + human gate	100% coverage, synchronous (blocks delivery)	All high-impact decisions reviewed

Platform-specific patterns¶

If you're building on a specific platform, these map framework controls to platform services:

Platform	Pattern Guide	Key Services
AWS Bedrock	AWS Bedrock Patterns	Bedrock Guardrails, CloudWatch, IAM
Azure AI	Azure AI Patterns	Azure AI Content Safety, Responsible AI toolkit
Databricks	Databricks Patterns	MLflow, Unity Catalog, Model Serving
LangChain / LangGraph	Integration Guide	LangSmith, callbacks, output parsers

Testing your controls¶

Controls that aren't tested don't work. The framework provides:

Testing Guidance — structured test scenarios for each control layer
Red Team Playbook — 13 adversarial scenarios (prompt injection, data exfiltration, privilege escalation, consensus manipulation)
Judge Assurance — how to measure Judge accuracy, calibrate confidence thresholds, detect drift
When the Judge Can Be Fooled — failure modes specific to the evaluation layer

Your Starting Path¶

#	Document	Why You Need It
1	Controls	Three-layer implementation reference — what to build
2	Quick Start	Zero to working controls in 30 minutes
3	LLM-as-Judge Implementation	Judge layer patterns, prompts, routing logic
4	Judge Assurance	How to measure and calibrate Judge accuracy
5	Checklist	Track what you've implemented

If you're building agents: Agentic Controls — tool scoping, action classification, confirmation gates.

If you're building multi-agent systems: MASO Integration Guide — message bus signing, per-agent identity, cross-agent DLP.

If you're building RAG: RAG Security — the attack surface you probably haven't considered.

What You Can Do Monday Morning¶

Add input guardrails. If you have no controls today, start with injection detection on input. NVIDIA NeMo, Guardrails AI, or your platform's built-in content safety. This alone catches ~90% of known-pattern attacks.
Add output grounding. If your system uses RAG, validate that the response is actually grounded in the retrieved documents. This catches hallucinated facts before they reach users.
Implement a Judge on 10% of traffic. Pick a different model from your task agent. Use the Judge Prompt Examples as starting points. Log results. Measure the catch rate. This tells you what your guardrails are missing.
Wire a circuit breaker. If your guardrail service goes down, your system should degrade to a safe state — not continue without protection. A simple health check and fallback route takes an afternoon.
Red team your own system. Spend an hour trying to break it. The Red Team Playbook has structured scenarios. Document what you find. This is the most effective way to identify control gaps.

Common Objections — With Answers¶

"The Judge adds latency to every request." Only for CRITICAL tier. For HIGH tier, run it asynchronously — it doesn't block the response. For MEDIUM tier, run it on a sample. For LOW tier, it's optional. See Cost & Latency.

"Our model is already aligned / fine-tuned / safe." Model alignment is necessary but insufficient. Alignment reduces the base rate of harmful outputs but doesn't eliminate it. Prompt injection bypasses alignment. RAG poisoning bypasses alignment. Edge cases that weren't in the training data bypass alignment. Runtime controls catch what alignment misses.

"We don't have budget for a second model (the Judge)." The Judge doesn't have to be expensive. A smaller, faster model (Haiku-class) running a focused evaluation prompt often outperforms a larger model for specific policy checks. Sample at 10% to start. The Judge Model Selection guide covers cost-effective configurations.

"Human oversight doesn't scale." Correct — which is why the framework doesn't require human review of every transaction (except at CRITICAL tier). The Judge handles scale. Humans handle the edge cases the Judge flags and the random samples that keep the system honest. See Humans Remain Accountable.

"This is security's job, not mine." Security sets the requirements. You implement them. The framework gives you concrete patterns so you're not guessing. The Controls document tells you exactly what to build. The Checklist tracks your progress. Security reviews the result, not the implementation approach.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).