AI Runtime Behaviour Security — Core¶

Implementing behavioral controls for AI systems in production.

This is the implementation companion to the Foundation overview. The Foundation explains the architecture and principles. This section contains the risk classification criteria, control definitions, checklists, and specialised controls you need to implement them.

Reading Order¶

Start with the essentials, then branch into specialised topics based on your deployment:

Essential (read in order): 1. Risk Tiers — classify your system 2. Risk Assessment — quantify control effectiveness and residual risk per tier 3. Controls — implement the three-layer pattern 4. Agentic — add controls if your agent has tool access 5. IAM Governance — identity, lifecycle, delegation 6. Judge Assurance — measure and calibrate the Judge 7. Checklist — track implementation progress

Specialised (read based on your deployment type):

If you're deploying...	Read
Multimodal models (image, audio, video)	Multimodal Controls
Reasoning models (chain-of-thought)	Reasoning Model Controls
Streaming responses	Streaming Controls
Persistent memory or long context	Memory and Context
Multi-agent systems	Multi-Agent Controls then MASO
Open-weight / self-hosted models	Open-Weight Models

PACE resilience (read after controls): - Control Layer Resilience — PACE for each control layer - PACE for Agentic AI — PACE for agentic deployments - PACE Checklist — verify your fail postures

The Fundamental Shift¶

Traditional software can be tested before deployment. AI cannot — not fully.

Traditional Software	AI Systems
Deterministic outputs	Non-deterministic
Testable at design time	Emergent behavior
Known failure modes	Adversarial discovery

The shift: From design-time assurance to runtime behavioral monitoring.

The Pattern¶

The industry is converging on three layers:

Layer	Function	Timing
Guardrails	Block known-bad inputs/outputs	Real-time
Judge	Detect unknown-bad via LLM evaluation	Async
Human Oversight	Decide, act, remain accountable	As needed

Guardrails prevent. Judge detects. Humans decide.

Where This Pattern Exists¶

This isn't theoretical. Production implementations include:

Platform	Implementation
NVIDIA NeMo Guardrails	Input, dialog, retrieval, execution, output rails
LangChain	Middleware + human-in-the-loop
Guardrails AI	Open-source validator framework
Galileo	Eval-to-guardrail lifecycle
DeepEval	LLM-as-judge evaluation
AWS Bedrock Guardrails	Managed filtering
Azure AI Content Safety	Content moderation

→ For detailed solution comparison, see Current Solutions

What's been missing: clear guidance on why this pattern is necessary and how to implement it proportionate to risk.

Scope¶

In: Custom LLM apps, AI decision support, document processing, agentic systems
Out: Vendor AI products, model training, data preparation

Quick Start¶

1. Classify Your System¶

Tier	Profile	Examples
CRITICAL	Direct decisions, customer/financial/safety impact	Credit decisions, fraud blocking
HIGH	Significant influence, sensitive data	Customer service with account access
MEDIUM	Moderate impact, human review expected	Internal Q&A, document drafting
LOW	Minimal impact, non-sensitive	Public FAQ, suggestions

2. Select Controls¶

Control	LOW	MEDIUM	HIGH	CRITICAL
Input guardrails	Basic	Standard	Enhanced	Maximum
Output guardrails	Basic	Standard	Enhanced	Maximum
Judge evaluation	—	Sampling	All	All + real-time
Human review	Exceptions	Sampling	Risk-based	All significant

3. Implement in Order¶

Logging — Can't evaluate what you don't capture
Basic guardrails — Block obvious attacks
Judge in shadow mode — Evaluate without action
HITL queues — Somewhere for findings to go
Operationalise — Act on findings, tune continuously

Core Documents¶

Document	Purpose
Risk Tiers	Classification criteria, control mapping
Risk Assessment	Quantitative control effectiveness, residual risk analysis, worked examples per tier
Controls	Guardrails, Judge, HITL implementation
Agentic	Additional controls for agents
IAM Governance	Identity governance, agent lifecycle, delegation, threats
Checklist	Implementation tracking
Emerging Controls	Multimodal, reasoning, streaming overview

Specialized Controls¶

Document	Purpose
Judge Assurance	Judge accuracy measurement and calibration
Multi-Agent Controls	Controls for multi-agent systems
Multimodal Controls	Controls for image, audio, and video AI
Memory and Context	Long context and persistent memory controls
Reasoning Model Controls	Controls for chain-of-thought reasoning models
Streaming Controls	Controls for real-time streaming outputs

Analysis & Insights¶

Document	Purpose
Oversight Readiness Problem	Why human-in-the-loop fails and how to fix it
When the Judge Can Be Fooled	Judge adversarial robustness
Open Weight Models Shift the Burden	Self-hosted model control implications
Future Considerations	Future framework scope

PACE Sections¶

Document	Purpose
PACE Controls Section	PACE framework — controls
PACE Agentic Section	PACE framework — agentic controls
PACE Checklist Section	PACE framework — implementation checklist

Architecture Overview¶

Architecture Overview

Extensions¶

Folder	Contents
regulatory/	ISO 42001, EU AI Act mapping
technical/	Bypass prevention, infrastructure, metrics
templates/	Playbooks, threat models
examples/	Worked examples

Key Principles¶

Match controls to risk — Don't over-engineer LOW tier systems
Guardrails are necessary but not sufficient — They miss novel attacks and nuance
Judge is assurance, not control — It detects; humans decide what to do
Infrastructure beats instructions — Enforce technically, not via prompts
Assume bypasses happen — Design for detection, not just prevention
Humans remain accountable — AI assists; humans own outcomes

AI Runtime Behaviour Security, 2026 (Jonathan Gill).