Skip to content

AI Runtime Behaviour Security — Core

Implementing behavioral controls for AI systems in production.

This is the implementation companion to the Foundation overview. The Foundation explains the architecture and principles. This section contains the risk classification criteria, control definitions, checklists, and specialised controls you need to implement them.


Reading Order

Start with the essentials, then branch into specialised topics based on your deployment:

Essential (read in order): 1. Risk Tiers — classify your system 2. Risk Assessment — quantify control effectiveness and residual risk per tier 3. Controls — implement the three-layer pattern 4. Agentic — add controls if your agent has tool access 5. IAM Governance — identity, lifecycle, delegation 6. Judge Assurance — measure and calibrate the Judge 7. Checklist — track implementation progress

Specialised (read based on your deployment type):

If you're deploying... Read
Multimodal models (image, audio, video) Multimodal Controls
Reasoning models (chain-of-thought) Reasoning Model Controls
Streaming responses Streaming Controls
Persistent memory or long context Memory and Context
Multi-agent systems Multi-Agent Controls then MASO
Open-weight / self-hosted models Open-Weight Models

PACE resilience (read after controls): - Control Layer Resilience — PACE for each control layer - PACE for Agentic AI — PACE for agentic deployments - PACE Checklist — verify your fail postures


The Fundamental Shift

Traditional software can be tested before deployment. AI cannot — not fully.

Traditional Software AI Systems
Deterministic outputs Non-deterministic
Testable at design time Emergent behavior
Known failure modes Adversarial discovery

The shift: From design-time assurance to runtime behavioral monitoring.


The Pattern

The industry is converging on three layers:

Layer Function Timing
Guardrails Block known-bad inputs/outputs Real-time
Judge Detect unknown-bad via LLM evaluation Async
Human Oversight Decide, act, remain accountable As needed

Guardrails prevent. Judge detects. Humans decide.

Where This Pattern Exists

This isn't theoretical. Production implementations include:

Platform Implementation
NVIDIA NeMo Guardrails Input, dialog, retrieval, execution, output rails
LangChain Middleware + human-in-the-loop
Guardrails AI Open-source validator framework
Galileo Eval-to-guardrail lifecycle
DeepEval LLM-as-judge evaluation
AWS Bedrock Guardrails Managed filtering
Azure AI Content Safety Content moderation

→ For detailed solution comparison, see Current Solutions

What's been missing: clear guidance on why this pattern is necessary and how to implement it proportionate to risk.


Scope

In: Custom LLM apps, AI decision support, document processing, agentic systems
Out: Vendor AI products, model training, data preparation


Quick Start

1. Classify Your System

Tier Profile Examples
CRITICAL Direct decisions, customer/financial/safety impact Credit decisions, fraud blocking
HIGH Significant influence, sensitive data Customer service with account access
MEDIUM Moderate impact, human review expected Internal Q&A, document drafting
LOW Minimal impact, non-sensitive Public FAQ, suggestions

2. Select Controls

Control LOW MEDIUM HIGH CRITICAL
Input guardrails Basic Standard Enhanced Maximum
Output guardrails Basic Standard Enhanced Maximum
Judge evaluation Sampling All All + real-time
Human review Exceptions Sampling Risk-based All significant

3. Implement in Order

  1. Logging — Can't evaluate what you don't capture
  2. Basic guardrails — Block obvious attacks
  3. Judge in shadow mode — Evaluate without action
  4. HITL queues — Somewhere for findings to go
  5. Operationalise — Act on findings, tune continuously

Core Documents

Document Purpose
Risk Tiers Classification criteria, control mapping
Risk Assessment Quantitative control effectiveness, residual risk analysis, worked examples per tier
Controls Guardrails, Judge, HITL implementation
Agentic Additional controls for agents
IAM Governance Identity governance, agent lifecycle, delegation, threats
Checklist Implementation tracking
Emerging Controls Multimodal, reasoning, streaming overview

Specialized Controls

Document Purpose
Judge Assurance Judge accuracy measurement and calibration
Multi-Agent Controls Controls for multi-agent systems
Multimodal Controls Controls for image, audio, and video AI
Memory and Context Long context and persistent memory controls
Reasoning Model Controls Controls for chain-of-thought reasoning models
Streaming Controls Controls for real-time streaming outputs

Analysis & Insights

Document Purpose
Oversight Readiness Problem Why human-in-the-loop fails and how to fix it
When the Judge Can Be Fooled Judge adversarial robustness
Open Weight Models Shift the Burden Self-hosted model control implications
Future Considerations Future framework scope

PACE Sections

Document Purpose
PACE Controls Section PACE framework — controls
PACE Agentic Section PACE framework — agentic controls
PACE Checklist Section PACE framework — implementation checklist

Architecture Overview

Architecture Overview


Extensions

Folder Contents
regulatory/ ISO 42001, EU AI Act mapping
technical/ Bypass prevention, infrastructure, metrics
templates/ Playbooks, threat models
examples/ Worked examples

Key Principles

  1. Match controls to risk — Don't over-engineer LOW tier systems
  2. Guardrails are necessary but not sufficient — They miss novel attacks and nuance
  3. Judge is assurance, not control — It detects; humans decide what to do
  4. Infrastructure beats instructions — Enforce technically, not via prompts
  5. Assume bypasses happen — Design for detection, not just prevention
  6. Humans remain accountable — AI assists; humans own outcomes

AI Runtime Behaviour Security, 2026 (Jonathan Gill).