AI Runtime Behaviour Security — Single-Agent Controls¶

Runtime behavioural security for single-model AI deployments. Guardrails, LLM-as-Judge, and human oversight — scaled to the risk.

Part of the AI Runtime Behaviour Security Version 1.0 · February 2026 · Jonathan Gill

How This Section Is Organised¶

This page is the conceptual overview — it explains the architecture, the risk-scaling model, and how the pieces connect. The implementation details — risk classification criteria, specific control definitions, checklists, and specialised controls for multimodal, reasoning, streaming, and memory — live in the Core directory.

If you want to...	Go here
Understand the architecture and principles	You're in the right place — keep reading
Classify a system and select controls	Core: Risk Tiers → Controls
See the implementation checklist	Core: Checklist
Read specialised controls (multimodal, reasoning, streaming, memory)	Core: Specialised Controls

Architecture¶

Single-Agent Security Architecture

Three layers, one principle: you can't fully test a non-deterministic system before deployment, so you continuously verify behaviour in production.

Layer 1 — Guardrails block known-bad inputs and outputs at machine speed (~10ms). Deterministic pattern matching: content filters, PII detection, topic restrictions, rate limits. Every request passes through. No exceptions.

Layer 2 — LLM-as-Judge catches unknown-bad through independent model evaluation (~500ms–5s). A separate LLM evaluates task agent outputs against policy, factual grounding, tone, and safety criteria. Catches what guardrails can't pattern-match.

Layer 3 — Human Oversight provides the accountability backstop. Scope scales with risk: low-risk systems get spot checks, high-risk systems get human approval before commit. Humans decide edge cases. Humans own outcomes.

Circuit Breaker stops all AI traffic and activates a non-AI fallback when any layer fails. Not a degradation — a full stop with a predetermined safe state.

This pattern already exists in production at major platforms: NVIDIA NeMo, AWS Bedrock, Azure AI, LangChain, Guardrails AI, and others. This framework provides the vendor-neutral implementation: risk classification, controls, fail postures, and tested fallback paths.

Get Started¶

If you want to...	Go here
Get the whole framework on one page	Cheat Sheet / Decision Poster
Deploy low-risk AI fast	Fast Lane
Understand the concepts in 30 minutes	Quick Start
Implement controls with working code	Implementation Guide
Classify a system by risk	Risk Tiers
Deploy an agentic AI system	Agentic Controls
Understand what happens when controls fail	PACE Resilience
Enforce controls at the infrastructure layer	Infrastructure Controls
Track your implementation	Checklist
Secure a multi-agent system	MASO Framework

Before You Build Controls¶

The First Control: Choosing the Right Tool

The most effective way to reduce AI risk is to not use AI where it doesn't belong. Before guardrails, judges, or human oversight — ask whether AI is the right tool for this problem.

If your deployment is internal, read-only, handles no regulated data, and has a human reviewing output — start with the Fast Lane. You may not need the rest.

Risk-Scaled Controls¶

Controls scale to risk so low-risk AI moves fast and high-risk AI stays safe.

Risk Tier	Controls Required	PACE Posture	Use Case Examples
Low	Fast Lane: minimal guardrails, self-certification	P only (fail-open with logging)	Internal chatbots, document summarisation, code assistance
Medium	Guardrails + Judge, periodic human review	P + A configured	Customer-facing content, recommendation engines, search
High	All three layers, human-in-the-loop for writes	P + A + C configured and tested	Financial advice, medical support, regulatory decisions
Critical	Full architecture, mandatory human approval	Full PACE cycle with tested E→P recovery	Autonomous actions on regulated data, safety-critical systems

Classify your system: Risk Tiers

PACE Resilience¶

Every control has a defined failure mode. The PACE methodology ensures that when a control layer degrades — and it will — the system fails safely rather than silently.

Primary: All layers operational. Normal production.

Alternate: One layer degraded. Backup activated. Scope tightened. Example: Judge layer is down → guardrails remain active, all outputs queued for human review.

Contingency: Multiple layers degraded. AI operates in supervised-only mode. Human approves every action. Reduced capacity, high assurance.

Emergency: Confirmed compromise or cascading failure. Circuit breaker fires. AI traffic stopped. Non-AI fallback activated. Incident response engaged.

Even at the lowest risk tier, there's a fallback plan. At the highest, there's a structured degradation path from full autonomy to full stop.

Core Documents¶

Document	Purpose
Cheat Sheet	Entire framework on one page — classify, control, fail posture, test
Decision Poster	Visual one-page reference
Fast Lane	Pre-approved minimal controls for low-risk AI
Risk Tiers	Classify your system, determine control and resilience requirements
Risk Assessment	Quantitative control effectiveness, residual risk per tier, NIST AI RMF aligned
Controls	Guardrails, Judge, and Human Oversight implementation with per-layer fail postures
Agentic	Controls for single autonomous AI agents including graceful degradation path
PACE Resilience	What happens when controls fail
Checklist	Track implementation and PACE verification progress
Emerging Controls	Multimodal, reasoning, and streaming considerations (theoretical)

Infrastructure Controls¶

This framework defines what to enforce. The infrastructure section defines how — 80 technical controls across 11 domains, with standards mappings and platform-specific patterns.

Domains: Identity & Access Management (8), Logging & Observability (10), Network & Segmentation (8), Data Protection (8), Secrets & Credentials (8), Supply Chain (8), Incident Response (8), Tool Access (6), Session & Scope (5), Delegation Chains (5), Sandbox Patterns (6).

Standards mappings: Every control maps to the three-layer model, ISO 42001 Annex A, NIST AI RMF, and OWASP LLM/Agentic Top 10.

Platform patterns: AWS Bedrock, Azure AI, and Databricks reference architectures.

When You Need Multi-Agent¶

When AI agents collaborate, delegate tasks, and take autonomous actions across trust boundaries, the single-agent controls on this page are necessary but not sufficient. The MASO Framework extends this architecture into multi-agent orchestration.

What MASO adds	Why single-agent controls aren't enough
Inter-agent message bus security	Agents communicating directly create uncontrolled trust boundaries
Non-Human Identity per agent	Shared credentials between agents create lateral movement risk
Epistemic integrity controls	Hallucinations compound across agent chains; confidence inflates without evidence
Transitive authority prevention	Delegation creates implicit privilege escalation
Kill switch architecture	Multi-agent cascading failures require system-wide emergency stop
Dual OWASP coverage	Agentic Top 10 (2026) risks only exist when agents act autonomously

Document	Purpose
MASO Overview	Architecture, PACE integration, OWASP dual mapping, 6 control domains
Tier 1 — Supervised	Low autonomy: human approves all writes
Tier 2 — Managed	Medium autonomy: NHI, signed bus, LLM-as-Judge, continuous monitoring
Tier 3 — Autonomous	High autonomy: self-healing PACE, adversarial testing, isolated kill switch
Red Team Playbook	13 adversarial test scenarios for multi-agent systems
Integration Guide	LangGraph, AutoGen, CrewAI, AWS Bedrock implementation patterns
Worked Examples	Financial services, healthcare, critical infrastructure

Extensions¶

Folder	Contents
Regulatory	ISO 42001 and EU AI Act mapping
Technical	Bypass prevention, metrics
Industry Solutions	Guardrails, evaluators, and safety model reference
Templates	Risk assessment templates, implementation plans
Worked Examples	Per-tier implementation walkthroughs

Insights¶

Foundational Arguments

Article	Key Argument
The First Control: Choosing the Right Tool	Design thinking before technology selection
Why Your AI Guardrails Aren't Enough	Guardrails block known-bad; you need detection for unknown-bad
The Judge Detects. It Doesn't Decide.	Async evaluation beats real-time blocking for nuanced decisions
Infrastructure Beats Instructions	You can't secure AI systems with prompts alone
Risk Tier Is Use Case, Not Technology	Classification reflects deployment context, not model capability
Humans Remain Accountable	AI assists decisions; humans own outcomes

Emerging Challenges

Article	Key Argument
The Verification Gap	Current safety approaches can't confirm ground truth
Behavioral Anomaly Detection	Aggregating signals to detect drift from expected behaviour
Multimodal AI Breaks Your Text-Based Guardrails	Images, audio, and video create new attack surfaces
When AI Thinks Before It Answers	Reasoning models need reasoning-aware controls
When Agents Talk to Agents	Multi-agent accountability gaps → see MASO
The Memory Problem	Long context and persistent memory introduce novel risks
You Can't Validate What Hasn't Finished	Real-time streaming challenges existing validation
Open-Weight Models Shift the Burden	Self-hosted models inherit the provider's control responsibilities
When the Judge Can Be Fooled	The Judge layer needs its own threat model

Platforms Implementing This Pattern¶

This isn't a theoretical proposal. These platforms already implement variants of the three-layer pattern:

Platform	Approach
NVIDIA NeMo Guardrails	Five rail types: input, dialog, retrieval, execution, output
LangChain	Middleware chains with human-in-the-loop
Guardrails AI	Open-source validator framework
Galileo	Eval-to-guardrail lifecycle
DeepEval	LLM-as-judge evaluation framework
AWS Bedrock Guardrails	Managed input/output filtering
Azure AI Content Safety	Content filtering and moderation

Standards Alignment¶

Standard	Relevance	Mapping
OWASP LLM Top 10	Security vulnerabilities in LLM applications	OWASP mapping
OWASP Agentic Top 10	Risks specific to autonomous AI agents	MASO mapping
NIST AI RMF	AI risk management framework	NIST mapping
ISO 42001	AI management system standard	ISO 42001 mapping
NIST SP 800-218A	Secure development for generative AI	SP 800-218A mapping
MITRE ATLAS	Adversarial threat landscape for AI	MASO threat intelligence
DORA	Digital operational resilience	MASO regulatory alignment

Scope¶

In scope: Custom LLM applications, AI decision support, document processing, single-agent systems — from deployment through incident response.

Out of scope: Vendor AI products (use vendor controls), model training (see MLOps security guidance), and pre-deployment testing. This framework is about what happens in production.

Pre-deployment complement: For secure development practices covering data sourcing, training, fine-tuning, and model release, see NIST SP 800-218A. This framework begins where SP 800-218A ends.

For multi-agent systems: See MASO.

Contributing¶

Feedback, corrections, and extensions welcome. See CONTRIBUTING.md.¶

AI Runtime Behaviour Security, 2026 (Jonathan Gill).