Skip to content

Resources

The AIRS Framework

This learning site is built on the AI Runtime Security (AIRS) framework, an open-source, MIT-licensed framework for monitoring, controlling, and constraining AI system behaviour in production environments.

Key framework resources

Resource Description
Architecture The three-layer runtime defence architecture (Guardrails, Model-as-Judge, Human Oversight) with circuit breaker containment
MASO Framework Multi-Agent Security Operations: 128 controls across 8 domains
Risk Tiers Tier 1 (Supervised), Tier 2 (Managed), Tier 3 (Autonomous) classification
Python SDK Reference implementation library with guardrail chains, judge evaluation, circuit breakers (early-stage, not for production assurance)
Red Team Playbook 16 adversarial scenarios for testing AI runtime controls (13 individual control tests + 3 compound attack chains)

Insights

Research articles exploring the risks that require MASO controls and the arguments behind AI runtime security. Browse the insights →

Full documentation

The complete AIRS documentation (including stakeholder guides, infrastructure patterns, regulatory mappings, and worked examples) is available at airuntimesecurity.io.


Standards & frameworks referenced

Standard Relevance
OWASP LLM Top 10 Foundational threat taxonomy for LLM applications
OWASP Agentic Top 10 Emerging threat taxonomy for agentic AI systems
NIST AI RMF Risk management framework for AI systems; AIRS maps to its Govern, Map, Measure, and Manage functions
ISO 42001 AI management system standard; AIRS provides Annex A control alignment
EU AI Act European regulation; AIRS provides crosswalk mapping for high-risk AI systems
MITRE ATLAS Adversarial threat landscape for AI systems

Concepts glossary

Epistemic integrity
An agent's outputs are faithful to its actual reasoning inputs: what it claims to know is based on what it actually accessed and verified.
MASO (Multi-Agent Security Operations)
The AIRS control catalogue for multi-agent systems. Eight domains, 128 controls, organised by risk tier.
Three-layer architecture
Guardrails (~10ms, deterministic), Model-as-Judge (async ~500ms–5s or inline ~10–50ms for SLM, evaluative), Human Oversight (as needed, investigative). Each layer catches failures the others miss. A circuit breaker provides emergency containment when controls themselves fail.
Circuit breaker
An emergency failsafe that halts AI operations and activates a safe fallback when controls fail or confirmed compromise is detected. Operates independently of the three defence layers, providing containment when the layers themselves degrade.
Objective Intent
A developer-declared Objective Intent Specification (OISpec) attached to every agent, judge, and workflow. Defines what the component is supposed to achieve, within what parameters, and against what success and failure criteria. Enables tactical evaluation (per-agent) and strategic evaluation (workflow-level) of whether the system is doing what it was designed to do.
PACE resilience
Primary, Alternate, Contingency, Emergency. A degradation pattern ensuring systems fail safely when controls fail.
Risk tiers
Tier 1 (Supervised), human-in-the-loop for all decisions. Tier 2 (Managed), automated controls, human oversight for flagged cases. Tier 3 (Autonomous), full automation with controls at every layer.
Chain-of-trust propagation
When downstream agents treat upstream outputs as authoritative without independent verification, causing errors to amplify through the chain.
Reasoning-basis corruption
An agent produces correct output given its inputs, but the inputs themselves were incomplete, stale, or subtly wrong.
Verification receipt
Metadata passed alongside an agent's output that documents what data sources were accessed, retrieval completeness metrics, and processing metadata, enabling downstream verification.

About

This learning site was created by Jonathan Gill as a companion to the AI Runtime Security framework. All content is MIT licensed.