Resources¶
The AIRS Framework¶
This learning site is built on the AI Runtime Security (AIRS) framework, an open-source, MIT-licensed framework for monitoring, controlling, and constraining AI system behaviour in production environments.
Key framework resources¶
| Resource | Description |
|---|---|
| Architecture | The three-layer runtime defence architecture (Guardrails, Model-as-Judge, Human Oversight) with circuit breaker containment |
| MASO Framework | Multi-Agent Security Operations: 128 controls across 8 domains |
| Risk Tiers | Tier 1 (Supervised), Tier 2 (Managed), Tier 3 (Autonomous) classification |
| Python SDK | Reference implementation library with guardrail chains, judge evaluation, circuit breakers (early-stage, not for production assurance) |
| Red Team Playbook | 16 adversarial scenarios for testing AI runtime controls (13 individual control tests + 3 compound attack chains) |
Insights¶
Research articles exploring the risks that require MASO controls and the arguments behind AI runtime security. Browse the insights →
Full documentation¶
The complete AIRS documentation (including stakeholder guides, infrastructure patterns, regulatory mappings, and worked examples) is available at airuntimesecurity.io.
Standards & frameworks referenced¶
| Standard | Relevance |
|---|---|
| OWASP LLM Top 10 | Foundational threat taxonomy for LLM applications |
| OWASP Agentic Top 10 | Emerging threat taxonomy for agentic AI systems |
| NIST AI RMF | Risk management framework for AI systems; AIRS maps to its Govern, Map, Measure, and Manage functions |
| ISO 42001 | AI management system standard; AIRS provides Annex A control alignment |
| EU AI Act | European regulation; AIRS provides crosswalk mapping for high-risk AI systems |
| MITRE ATLAS | Adversarial threat landscape for AI systems |
Concepts glossary¶
- Epistemic integrity
- An agent's outputs are faithful to its actual reasoning inputs: what it claims to know is based on what it actually accessed and verified.
- MASO (Multi-Agent Security Operations)
- The AIRS control catalogue for multi-agent systems. Eight domains, 128 controls, organised by risk tier.
- Three-layer architecture
- Guardrails (~10ms, deterministic), Model-as-Judge (async ~500ms–5s or inline ~10–50ms for SLM, evaluative), Human Oversight (as needed, investigative). Each layer catches failures the others miss. A circuit breaker provides emergency containment when controls themselves fail.
- Circuit breaker
- An emergency failsafe that halts AI operations and activates a safe fallback when controls fail or confirmed compromise is detected. Operates independently of the three defence layers, providing containment when the layers themselves degrade.
- Objective Intent
- A developer-declared Objective Intent Specification (OISpec) attached to every agent, judge, and workflow. Defines what the component is supposed to achieve, within what parameters, and against what success and failure criteria. Enables tactical evaluation (per-agent) and strategic evaluation (workflow-level) of whether the system is doing what it was designed to do.
- PACE resilience
- Primary, Alternate, Contingency, Emergency. A degradation pattern ensuring systems fail safely when controls fail.
- Risk tiers
- Tier 1 (Supervised), human-in-the-loop for all decisions. Tier 2 (Managed), automated controls, human oversight for flagged cases. Tier 3 (Autonomous), full automation with controls at every layer.
- Chain-of-trust propagation
- When downstream agents treat upstream outputs as authoritative without independent verification, causing errors to amplify through the chain.
- Reasoning-basis corruption
- An agent produces correct output given its inputs, but the inputs themselves were incomplete, stale, or subtly wrong.
- Verification receipt
- Metadata passed alongside an agent's output that documents what data sources were accessed, retrieval completeness metrics, and processing metadata, enabling downstream verification.
About¶
This learning site was created by Jonathan Gill as a companion to the AI Runtime Security framework. All content is MIT licensed.
Have feedback, corrections, or suggestions? We'd love to hear from you.