Maturity & Validation¶

An honest assessment of what this framework is, what it's validated against, and where the gaps are.

Part of AI Runtime Behaviour Security Last updated: February 2026

Framework Status¶

This framework has not been deployed end-to-end in a production environment. No organisation has implemented the full control set and reported back on operational results. The adopters list is empty.

That's an important fact and we're stating it clearly rather than hoping no one notices.

What the framework is:

A practitioner's synthesis. 30+ years in IT, 20+ in enterprise security, applied to the specific problem of runtime AI security. The architecture draws on real operational experience in financial services, government, and critical infrastructure — applied to AI, not invented from theory.
Grounded in real incidents. Every major control addresses a documented, public security failure. The Incident Tracker maps 10 real-world incidents to specific controls. The Validated Against Real Incidents page inverts this — showing which controls have the strongest evidence base.
Aligned to established standards. Full crosswalks to OWASP LLM Top 10 (2025), OWASP Agentic Top 10 (2026), NIST AI RMF, ISO 42001, EU AI Act, NIST SP 800-218A, and MITRE ATLAS. These aren't surface-level mappings — they're control-by-control alignments.
Consistent with production patterns. The three-layer model (Guardrails, Judge, Human Oversight) exists in production at NVIDIA NeMo, AWS Bedrock, Azure AI Content Safety, LangChain, and Guardrails AI. This framework didn't invent the pattern — it codified it.

What the framework is not:

Not battle-tested at scale. No one has run this at 100K+ daily interactions and reported back on false positive rates, Judge accuracy, operational overhead, or PACE failover performance.
Not a certification or audit standard. Implementing this does not constitute compliance with any regulation. It may support compliance — the regulatory crosswalks show how — but that's your assessment to make, not ours.
Not complete. Several control domains are explicitly marked as emerging: multimodal, reasoning models, streaming, epistemic risk detection. These are well-articulated but lack the operational depth of the core three-layer pattern.

Validation Layers¶

We distinguish between four types of validation, ordered by strength:

1. Production Validation (Strongest)¶

Controls proven effective through real-world deployment and operational measurement.

Current status: None. No organisation has reported production deployment results.

What would change this: An organisation implements the framework (or a subset), operates it for 90+ days, and shares operational metrics — false positive rates, Judge accuracy, PACE failover performance, cost per interaction, incidents detected. Even anonymised or aggregated data would be valuable.

2. Incident Validation¶

Controls retroactively mapped to real-world security incidents, showing they would have prevented or detected the failure.

Current status: 10 incidents mapped. Each incident includes the specific attack vector, the controls that address it, the minimum effective implementation tier, and analysis of how the attack would amplify in a multi-agent system.

See: Validated Against Real Incidents

This is the framework's strongest current evidence base. It doesn't prove the controls will work in your environment — but it demonstrates they address real attack patterns, not theoretical ones.

3. Standards Alignment¶

Controls mapped to established international standards and industry frameworks.

Current status: 8 standards mapped. Full crosswalks with control-level alignment.

Standard	Mapping Depth
OWASP LLM Top 10 (2025)	Control-by-control across foundation + MASO
OWASP Agentic Top 10 (2026)	Control-by-control in MASO
NIST AI RMF	Govern, Map, Measure, Manage functions
ISO 42001	Annex A clause-level alignment
ISO 27001	Information security control mapping
EU AI Act	Articles 9, 14, 15 alignment
NIST SP 800-218A	Pre-deployment complement
MITRE ATLAS	Agent threat intelligence alignment

Standards alignment doesn't validate effectiveness — it validates relevance. These mappings show the framework addresses the same risk categories that international bodies have independently identified as critical.

4. Pattern Consistency¶

The architectural patterns used in this framework exist independently in production systems from multiple vendors.

Pattern	Production Evidence
Input/output guardrails	NVIDIA NeMo, AWS Bedrock, Azure AI, Guardrails AI
LLM-as-Judge evaluation	DeepEval, Galileo, LangSmith, custom implementations
Human-in-the-loop review	LangChain HITL, custom queue systems
Circuit breaker / kill switch	Standard resilience pattern (Netflix Hystrix lineage)
Agent sandboxing	Docker/gVisor patterns in LangGraph, CrewAI
Non-Human Identity	Service account patterns extended to agents

Pattern consistency doesn't prove this framework's specific implementation works — but it demonstrates the architectural approach is sound and independently adopted.

Known Gaps¶

We track what we know we don't know.

Gap	Status	Impact
Production metrics at scale	No data on false positive rates, latency, cost at >100K interactions/day	Can't quantify operational overhead
Judge accuracy baselines	Analysis of failure modes exists (When the Judge Can Be Fooled) but no published accuracy benchmarks	Can't specify expected detection rates
PACE failover performance	Failure scenarios are designed and documented; none have been tested under real incident conditions	Failover timing and reliability unproven
Epistemic risk detection	Six risk categories identified (groupthink, hallucination amplification, correlated errors, semantic drift, uncertainty stripping, synthetic corroboration); no algorithmic detection published	Detection thresholds are theoretical
Kill switch architecture	Identified as critical for MASO Tier 3; detailed implementation patterns still emerging	Preferred pattern for agent-to-human emergency communication unclear
Supply chain vetting automation	AIBOM and MCP vetting controls specified; enterprise tooling not yet mature	Manual review still required for novel components
RAG poisoning detection	Controls are mostly preventive (integrity checks, provenance); detective controls (finding poisoned data already in corpus) are limited	Retrospective detection weak

How You Can Help¶

Pilot the Framework¶

The single most valuable thing for this framework's credibility is a real deployment. If your organisation is implementing AI security controls — even partially — and would be willing to share results (anonymised is fine), we want to hear from you.

What a pilot looks like:

Pick a scope. One AI system, one risk tier. You don't need the whole framework.
Implement the controls. Use the Quick Start or Implementation Guide.
Measure. Track false positive rates, Judge accuracy, latency impact, operational overhead.
Share. Open an issue, submit a PR to Adopters, or contact the maintainer directly.

Even negative results are valuable. "We implemented X and it didn't work because Y" improves the framework more than silence.

Peer Review¶

If you're an AI security practitioner, red teamer, or compliance professional — review the framework and tell us what's wrong, what's missing, or what doesn't match your experience. Open an issue or submit a PR.

Specific areas where expert review would be most valuable:

Judge accuracy — Have you measured LLM-as-Judge performance in production? What accuracy ranges are realistic?
Epistemic controls — Are the six epistemic risk categories we've identified complete? Are there others?
PACE resilience — Have you implemented structured failover for AI systems? What worked?
Multi-agent security — Are the MASO control domains the right decomposition? What's missing?

Report Incidents¶

If you're aware of a public AI security incident not in our Incident Tracker, open an issue. We'll map it to controls and add it to the evidence base.

A Note on Honesty¶

This section exists because too many frameworks present themselves as established when they're theoretical, proven when they're proposed, and comprehensive when they're partial.

We'd rather be honest about where this framework stands and let you judge whether it's useful in its current state, than pretend it has an evidence base it doesn't.

The architecture is sound — the patterns exist independently in production. The controls are grounded — they address real incidents. The standards alignment is thorough. What's missing is the operational proof that comes only from deployment.

Help us close that gap.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).