Validated Against Real Incidents¶

Every major control in this framework addresses a documented, public AI security failure. This page is the evidence.

Part of AI Runtime Behaviour Security Last updated: February 2026

How to Read This Page¶

The Incident Tracker is organised by incident — "here's what happened, here are the controls." This page inverts that view. It's organised by control — "here's the control, here's the evidence it addresses real threats."

Each control is mapped to the incidents it would have prevented or detected, with the specific mechanism explained. Controls validated against more incidents have a stronger evidence base. Controls validated against zero incidents are flagged — they may still be valuable, but they're based on threat modelling rather than observed attacks.

Validation does not mean proven. It means the control addresses a documented attack pattern. Whether the control would have actually prevented the incident in your environment depends on your implementation. This is retroactive analysis, not a guarantee.

Validation Summary¶

Controls by Evidence Strength¶

Evidence Level	Criteria	Control Count
Strong	Addresses 3+ real incidents	5 controls
Moderate	Addresses 1–2 real incidents	18 controls
Threat-modelled	Based on emerging threat analysis, not yet observed in production	Remaining controls

Most-Validated Controls¶

These controls are referenced across the highest number of documented incidents. They form the minimum credible defence.

Rank	Control	Incidents	Evidence Base
1	PG-1.1 Input guardrails per agent	8 of 10	INC-01, 02, 03, 04, 05, 07, 08, 09
2	EC-2.5 LLM-as-Judge gate	5 of 10	INC-01, 02, 05, 09, 10
3	EC-1.1 Human approval for write operations	3 of 10	INC-01, 06, 09
3	DP-2.1 DLP on message bus	3 of 10	INC-04, 06, 07
3	OB-3.1 Independent observability agent	3 of 10	INC-07, 09, 10

What this tells you: If you implement nothing else, input guardrails per agent and an LLM-as-Judge gate address the widest range of documented attack patterns. This is consistent with the framework's core architecture — Guardrails prevent, Judge detects.

Control-by-Control Validation¶

Prompt, Goal & Epistemic Integrity¶

PG-1.1 — Input Guardrails Per Agent¶

Evidence strength: Strong (8 incidents)

The single most broadly validated control. Addresses the widest range of attack vectors because prompt injection — direct and indirect — is the most common AI attack primitive.

Incident	Attack Vector	How PG-1.1 Helps
INC-01: Auto-GPT crypto transfer	Indirect injection via email	Detects injection patterns in email content before agent processes it
INC-02: Copilot RCE	Indirect injection via code comments	Filters injection patterns from code repository content
INC-03: Cursor IDE RCE	Configuration poisoning	Detects malicious patterns in configuration file content
INC-04: Perplexity data exfil	Indirect injection via web content	Catches injection payloads in scraped web pages
INC-05: PoisonedRAG	RAG corpus contamination	Identifies suspicious patterns in retrieved documents (partial — sophisticated poisoning may evade)
INC-07: Morris II worm	Self-replicating injection via inter-agent messages	Detects injection patterns in incoming agent-to-agent messages
INC-08: MCP supply chain	Poisoned MCP tool metadata	Filters injection from MCP tool descriptions and responses
INC-09: Banking AI fraud	Direct prompt injection via chat	Detects injection patterns in customer messages

Limitations: Guardrails are pattern-based. They catch known injection techniques effectively but can be evaded by novel or highly contextual attacks. This is exactly why the framework pairs guardrails with Judge evaluation (PG-1.1 + EC-2.5).

PG-1.2 — System Prompt Isolation¶

Evidence strength: Moderate (1 incident)

Incident	How PG-1.2 Helps
INC-02: Copilot RCE	Prevents external content (code comments) from overriding agent system instructions

Threat model basis: System prompt extraction and override are well-documented attack classes. While only one tracked incident directly exploits this, the attack primitive is fundamental to prompt injection.

PG-1.3 — Immutable Task Specification¶

Evidence strength: Moderate (1 incident)

Incident	How PG-1.3 Helps
INC-03: Cursor IDE RCE	Task definitions cannot be modified by external configuration files — prevents the path traversal attack vector

PG-1.4 — Message Source Tagging¶

Evidence strength: Moderate (2 incidents)

Incident	How PG-1.4 Helps
INC-01: Auto-GPT crypto transfer	Tags email-derived content as untrusted data, not instruction — agent processes it as data, not commands
INC-07: Morris II worm	Distinguishes legitimate inter-agent instructions from data payloads — worm content tagged as data, not instruction

Why this matters: The root cause of most indirect injection attacks is that AI systems treat all input as potential instruction. Message source tagging enforces the instruction/data boundary at the protocol level.

PG-2.1 — Inter-Agent Injection Detection¶

Evidence strength: Moderate (1 incident)

Incident	How PG-2.1 Helps
INC-07: Morris II worm	Judge evaluates all inter-agent messages for injection patterns — blocks worm propagation at the message bus

Threat model basis: This control is purpose-built for the Morris II attack class. As multi-agent systems become more common, inter-agent injection will become a primary attack vector.

PG-2.5 — Claim Provenance Enforcement¶

Evidence strength: Moderate (1 incident)

Incident	How PG-2.5 Helps
INC-05: PoisonedRAG	Unverified agent claims cannot be treated as facts — forces provenance tracking back to original source, exposing the poisoned documents

PG-2.6 — Self-Referential Evidence Prohibition¶

Evidence strength: Moderate (1 incident)

Incident	How PG-2.6 Helps
INC-05: PoisonedRAG	Agents cannot cite other agents' output as primary evidence — breaks the amplification chain where Agent B cites Agent A's poisoned claim as a source

PG-2.7 — Uncertainty Preservation¶

Evidence strength: Moderate (1 incident)

Incident	How PG-2.7 Helps
INC-05: PoisonedRAG	Confidence scores propagate through agent chains without inflation — a 60% confidence claim from Agent A cannot become 95% confidence at Agent C

PG-2.9 — Model Diversity Policy¶

Evidence strength: Moderate (1 incident)

Incident	How PG-2.9 Helps
INC-10: JudgeDeceiver	Judge uses a different model/provider than task agents — adversarial optimisation against the task model doesn't transfer to the Judge

Why this matters: JudgeDeceiver works by optimising against a known model. Model diversity raises the attack cost from "optimise against one model" to "optimise against multiple unknown models simultaneously."

PG-3.5 — Challenger Agent¶

Evidence strength: Moderate (2 incidents)

Incident	How PG-3.5 Helps
INC-05: PoisonedRAG	Adversarial agent actively attacks the primary hypothesis — challenges the poisoned claim with counter-evidence
INC-10: JudgeDeceiver	Adversarial agent tests Judge decisions — detects when the Judge has been manipulated

Identity & Access¶

IA-1.4 — Scoped Tool Permissions¶

Evidence strength: Moderate (2 incidents)

Incident	How IA-1.4 Helps
INC-02: Copilot RCE	Prevents agents from modifying IDE configuration files — even if injection succeeds, the agent can't write to settings.json
INC-08: MCP supply chain	Limits what each MCP server can access — poisoned server's blast radius is contained to its scoped permissions

IA-2.6 — Secrets Exclusion from Context¶

Evidence strength: Moderate (1 incident)

Incident	How IA-2.6 Helps
INC-03: Cursor IDE RCE	Configuration files with credentials are isolated from agent context — path traversal can't reach sensitive configuration

Data Protection¶

DP-1.1 — Data Classification Labels¶

Evidence strength: Moderate (2 incidents)

Incident	How DP-1.1 Helps
INC-04: Perplexity data exfil	Classifies browsing session data and prevents cross-boundary transfer
INC-06: Samsung code leak	Classifies proprietary code as confidential and blocks external transmission

DP-1.3 — Memory Isolation¶

Evidence strength: Moderate (1 incident)

Incident	How DP-1.3 Helps
INC-06: Samsung code leak	Prevents context leakage between agent sessions — data shared with one session doesn't persist to others

DP-2.1 — DLP on Message Bus¶

Evidence strength: Strong (3 incidents)

Incident	How DP-2.1 Helps
INC-04: Perplexity data exfil	Detects sensitive data (PII, credentials, internal URLs) in inter-agent messages
INC-06: Samsung code leak	Detects code patterns in outbound messages to external providers
INC-07: Morris II worm	Detects anomalous content patterns in inter-agent communication — worm payloads differ from normal message patterns

DP-2.2 — RAG Integrity with Freshness¶

Evidence strength: Moderate (1 incident)

Incident	How DP-2.2 Helps
INC-05: PoisonedRAG	Validates document provenance and freshness metadata — poisoned documents without valid provenance are flagged

Execution Control¶

EC-1.1 — Human Approval for Write Operations¶

Evidence strength: Strong (3 incidents)

Incident	How EC-1.1 Helps
INC-01: Auto-GPT crypto transfer	Human confirms all financial transactions — injection succeeds but transfer is blocked pending human approval
INC-06: Samsung code leak	Human reviews outbound data transfers — code submission to external AI caught at review step
INC-09: Banking AI fraud	All financial transactions require human confirmation — the most basic control for the most damaging outcome

Why this is non-negotiable for high-risk operations: Three separate incidents across different domains (crypto, code IP, banking) would have been prevented by this single control. If your AI system can take actions with financial or legal consequences, human approval for writes is the minimum.

EC-1.2 — Tool Allow-Lists¶

Evidence strength: Moderate (1 incident)

Incident	How EC-1.2 Helps
INC-01: Auto-GPT crypto transfer	Restricts wallet operations to explicitly approved task types — even if the agent is instructed to transfer, the tool isn't available for that task context

EC-1.4 — Blast Radius Caps¶

Evidence strength: Moderate (2 incidents)

Incident	How EC-1.4 Helps
INC-02: Copilot RCE	Limits scope of file modifications per agent — even if injection succeeds, the agent can only modify files within its scope
INC-04: Perplexity data exfil	Limits browsing agent's access to user session data — exfiltration scope is contained

EC-1.5 — Interaction Timeout¶

Evidence strength: Moderate (1 incident)

Incident	How EC-1.5 Helps
INC-07: Morris II worm	Caps propagation chains at maximum turn count — worm can't replicate indefinitely because agent interactions are bounded

EC-2.5 — LLM-as-Judge Gate¶

Evidence strength: Strong (5 incidents)

The second most broadly validated control after input guardrails.

Incident	How EC-2.5 Helps
INC-01: Auto-GPT crypto transfer	Independent evaluation flags unauthorised financial action before execution
INC-02: Copilot RCE	Flags settings.json modification as high-risk action inconsistent with coding task
INC-05: PoisonedRAG	Evaluates output quality and detects claims inconsistent with known facts (partial — depends on Judge's knowledge)
INC-09: Banking AI fraud	Independent evaluation of transaction legitimacy before execution
INC-10: JudgeDeceiver	This incident attacks the Judge — but hardened Judge with multiple criteria reduces single-point manipulation

Important caveat: INC-10 demonstrates that the Judge itself can be attacked. EC-2.5 is critical but not sufficient alone — it needs to be paired with model diversity (PG-2.9), independent observability (OB-3.1), and for highest-risk decisions, multi-judge consensus (EC-3.1).

EC-2.6 — Decision Commit Protocol¶

Evidence strength: Moderate (1 incident)

Incident	How EC-2.6 Helps
INC-09: Banking AI fraud	Committed transaction decisions cannot be reversed without human authorisation — prevents rapid-fire fraudulent transfers

EC-3.1 — Multi-Judge Consensus¶

Evidence strength: Moderate (1 incident)

Incident	How EC-3.1 Helps
INC-10: JudgeDeceiver	Multiple independent judges for high-risk decisions — attacker must bypass all judges simultaneously, which requires optimising against multiple unknown models

Observability¶

OB-2.1 — Anomaly Scoring¶

Evidence strength: Moderate (2 incidents)

Incident	How OB-2.1 Helps
INC-04: Perplexity data exfil	Flags unusual data transfer patterns — browsing agent suddenly exfiltrating session data is anomalous
INC-09: Banking AI fraud	Flags unusual transaction patterns — rapid or large transfers that deviate from baseline

OB-3.1 — Independent Observability Agent¶

Evidence strength: Strong (3 incidents)

Incident	How OB-3.1 Helps
INC-07: Morris II worm	Detects anomalous communication patterns across the entire system — worm propagation creates observable spikes in inter-agent traffic
INC-09: Banking AI fraud	Separate monitoring agent with own model detects patterns invisible to individual agents
INC-10: JudgeDeceiver	Cross-checks Judge decisions using independent model — catches Judge manipulation that individual agents can't detect

OB-3.2 — Circuit Breaker¶

Evidence strength: Moderate (1 incident)

Incident	How OB-3.2 Helps
INC-07: Morris II worm	Kill switch terminates all agent communication — stops worm propagation system-wide when detected

Supply Chain¶

SC-1.1 — Component Inventory (AIBOM)¶

Evidence strength: Moderate (1 incident)

Incident	How SC-1.1 Helps
INC-03: Cursor IDE RCE	Tracks all configuration sources and detects unauthorised changes — the path traversal attack modifies a tracked component

SC-1.2 — Signed Tool Manifests¶

Evidence strength: Moderate (1 incident)

Incident	How SC-1.2 Helps
INC-08: MCP supply chain	Verifies MCP server integrity before connection — unsigned or tampered servers are rejected

SC-2.1 — AIBOM with Provider Mapping¶

Evidence strength: Moderate (1 incident)

Incident	How SC-2.1 Helps
INC-06: Samsung code leak	Maps which data reaches which external provider — reveals the data exposure before it occurs

SC-2.2 — MCP Server Vetting¶

Evidence strength: Moderate (1 incident)

Incident	How SC-2.2 Helps
INC-08: MCP supply chain	Pre-approves MCP servers through a vetting process — denies connection to unsigned or unvetted servers

SC-2.3 — Runtime Component Audit¶

Evidence strength: Moderate (1 incident)

Incident	How SC-2.3 Helps
INC-08: MCP supply chain	Continuous verification of active MCP connections — detects if a previously vetted server has been compromised or swapped

Validation Coverage Map¶

By MASO Domain¶

Domain	Controls Validated	Total Controls	Coverage
Prompt, Goal & Epistemic Integrity	10	20	50%
Identity & Access	2	~12	~17%
Data Protection	5	~12	~42%
Execution Control	8	~15	~53%
Observability	3	~12	25%
Supply Chain	4	~10	40%

What's Not Yet Validated¶

Controls in these categories are based on threat modelling and architectural reasoning, not observed incidents:

Epistemic integrity (PG-2.4 consensus diversity gate, PG-2.8 assumption isolation) — These address the non-adversarial failure modes that emerge from multi-agent interaction. No public incident reports exist because organisations either aren't detecting them or aren't disclosing them. The threat model is strong (Emerging Threats ET-02, ET-05), but the evidence is research-based, not incident-based.
Advanced Identity & Access (zero-trust agent credentials, non-human identity lifecycle) — These extend standard NHI patterns to AI agents. The patterns are proven in traditional service-to-service authentication; the extension to AI agents is logical but not yet documented in public incidents.
Tier 3 autonomous controls (self-healing PACE, adversarial testing suites, independent kill switch) — These are designed for fully autonomous multi-agent systems, which are still rare in production. The controls are architecturally sound but won't be incident-validated until autonomous systems are common enough to be attacked.

How This Page Evolves¶

This is a living document. As new AI security incidents are publicly disclosed:

They're added to the Incident Tracker
The control mappings on this page are updated
Controls that were "threat-modelled only" may be upgraded to "incident-validated"
New controls may be added if incidents reveal gaps

If you know of a public AI security incident not listed here, open an issue. We'll map it to controls and update both pages.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).