Skip to content

The Gap

1What goes wrong
2The gap
3The concept
4The controls
5The evidence

You've seen the failure. Now let's understand why the standard security toolkit doesn't catch it.

What most organisations deploy today

A typical AI security posture in 2026 looks like this:

Control What it does What it catches
Input guardrails Block prompt injection, jailbreaks, toxic inputs Adversarial inputs
Output guardrails Block harmful, off-topic, or policy-violating outputs Bad outputs
Content filtering Flag PII, credentials, restricted content in outputs Data leakage
Logging & audit Record all inputs, outputs, and metadata Forensic review (after the fact)
Rate limiting Prevent abuse and cost overruns Resource exhaustion
Model evaluation Benchmark accuracy, bias, and capability pre-deployment Known weaknesses before production

These controls are necessary. They catch real attacks and real failures. But they share a common assumption:

The hidden assumption

Every control above evaluates the agent's output or input in isolation. None of them verify that the reasoning process between input and output was based on complete, current, and relevant information.

Applying each control to Phantom Compliance

Input guardrails: Agent B received a well-formed input from Agent A. No injection, no jailbreak, no toxicity. Pass.

Output guardrails: Agent B produced a well-formatted compliance assessment. No blocked patterns, no hallucination flags on the text. Pass.

Content filtering: No PII, no credentials, no restricted content in the output. Pass.

Logging: The full transcript was recorded. The logs show a normal pipeline run. The partial retrieval is technically visible in the raw token data, but no alert triggers because no one defined "retrieval completeness" as a monitoring dimension. Pass (the failure is logged but invisible).

Rate limiting: Normal request volume. Pass.

Model evaluation: The model was evaluated before deployment. It performed well on compliance checking benchmarks. But benchmarks test the model with complete inputs. They don't test what happens when retrieval returns partial data under context pressure. Pass (the eval didn't cover this case).


The three gaps

Gap 1: No verification of reasoning inputs

Current controls verify what goes into the agent (the user or upstream prompt) and what comes out (the response). But they don't verify the intermediate data the agent retrieves or generates during its reasoning process.

In the Phantom Compliance case, Agent B's retrieval of a partial securities list was an intermediate step. No control examined whether that retrieval was complete.

Gap 2: No cross-agent verification

Each agent is monitored independently. Agent A's output is checked. Agent B's output is checked. Agent C's output is checked. But nobody asks:

  • Is Agent B's output consistent with what a complete check would produce?
  • Does Agent C have enough information to verify Agent B's claims?
  • Did the chain as a whole maintain integrity, or did it silently degrade?

This is the difference between monitoring agents and monitoring chains.

Gap 3: No distinction between "looks right" and "is right"

Guardrails and output quality checks evaluate plausibility: does the output look like a reasonable response? In a multi-agent chain, plausibility is necessary but not sufficient. An output can be plausible, internally consistent, and confidently stated while being based on incomplete data.

The missing capability is epistemic verification: confirming not just that the output looks like a correct compliance check, but that the compliance check was actually performed against complete data.


The cost of these gaps

In the Phantom Compliance scenario, the cost was a regulatory violation discovered three days later. But the same structural failure (agents acting on incomplete reasoning inputs, with downstream agents trusting upstream outputs without verification) can manifest as:

  • A customer-facing agent giving medical information based on a partial retrieval of contraindication data
  • A code generation agent that passes security review because its security-checking agent only evaluated a subset of the generated code
  • A procurement agent that approves a vendor because its due diligence agent's search returned truncated results
  • An autonomous research agent that draws conclusions from incomplete literature retrieval and passes those conclusions to a decision-making agent

The pattern is always the same: the output looks correct, the logs look clean, and the failure is only discovered when reality doesn't match what the system said.


The gap in one sentence: Current AI security controls verify that agents behave correctly but not that agents reason correctly, and in multi-agent systems, the distinction is the entire attack surface.


Where to go from here

You now understand the threat and the gap. The next step depends on your role.

Security Architects

You design and integrate security controls into systems

Your thread: threat model → MASO control domains → three-layer architecture → implementation patterns.

Start the Security Architects track →

Risk & Governance

You own risk frameworks, compliance, and oversight obligations

Your thread: threat model → why governance doesn't cover agent chains → MASO as extension layer → oversight obligations.

Start the Risk & Governance track →

Engineering Leads

You build and operate AI systems in production

Your thread: threat model → what breaks in practice → runtime vs design-time controls → instrumentation.

Start the Engineering Leads track →