What Went Wrong¶

1What goes wrong

2The gap

3The concept

4The controls

5The evidence

Let's dissect the Phantom Compliance incident. Not the surface failure (the trade that should have been flagged) but the structural failure that made it invisible.

Three layers of failure¶

Layer 1: Incomplete retrieval without error¶

Agent B retrieved a partial restricted securities list. This isn't a hallucination. The agent genuinely accessed the data source and genuinely processed what it received. The problem is that LLMs don't have a built-in mechanism to detect completeness of retrieved data.

When you or I open a spreadsheet and see it ends at row 340 when we expected 429, we notice. An LLM processes whatever tokens are in its context window. It has no expectation of "429 rows"; it simply works with what's there.

This is not a retrieval-augmented generation (RAG) problem

It's tempting to categorise this as a RAG failure and move on. But RAG quality improvements won't solve it. Even with perfect retrieval infrastructure, an LLM under context pressure can receive truncated results and process them without flagging incompleteness. The failure is in the absence of a completeness verification step, not in the retrieval mechanism itself.

Layer 2: Confident output from incomplete reasoning¶

Agent B's output stated it checked the restricted securities list. This is technically true: it did check the data it had. But the output carried the same confidence as if it had checked the complete list. There was no uncertainty signal, no caveat, no indication that the check was partial.

This is a fundamental property of how LLMs produce output: confidence is a function of pattern coherence, not of reasoning completeness. The output "no match found" is just as fluently generated whether the agent checked 340 entries or 429.

Layer 3: Downstream agents trust upstream outputs¶

Agent C received Agent B's compliance assessment and treated it as authoritative. It had no mechanism to:

Verify that Agent B's check was complete
Assess whether Agent B's confidence was warranted
Request evidence of the check beyond the summary output

This is the chain-of-trust problem in multi-agent systems. Each agent assumes the agent before it did its job correctly. In a single-agent system, you can evaluate the output against the input. In a chain, you need to verify that every link's reasoning basis was sound, not just that every link's output format was correct.

Why it's hard to catch¶

This failure doesn't look like a failure from any single vantage point:

Perspective	What it sees	Why it looks fine
Agent A	Trade analysis	Completed correctly, no issue
Agent B	Compliance check	Processed the data it had; output is internally consistent
Agent C	Decision synthesis	Both inputs are well-formed; recommendation follows logically
Guardrails	Output patterns	No blocked content, no format violations
Logging	Full transcript	Every step is recorded; the logs look clean
Human reviewer	Final decision	"Approved" with supporting rationale, nothing to flag

The failure is only visible if you ask a question that nobody in this pipeline is asking:

"Was Agent B's reasoning based on complete and current data?"

The generalised problem¶

The Phantom Compliance scenario illustrates a class of failures, not a one-off bug:

An agent can do its job correctly with the wrong inputs and produce output that looks identical to output produced with the right inputs.
Downstream agents amplify the failure by building on the flawed output without independent verification.
Conventional monitoring watches the wrong signals: output format, latency, token counts, guardrail violations. None of these detect incomplete reasoning inputs.

This class of failure becomes more dangerous as agent chains get longer, as agents delegate to other agents, and as the gap between "what the agent checked" and "what the agent said it checked" becomes harder for any single observer to verify.

Key takeaway: The problem isn't that the system produced wrong output. The problem is that the system produced wrong output that was indistinguishable from correct output at every monitoring point. The failure was in the epistemic basis of the reasoning, not in the reasoning itself.

Next: The Gap, why current controls don't catch this →