Skip to content

What Went Wrong

1What goes wrong
2The gap
3The concept
4The controls
5The evidence

Let's dissect the Phantom Compliance incident. Not the surface failure (the trade that should have been flagged) but the structural failure that made it invisible.

Three layers of failure

Layer 1: Incomplete retrieval without error

Agent B retrieved a partial restricted securities list. This isn't a hallucination. The agent genuinely accessed the data source and genuinely processed what it received. The problem is that LLMs don't have a built-in mechanism to detect completeness of retrieved data.

When you or I open a spreadsheet and see it ends at row 340 when we expected 429, we notice. An LLM processes whatever tokens are in its context window. It has no expectation of "429 rows"; it simply works with what's there.

This is not a retrieval-augmented generation (RAG) problem

It's tempting to categorise this as a RAG failure and move on. But RAG quality improvements won't solve it. Even with perfect retrieval infrastructure, an LLM under context pressure can receive truncated results and process them without flagging incompleteness. The failure is in the absence of a completeness verification step, not in the retrieval mechanism itself.

Layer 2: Confident output from incomplete reasoning

Agent B's output stated it checked the restricted securities list. This is technically true: it did check the data it had. But the output carried the same confidence as if it had checked the complete list. There was no uncertainty signal, no caveat, no indication that the check was partial.

This is a fundamental property of how LLMs produce output: confidence is a function of pattern coherence, not of reasoning completeness. The output "no match found" is just as fluently generated whether the agent checked 340 entries or 429.

Layer 3: Downstream agents trust upstream outputs

Agent C received Agent B's compliance assessment and treated it as authoritative. It had no mechanism to:

  • Verify that Agent B's check was complete
  • Assess whether Agent B's confidence was warranted
  • Request evidence of the check beyond the summary output

This is the chain-of-trust problem in multi-agent systems. Each agent assumes the agent before it did its job correctly. In a single-agent system, you can evaluate the output against the input. In a chain, you need to verify that every link's reasoning basis was sound, not just that every link's output format was correct.


Why it's hard to catch

This failure doesn't look like a failure from any single vantage point:

Perspective What it sees Why it looks fine
Agent A Trade analysis Completed correctly, no issue
Agent B Compliance check Processed the data it had; output is internally consistent
Agent C Decision synthesis Both inputs are well-formed; recommendation follows logically
Guardrails Output patterns No blocked content, no format violations
Logging Full transcript Every step is recorded; the logs look clean
Human reviewer Final decision "Approved" with supporting rationale, nothing to flag

The failure is only visible if you ask a question that nobody in this pipeline is asking:

"Was Agent B's reasoning based on complete and current data?"


The generalised problem

The Phantom Compliance scenario illustrates a class of failures, not a one-off bug:

  • An agent can do its job correctly with the wrong inputs and produce output that looks identical to output produced with the right inputs.
  • Downstream agents amplify the failure by building on the flawed output without independent verification.
  • Conventional monitoring watches the wrong signals: output format, latency, token counts, guardrail violations. None of these detect incomplete reasoning inputs.

This class of failure becomes more dangerous as agent chains get longer, as agents delegate to other agents, and as the gap between "what the agent checked" and "what the agent said it checked" becomes harder for any single observer to verify.


Key takeaway: The problem isn't that the system produced wrong output. The problem is that the system produced wrong output that was indistinguishable from correct output at every monitoring point. The failure was in the epistemic basis of the reasoning, not in the reasoning itself.


Next: The Gap, why current controls don't catch this →