The Scenario: Phantom Compliance¶
After this section you will be able to¶
- Describe how a multi-agent system can produce a correct-looking output from a broken reasoning chain
- Identify why the failure is invisible to conventional monitoring
- Explain what "epistemic integrity" means in practical terms
The setup¶
It's a Tuesday morning at Meridian Capital, a mid-tier investment firm. Six months ago, they deployed an AI-assisted compliance pipeline for trade pre-clearance. The system has been working well, flagging problematic trades, reducing the compliance team's review backlog by 60%, and catching two genuine issues that humans had missed.
The pipeline has three agents:
Agent A (Trade Analyser) ingests the trade details (asset class, counterparty, size, timing) and enriches it with market context from the firm's data feeds.
Agent B (Compliance Checker) takes Agent A's analysis and evaluates it against the firm's regulatory obligations: sanctions lists, concentration limits, restricted securities, and internal trading policies.
Agent C (Decision Synthesiser) takes the outputs from both upstream agents and produces a final recommendation: approve, flag for review, or escalate.
Every output is logged. Every agent has guardrails. The system has been through security review.
The incident¶
At 09:14, a block trade request comes in: 2,400 shares of a telecommunications company, Vertex Communications, for a client account. Routine.
Agent A analyses the trade. It correctly identifies the asset, the counterparty, the market conditions. Its output is clean and well-structured.
Agent B receives Agent A's analysis. It needs to check whether Vertex Communications appears on any restricted lists. Here's what happens:
Agent B's context window is at 87% capacity from processing a batch of earlier trades. When it retrieves the restricted securities list, the retrieval is partial. It gets the list headers and the first 340 entries, but the remaining 89 entries (including Vertex Communications, added last week after a pending regulatory action) are truncated.
Agent B doesn't fail. It doesn't error. It processes the partial list, finds no match for Vertex Communications, and returns:
Compliance Status: CLEAR
Vertex Communications (VTX) checked against: sanctions list (no match), concentration limits (within threshold), restricted securities list (no match), internal trading policy (no restrictions).
Recommendation: No compliance concerns identified.
Every word is grammatically correct. The format matches the expected schema. The confidence indicators are normal.
Agent C receives this output alongside Agent A's analysis. It synthesises them into an approval recommendation. The trade is approved and executed at 09:16.
The problem¶
Three days later, the compliance team discovers the issue during a routine audit. Vertex Communications was on the restricted list. The trade should have been flagged. They now face a potential regulatory violation, client notification obligations, and an internal investigation.
But here's what makes this different from a simple bug:
Every individual output looked correct. Agent A's analysis was accurate. Agent B's output was well-formatted, confident, and referenced all the right check categories. Agent C's synthesis was logically sound given its inputs. The guardrails on each agent passed: no toxicity, no format violations, no hallucination flags on the output text.
The failure was in the reasoning basis. Agent B stated it checked the restricted securities list. It did check a list, just not the complete one. Its output was factually wrong but structurally perfect.
What did the monitoring catch?¶
Nothing.
- Guardrails: Passed. The output contained no blocked patterns, no format violations.
- Output quality checks: Passed. The response was coherent, well-structured, and used the correct terminology.
- Latency monitoring: Normal. Agent B responded within expected time bounds.
- Token usage: Slightly elevated but within one standard deviation.
- Log review: All three agents logged their inputs and outputs. The logs showed a clean, successful pipeline run.
The only signal was buried in the token count: Agent B's retrieval context was shorter than usual. But "shorter than usual" wasn't an alert condition. No one had defined what a complete retrieval looked like.
The core insight: The answer can be right and the chain can still be broken.
Agent B didn't hallucinate. It didn't refuse. It didn't produce obviously wrong output. It performed a real check against real data, just incomplete data. And nothing in the monitoring stack was designed to verify that the reasoning inputs were complete, only that the reasoning outputs looked plausible.
This is the gap that AI runtime security exists to close.
What's next¶
Before we look at solutions, let's make sure the failure mode is fully understood.