3. Epistemic Integrity¶

1What goes wrong

2Controls miss it

3Epistemic integrity

4MASO controls

5Verification

After this module you will be able to¶

Define epistemic integrity in the context of AI runtime security
Explain why epistemic integrity is the foundation of multi-agent control design
Identify where epistemic integrity breaks in the AIRS three-layer architecture
Distinguish between output correctness and epistemic soundness

The concept¶

Epistemic integrity means that an agent's outputs are faithful to its actual reasoning inputs: what it claims to know is based on what it actually accessed, processed, and verified.

In single-agent systems, this is relatively easy to verify: you can compare the agent's output against its input and check for consistency. In multi-agent chains, it becomes the central security challenge.

The definition: An agent has epistemic integrity when its stated confidence, claims, and conclusions are warranted by the data it actually accessed, not by the data it should have accessed, or the data its output implies it accessed.

Why this is the differentiator¶

The AIRS framework identifies epistemic integrity as the first MASO control domain (Prompt, Goal & Epistemic Integrity) because it's the foundation on which all other controls depend:

Guardrails can block bad outputs, but they can't determine whether a "good" output was derived from complete data
Model-as-Judge can evaluate output quality, but it needs to know what to check, and epistemic integrity tells it to check the reasoning basis, not just the conclusion
Human oversight is only effective when humans know where to look, and epistemic integrity violations tell them to look at the data the agent used, not just the answer it gave

Without epistemic integrity as a design principle, the other layers of defence are checking the wrong things.

Epistemic integrity in the three-layer architecture¶

The AIRS framework uses a three-layer runtime defence:

Layer	Speed	What it does	Epistemic integrity role
Guardrails	~10ms	Block known-bad inputs/outputs	Can verify basic structural claims (format, completeness markers)
Model-as-Judge	~500ms–5s	Evaluate unknown threats via independent model	Can assess whether an agent's claims are consistent with its stated data sources
Human Oversight	Minutes–hours	Resolve genuinely ambiguous cases	Can investigate the actual data accessed vs. claimed

For a security architect, the key design question is: at which layer do you verify epistemic integrity for each agent in the chain?

The answer depends on the risk tier (from the AIRS framework):

Tier 1 (Supervised): Guardrails verify structural completeness markers; human reviews all agent chains
Tier 2 (Managed): Guardrails + Model-as-Judge verify reasoning-basis consistency; humans review flagged chains
Tier 3 (Autonomous): Full epistemic integrity verification at all three layers; circuit breaker triggers on verification failure

Designing for epistemic integrity¶

As a security architect, here's how epistemic integrity translates to architectural requirements:

1. Reasoning-basis logging¶

Every agent must log not just its output, but what data it accessed during reasoning. This means:

RAG retrieval: Log the query, the number of results, and a completeness metric (e.g., expected vs. actual result count)
Tool calls: Log the call, the response, and whether the response was complete
Context window: Log the utilisation percentage and whether any input was truncated

2. Cross-agent assertion verification¶

When an agent claims "I checked X", a downstream verification step should confirm:

Did the agent actually access the data source for X?
Was the access complete (not partial, not cached, not stale)?
Does the output's confidence match the completeness of the check?

3. Chain-level integrity checks¶

At the end of a multi-agent chain (or at key checkpoints within it), an independent verification should confirm:

The chain's output is consistent with the data sources accessed by all agents
No agent introduced claims that aren't supported by upstream data
Confidence levels didn't inflate as data passed through the chain

Epistemic integrity vs. hallucination detection¶

It's important to distinguish epistemic integrity from hallucination detection:

	Hallucination detection	Epistemic integrity
What it checks	Is the output factually correct?	Was the reasoning based on complete and current data?
Where it looks	The output text	The reasoning process and data access
What it catches	Made-up facts	Correct-looking conclusions from incomplete data
Phantom Compliance	Wouldn't catch it (output is well-formed and internally consistent)	Would catch it (Agent B's retrieval was incomplete)

Hallucination detection is a subset of epistemic integrity, not a replacement for it.

Reflection

Think about the Phantom Compliance scenario. At which layer of the three-layer architecture would you place the epistemic integrity check for Agent B? What would that check look like?

Consider

A guardrail-level check could verify the retrieval result count against an expected minimum. An Model-as-Judge check could compare Agent B's "checked against restricted securities list" claim against the actual retrieval metadata. Which gives you the right balance of speed and depth for a compliance-critical pipeline?

Next: MASO Controls →