3. Epistemic Integrity¶
After this module you will be able to¶
- Define epistemic integrity in the context of AI runtime security
- Explain why epistemic integrity is the foundation of multi-agent control design
- Identify where epistemic integrity breaks in the AIRS three-layer architecture
- Distinguish between output correctness and epistemic soundness
The concept¶
Epistemic integrity means that an agent's outputs are faithful to its actual reasoning inputs: what it claims to know is based on what it actually accessed, processed, and verified.
In single-agent systems, this is relatively easy to verify: you can compare the agent's output against its input and check for consistency. In multi-agent chains, it becomes the central security challenge.
The definition: An agent has epistemic integrity when its stated confidence, claims, and conclusions are warranted by the data it actually accessed, not by the data it should have accessed, or the data its output implies it accessed.
Why this is the differentiator¶
The AIRS framework identifies epistemic integrity as the first MASO control domain (Prompt, Goal & Epistemic Integrity) because it's the foundation on which all other controls depend:
- Guardrails can block bad outputs, but they can't determine whether a "good" output was derived from complete data
- Model-as-Judge can evaluate output quality, but it needs to know what to check, and epistemic integrity tells it to check the reasoning basis, not just the conclusion
- Human oversight is only effective when humans know where to look, and epistemic integrity violations tell them to look at the data the agent used, not just the answer it gave
Without epistemic integrity as a design principle, the other layers of defence are checking the wrong things.
Epistemic integrity in the three-layer architecture¶
The AIRS framework uses a three-layer runtime defence:
| Layer | Speed | What it does | Epistemic integrity role |
|---|---|---|---|
| Guardrails | ~10ms | Block known-bad inputs/outputs | Can verify basic structural claims (format, completeness markers) |
| Model-as-Judge | ~500ms–5s | Evaluate unknown threats via independent model | Can assess whether an agent's claims are consistent with its stated data sources |
| Human Oversight | Minutes–hours | Resolve genuinely ambiguous cases | Can investigate the actual data accessed vs. claimed |
For a security architect, the key design question is: at which layer do you verify epistemic integrity for each agent in the chain?
The answer depends on the risk tier (from the AIRS framework):
- Tier 1 (Supervised): Guardrails verify structural completeness markers; human reviews all agent chains
- Tier 2 (Managed): Guardrails + Model-as-Judge verify reasoning-basis consistency; humans review flagged chains
- Tier 3 (Autonomous): Full epistemic integrity verification at all three layers; circuit breaker triggers on verification failure
Designing for epistemic integrity¶
As a security architect, here's how epistemic integrity translates to architectural requirements:
1. Reasoning-basis logging¶
Every agent must log not just its output, but what data it accessed during reasoning. This means:
- RAG retrieval: Log the query, the number of results, and a completeness metric (e.g., expected vs. actual result count)
- Tool calls: Log the call, the response, and whether the response was complete
- Context window: Log the utilisation percentage and whether any input was truncated
2. Cross-agent assertion verification¶
When an agent claims "I checked X", a downstream verification step should confirm:
- Did the agent actually access the data source for X?
- Was the access complete (not partial, not cached, not stale)?
- Does the output's confidence match the completeness of the check?
3. Chain-level integrity checks¶
At the end of a multi-agent chain (or at key checkpoints within it), an independent verification should confirm:
- The chain's output is consistent with the data sources accessed by all agents
- No agent introduced claims that aren't supported by upstream data
- Confidence levels didn't inflate as data passed through the chain
Epistemic integrity vs. hallucination detection¶
It's important to distinguish epistemic integrity from hallucination detection:
| Hallucination detection | Epistemic integrity | |
|---|---|---|
| What it checks | Is the output factually correct? | Was the reasoning based on complete and current data? |
| Where it looks | The output text | The reasoning process and data access |
| What it catches | Made-up facts | Correct-looking conclusions from incomplete data |
| Phantom Compliance | Wouldn't catch it (output is well-formed and internally consistent) | Would catch it (Agent B's retrieval was incomplete) |
Hallucination detection is a subset of epistemic integrity, not a replacement for it.
Reflection
Think about the Phantom Compliance scenario. At which layer of the three-layer architecture would you place the epistemic integrity check for Agent B? What would that check look like?
Consider
A guardrail-level check could verify the retrieval result count against an expected minimum. An Model-as-Judge check could compare Agent B's "checked against restricted securities list" claim against the actual retrieval metadata. Which gives you the right balance of speed and depth for a compliance-critical pipeline?