Skip to content

3. Epistemic Integrity

1What breaks
2Tools miss it
3Epistemic integrity
4MASO controls
5Instrumentation

After this module you will be able to

  • Define epistemic integrity as a measurable engineering property of agent chains
  • Design a verification receipt data structure for tracking reasoning-basis integrity
  • Identify which integrity checks must happen at runtime vs. design-time
  • Implement interfaces that propagate data quality metadata across agent boundaries

Not philosophy: engineering

The term "epistemic integrity" sounds academic. In engineering terms, it means something concrete:

An agent's output is only trustworthy if the data it reasoned over was complete, current, and correctly scoped for the task.

This is not a subjective judgement. It is measurable. You can instrument it. You can set thresholds. You can alert on it. This module shows you how.

The engineering definition: Epistemic integrity is the property that an agent's stated conclusions are warranted by the data it actually accessed. When you can measure and verify this property at runtime, you have epistemic integrity monitoring. When you can't, you have the Phantom Compliance problem.


From concept to data structure

Traditional software systems have well-established patterns for tracking data provenance. When a function returns a result, you can trace which database queries produced the inputs, whether those queries returned complete results, and whether the data was fresh. You do this with request context, database connection metadata, and cache headers.

Agent systems need the same thing, but the "database query" is a retrieval step, the "function" is an LLM inference, and the "result" is a natural-language output that carries no built-in provenance.

The solution is a verification receipt, a structured metadata object that travels alongside every agent output, recording what data the agent accessed and how complete that data was.

The verification receipt pattern

Here is a concrete schema:

{
  "receipt_id": "vr-2025-03-15-14-22-01-agent-b",
  "agent_id": "compliance-agent-b",
  "chain_id": "trade-review-4847",
  "timestamp": "2025-03-15T14:22:02Z",

  "reasoning_basis": {
    "data_sources": [
      {
        "source_id": "restricted-securities-vectorstore",
        "query": "restricted_securities_check",
        "expected_result_count": 312,
        "actual_result_count": 47,
        "completeness_ratio": 0.15,
        "freshness": "2025-03-15T14:21:58Z",
        "truncation_occurred": true,
        "truncation_reason": "context_window_limit"
      }
    ],
    "tool_calls": [],
    "context_window": {
      "capacity_tokens": 128000,
      "used_tokens": 94000,
      "utilisation": 0.73,
      "truncation_occurred": false
    }
  },

  "output_metadata": {
    "stated_confidence": 0.94,
    "warranted_confidence": 0.15,
    "confidence_gap": 0.79,
    "claims": [
      {
        "claim": "No restricted securities found in proposed trade",
        "basis": "Checked 47 of 312 known restricted securities",
        "coverage": 0.15
      }
    ]
  },

  "integrity_verdict": {
    "pass": false,
    "flags": ["retrieval_completeness_below_threshold"],
    "recommended_action": "escalate_or_retry"
  }
}

Let's break down the critical fields.

reasoning_basis.data_sources

This section records every data access the agent made during its reasoning process. For each source:

  • expected_result_count: How many results should this query return? This can come from a precomputed baseline, a count query against the source, or a configured threshold.
  • actual_result_count: How many results did the agent actually receive?
  • completeness_ratio: The ratio of actual to expected. This is the single most important metric for detecting Phantom Compliance-style failures.
  • truncation_occurred: A boolean flag indicating whether the data was truncated before reaching the agent.

output_metadata.confidence_gap

The confidence gap is the difference between what the agent claims its confidence is and what the data warrants. Agent B reported 94% confidence on a compliance check that covered 15% of the restricted securities list. The gap is 0.79, a clear signal that something is wrong.

Computing warranted confidence is domain-specific, but a simple heuristic works well: if you checked 15% of the data, your maximum warranted confidence in a "nothing found" claim is approximately 15%. Any confidence above that is unwarranted.

integrity_verdict

The verdict is a computed field based on configurable rules:

  • Completeness ratio below threshold? Flag.
  • Confidence gap above threshold? Flag.
  • Truncation occurred on a critical data source? Flag.

If any flag is set, the receipt's pass field is false, and the recommended action tells the system what to do next.


Where the receipt lives in your architecture

The verification receipt is not a log entry. It is a first-class data object that travels with the agent's output through the chain.

Verification receipt propagation: each agent reads upstream receipts, generates its own, and passes both downstream

The architectural principle: Verification receipts make data quality a first-class citizen in inter-agent communication. Just as HTTP responses carry headers with cache metadata and content types, agent outputs carry receipts with reasoning-basis metadata. Downstream agents can make informed trust decisions instead of blindly accepting upstream outputs.


Runtime vs. design-time checks

Not every integrity check needs to happen at runtime. Some can be verified at design time (during development and deployment), and some must be verified at runtime (during execution). Getting this distinction right is critical for both performance and coverage.

Design-time checks

These checks verify structural properties that don't change between requests:

Check What it verifies How to implement
Schema validation Receipt structure is well-formed JSON Schema validation in CI/CD
Source registration Every data source has an expected result count or baseline Configuration validation at deploy time
Threshold configuration Completeness thresholds are set for every critical source Config review + automated validation
Receipt propagation Every agent in the chain produces and consumes receipts Integration tests that verify receipt flow
Truncation handling Framework is configured to emit truncation events Unit tests for context assembly logic

Design-time checks are your safety net against misconfiguration. They ensure that the runtime checks can work.

Runtime checks

These checks must execute on every request because their results depend on the specific data being processed:

Check What it verifies When it runs
Retrieval completeness Actual vs. expected result count for this specific query After every retrieval step
Context utilisation How much of the context window is used; whether truncation occurred Before every LLM inference
Response freshness Tool response timestamps are within acceptable staleness bounds After every tool call
Cross-agent consistency Downstream confidence is warranted by upstream data quality At every inter-agent boundary
Confidence gap Stated confidence does not exceed warranted confidence Before emitting any agent output

Runtime checks add latency. The retrieval completeness check requires either a count query against the source or a comparison against a cached baseline. Context utilisation checks are essentially free (you already have the token count). Response freshness checks require parsing one additional field from the tool response.

In practice, the total overhead for runtime epistemic integrity checks is 50-200ms per agent boundary, negligible compared to LLM inference time.


Implementing the interface

Here is a minimal interface definition for agents that support epistemic integrity:

from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime

@dataclass
class DataSourceAccess:
    source_id: str
    query: str
    expected_count: Optional[int]
    actual_count: int
    freshness: datetime
    truncated: bool
    truncation_reason: Optional[str] = None

    @property
    def completeness_ratio(self) -> Optional[float]:
        if self.expected_count and self.expected_count > 0:
            return self.actual_count / self.expected_count
        return None

@dataclass
class VerificationReceipt:
    agent_id: str
    chain_id: str
    timestamp: datetime
    data_sources: list[DataSourceAccess] = field(default_factory=list)
    context_utilisation: float = 0.0
    context_truncated: bool = False
    stated_confidence: float = 0.0

    @property
    def warranted_confidence(self) -> float:
        """Minimum completeness ratio across all data sources."""
        ratios = [ds.completeness_ratio for ds in self.data_sources
                  if ds.completeness_ratio is not None]
        if not ratios:
            return 0.0
        return min(ratios)

    @property
    def confidence_gap(self) -> float:
        return max(0.0, self.stated_confidence - self.warranted_confidence)

    @property
    def integrity_pass(self) -> bool:
        return (
            self.confidence_gap < 0.2
            and all(
                (ds.completeness_ratio or 0) > 0.8
                for ds in self.data_sources
            )
            and not self.context_truncated
        )

@dataclass
class AgentOutput:
    content: str
    receipt: VerificationReceipt
    upstream_receipts: list[VerificationReceipt] = field(
        default_factory=list
    )

The critical design choice: AgentOutput bundles the content with its receipt and the chain of upstream receipts. Any downstream agent (or any monitoring system) can inspect the full provenance chain.


What this catches

Let's replay the Phantom Compliance scenario with verification receipts in place:

  1. Agent B retrieves 47 of 312 restricted securities. The receipt records completeness_ratio: 0.15 and truncated: true.

  2. Agent B produces CLEAR with 94% confidence. The receipt computes warranted_confidence: 0.15 and confidence_gap: 0.79. The integrity_pass field is false.

  3. Agent C receives Agent B's output and receipt. Before proceeding, Agent C checks receipt.integrity_pass. It is false. Agent C halts the chain and escalates.

Result: The trade is not approved. The failure is detected at runtime, at the inter-agent boundary, before any damage occurs. Total added latency: approximately 100ms for the receipt computation and check.


What this doesn't catch

Verification receipts are not a complete solution. They have limitations:

  • Unknown unknowns: If you don't configure an expected result count for a data source, the completeness ratio can't be computed. You need to register your data sources and their baselines.
  • Semantic completeness: Receipts track whether the agent got enough data, not whether it got the right data. If all 312 restricted securities are returned but the query was wrong and retrieved the wrong list, the receipt will show 100% completeness on the wrong data.
  • Gaming: If an agent is adversarial (or hallucinating), it could generate a receipt that claims high completeness. Receipts should be computed by the framework layer, not by the agent itself.

These limitations are addressed by the MASO controls in Module 4, which add independent verification layers on top of the receipt pattern.


Reflection

Look at the warranted_confidence calculation in the code above. It uses the minimum completeness ratio across all data sources. Is this the right heuristic for your use case? When might you want the average instead? When might you want a weighted calculation based on source criticality?

Consider

For compliance-critical systems (like the Phantom Compliance scenario), the minimum is correct, because your confidence can only be as high as your weakest data source. For advisory systems where multiple sources provide overlapping information, an average or weighted approach might be more appropriate. The key is that you are making this decision explicitly rather than defaulting to the agent's self-reported confidence.


Next: MASO Controls →