4. MASO Controls¶

1What goes wrong

2Controls miss it

3Epistemic integrity

4MASO controls

5Verification

After this module you will be able to¶

Name and describe the eight MASO control domains
Map each Phantom Compliance failure to the MASO domain that addresses it
Explain how MASO integrates with the three-layer runtime architecture
Select controls for a multi-agent pipeline based on risk tier

What is MASO?¶

MASO (Multi-Agent Security Operations) is the AIRS framework's control catalogue for multi-agent systems. It defines controls across eight domains, organised by risk tier.

Think of MASO as what OWASP is for web applications, but for agent chains: a structured way to identify what controls you need and where to place them.

The full MASO framework defines each control in detail. Here, we'll focus on how the eight domains apply to the Phantom Compliance failure, and how you'd architect them into a system.

The eight domains¶

Domain	What it controls	Phantom Compliance relevance
1. Prompt, Goal & Epistemic Integrity	Agent reasoning basis, goal fidelity, input completeness	Directly addresses the incomplete retrieval
2. Identity & Access	Who (or what) can call which agents, with what permissions	Controls which agents can invoke compliance checks
3. Data Protection	Data handling across agent boundaries	Controls how the restricted list flows through the chain
4. Execution Control	Runtime behaviour, delegation limits, circuit breakers	Stops the chain if a control fails
5. Observability	Logging, tracing, and monitoring across the chain	Makes the incomplete retrieval visible
6. Supply Chain	Model provenance, tool integrity, dependency management	Ensures the compliance model is the right model
7. Privileged Agent Governance	Controls for agents with elevated permissions	Agent C's ability to approve trades requires elevated oversight
8. Objective Intent	Developer-declared specifications for evaluating agent and workflow compliance	Provides the formal reference standard for evaluating whether agents did what they were designed to do

Mapping MASO to Phantom Compliance¶

Let's trace each failure through the MASO domains:

The incomplete retrieval (Domain 1: Epistemic Integrity)¶

What MASO says: Agents must verify the completeness of their reasoning inputs before producing output. For data retrieval, this means checking:

Result count verification: The agent should compare the number of retrieved results against an expected count or a completeness threshold
Source currency: The data must be current, not cached, not stale
Retrieval confidence: The agent must log a retrieval completeness metric, and downstream agents must have access to it

Architectural pattern: Place a retrieval verification guardrail between the RAG layer and the agent. The guardrail checks that the retrieval result count is within expected bounds before passing the data to the agent.

RAG completeness check before Agent B

The unchallenged trust (Domains 2 + 4: Identity/Access + Execution Control)¶

What MASO says: Agents must have scoped permissions, and downstream agents must be able to verify upstream claims. Agent C should not blindly trust Agent B's compliance assessment; it should be able to access Agent B's verification metadata.

Architectural pattern: Each agent passes a verification receipt alongside its output. The receipt includes:

What data sources were accessed
Retrieval completeness metrics
Processing metadata (context utilisation, truncation flags)

Downstream agents check the receipt before proceeding. If the receipt is missing or incomplete, the chain escalates.

The invisible failure (Domain 5: Observability)¶

What MASO says: Observability must operate at the chain level, not just the agent level. Key metrics include:

Retrieval completeness ratio: Expected vs. actual results for every data access
Context utilisation: How full was the agent's context window?
Cross-agent consistency: Do downstream outputs align with upstream data quality?

Architectural pattern: A chain-level observability layer aggregates per-agent metrics and applies cross-agent rules:

Chain-level observability: agent metrics flowing into integrity dashboard

MASO by risk tier¶

The AIRS framework defines three risk tiers. As a security architect, the tier determines which MASO controls are mandatory vs. recommended:

Tier 1: SupervisedTier 2: ManagedTier 3: Autonomous

Human-in-the-loop for all decisions. Typical for initial deployments and high-risk domains.

Epistemic integrity: Structural completeness checks via guardrails
Execution control: Human approves all chain outputs
Observability: Full logging, human reviews chain traces

Phantom Compliance at Tier 1: A human reviewer would see every chain. The failure might still occur, but it would be caught during human review, if the reviewer knows to check retrieval completeness.

Automated controls with human oversight for flagged cases. Typical for established systems with proven track records.

Epistemic integrity: Model-as-Judge verifies reasoning basis on a sample or risk basis
Execution control: Circuit breaker triggers on verification failure; human reviews escalations
Observability: Automated chain-level integrity monitoring with alerting

Phantom Compliance at Tier 2: The retrieval completeness check flags Agent B's partial retrieval. The Model-as-Judge confirms the flag. The chain escalates to human review. The trade is paused, not approved.

Full automation with controls at every layer. Typical for low-risk, high-volume operations with strong safety records.

Epistemic integrity: Guardrails + Model-as-Judge + automated verification at every inter-agent boundary
Execution control: Circuit breaker with PACE fallback (Primary → Alternate → Contingency → Emergency)
Observability: Real-time chain integrity monitoring with automated response

Phantom Compliance at Tier 3: The retrieval completeness guardrail blocks Agent B's output immediately. PACE activates the alternate path (retry with a forced full retrieval). If that fails, the contingency path (flag for human review) activates. If that fails, the emergency path (block the trade) activates.

Domain 8: Objective Intent¶

The first seven domains catch specific faults: incomplete retrieval, unchallenged trust, invisible failures. But they cannot evaluate whether the system as a whole is doing what the developer designed it to do. That requires a declared reference standard.

Objective Intent is the newest MASO domain. It requires every agent, judge, and workflow to have a declared Objective Intent Specification (OISpec): a structured, versioned document that states what the component is supposed to achieve, within what parameters, and against what success and failure criteria.

The OISpec serves as the formal input to two levels of evaluation:

Tactical evaluation: A judge evaluates each agent's actions against its individual OISpec. Did this agent operate within its declared parameters? Did it pursue the goal it was assigned?
Strategic evaluation: A workflow-level evaluator assesses whether all agents collectively achieved the workflow's declared intent. This catches the most dangerous class of multi-agent failure: every agent complied with its own specification, but the aggregate result is wrong.

Crucially, judges themselves have OISpecs. A judge without declared evaluation criteria is a black box with authority. Judge OISpecs are monitored by an independent meta-evaluator, closing the "who watches the watchmen" loop through explicit contracts at every level.

For a deeper treatment of the OISpec schema, tiered controls, and evaluation architecture, see Objective Intent.

Reflection

Which risk tier should the Meridian Capital compliance pipeline be running at? Consider: it's handling regulatory compliance for financial transactions, it's been running for six months, and until this incident its track record was clean.

Consider

A compliance pipeline for financial transactions is high-risk even with a good track record. Tier 2 (Managed) is the minimum, with automated controls and human oversight for edge cases. Tier 1 might be justified for the first deployment period, but the 60% backlog reduction suggests Meridian moved past that. The question is whether they built Tier 2 controls or just assumed Tier 1's human reviewers would catch everything.

Next: Verification & Evidence →