Skip to content

Multi-Agent Controls

Extends Agentic Controls for systems where agents interact with other agents.

This document uses the simplified three-tier system (Tier 1/2/3). See Risk Tiers — Simplified Tier Mapping for the mapping to LOW/MEDIUM/HIGH/CRITICAL.

The Problem

Single-agent systems have a clear accountability chain: User → Agent → Tools → Output.

Multi-agent systems break this. When Agent A delegates to Agent B, which calls Agent C:

  • Who is accountable for the final output?
  • Can Agent B do things Agent A can't?
  • Does Agent C know who originally requested the action?
  • If the chain produces a harmful outcome, where did it go wrong?

These are not edge cases. Every framework that supports agent-to-agent communication (CrewAI, AutoGen, LangGraph, custom MCP chains) creates these scenarios by default.


Trust Topologies

Multi-agent systems fall into three patterns. Each has different control requirements.

Topology Description Risk Profile
Orchestrator One agent coordinates, others execute Moderate — single point of control and failure
Peer-to-peer Agents communicate directly High — no central authority, lateral movement risk
Hierarchical Agents delegate down a chain High — privilege can accumulate or escalate across levels

Controls

1. Delegation Policy

Every agent-to-agent request must be governed by an explicit delegation policy.

Rule Implementation
No privilege escalation Agent B cannot perform actions that Agent A's principal is not authorised for
Scope inheritance Delegated tasks inherit the scope (and constraints) of the requesting agent
Delegation depth limits Maximum chain length before requiring human approval
Allowlisted delegation pairs Explicitly define which agents can call which agents
# Example delegation policy
delegation:
  max_depth: 3
  require_human_approval_at_depth: 2
  allowed_pairs:
    - from: research-agent
      to: search-agent
      scopes: [web_search, document_retrieval]
    - from: research-agent
      to: summarisation-agent
      scopes: [text_generation]
  denied_pairs:
    - from: research-agent
      to: payment-agent  # research agent should never trigger payments

2. Identity Propagation

Every request in a multi-agent chain must carry the originating identity.

Requirement Why
Principal identity The human (or system) that initiated the chain
Agent chain Ordered list of agents that have handled the request
Scope at each hop What permissions were available at each step
Timestamp at each hop When each agent acted

This is the equivalent of HTTP request tracing (e.g., OpenTelemetry spans) applied to agent interactions. Without it, you cannot audit or attribute outcomes.

{
  "trace_id": "abc-123",
  "principal": "user:jgill@example.com",
  "chain": [
    { "agent": "orchestrator", "scope": ["read", "write"], "timestamp": "2026-02-11T10:00:00Z" },
    { "agent": "research-agent", "scope": ["read"], "timestamp": "2026-02-11T10:00:01Z" },
    { "agent": "search-agent", "scope": ["web_search"], "timestamp": "2026-02-11T10:00:02Z" }
  ]
}

3. Inter-Agent Guardrails

Each agent in a chain should apply its own guardrails to incoming requests — not trust the upstream agent's validation.

Position Guardrail Responsibility
Receiving agent Validate that the request is within its declared scope
Receiving agent Verify the delegation policy allows this interaction
Receiving agent Apply its own input guardrails to the content, not just the metadata
Sending agent Apply output guardrails before forwarding results downstream

Zero trust applies to agents exactly as it applies to microservices. Trust nothing. Verify everything.

4. Circuit Breakers

Multi-agent chains can loop, cascade, or amplify errors. Circuit breakers prevent runaway behaviour.

Trigger Action
Token budget exceeded Halt chain, return partial result with explanation
Delegation depth exceeded Halt chain, escalate to human
Error rate threshold Disable agent-to-agent path, fall back to simpler flow
Latency threshold Timeout and escalate rather than waiting indefinitely
Repeated identical requests Detect loops, break them

5. Outcome Attribution

When a multi-agent chain produces an output, you must be able to attribute each component of the output to the agent that generated it.

This is not optional for regulated environments. If an AI system makes a credit decision, the regulator will ask: "Which component made this determination and on what basis?"

Requirement Implementation
Per-agent output logging Each agent logs its input, reasoning, and output
Contribution tagging Final output is annotated with which agent contributed which parts
Decision audit trail For consequential decisions, the full chain is reconstructable

Protocol-Level Risks

MCP (Model Context Protocol)

MCP enables agents to use tools, including other agents exposed as tools. Risks:

  • Tool impersonation — Malicious MCP server posing as a legitimate tool
  • Excessive tool access — Agent given access to more MCP tools than needed
  • No built-in authentication — MCP does not natively verify tool identity

Controls: Pin MCP server URIs, verify server identity, scope tool access per agent, log all MCP calls.

A2A (Agent-to-Agent Protocol)

Google's A2A protocol enables cross-vendor agent communication. Risks:

  • Trust boundary collapse — External agent from another organisation gains access to internal tools
  • Schema injection — Malformed agent cards that manipulate receiving agents
  • Capability advertisement spoofing — Agent claims capabilities it doesn't have (or shouldn't use)

Controls: Validate agent cards against an allowlist, enforce capability constraints at the receiving end, treat all A2A inputs as untrusted.


Risk Tier Adjustment

Multi-agent systems should be classified at least one risk tier higher than the equivalent single-agent system performing the same task.

Single-Agent Tier Multi-Agent Equivalent
Tier 1 (Low) Tier 2 (Medium) minimum
Tier 2 (Medium) Tier 3 (High) minimum
Tier 3 (High) Tier 3 + enhanced controls

The rationale: every additional agent in a chain is an additional point of failure, an additional attack surface, and an additional accountability gap.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).