When Agents Talk to Agents¶

Multi-agent systems create accountability gaps that require careful governance

One agent is hard enough to secure. Multiple agents — collaborating, delegating, negotiating — compound the problem in ways that single-agent frameworks don't address.

Multi-agent architectures are arriving in production. The security model hasn't caught up.

The Coordination Problem¶

A single agent has a clear accountability path: it receives input, takes actions, produces output. You can trace cause to effect.

Multi-agent systems break this:

Agent A receives a user request. Agent A delegates part of the task to Agent B. Agent B queries Agent C for information. Agent C returns data that contains an embedded instruction. Agent B acts on the instruction. Agent A incorporates the result without knowing what happened. User receives a compromised output.

Where did the failure occur? Who's accountable? What control should have caught it?

The answer is unclear because the architecture diffuses responsibility.

New Attack Surfaces¶

Agent-to-agent injection¶

When agents communicate, their messages become input. If Agent B trusts Agent A's output, and an attacker can influence Agent A, they can inject through the agent chain.

This is prompt injection at scale — not user-to-agent, but agent-to-agent.

Emergent behaviour¶

Individual agents may behave correctly in isolation. Together, they exhibit behaviour nobody designed.

Agent A optimises for speed. Agent B optimises for thoroughness. Together, they oscillate — A rushing, B slowing, A overriding, B resisting. The system fails not because either agent is broken but because their interaction is pathological.

Emergent behaviour is hard to test for and hard to predict.

Cascading failures¶

Agent A fails. Agent B, waiting for Agent A, times out. Agent C, depending on both, enters an error state. Agent D, orchestrating all of them, retries — triggering a cascade that amplifies the original failure.

Single-agent failure modes are contained. Multi-agent failure modes can propagate.

Responsibility diffusion¶

When something goes wrong, each agent can point at another. "I just followed the instruction from Agent B." "I just returned the data Agent C requested." "I just aggregated what everyone gave me."

No single agent did anything wrong. The system produced harm anyway.

What the Framework Covers (Partially)¶

The existing framework addresses single agents:

Control	Single Agent	Multi-Agent Gap
Scope enforcement	Agent stays in its lane	Agents may expand scope through delegation
Action validation	Actions checked before execution	Who validates when Agent B acts on Agent A's request?
Tool output sanitisation	External data treated as untrusted	Is Agent A's output "external"?
Circuit breakers	Stop runaway execution	Distributed execution harder to stop
Human approval	Human approves high-impact actions	Approval for each agent? For the orchestrator? For the final action only?

The principles apply. The implementation is unclear.

Framework Extensions Needed¶

Agent identity and trust¶

Each agent needs identity. Agent A knows it's receiving a message from Agent B, not a spoofed message claiming to be from Agent B.

Trust must be explicit: - What can Agent A ask Agent B to do? - What data can Agent B share with Agent A? - Can Agent A delegate to agents it hasn't been authorised to use?

Implicit trust ("we're all in the same system") is a vulnerability.

End-to-end accountability¶

Even if individual agents pass inspection, the system needs end-to-end accountability:

Who requested the original task? (Traceable through the entire chain)
What was the final outcome? (Attributable to the originating request)
Which agents contributed? (Logged and auditable)
Where did policy violations occur? (Identifiable despite diffusion)

The orchestrator — or a supervisory layer — needs visibility into the whole chain, not just its direct reports.

Distributed guardrails¶

Guardrails at the edge (user input, final output) aren't enough. You need validation at agent boundaries:

Agent A → Orchestrator → Guardrail check → Agent B
Agent B → Guardrail check → External tool
Agent C → Guardrail check → Agent A

Every boundary is a potential injection point. Every boundary needs inspection.

This is expensive. It may also be necessary.

System-level Judge¶

A Judge evaluating individual agent interactions misses system-level issues. You need a Judge that can:

Review the full multi-agent conversation
Identify coordination failures
Detect emergent behaviour patterns
Assess whether the system outcome matches the user intent

This is harder than single-agent evaluation. The Judge needs to understand agent roles, expected interactions, and acceptable deviations.

Practical Guidance¶

If you're deploying multi-agent systems:

Start simple — minimise agent count until you understand the dynamics
Treat agent-to-agent messages as untrusted input
Log the full conversation across all agents
Implement circuit breakers at the orchestrator level
Require human approval for system-level outcomes, not just individual actions
Design for graceful degradation when agents fail

If you're building framework:

Extend agent controls to cover agent-to-agent communication
Add orchestrator-level controls and accountability
Define trust relationships between agents explicitly
Build Judge capability for multi-turn, multi-agent evaluation
Plan for emergent behaviour — monitoring, anomaly detection, kill switches

The Trajectory¶

Multi-agent architectures will become common. They solve problems single agents can't — complex tasks, specialised knowledge, parallel execution.

The security and governance models will lag. We're building single-agent controls for multi-agent systems.

Expect failures. Design for them. Learn from them.¶

AI Runtime Behaviour Security, 2026 (Jonathan Gill).