Agentic AI Controls¶
This document extends the control framework for agentic AI systems — AI that takes autonomous multi-step actions, uses tools, and interacts with external systems.
Why Agentic AI Requires Additional Controls¶
Standard AI controls assume discrete request/response interactions. Agentic AI breaks this assumption:
| Standard AI | Agentic AI |
|---|---|
| Single interaction | Multi-step execution |
| Content generation | Real-world actions |
| Human reviews output | Actions happen autonomously |
| Evaluate one response | Evaluate trajectory |
| Undo = don't send | Actions may be irreversible |
Without additional controls, the standard architecture fails to provide coverage.
Agentic Control Model¶
Three Phases of Agentic Control¶
| Phase | Purpose | Controls |
|---|---|---|
| Planning | Review before execution | Plan guardrails, plan approval |
| Execution | Constrain during execution | Action guardrails, circuit breakers |
| Assurance | Evaluate after execution | Trajectory Judge, HITL review |
Control Reference¶
AG.1 Plan-Level Controls¶
AG.1.1 Plan Disclosure¶
Requirement: Agents must disclose their intended plan before execution.
Implementation: - Agent generates explicit plan before acting - Plan includes: goals, steps, tools to be used, expected outcomes - Plan is logged and available for review - No execution without plan disclosure
Risk tier application:
| Tier | Requirement |
|---|---|
| CRITICAL | Full plan with reasoning, mandatory human approval |
| HIGH | Full plan, human approval for high-risk plans |
| MEDIUM | Summary plan, auto-approve within bounds |
| LOW | Basic plan logging |
Evidence: Plan logs, plan templates
AG.1.2 Plan Guardrails¶
Requirement: Validate plans against policy before execution.
Implementation: - Automated policy check on proposed plans - Check for: prohibited actions, excessive scope, sensitive data access - Block plans that violate policy - Flag borderline plans for human review
Guardrail checks:
| Check | Purpose |
|---|---|
| Action allowlist | Only permitted actions in plan |
| Scope limits | Plan stays within defined boundaries |
| Resource limits | Plan won't exceed cost/time limits |
| Data access | Plan doesn't access prohibited data |
| External access | Plan doesn't contact prohibited systems |
Evidence: Plan validation logs, policy configuration
AG.1.3 Plan Approval¶
Requirement: Require human approval for plans above threshold.
Approval matrix:
| Plan Characteristic | CRITICAL | HIGH | MEDIUM | LOW |
|---|---|---|---|---|
| Any plan | Approve | — | — | — |
| External system access | Approve | Approve | Review | — |
| Data modification | Approve | Approve | Review | — |
| Financial transaction | Approve | Approve | Approve | Review |
| >10 steps | Approve | Approve | Review | — |
| Irreversible actions | Approve | Approve | Approve | Review |
Implementation: - Plan routed to approval queue - Approver sees: plan, context, risk assessment - Timeout: plan expires if not approved in SLA - Approval logged with approver identity and timestamp
Evidence: Approval workflow configuration, approval logs
AG.2 Execution-Level Controls¶
AG.2.1 Action Guardrails¶
Requirement: Validate each action before execution.
Implementation: - Every tool call / action passes through guardrail - Check: action permitted, parameters valid, within scope - Block actions that fail validation - Log all action attempts (pass and fail)
This is distinct from plan guardrails: - Plan guardrails check the intended plan - Action guardrails check each actual action at runtime - Agent may deviate from plan; action guardrails catch this
Guardrail checks per action:
| Check | Implementation |
|---|---|
| Action permitted | Allowlist of permitted actions |
| Parameters valid | Schema validation, range checks |
| Within scope | Action matches approved plan |
| Rate limit | Max actions per time window |
| Resource limit | Action won't exceed limits |
Evidence: Action validation logs, guardrail configuration
AG.2.2 Circuit Breakers¶
Requirement: Hard limits that halt execution automatically.
Circuit breaker types:
| Breaker | Trigger | Action |
|---|---|---|
| Step limit | >N steps in execution | Halt, require human review |
| Time limit | >N minutes elapsed | Halt, require human review |
| Cost limit | >$N spent (API, compute) | Halt, require human review |
| Error rate | >N% actions failing | Halt, investigate |
| Anomaly | Behaviour outside baseline | Halt, investigate |
| Deviation | Execution diverges from plan | Halt, require re-approval |
Limits by tier:
| Breaker | CRITICAL | HIGH | MEDIUM | LOW |
|---|---|---|---|---|
| Max steps | 10 | 25 | 50 | 100 |
| Max time | 5 min | 15 min | 30 min | 60 min |
| Max cost | $1 | $10 | $50 | $100 |
| Error threshold | 10% | 20% | 30% | 50% |
Implementation: - Circuit breakers enforced in execution runtime - Cannot be overridden by agent - Trigger halts execution immediately - Human must review and explicitly resume
Evidence: Circuit breaker configuration, trigger logs
AG.2.3 Scope Enforcement¶
Requirement: Enforce boundaries on what agents can access, modify, and achieve.
Scope dimensions:
| Dimension | Definition | Enforcement |
|---|---|---|
| Data scope | What data agent can read/write | Access controls, data classification |
| System scope | What systems agent can interact with | Network controls, API allowlists |
| Action scope | What actions agent can take | Action allowlist per agent |
| Authority scope | What agent can commit to | Approval thresholds |
| Outcome scope | What results the agent should produce | Outcome boundaries, not just action lists |
Outcome boundaries: Action-level scope is necessary but insufficient. An agent can take a series of individually permitted actions that produce an unintended aggregate outcome. Scope must include outcome constraints — "you can query this table for read-only customer service purposes" is better than "you can query this table."
Implementation: - Scope defined per agent / use case - Enforced at infrastructure level (not just agent code) - Attempted scope violations logged and blocked - Scope cannot be expanded by agent itself - Outcome boundaries validated after task completion (see AI.10.6)
Evidence: Scope definitions, access control configuration, violation logs, outcome boundary definitions
AG.2.4 Tool Controls¶
Requirement: Govern which tools agents can use and how.
Tool governance:
| Control | Purpose |
|---|---|
| Tool inventory | List of approved tools |
| Tool classification | Risk tier per tool |
| Tool parameters | Allowed parameter ranges |
| Tool rate limits | Max calls per tool |
| Tool approval | Some tools require per-use approval |
Tool risk classification:
| Risk | Examples | Controls |
|---|---|---|
| High | Database write, API call, file system, email send | Per-use approval or strict limits |
| Medium | Web search, document read, calculation | Rate limits, logging |
| Low | Text generation, formatting | Standard logging |
Evidence: Tool inventory, tool risk classifications, tool usage logs
AG.2.5 Tool Protocol Security¶
Requirement: Secure tool connectivity regardless of protocol (MCP, OpenAI function calling, LangChain, etc.).
Tool protocols standardise how agents invoke external capabilities. The security principles apply regardless of which protocol is used:
| Control | Implementation |
|---|---|
| Authentication | Authenticate tool endpoints; no anonymous tool access |
| Authorisation | Scope tool permissions to minimum required; use per-agent credentials |
| Input validation | Validate tool parameters against schema before execution |
| Output sanitisation | Treat tool responses as untrusted input; sanitise before use |
| Transport security | TLS for all tool communications; certificate validation |
| Logging | Log all tool calls with parameters and responses |
| Rate limiting | Limit tool call frequency per session/user/agent |
| Timeout handling | Set timeouts; handle gracefully; don't hang on unresponsive tools |
MCP-specific considerations:
| Concern | Mitigation |
|---|---|
| Server discovery | Allowlist approved MCP servers; don't allow dynamic discovery |
| Capability negotiation | Restrict to required capabilities only |
| Resource access | Apply data scope controls to MCP resource requests |
| Prompt injection via tools | Sanitise tool outputs before including in context |
Protocol-agnostic principles:
- Tools are trust boundaries — Every tool call crosses a trust boundary
- Least privilege — Grant minimum permissions required
- Defence in depth — Don't rely solely on tool-side security
- Assume compromise — Tool responses may be malicious or manipulated
- Audit everything — Full logging for investigation and compliance
Evidence: Tool endpoint configuration, authentication records, tool call logs
AG.3 Assurance-Level Controls¶
AG.3.1 Trajectory Logging¶
Requirement: Log complete execution trajectory for evaluation.
Log content:
{
"trajectory_id": "uuid",
"agent_id": "agent-identifier",
"start_time": "timestamp",
"end_time": "timestamp",
"status": "completed | halted | failed",
"plan": {
"goals": ["..."],
"steps": ["..."],
"approved_by": "user-id | auto",
"approved_at": "timestamp"
},
"execution": [
{
"step": 1,
"action": "action-name",
"parameters": {...},
"guardrail_result": "pass | block",
"outcome": "success | failure",
"timestamp": "timestamp"
}
],
"circuit_breakers": {
"steps_used": 5,
"steps_limit": 25,
"cost_used": 0.50,
"cost_limit": 10.00,
"triggered": false
},
"outcome": {
"goal_achieved": true | false,
"side_effects": ["..."],
"errors": ["..."]
}
}
Retention by tier:
| Tier | Retention |
|---|---|
| CRITICAL | 7 years |
| HIGH | 3 years |
| MEDIUM | 1 year |
| LOW | 90 days |
Evidence: Trajectory logs
AG.3.2 Trajectory Evaluation (Judge)¶
Requirement: Evaluate complete trajectories, not just single interactions.
Evaluation criteria:
| Criterion | Question |
|---|---|
| Goal achievement | Did the agent achieve its goal? |
| Plan adherence | Did execution match the approved plan? |
| Scope compliance | Did agent stay within boundaries? |
| Action appropriateness | Were individual actions appropriate? |
| Efficiency | Was execution efficient (steps, cost, time)? |
| Side effects | Were there unintended consequences? |
| Policy compliance | Did trajectory comply with policies? |
Judge prompt structure:
You are evaluating an AI agent's complete execution trajectory.
APPROVED PLAN:
{plan}
ACTUAL EXECUTION:
{trajectory}
OUTCOME:
{outcome}
EVALUATE:
1. Did the agent achieve its goal appropriately?
2. Did execution follow the approved plan?
3. Were individual actions appropriate and necessary?
4. Did the agent stay within its defined scope?
5. Were there any unintended side effects?
6. Did anything violate policy?
Provide:
- Overall assessment (acceptable / concerns / unacceptable)
- Specific findings with evidence
- Recommendations for guardrail/limit adjustments
Sampling by tier:
| Tier | Sampling Rate |
|---|---|
| CRITICAL | 100% of trajectories |
| HIGH | 50% of trajectories |
| MEDIUM | 10% of trajectories |
| LOW | 5% of trajectories |
Evidence: Trajectory evaluation logs, Judge findings
AG.3.3 HITL for Agentic Systems¶
Requirement: Human oversight adapted for agentic execution.
HITL touch points:
| Touch Point | When | Human Action |
|---|---|---|
| Plan approval | Before execution | Approve / reject / modify plan |
| Circuit breaker | Execution halted | Investigate, resume / abort |
| Trajectory review | After execution | Review findings, remediate |
| Exception handling | Agent requests help | Provide guidance, approve exception |
HITL queue structure for agentic:
| Queue | Trigger | SLA |
|---|---|---|
| Plan approval | CRITICAL/HIGH plans pending | 15 min |
| Circuit breaker | Execution halted | 30 min |
| Trajectory findings | Judge flags issues | 4 hours |
| Exception requests | Agent needs guidance | 1 hour |
Key difference from standard HITL: - Standard: Review findings after AI response sent - Agentic: Review before execution (plans) AND after (trajectories) - Agentic: Real-time intervention via circuit breakers
Evidence: HITL queue configuration, review logs, SLA metrics
AG.4 Multi-Agent Controls¶
AG.4.1 Agent Inventory¶
Requirement: Maintain inventory of all agents and their relationships.
Inventory content:
| Field | Purpose |
|---|---|
| Agent ID | Unique identifier |
| Agent type | Orchestrator / specialist / worker |
| Scope | What this agent can do |
| Tools | What tools this agent can use |
| Dependencies | What other agents it calls |
| Owner | Accountable human |
| Risk tier | Classification |
Evidence: Agent inventory
AG.4.2 Orchestration Controls¶
Requirement: Govern how agents coordinate and delegate.
Controls:
| Control | Purpose |
|---|---|
| Delegation allowlist | Which agents can call which |
| Scope inheritance | Child agent can't exceed parent scope |
| Aggregated limits | Total limits across agent chain |
| Attribution | Track which agent responsible for what |
Implementation: - Orchestrator can only delegate to registered agents - Delegated agent inherits scope constraints - Circuit breakers apply to total execution (all agents) - Full trace maintained across agent boundaries
Evidence: Orchestration rules, delegation logs, aggregated traces
AG.4.3 Trace Correlation¶
Requirement: Maintain end-to-end trace across multi-agent execution.
Implementation: - Single trace ID for entire execution - Each agent interaction tagged with trace ID - Parent-child relationships recorded - Full trace reconstructable for investigation
Evidence: Correlated trace logs
Control Selection by Risk Tier¶
| Control | CRITICAL | HIGH | MEDIUM | LOW |
|---|---|---|---|---|
| AG.1.1 Plan disclosure | Full + reasoning | Full | Summary | Basic |
| AG.1.2 Plan guardrails | ✅ Required | ✅ Required | ✅ Required | ⚠️ Recommended |
| AG.1.3 Plan approval | All plans | High-risk plans | Auto within bounds | Auto |
| AG.2.1 Action guardrails | ✅ Required | ✅ Required | ✅ Required | ✅ Required |
| AG.2.2 Circuit breakers | Strict limits | Standard limits | Relaxed limits | Basic limits |
| AG.2.3 Scope enforcement | ✅ Required | ✅ Required | ✅ Required | ⚠️ Recommended |
| AG.2.4 Tool controls | Per-use approval | Rate limits | Logging | Basic logging |
| AG.2.5 Tool protocol security | ✅ Required | ✅ Required | ✅ Required | ⚠️ Recommended |
| AG.3.1 Trajectory logging | Full, 7 years | Full, 3 years | Full, 1 year | Summary, 90 days |
| AG.3.2 Trajectory eval | 100% | 50% | 10% | 5% |
| AG.3.3 HITL | All touch points | Plan + findings | Findings only | Spot checks |
| AG.4.1 Agent inventory | ✅ Required | ✅ Required | ✅ Required | ⚠️ Recommended |
| AG.4.2 Orchestration | ✅ Required | ✅ Required | ⚠️ If applicable | ⚠️ If applicable |
| AG.4.3 Trace correlation | ✅ Required | ✅ Required | ✅ Required | ⚠️ Recommended |
Regulatory Alignment¶
EU AI Act¶
| Requirement | Agentic Control |
|---|---|
| Article 14: Human oversight | Plan approval, circuit breakers, HITL |
| Article 9: Risk management | Circuit breakers, trajectory evaluation |
| Article 12: Record-keeping | Trajectory logging |
| Article 13: Transparency | Plan disclosure |
GDPR Article 22¶
Agentic systems that make decisions affecting individuals must ensure: - Human approval for consequential decisions - Ability to explain decision basis (trajectory logs) - Right to human review (HITL)
SR 11-7 / SS1/23 (Model Risk)¶
| Requirement | Agentic Control |
|---|---|
| Effective challenge | Plan approval, trajectory evaluation |
| Ongoing monitoring | Circuit breakers, trajectory Judge |
| Documentation | Trajectory logging, agent inventory |
Implementation Checklist¶
Before Deploying Agentic AI¶
- Agent registered in inventory
- Scope defined and documented
- Tools identified and classified
- Circuit breaker limits set
- Plan guardrails configured
- Action guardrails configured
- Trajectory logging enabled
- Judge evaluation configured
- HITL workflows established
- Approval matrix defined
- Risk tier assigned
- Owner assigned
Ongoing Operations¶
- Plan approval queue monitored
- Circuit breaker triggers investigated
- Trajectory evaluations reviewed
- Scope violations investigated
- Limits tuned based on findings
- Agent inventory maintained
Summary¶
Agentic AI requires controls at three phases:
| Phase | Key Controls |
|---|---|
| Planning | Plan disclosure, plan guardrails, plan approval |
| Execution | Action guardrails, circuit breakers, scope enforcement |
| Assurance | Trajectory logging, trajectory Judge, HITL review |
The standard architecture extends, not replaces: - Guardrails → Plan guardrails + action guardrails - Judge → Trajectory evaluation - HITL → Plan approval + circuit breaker response + trajectory review
Key principle remains: Humans decide. Agents act within approved boundaries. Execution is constrained by circuit breakers. Trajectories are evaluated. Findings are reviewed.¶
AI Runtime Behaviour Security, 2026 (Jonathan Gill).