Agentic AI Controls¶

Additional controls for AI systems that take autonomous actions.

What Makes Agents Different¶

Characteristic	Chatbot	Agent
Actions	Responds only	Takes real-world actions
Autonomy	Single turn	Multi-step, self-directed
Scope	Fixed	May expand based on goals
Failure mode	Bad answer	Bad action with consequences

Key risk: Agents can cause harm at machine speed without human review.

The Two Core Problems¶

Agentic AI security reduces to two problems:

The Two Core Problems

Problem	Question	Failure Mode
1. System Access	Does the agent access only the right systems?	Reaches data/APIs it shouldn't
2. Request Integrity	Does the action match the user's actual intent?	Manipulated or misinterpreted requests

Problem 1: System Access¶

The agent should only reach systems it needs, with minimum necessary permissions. For the governance model, lifecycle, and threat landscape behind these controls, see IAM Governance for AI Systems.

Control	Implementation
Least-privilege credentials	Agent gets tokens scoped to specific resources
Network allowlists	Agent can only reach approved endpoints
Data views	Database exposes only permitted subset
Action allowlists	Only pre-approved action types permitted
Blast radius limits	Maximum records, funds, or scope per action

Test: If the agent is fully compromised, what's the worst it can do? Reduce that.

Problem 2: Request Integrity¶

The action the agent takes should match what the user actually wanted.

Threat	Control
Injection attacks	Input guardrails, tool output sanitisation
Instruction drift	Anchor to original request, not intermediate reasoning
Misinterpretation	Intent confirmation before irreversible actions
Manipulation via tools	Treat tool outputs as untrusted data

Test: Can you trace from the user's original request to the final action? Is the link intact?

Why Both Problems Matter¶

Scenario	Access OK?	Integrity OK?	Outcome
Normal operation	✓	✓	Correct action
Over-privileged agent	✗	✓	Correct action, but breach waiting to happen
Injection attack	✓	✗	Wrong action on right systems
Compromised agent	✗	✗	Catastrophic — wrong action, broad access

Both problems must be solved. Solving one doesn't help if the other fails.

Core Principle¶

Infrastructure beats instructions.

Don't tell the agent "only access customer service data."
Give it credentials that can only access customer service data.

Bad (Instruction)	Good (Infrastructure)
"Only access CS data"	Database view exposes only CS data
"Don't send emails without approval"	Email API requires approval token
"Stay within budget"	Hard spending cap at API gateway

Control Categories¶

1. Scope Enforcement¶

Limit what the agent can access and do — technically, not via prompts.

Control	Implementation
Network allowlist	Agent can only reach approved endpoints
Data views	Agent sees only authorised data subset
Action allowlist	Only permitted actions can execute
Resource caps	Hard limits on compute, API calls, cost
Time limits	Maximum execution duration

2. Action Validation¶

Validate every action independently. Don't trust agent reasoning.

Validation flow:

Action Validator Flow

3. Tool Output Sanitisation¶

Tool outputs are injection vectors. Treat as untrusted.

Control	Purpose
Scan for instructions	Detect "ignore previous" patterns
Truncate length	Limit context pollution
Mark as data	Clear framing that this is data, not instructions
Flag suspicious	Human review before continuing

4. Approval Workflows¶

Make approval meaningful, not rubber-stamp.

Bad	Good
"Approve?"	Show context, data, impact, expected outcome
Approve/Deny only	Approve / Deny / Modify / Escalate
Same approver for all	Different approvers by action type
No expiry	Approval expires, must re-request

5. Circuit Breakers¶

Hard stops that trigger regardless of agent "reasoning."

Threshold	Action
>100 actions in one task	Pause
>$50 in API calls	Pause
>30 minutes execution	Pause
>10% error rate	Pause
Any scope violation	Terminate
Any irreversible action	Require approval

Agent Risk Tiers¶

Agents are typically HIGH or CRITICAL tier. LOW/MEDIUM agents are rare.

Agent Type	Typical Tier	Key Controls
Read-only research	HIGH	Scope limits, output review
Internal automation	HIGH	Action allowlist, circuit breakers
Customer-facing	CRITICAL	Full approval workflow
Financial actions	CRITICAL	All controls, human approval

Judge for Agents¶

Agent interactions need deeper evaluation.

Additional Criteria	Question
Goal alignment	Did agent pursue stated goal?
Action appropriateness	Were actions proportionate?
Scope adherence	Did agent stay in bounds?
Reasoning quality	Was the reasoning sound?
Efficiency	Did agent take unnecessary steps?

Monitoring¶

Signal	Concern
Action volume spike	Runaway agent
Error rate increase	Agent confused or attacking
Novel action patterns	Unexpected behaviour
Scope boundary probes	Attempted breakout
Cost anomalies	Resource abuse

Recovery and Rollback¶

When integrity is compromised, you need to undo the damage.

Capability	Purpose
Action logging	Full audit trail of what agent did (not just said)
Reversibility windows	Delay irreversible actions to allow intervention
Automated rollback	Undo actions when integrity breach detected
Blast radius tracking	Know exactly what was affected

Not all actions are reversible. For those that aren't, require human approval.

Key Takeaways¶

Solve both problems — Access control AND integrity preservation
Enforce via infrastructure — Agents can ignore instructions
Validate every action — Independent of agent reasoning
Sanitise tool outputs — They're injection vectors
Use circuit breakers — Hard stops that can't be reasoned around
Require approval for impact — Irreversible actions need humans
Enable rollback — Assume integrity will sometimes fail
Monitor aggressively — Agents can cause harm fast

AI Runtime Behaviour Security, 2026 (Jonathan Gill).