AI Security Control Families¶

This document defines the control families for AI systems, organised by function and timing.

Control Model Overview¶

Control Layers

AI security controls operate across three layers:

Layer	Function	Timing	Can Block?
Guardrails	Block known-bad patterns	Inline, real-time	Yes
LLM-as-Judge	Detect issues, surface findings	Async, after-the-fact	No
Human Oversight	Review, decide, act	As needed	Yes

Key principle: Guardrails prevent. Judge detects. Humans decide.

Control Family Index¶

ID	Family	Purpose
AI.1	Governance	Policies, roles, accountability
AI.2	Risk Management	Classification, assessment, monitoring
AI.3	Inventory & Documentation	Registration, documentation, lineage
AI.4	Development Security	Secure development, testing, deployment
AI.5	Data Governance	Data quality, privacy, protection
AI.6	Model Security	Model protection, validation, monitoring
AI.7	Runtime Controls — Guardrails	Inline input/output validation
AI.8	Runtime Controls — LLM-as-Judge	Async assurance and monitoring
AI.9	Human Oversight	HITL, escalation, accountability
AI.10	Agentic Controls	Agent-specific safeguards
AI.11	Logging & Monitoring	Observability, alerting, audit
AI.12	Incident Response	Detection, response, recovery
AI.13	AI Supplier Management	Vendor assessment, agreements, monitoring
AI.14	AI Security Awareness	Training for AI-specific risks
AI.15	AI System Continuity	BCP for AI systems
AI.16	AI Intellectual Property	Model and data IP protection

ISO 27001 Alignment: See ISO 27001 Alignment for detailed mapping.

AI.1 Governance¶

AI.1.1 AI Policy Framework¶

Requirement: Establish policies governing AI development, deployment, and use.

Implementation: - Acceptable use policy for AI systems - AI ethics principles - Roles and responsibilities - Approval workflows by risk tier

Evidence: Policy documents, approval records

AI.1.2 Governance Structure¶

Requirement: Define governance bodies and decision rights for AI.

Implementation: - AI Governance Committee (CRITICAL tier approvals) - Risk and security sign-off (HIGH tier) - Business owner accountability - Clear escalation paths

Evidence: Committee charters, meeting minutes, approval records

AI.1.3 Accountability¶

Requirement: Assign clear accountability for AI system outcomes.

Implementation: - Named owner for each AI system - Accountability cannot be delegated to AI - Human decision-maker for consequential actions - Documented responsibility matrix

Evidence: RACI matrices, system ownership records

AI.2 Risk Management¶

AI.2.1 Risk Classification¶

Requirement: Classify all AI systems by risk tier.

Tiers:

Tier	Criteria	Examples
CRITICAL	Affects credit, employment, legal rights; high regulatory exposure	Credit decisions, hiring systems
HIGH	Customer-facing; accesses sensitive data; significant business impact	Customer service AI, document processing
MEDIUM	Internal use; limited sensitive data; moderate impact	Internal assistants, meeting summarisers
LOW	Experimental; no production data; minimal impact	POCs, sandboxes

Evidence: Classification assessments, approval records

AI.2.2 Risk Assessment¶

Requirement: Assess risks before deployment and periodically thereafter.

Assessment factors: - Decision impact - Data sensitivity - User population - Autonomy level - Regulatory scope - Reputational risk

Evidence: Risk assessment documents, review records

AI.2.3 Ongoing Risk Monitoring¶

Requirement: Continuously monitor AI systems for emerging risks.

Implementation: - Judge-based quality monitoring (async) - Statistical drift detection - Bias monitoring (where applicable) - Threat landscape monitoring

Evidence: Monitoring dashboards, trend reports

AI.3 Inventory & Documentation¶

AI.3.1 AI System Inventory¶

Requirement: Maintain a complete inventory of all AI systems.

Required fields: - System name and description - Risk tier - Owner - Technology stack - Data sources - Deployment status - Last review date

Evidence: Inventory records

AI.3.2 System Documentation¶

Requirement: Document AI systems proportionate to risk tier.

Tier	Documentation Required
CRITICAL	Full model documentation (SR 11-7 compliant), FRIA, validation reports
HIGH	System architecture, data flows, control documentation
MEDIUM	Basic architecture, risk assessment
LOW	Registration in inventory

Evidence: Documentation packages

AI.3.3 Data Lineage¶

Requirement: Document data sources, flows, and transformations.

Implementation: - Training data sources - Runtime data inputs - Data flow diagrams - Retention and deletion

Evidence: Data flow documentation

AI.3.4 Explainability Requirements¶

Requirement: Define and document the level of explainability required for each AI system, proportionate to risk tier.

AI models are inherently opaque — billions of parameters with no traceable decision logic. Explainability methods (attention maps, SHAP, etc.) are approximations. The required level of explainability must be defined per system and may constrain which models can be used.

Explainability tiers:

Tier	Requirement	Approach
CRITICAL	Full decision audit trail; human must be able to articulate reasoning	Constrained models, rule-augmented AI, mandatory human reasoning documentation
HIGH	Key factors identified; output rationale documented	Feature importance, source citation, Judge evaluation of reasoning
MEDIUM	General approach documented; exceptions explainable	System-level documentation, output with source references
LOW	System purpose and approach documented	Standard documentation

Documentation per system: - What can be explained (system architecture, data sources, general approach) - What cannot be explained (individual model decisions, parameter interactions) - What methods are used (SHAP, attention, source citation, Judge analysis) - What compensating controls exist (HITL review, Judge evaluation, output validation)

Regulatory alignment: - GDPR Article 22: Right to explanation for automated decisions - EU AI Act Article 13: Transparency requirements - SR 11-7: Model risk management, model validation - FCA/PRA: Consumer Duty, outcomes monitoring

Evidence: Explainability assessments per system, methodology documentation

AI.4 Development Security¶

AI.4.1 Secure Development¶

Requirement: Apply secure development practices to AI systems.

Implementation: - Secure coding standards - Code review requirements - Dependency management - Secret management

Evidence: Code review records, security scan results

AI.4.2 Testing¶

Requirement: Test AI systems for security and quality before deployment.

Testing types:

Type	Purpose	When
Functional	Correct behaviour	Pre-deployment
Security	Vulnerability identification	Pre-deployment
Adversarial	Robustness against attacks	Pre-deployment, periodic
Bias	Fairness across protected characteristics	Pre-deployment, periodic
Regression	Detect degradation	Ongoing
Statistical	Validate output distributions (not exact outputs)	Pre-deployment, ongoing
Semantic	Test for meaning-based evasion of controls	Pre-deployment, periodic

Non-determinism requirement: AI systems are probabilistic. Testing must evaluate output distributions and acceptable ranges, not exact expected outputs. Run each test case multiple times and validate that outputs fall within acceptance criteria.

Tier	Statistical test runs per case	Acceptance threshold
CRITICAL	≥50	99% within criteria
HIGH	≥20	95% within criteria
MEDIUM	≥10	90% within criteria
LOW	≥5	85% within criteria

Evidence: Test results, coverage reports, statistical analysis, adversarial test outcomes

AI.4.3 Pre-Deployment Review¶

Requirement: Security review before production deployment.

Tier	Review Required
CRITICAL	Independent security review, governance committee approval
HIGH	Security team review, risk sign-off
MEDIUM	Streamlined security review
LOW	Self-assessment

Evidence: Review reports, approval records

AI.5 Data Governance¶

AI.5.1 Data Classification¶

Requirement: Classify data used by AI systems.

Implementation: - Training data classification - Runtime input classification - Output classification - Apply handling rules based on classification

Evidence: Data classification records

AI.5.2 Data Quality¶

Requirement: Ensure data quality for AI systems.

Implementation: - Training data validation - Input data validation (guardrails) - Knowledge base quality (RAG systems) - Data freshness monitoring

Evidence: Quality metrics, validation records

AI.5.3 Privacy Protection¶

Requirement: Protect personal data in AI systems.

Implementation: - Data minimisation - Purpose limitation - PII detection and handling (guardrails) - Privacy impact assessments

Evidence: PIAs, data handling records

AI.5.4 RAG Content Integrity¶

Requirement: Validate and protect the integrity of content retrieved for AI context.

Retrieved content (RAG) is a primary attack vector. Poisoned knowledge base content can hijack model behaviour without triggering input guardrails, because the malicious content enters through the data path, not the user input path.

Implementation:

Control	Purpose
Content validation	Validate retrieved content hasn't been tampered with (checksums, signatures)
Content sanitisation	Strip potential injection payloads from retrieved content before inclusion in context
Source authentication	Verify the source of retrieved content
Freshness monitoring	Alert when knowledge base content exceeds staleness thresholds
Modification tracking	Log all changes to knowledge base content with who, what, when
Anomaly detection	Flag when retrieved content distribution shifts unexpectedly
Access control	Restrict who can modify knowledge base content

Freshness thresholds by tier:

Tier	Max staleness	Alert
CRITICAL	1 hour	Immediate
HIGH	24 hours	Within 1 hour
MEDIUM	7 days	Daily
LOW	30 days	Weekly

Evidence: Content integrity logs, freshness monitoring records, knowledge base change logs

AI.6.1 Model Protection¶

Requirement: Protect AI models from theft and tampering.

Implementation: - Access controls on model artifacts - Secure model storage - Model versioning - Integrity verification

Evidence: Access logs, integrity checks

AI.6.2 Model Validation¶

Requirement: Validate model behaviour before and during deployment.

Tier	Validation Required
CRITICAL	Independent validation (SR 11-7), annual revalidation
HIGH	Internal validation, periodic review
MEDIUM	Functional testing
LOW	Basic testing

Bias and fairness testing: All models making decisions affecting individuals must be tested for discriminatory outputs across protected characteristics (age, gender, ethnicity, disability, etc.) before deployment and periodically in production.

Continuous validation: Model validation is never complete. Validation must be ongoing using statistical methods to detect performance degradation, distributional shift, and emergent bias.

Validation Type	Frequency
Pre-deployment validation	Before each deployment
Periodic revalidation	Quarterly (CRITICAL/HIGH), biannually (MEDIUM/LOW)
Post-upgrade validation	After every model version change
Bias audit	Annually (CRITICAL/HIGH), biannually (MEDIUM/LOW)

Evidence: Validation reports, bias test results, continuous validation metrics

AI.6.3 Model Monitoring¶

Requirement: Monitor model performance and behaviour in production.

Implementation: - Performance metrics (accuracy, latency, throughput) - Drift detection (input distribution, output distribution) - Judge-based quality assurance (async) - Anomaly detection - Gradual degradation detection (trend analysis, not just threshold alerts) - Capability monitoring (track what the model is doing, not just how well)

Invisible degradation: AI systems can degrade silently — output quality drops with no error signal. Monitoring must include trend analysis to catch gradual decline, not just sudden failures.

Metric Type	What It Catches
Threshold alerts	Sudden failures, outages
Trend analysis	Gradual quality decline over days/weeks
Baseline comparison	Drift from validated behaviour
Distribution monitoring	Shift in output patterns

Evidence: Monitoring dashboards, alerts, trend reports

AI.6.4 Model Capability Assessment¶

Requirement: Assess model capabilities before deployment, and reassess when models are upgraded or changed.

AI models can develop emergent capabilities that weren't explicitly programmed. A new model version may have capabilities — beneficial or dangerous — that the previous version lacked. Controls designed for the old model may be insufficient for the new one.

Assessment triggers:

Trigger	Action
New model deployment	Full capability assessment
Model version upgrade	Delta assessment (what changed?)
Provider announces new capabilities	Evaluate relevance and risk
Anomalous behaviour detected	Investigate for unknown capabilities

Assessment scope:

Dimension	What to test
Intended capabilities	Does the model do what we need?
Unintended capabilities	Can the model do things we don't want? (code execution, data extraction, tool misuse)
Capability boundaries	Where does the model exceed or fall short of the previous version?
Risk profile change	Does the new capability change the risk tier of the use case?

Evidence: Capability assessment reports, risk reclassification records

AI.6.5 Baseline Comparison¶

Requirement: Maintain and periodically test against a baseline set of known-good inputs and outputs.

Invisible degradation — where AI quality drops with no error signal — is a novel risk. Baseline comparison is the primary detection method.

Implementation:

Component	Purpose
Baseline dataset	Curated set of inputs with known-good outputs, covering key scenarios
Periodic testing	Run baseline dataset against production system on schedule
Comparison analysis	Compare current outputs to baseline outputs using defined criteria
Drift alerting	Alert when baseline comparison scores fall below threshold

Testing frequency:

Tier	Frequency
CRITICAL	Daily
HIGH	Weekly
MEDIUM	Fortnightly
LOW	Monthly

Evidence: Baseline datasets, comparison results, drift alerts

AI.7 Runtime Controls — Guardrails¶

Guardrails are inline controls that operate in real-time on inputs and outputs.

AI.7.1 Input Guardrails¶

Requirement: Validate and filter inputs before AI processing.

Implementation:

Check	Purpose	Method
Length limits	Prevent resource abuse	Rules
Format validation	Ensure valid input structure	Rules
Injection detection	Block prompt injection	Patterns, classifiers
Semantic intent analysis	Detect meaning-based evasion	ML classifiers
Scope enforcement	Keep requests in bounds	Patterns, classifiers
Rate limiting	Prevent abuse	Rules
Retrieved content filtering	Sanitise RAG content before inclusion in context	Patterns, classifiers

Limitation acknowledged: Pattern-based and classifier-based guardrails reduce but cannot eliminate prompt injection. Instructions and data share the same channel (the context window) and there is no complete technical solution. Defence-in-depth is the only viable strategy.

Semantic attacks: Attackers exploit meaning, not syntax. Keyword filters miss rephrased harmful requests. Input guardrails should include semantic intent analysis where feasible, but the Judge (AI.8) is better positioned for deep semantic analysis due to the latency budget.

RAG content filtering: Retrieved context is an injection vector. Apply input guardrail checks to retrieved content, not just user input.

Performance requirement: <50ms latency budget

Evidence: Guardrail configuration, block logs, false positive rates, semantic classifier metrics

AI.7.2 Output Guardrails¶

Requirement: Filter outputs before delivery to users or downstream systems.

Implementation:

Check	Purpose	Method
PII detection	Prevent data leakage	Patterns, NER
Content filtering	Block policy violations	Patterns, classifiers
Format validation	Ensure valid output structure	Rules
Cross-reference check	Prevent cross-user leakage	Data lookups
Factual grounding check	Verify claims against retrieved source data	Comparison logic
Uncertainty markers	Inject appropriate hedging for low-confidence outputs	Rules, classifiers

Grounding verification: For CRITICAL and HIGH tier systems, output guardrails should cross-reference AI claims against the source data that was retrieved. Unsupported claims should be flagged or blocked.

Uncertainty markers: For high-risk use cases, outputs should include appropriate hedging ("Based on available data..." rather than presenting as absolute fact). The AI must be able to say "I don't know" rather than fabricate.

Performance requirement: <50ms latency budget

Evidence: Guardrail configuration, block logs, false positive rates, grounding check results

AI.7.3 Guardrail Maintenance¶

Requirement: Maintain and improve guardrails over time.

Implementation: - Regular pattern updates based on new threats - False positive monitoring and tuning - Feedback loop from Judge findings - Adversarial testing (including semantic/meaning-based evasion, not just known patterns) - Periodic effectiveness verification (don't assume guardrails still work — test them)

Guardrail effectiveness testing: Guardrails degrade over time as attackers adapt. Periodic red-team testing must include semantic evasion techniques — rephrased requests, multi-turn manipulation, and context-based attacks.

Tier	Adversarial testing frequency
CRITICAL	Monthly
HIGH	Quarterly
MEDIUM	Biannually
LOW	Annually

Evidence: Update logs, tuning records, test results, effectiveness test reports

AI.7.4 Context Isolation¶

Requirement: Prevent cross-user and cross-session context contamination.

In multi-user AI systems, information from one user's session must not leak into another user's session. Shared context, cached responses, or persistent model memory can create cross-user data leakage.

Implementation:

Control	Purpose
Stateless sessions	Each session starts with clean context; no carry-over between users
Session boundary enforcement	Hard isolation between user sessions at infrastructure level
No shared memory	Disable any persistent memory or context sharing between users
Cache isolation	If response caching is used, scope caches to individual users
Context window clearing	Ensure context window is fully cleared between sessions
Multi-tenant isolation	In SaaS deployments, isolate between organisational tenants

Tier requirements:

Tier	Isolation Level
CRITICAL	Dedicated model instances per user/session; no shared infrastructure
HIGH	Strict session isolation; no caching across users
MEDIUM	Session isolation; user-scoped caching permitted
LOW	Standard session management

Evidence: Isolation architecture documentation, session management configuration, penetration test results

AI.8 Runtime Controls — LLM-as-Judge¶

The Judge is an async assurance mechanism that evaluates AI interactions after the fact.

AI.8.1 Judge Evaluation¶

Requirement: Evaluate AI interactions for quality, policy compliance, and issues.

Evaluation areas:

Area	What It Assesses
Quality	Accuracy, helpfulness, appropriateness
Policy compliance	Adherence to system rules and constraints
Conduct risk	Potential for customer or business harm
Anomalies	Unusual patterns suggesting attacks or failures
Bias indicators	Potential unfair treatment (where applicable)
Hallucination detection	Unsupported claims — compare output against retrieved context
Instruction override detection	Signs that the model followed injected instructions rather than system prompt
Confidence calibration	Cases where model expresses high confidence on topics where it's likely unreliable

Hallucination detection: Judge compares AI output against the source data that was retrieved. Claims not supported by retrieved context should be flagged. This is the primary async defence against hallucination.

Instruction override detection: Judge evaluates whether the model's behaviour in an interaction is consistent with its system prompt. Behavioural anomalies — sudden topic changes, policy deviations, unusual output formats — may indicate the model followed injected instructions.

Criteria-based evaluation: Because AI is non-deterministic, Judge evaluates outputs against acceptance criteria, not expected exact outputs. "Was this response helpful, accurate, and within policy?" — not "Did this response match the expected answer?"

Evidence: Judge evaluation logs, finding summaries, hallucination detection rates, override detection rates

AI.8.2 Sampling Strategy¶

Requirement: Sample interactions for Judge evaluation based on risk tier.

Tier	Sampling Rate	Rationale
CRITICAL	100%	Full audit trail required
HIGH	20-50%	Statistically significant coverage
MEDIUM	5-10%	Trend detection
LOW	1-5% or triggered	Spot checks

Additional triggers for 100% evaluation: - Guardrail near-misses - Customer complaints - Unusual patterns - New feature areas - Post model upgrade (first 48 hours) - Baseline comparison drift detected

Baseline integration: Sampling should include periodic baseline queries (known-good inputs with expected outcomes) to detect invisible degradation. If baseline comparison shows drift, temporarily increase sampling to 100% until root cause identified.

Evidence: Sampling configuration, coverage metrics, baseline comparison results

AI.8.3 Finding Management¶

Requirement: Route Judge findings appropriately for human review.

Routing:

Finding Severity	Routing	SLA
Critical (bias, data leakage)	Immediate escalation	1 hour
High (policy violation, quality failure)	Priority queue	24 hours
Medium (minor issues)	Standard review	1 week
Low (observations)	Batch review	Monthly

Evidence: Finding logs, routing records, SLA compliance

Note: These are Judge finding management SLAs — the time to triage and route findings from automated evaluation. They are distinct from incident response SLAs in the AI Incident Playbook, which govern response to confirmed security incidents.

AI.8.4 Judge Governance¶

Requirement: Govern the Judge as an AI system subject to controls.

Implementation: - Validate Judge accuracy - Test Judge against known cases - Monitor Judge for drift - Human oversight of Judge findings

Evidence: Judge validation records, accuracy metrics

AI.8.5 Confidence Calibration¶

Requirement: Detect and flag cases where AI expresses inappropriate confidence.

AI presents every output with equal confidence — correct or incorrect. Users cannot distinguish between a confident correct answer and a confident wrong answer. This leads to over-reliance, automation bias, and cascading errors when confident-but-wrong outputs feed downstream systems.

Implementation:

Control	Purpose
Topic confidence mapping	Identify topics/domains where the AI is reliably accurate vs. unreliable
Uncertainty injection	For known-unreliable domains, inject hedging language into outputs
Source citation	Require AI to cite sources; flag outputs with no supporting source
Multi-model cross-check	For CRITICAL decisions, compare outputs from multiple models; flag disagreements
Confidence scoring	Where model provides confidence scores, calibrate and surface to users

Judge integration: Judge should flag cases where: - AI makes definitive claims on topics outside its reliable domain - AI provides specific numbers or dates without source data - AI contradicts information in its retrieved context - AI's output would be treated as authoritative by the downstream consumer

Evidence: Confidence calibration records, uncertainty injection logs, cross-check results

AI.9 Human Oversight¶

AI.9.1 Human-in-the-Loop¶

Requirement: Maintain human oversight proportionate to risk.

Tier	HITL Requirement
CRITICAL	Human decides all consequential actions
HIGH	Human reviews all Judge escalations; sampling of routine
MEDIUM	Periodic batch review; escalation path
LOW	Spot checks; standard IT escalation

Automation bias mitigation: HITL reviewers must be trained to challenge AI outputs, not just confirm them. Humans tend to defer to AI even when their own judgement is better (automation bias) and anchor on the first AI recommendation (anchoring bias).

Design requirements for HITL interfaces: - Present relevant source data alongside AI output so reviewers can verify - For CRITICAL decisions, require reviewer to form independent judgement before seeing AI recommendation - Randomise presentation order where possible to reduce anchoring - Include clear "I disagree" pathways with no friction penalty

Evidence: Review records, decision logs, reviewer training records

AI.9.2 Escalation Procedures¶

Requirement: Define clear escalation paths for AI issues.

Implementation: - Escalation triggers defined - Escalation paths documented - Escalation SLAs established - On-call coverage (for HIGH/CRITICAL) - Escalation trigger when HITL reviewers consistently agree with AI (may indicate rubber-stamping)

Evidence: Escalation procedures, escalation logs

AI.9.3 Human Override¶

Requirement: Humans can override AI recommendations.

Implementation: - Override capability in all workflows - Override reasoning documented - Override patterns monitored - No penalty for appropriate overrides

Evidence: Override logs, pattern analysis

AI.9.4 Accountability¶

Requirement: Humans remain accountable for outcomes.

Implementation: - AI is advisory; humans decide - Decision authority clearly assigned - Audit trail of who decided what - No "the AI did it" defence - AI recommendation does not transfer accountability to the system

Evidence: Decision logs with human attribution

AI.9.5 HITL Effectiveness Measurement¶

Requirement: Measure whether human oversight is genuinely effective, not just present.

Human oversight is a known failure mode in every industry that uses it (aviation, nuclear, financial services). Simply having a human "in the loop" does not guarantee effective oversight. Measure to verify.

Metrics:

Metric	What It Indicates	Concern Trigger
Override rate	How often reviewers disagree with AI	Very low rate may indicate automation bias, not AI perfection
Decision time	How long reviewers spend per review	Very fast times suggest rubber-stamping
Finding detection rate	How often reviewers catch known-bad items	Low rate indicates ineffective review
Inter-reviewer agreement	Whether different reviewers reach same conclusions	Low agreement suggests unclear criteria
Canary detection rate	How often reviewers catch deliberately inserted test cases	Direct measure of attention

Canary reviews: Periodically inject known findings (canary cases) into the HITL review queue. If reviewers don't catch them, the process is not working.

Tier	Canary frequency	Expected detection
CRITICAL	Weekly	100%
HIGH	Monthly	95%
MEDIUM	Quarterly	90%
LOW	Biannually	80%

Evidence: HITL effectiveness metrics, canary detection results, reviewer performance data

AI.10 Agentic Controls¶

Additional controls for autonomous AI agents (systems that take actions, not just generate content).

See Agentic Controls for comprehensive coverage.

Control ID note: Agentic controls use two complementary schemes. AG.x (AG.1–AG.4) provides structural decomposition by phase (planning, execution, assurance, multi-agent). AI.10.x provides implementation control IDs within the main control family numbering. See the control selection guide for the mapping: AI.10.1–10.6 implement AG.1–AG.4.

Agentic AI requires controls at three phases:

Phase	Controls
Planning	Plan disclosure, plan guardrails, plan approval
Execution	Action guardrails, circuit breakers, scope enforcement
Assurance	Trajectory logging, trajectory evaluation, HITL review

AG.1 Plan-Level Controls¶

Control	Purpose
AG.1.1 Plan disclosure	Agent discloses intended actions before execution
AG.1.2 Plan guardrails	Validate plans against policy
AG.1.3 Plan approval	Human approves plans above threshold

AG.2 Execution-Level Controls¶

Control	Purpose
AG.2.1 Action guardrails	Validate each action at runtime
AG.2.2 Circuit breakers	Hard limits that halt execution
AG.2.3 Scope enforcement	Enforce boundaries on access and actions
AG.2.4 Tool controls	Govern which tools agents can use
AG.2.5 Tool protocol security	Secure MCP, function calling, etc.

AG.3 Assurance-Level Controls¶

Control	Purpose
AG.3.1 Trajectory logging	Log complete execution path
AG.3.2 Trajectory evaluation	Judge evaluates full trajectory
AG.3.3 HITL for agentic	Human oversight at plan, execution, and review stages

AG.4 Multi-Agent Controls¶

Control	Purpose
AG.4.1 Agent inventory	Track all agents and relationships
AG.4.2 Orchestration controls	Govern delegation between agents
AG.4.3 Trace correlation	End-to-end trace across agents

AI.10.1 Scope Boundaries¶

Requirement: Define and enforce what agents can and cannot do.

Implementation: - Explicit action allowlist - Parameter constraints on actions - Scope enforcement in code - Boundary monitoring

Evidence: Scope definitions, boundary violation logs

AI.10.2 Approval Workflows¶

Requirement: Require human approval for high-impact agent actions.

Implementation: - Define which actions require approval - Implement approval workflows - Timeout if approval not received - Audit trail of approvals

Evidence: Approval workflow configuration, approval logs

AI.10.3 Action Logging¶

Requirement: Log all agent actions comprehensively.

Log content: - Action requested - Parameters - Context/reasoning - Outcome - Timestamp - Correlation ID

Evidence: Action logs

AI.10.4 Checkpoints¶

Requirement: Validate intermediate results in multi-step agent workflows.

Implementation: - Define checkpoint locations - Validation criteria at each checkpoint - Halt on validation failure - Human review option at checkpoints

Evidence: Checkpoint configuration, validation logs

AI.10.5 Rollback Capability¶

Requirement: Ability to undo agent actions where possible.

Implementation: - Identify reversible vs irreversible actions - Implement rollback for reversible actions - Extra scrutiny for irreversible actions - Rollback testing

Evidence: Rollback capability documentation, test records

AI.10.6 Outcome Validation¶

Requirement: After an agent completes a task, independently validate that the outcome matches the intended goal and has no unintended side effects.

Agentic AI pursues goals across multiple steps, choosing its own actions. Validating individual actions (AG.2.1) is necessary but insufficient — an agent can take a series of individually valid actions that produce an unintended aggregate outcome.

Implementation:

Control	Purpose
Goal-outcome comparison	Compare completed task outcome against the original goal/instruction
Side effect detection	Check for unintended changes to systems, data, or state
Boundary verification	Confirm agent stayed within its authorised scope
Resource accounting	Verify resources consumed are within expected bounds
Downstream impact check	Assess impact on systems that depend on modified data/state

Validation by tier:

Tier	Validation
CRITICAL	Automated outcome validation + human verification before results are committed
HIGH	Automated outcome validation; human review of exceptions
MEDIUM	Automated validation; spot-check human review
LOW	Automated validation

Evidence: Outcome validation logs, exception reports, human verification records

AI.11 Logging & Monitoring¶

AI.11.1 Comprehensive Logging¶

Requirement: Log AI interactions for audit, investigation, and improvement.

Log content by tier:

Tier	Logging Requirement
CRITICAL	Full content, all metadata, tamper-evident, 7-year retention
HIGH	Full content, all metadata, 3-year retention
MEDIUM	Metadata, sampled content, 1-year retention
LOW	Basic metadata, 90-day retention

Full context capture: Because AI is non-deterministic, reproducing an interaction requires capturing the complete context. Logs must include:

Field	Purpose
Model version and provider	Know exactly which model produced the output
Temperature and parameters	Reproduce generation conditions
System prompt version	Know which instructions the model was following
Retrieved context (RAG)	Know what data the model had access to
User identity	Know who initiated the interaction
Timestamp	Know when the interaction occurred
Guardrail results	Know what was filtered or flagged
Full input and output	The actual interaction content

Without full context capture, incident investigation is impossible — you cannot determine why the model produced a specific output.

Evidence: Log samples, retention compliance, context capture verification

AI.11.2 Real-Time Monitoring¶

Requirement: Monitor AI systems for operational and security issues.

Metrics:

Category	Metrics
Operational	Latency, throughput, error rate, availability
Security	Block rate, escalation rate, anomaly indicators
Quality	Judge scores, HITL findings, customer feedback
Cost	Inference spend, HITL hours

Evidence: Monitoring dashboards

AI.11.3 Alerting¶

Requirement: Alert on significant events and threshold breaches.

Alert categories:

Category	Examples	Response
Security	Injection spike, data leakage	Immediate
Quality	Judge escalation spike, quality drop	Same day
Operational	Latency increase, error spike	Per SLA
Cost	Budget threshold breach	Same day

Evidence: Alert configuration, alert logs

AI.12 Incident Response¶

AI.12.1 AI-Specific Playbooks¶

Requirement: Develop incident response playbooks for AI-specific scenarios.

Playbook scenarios: - Prompt injection campaign - Data leakage detection - Bias/fair lending alert - Model manipulation suspected - Judge failure - Agent runaway

Evidence: Playbooks, tabletop exercise records

AI.12.2 Investigation Capability¶

Requirement: Ability to investigate AI incidents effectively.

Implementation: - Access to logs and Judge evaluations - Ability to replay conversations - Root cause analysis methodology - Forensic preservation procedures

Evidence: Investigation reports

AI.12.3 Remediation¶

Requirement: Remediate issues and prevent recurrence.

Implementation: - Immediate containment options - Guardrail updates - Judge updates - Process improvements - Customer remediation (if harmed)

Evidence: Remediation records

AI.12.4 Notification¶

Requirement: Notify stakeholders and regulators as required.

Implementation: - Internal notification matrix - Regulatory notification triggers - Customer notification criteria - Communication templates

Evidence: Notification records

AI.13 AI Supplier Management¶

See ISO 27001 Alignment for detailed requirements.

AI.13.1 AI Vendor Assessment¶

Requirement: Assess AI vendors and foundation model providers for security.

Implementation: - Security questionnaire for AI vendors - Review of certifications (SOC 2, ISO 27001) - Assessment of data handling practices - Understanding of model provenance - Training data practices assessment (what data was used, how was bias mitigated, what content filtering was applied) - Data retention policy (does the provider retain your data? For how long? For what purpose?) - Model update notification (how does the provider communicate changes to model behaviour?)

Evidence: Vendor assessment records, training data practice assessments

AI.13.2 AI Vendor Agreements¶

Requirement: Include AI-specific terms in vendor agreements.

Key terms: - Data processing and residency - Model use restrictions (training on your data) - Security requirements - Incident notification - Audit rights - Zero-retention options for sensitive data - Model deprecation notice periods - Behavioural change notification requirements

Evidence: Contract terms

AI.13.3 Model Provenance¶

Requirement: Document provenance of AI models used.

Documentation: - Model identity and version - Known training data sources (where disclosed) - Known limitations and biases - License terms - Training data lineage where available; documented gap and compensating controls where unavailable

Evidence: Model documentation, provenance gap analysis

AI.13.4 Training Data Risk Assessment¶

Requirement: Assess the risks associated with foundation model training data for each use case.

The behaviour of AI systems is shaped by training data you don't control and likely can't fully audit. Training data risks include inherited bias, embedded misinformation, copyright issues, and cultural assumptions.

Assessment per model per use case:

Factor	Assessment
Bias risk	Could training data bias affect this use case? (e.g., lending, hiring)
Misinformation risk	Could incorrect training data lead to harmful outputs in this domain?
Copyright risk	Could the model reproduce copyrighted content relevant to this use case?
Cultural risk	Is the use case sensitive to cultural context the training data may not represent?
Recency risk	Does the use case require current information the training data may lack?

Decision framework:

Risk Level	Action
Training data risk is low for this use case	Accept — document rationale
Training data risk is moderate	Mitigate — RAG grounding, output validation, bias testing
Training data risk is high	Avoid — use a different model, fine-tune on curated data, or don't use AI for this use case

Evidence: Training data risk assessments per model per use case

AI.14 AI Security Awareness¶

AI.14.1 AI Security Training¶

Requirement: Train relevant personnel on AI security risks.

Training by audience:

Audience	Content
All staff	AI acceptable use, recognising AI outputs, confidence-competence gap ("The AI sounds sure — that doesn't mean it's right")
AI developers	Secure AI development, prompt injection, adversarial testing
AI operators	Guardrails, HITL processes
HITL reviewers	Cognitive bias training (automation bias, anchoring bias, authority bias), how to challenge AI outputs, canary exercise participation
Security team	AI threat landscape, monitoring, novel AI risks
Executives	AI risk literacy, accountability for AI decisions

HITL-specific training: Automation bias — the tendency to defer to AI even when human judgement is better — is the primary failure mode of human oversight. HITL reviewers must be specifically trained to recognise and counter this bias.

Evidence: Training records, cognitive bias assessment results

AI.15 AI System Continuity¶

AI.15.1 AI Continuity Planning¶

Requirement: Include AI systems in business continuity planning.

Implementation: - AI system criticality classification - Fallback procedures when AI unavailable - Recovery time objectives for AI systems - Vendor dependency planning

Evidence: BCP documentation

AI.15.2 AI System Resilience¶

Requirement: Design AI systems for resilience.

Implementation: - Graceful degradation - Fallback models - Circuit breakers (see AG.2.2) - Timeout handling

Evidence: Architecture documentation

AI.16 AI Intellectual Property¶

AI.16.1 Model IP Protection¶

Requirement: Protect intellectual property in AI models.

Implementation: - Access controls for custom models - Encryption of model weights - Protection of system prompts - Licensing for model use

Evidence: IP inventory

AI.16.2 Third-Party IP Compliance¶

Requirement: Ensure AI use complies with third-party IP rights.

Implementation: - Foundation model license compliance - Training data rights verification - Guardrails for copyright compliance

Evidence: License compliance records

Control Selection by Risk Tier¶

Summary Matrix¶

Control Family	CRITICAL	HIGH	MEDIUM	LOW
AI.1 Governance	Full	Full	Standard	Basic
AI.2 Risk Management	Full	Full	Standard	Basic
AI.3 Inventory & Documentation	Full	Full	Standard	Registration
AI.4 Development Security	Full	Full	Standard	Basic
AI.5 Data Governance	Full	Full	Standard	Basic
AI.6 Model Security	Full	Full	Standard	Basic
AI.7 Guardrails	Full	Full	Standard	Basic
AI.8 LLM-as-Judge	100%	20-50%	5-10%	Optional
AI.9 Human Oversight	All decisions	Escalations + sampling	Periodic	Spot checks
AI.10 Agentic Controls	Full (if applicable)	Full	Standard	Basic
AI.11 Logging & Monitoring	Full	Full	Standard	Basic
AI.12 Incident Response	Full	Full	Standard IT process	Standard IT process

Standards Mapping¶

Control Family	ISO 42001	NIST AI RMF	EU AI Act
AI.1 Governance	5.1, 5.2	GOVERN	Art. 9
AI.2 Risk Management	6.1	MAP, MEASURE	Art. 9
AI.3 Inventory	7.1	MAP	Art. 11
AI.7 Guardrails	8.2	MANAGE	Art. 9, 15
AI.8 Judge	8.2	MEASURE	Art. 9
AI.9 Human Oversight	8.4	GOVERN	Art. 14
AI.11 Logging	9.1	MEASURE	Art. 12
AI.12 Incident Response	10.1	MANAGE	Art. 9

AI Runtime Behaviour Security, 2026 (Jonathan Gill).