Controls to Three-Layer Mapping¶
Maps all 80 infrastructure controls to the three-layer behavioural security pattern: Guardrails → LLM-as-Judge → Human Oversight.
Part of the AI Security Infrastructure Controls framework. Companion to AI Runtime Behaviour Security.
How to Read This Mapping¶
Every infrastructure control supports all three layers of the behavioural security pattern. The table below summarises how each control contributes to each layer. Detailed descriptions are in the individual control documents.
The three layers operate as concentric defences:
- Guardrails prevent — they block or constrain before or during model execution.
- Judge detects — it evaluates outputs and behaviour against policy, asynchronously.
- Human Oversight decides — humans review, approve, investigate, and adjust.
Infrastructure controls make these layers enforceable. Without the infrastructure, the behavioural pattern is aspirational.
Identity & Access Management¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| IAM-01 Authenticate all entities | Authentication gates prevent unauthorised access to model endpoints and tools | Judge receives verified identity context for evaluation, ensuring attributable actions | Human reviewers can trace every action to an authenticated identity |
| IAM-02 Enforce least privilege | Permission boundaries limit what each entity can do — the primary access guardrail | Judge can detect actions that fall within permissions but outside expected behavioural patterns | Humans define and review permission sets; least privilege reduces scope of decisions requiring oversight |
| IAM-03 Separate control/data planes | Prevents runtime paths from modifying security configuration — structural guardrail | Judge configuration is protected from manipulation via the data plane | Humans access control plane via MFA/VPN; configuration changes are human-only |
| IAM-04 Constrain agent tool invocation | Gateway enforces tool manifest — the guardrail for all agent actions | Gateway logs feed Judge evaluation of agent tool usage patterns | Humans define manifests and review agent behaviour against declared scope |
| IAM-05 Human approval for high-impact actions | Approval routing pauses irreversible actions — guardrail that defers to humans | Judge evaluates the pattern of actions requiring approval and approval outcomes | Direct human involvement in high-impact decisions; the oversight layer in action |
| IAM-06 Session-scoped credentials | Credential scope limits blast radius of compromise — time-bounded guardrail | Judge monitors credential usage patterns within sessions for anomalies | Session scope gives humans a bounded window to review; credentials expire automatically |
| IAM-07 Prevent credential exposure in context | Out-of-band credential injection prevents extraction via prompt injection — fundamental guardrail | Judge monitors for credential-like patterns in model outputs as a detection signal | Humans design credential architecture; exposure incidents trigger human investigation |
| IAM-08 Audit all access changes | Audit trail deters unauthorised changes — indirect guardrail via accountability | Judge can correlate access changes with subsequent behavioural changes in the system | Humans review access change audit trails; changes are attributable and reviewable |
Logging & Observability¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| LOG-01 Log all model I/O | Logged I/O enables guardrail effectiveness measurement and tuning | Model I/O logs are the primary input for Judge evaluation of outputs | I/O logs give humans the raw data to review what the model actually said |
| LOG-02 Log guardrail decisions | Guardrail decision logs prove enforcement is active and measure block/pass rates | Judge uses guardrail decision logs to evaluate whether guardrails are functioning correctly | Humans review guardrail decision logs to tune rules and investigate bypasses |
| LOG-03 Log Judge evaluations | Guardrail tuning is informed by Judge evaluation trends (feedback loop) | Judge evaluation logs enable meta-evaluation — assessing whether the Judge itself is effective | Humans review Judge logs to calibrate thresholds and assess Judge accuracy |
| LOG-04 Log agent decision chains | Chain logs reveal when guardrails are being tested or probed by agent behaviour | Judge reconstructs full agent chains to evaluate whether multi-step behaviour was appropriate | Humans can forensically reconstruct what an agent did and why |
| LOG-05 Detect behavioural drift | Drift detection is a guardrail that triggers when model behaviour moves outside baseline bounds | Judge evaluation criteria can be updated based on drift detection signals | Drift alerts bring humans into the loop when automated systems detect change |
| LOG-06 Detect prompt injection | Injection detection is a direct guardrail — blocking or flagging injection attempts | Judge evaluates whether injection attempts correlate with unusual model outputs | Injection detection alerts trigger human investigation of targeted attacks |
| LOG-07 Protect log integrity | Log integrity ensures guardrail decision records cannot be tampered with | Judge relies on trustworthy logs — tampered logs produce unreliable evaluations | Humans need tamper-proof logs for audit, compliance, and investigation |
| LOG-08 Enforce retention policies | Retention ensures guardrail and evaluation data is available for the required period | Judge evaluation data is retained for trend analysis and meta-evaluation | Humans have historical data for audits, investigations, and compliance |
| LOG-09 Redact sensitive data in logs | PII redaction in logs prevents logging infrastructure from becoming a data leakage vector | Judge evaluates redacted logs — it does not need PII to assess behavioural patterns | Humans can review logs without exposure to unnecessary sensitive data |
| LOG-10 Correlate with enterprise SIEM | SIEM correlation connects AI guardrail events with broader security monitoring | Judge signals (evaluation failures, drift alerts) feed enterprise detection capabilities | Security operations teams gain visibility into AI-specific events alongside traditional alerts |
Network & Segmentation¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| NET-01 Define network zone architecture | Zone boundaries are structural guardrails — they physically enforce separation | Judge infrastructure is isolated in its own zone, protecting evaluation independence | Zone architecture is designed by humans and enforced by infrastructure, not instructions |
| NET-02 Prevent guardrail bypass at network level | Network-enforced guardrail bypass prevention — no path to the model that skips guardrails | Judge can verify from logs that all model interactions transited the guardrail path | Humans can audit network policy to confirm no bypass paths exist |
| NET-03 Isolate Judge evaluation infrastructure | Judge isolation ensures guardrails cannot influence Judge evaluation | Direct protection of Judge independence — separate zone, async data flow, no runtime influence | Humans can verify Judge isolation through network policy review |
| NET-04 Control agent egress destinations | Egress proxy is a network-level guardrail on agent external communications | Judge can evaluate egress patterns against expected behaviour | Humans define egress allowlists and review blocked egress attempts |
| NET-05 Separate ingestion from runtime | Ingestion isolation prevents poisoned data from reaching runtime directly — structural guardrail | Judge can monitor ingestion pipeline outputs independently from runtime | Humans control ingestion approval processes without runtime pressure |
| NET-06 Protect control plane network path | Control plane protection prevents unauthorised modification of guardrail and Judge configuration | Judge configuration is protected by control plane network restrictions | Only authorised humans (MFA + VPN) can modify system configuration |
| NET-07 Enforce API gateway as single entry | Single entry point ensures all guardrails are applied to every request | All model interactions transit a path that generates Judge-consumable logs | Gateway provides humans with a single point of monitoring and control |
| NET-08 Monitor cross-zone traffic | Cross-zone monitoring detects anomalous traffic patterns that may indicate guardrail circumvention | Judge can incorporate cross-zone traffic anomalies as evaluation signals | Anomalous traffic alerts trigger human investigation |
Data Protection¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| DAT-01 Classify data at AI boundaries | Classification drives guardrail rules — what data can enter or leave the model context | Judge can evaluate whether data classification policies are being followed in model I/O | Classification provides humans with context for data handling decisions |
| DAT-02 Enforce data minimisation | Minimisation is a guardrail that reduces the data available for exfiltration | Judge can evaluate whether model context contains more data than necessary for the task | Humans define minimisation policies based on data sensitivity and task requirements |
| DAT-03 Detect and redact PII | PII detection and redaction is a direct input/output guardrail | Judge can evaluate PII detection effectiveness by monitoring for PII in post-guardrail outputs | PII incidents trigger human review and guardrail tuning |
| DAT-04 Enforce access-controlled RAG | RAG access control is a guardrail ensuring users only retrieve documents they are authorised to see | Judge can detect patterns where RAG retrieval returns content inconsistent with user permissions | Humans define RAG access policies and review retrieval audit logs |
| DAT-05 Encrypt data at rest and in transit | Encryption is an infrastructure guardrail protecting data if other controls fail | Judge evaluation data is encrypted, protecting evaluation integrity | Encryption provides humans with confidence that data is protected across all states |
| DAT-06 Prevent sensitive data leakage via responses | Output scanning is a direct output guardrail — blocking responses containing sensitive data | Judge evaluates response patterns for data leakage that may not match known patterns | Leakage incidents trigger human investigation and guardrail rule updates |
| DAT-07 Manage conversation history retention | Retention limits are a guardrail on context window accumulation — old data is purged | Judge can evaluate conversation history scope relative to task requirements | Humans define retention policies balancing utility with privacy requirements |
| DAT-08 Protect evaluation data sent to Judge | Tokenisation of evaluation data is a guardrail protecting PII that transits to the Judge | Judge receives necessary context without unnecessary PII exposure | Humans define what data the Judge needs and how it is protected |
Secrets & Credentials¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| SEC-01 Never inject credentials into context windows | Context window exclusion is the fundamental credential guardrail — no credentials in the attack surface | Judge monitors for credential-like patterns in model I/O as a detection layer | Architecture designed by humans; violations trigger investigation |
| SEC-02 Use short-lived, scoped tokens | Token scope and expiry are time-bounded guardrails that limit credential utility if compromised | Judge can detect token usage patterns outside expected scope or time windows | Short-lived tokens reduce the window humans have to respond to compromise |
| SEC-03 Centralise secrets in a vault | Centralised vault is a structural guardrail — secrets have one controlled access path | Judge can monitor vault access patterns for anomalies | Humans manage vault policies and review access logs |
| SEC-04 Scan model I/O for credential patterns | Credential pattern scanning is a direct I/O guardrail | Judge evaluates scanning effectiveness and correlates with other exfiltration signals | Scan alerts trigger human investigation and incident response |
| SEC-05 Rotate credentials on exposure | Automatic rotation is a guardrail that limits the utility window of exposed credentials | Judge monitors for continued use of rotated credentials as a compromise indicator | Rotation events trigger human investigation of the exposure source |
| SEC-06 Isolate agent credentials per session | Per-session isolation is a blast radius guardrail — compromise of one session doesn't affect others | Judge can detect credential usage inconsistent with session scope | Humans review session credential patterns and policy effectiveness |
| SEC-07 Protect model endpoint credentials | Endpoint credential protection prevents unauthorised model access — an access guardrail | Judge can detect anomalous patterns in model endpoint authentication | Humans manage endpoint credential policies and rotation schedules |
| SEC-08 Scan code and config for embedded credentials | Pre-deployment scanning is a preventive guardrail catching credentials before they reach production | Judge evaluation of deployment artefacts can include credential scanning results | Scan findings require human remediation before deployment proceeds |
Supply Chain¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| SUP-01 Verify model provenance | Provenance verification is a deployment guardrail — unverified models are blocked | Judge baselines are model-specific; verified provenance ensures valid baselines | Humans approve models for deployment based on provenance evidence |
| SUP-02 Assess model risk before adoption | Risk assessment determines guardrail configuration requirements per model | Assessment results calibrate Judge evaluation criteria and thresholds | Humans conduct assessments and make adoption decisions |
| SUP-03 Verify RAG data source integrity | Source allowlisting and content scanning are guardrails on the RAG ingestion path | Judge can evaluate whether retrieved content appears anomalous relative to known sources | Humans maintain source allowlists and review ingestion logs |
| SUP-04 Secure fine-tuning pipeline | Pipeline security prevents training-time attacks — a guardrail on model creation | Post-training evaluation provides Judge with validated safety baselines | Humans approve training data, review evaluation results, and authorise deployment |
| SUP-05 Audit tool and plugin supply chain | Tool registry and security assessment are guardrails on agent capability expansion | Judge can evaluate tool usage against declared capabilities, detecting anomalous patterns | Humans assess, approve, and monitor tools in the registry |
| SUP-06 Verify guardrail and safety model integrity | Integrity verification protects the guardrails themselves — the most critical supply chain control | Judge model integrity is directly protected; compromise would defeat the evaluation layer | Humans approve guardrail/Judge changes and review tamper detection alerts |
| SUP-07 Maintain AI-BOM | AI-BOM enables systematic guardrail coverage verification | AI-BOM tracks Judge model associations, ensuring consistent evaluation configuration | AI-BOM gives humans a single source of truth for what is deployed |
| SUP-08 Monitor for vulnerabilities | Vulnerability monitoring triggers guardrail rule updates for new attack patterns | New vulnerabilities may require Judge evaluation criteria updates | Humans assess vulnerability impact and prioritise remediation |
Incident Response¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| IR-01 Define AI-specific incident categories | Category definitions guide guardrail monitoring — each category has associated detection rules | Judge evaluation failures are an explicit incident category, integrating Judge into IR | Humans use category definitions to triage and prioritise incidents |
| IR-02 Establish detection triggers | Detection triggers are automated guardrails that escalate anomalies to incident status | Judge evaluations are a primary source of detection triggers | Detection triggers bring humans into the loop when automated systems detect problems |
| IR-03 Define containment procedures | Containment procedures include guardrail hot-reload for rapid policy update | Containment may include Judge threshold adjustment to increase sensitivity | Humans execute containment decisions based on severity classification |
| IR-04 Implement model rollback and hot-reload | Rollback and hot-reload enable rapid guardrail and model restoration | Judge configuration can be rolled back alongside model changes | Humans authorise rollback decisions and verify system recovery |
| IR-05 Define investigation procedures | Investigation uses guardrail decision logs as primary evidence | Judge evaluation logs provide investigation context for output-related incidents | Humans conduct investigations using logs from all three layers |
| IR-06 Establish communication protocols | Guardrail status is included in incident communications to stakeholders | Judge evaluation findings inform communication about impact assessment | Humans manage communications, disclosures, and stakeholder engagement |
| IR-07 Conduct post-incident review | Post-incident reviews assess guardrail effectiveness and identify gaps | Post-incident reviews evaluate Judge detection performance and calibration | Humans lead reviews and implement improvements across all three layers |
| IR-08 Integrate with enterprise IR | AI guardrail events feed enterprise IR workflows and SIEM | Judge signals integrate with enterprise detection and response capabilities | AI incidents are handled within existing human IR structures and escalation paths |
Agentic — Tool Access¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| TOOL-01 Declare tool permissions | Tool manifest is the guardrail — allowlist-based, deny by default | Judge evaluates whether tool usage patterns align with manifest intent | Humans define and approve manifests |
| TOOL-02 Enforce at gateway | Gateway is the enforcement point — deterministic, injection-proof | Gateway logs are ground truth for Judge evaluation of agent actions | Gateway enforcement means human-approved policies are actually enforced |
| TOOL-03 Constrain parameters | Parameter schemas are fine-grained guardrails within permitted tool access | Judge evaluates parameter patterns for anomalies within technically valid bounds | Humans define parameter constraints in the manifest |
| TOOL-04 Classify by reversibility | Classification drives guardrail escalation — irreversible actions require more oversight | Judge calibrates evaluation depth by action class | Classification determines when humans are brought into the loop |
| TOOL-05 Rate-limit invocations | Rate limits are quantitative guardrails preventing accumulation attacks | Rate limit proximity feeds Judge anomaly detection | Rate limit alerts surface sessions for human review |
| TOOL-06 Log every invocation | Invocation logs prove guardrails are active and measure enforcement | Complete invocation logs are primary Judge input for agent evaluation | Logs give humans full visibility into agent actions |
Agentic — Session & Scope¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| SESS-01 Define session boundaries | Time/token/action limits are structural guardrails on agent runtime | Judge evaluates whether sessions approach or reach limits, signalling potential issues | Humans define session limits based on task requirements and risk |
| SESS-02 Isolate sessions | Session isolation prevents cross-contamination — a blast radius guardrail | Judge evaluates each session independently; isolation ensures evaluation integrity | Humans review sessions individually without cross-session confusion |
| SESS-03 Limit session scope | Task scope constraints prevent agents from exceeding their declared purpose | Judge evaluates whether actions within a session are consistent with the declared task | Humans define task scope and review sessions where scope boundaries were tested |
| SESS-04 Implement progressive trust | Progressive trust starts with minimal permissions — a conservative guardrail that relaxes with evidence | Judge evaluates whether trust escalation is justified by the session's clean behaviour history | Humans define trust escalation criteria and review escalation decisions |
| SESS-05 Clean up session state | Session cleanup prevents state leakage — a guardrail on persistent attack surface | Judge evaluates cleanup completeness, detecting residual state that could affect future sessions | Humans define cleanup policies and review cleanup audit logs |
Agentic — Delegation Chains¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| DEL-01 Enforce least delegation | Permission intersection is the guardrail preventing privilege escalation through delegation | Judge evaluates delegation patterns for escalation attempts, even within technically valid permissions | Humans define permission sets; intersection ensures human-approved boundaries hold |
| DEL-02 Maintain audit trail | Audit trail proves delegation guardrails are functioning | Complete chain logs are primary Judge input for multi-agent evaluation | Humans can reconstruct full chains for investigation and review |
| DEL-03 Limit delegation depth | Depth limits are structural guardrails on chain complexity | Shorter chains are within Judge evaluation capacity; limits ensure evaluability | Depth limits keep chains tractable for human review |
| DEL-04 Require explicit authorisation | Delegation manifests are guardrails on agent-to-agent trust | Judge verifies all delegations match declared authorisations | Humans define delegation topology through manifests |
| DEL-05 Propagate user identity | Identity propagation ensures user permissions constrain the entire chain — a fundamental guardrail | User identity enables Judge to evaluate actions against user-level behavioural baselines | Every action in every chain is attributable to the initiating human |
Agentic — Sandbox Patterns¶
| Control | Guardrails | Judge | Human Oversight |
|---|---|---|---|
| SAND-01 Execute in isolated sandboxes | Sandbox isolation is the guardrail boundary for generated code execution | Judge evaluates code behaviour within the sandbox, using execution logs as evidence | Humans define isolation levels based on risk tier |
| SAND-02 Restrict file system access | File system restrictions prevent generated code from accessing data outside declared scope | Judge monitors file access patterns for anomalies | Humans define allowed paths and review access violations |
| SAND-03 Restrict network access | Default-deny network is a guardrail preventing code from communicating externally | Judge evaluates any permitted network access for anomalous patterns | Humans approve exceptions to default-deny network policy |
| SAND-04 Enforce resource limits | Resource limits are guardrails against denial of service and resource abuse | Judge monitors resource consumption patterns for anomalies | Humans define resource limits and review high-consumption sessions |
| SAND-05 Prevent persistent state | Ephemeral environments prevent state accumulation — a guardrail on long-term compromise | Judge evaluates that session state does not persist beyond session boundaries | Humans verify ephemeral policy compliance |
| SAND-06 Scan code before execution | Pre-execution scanning is a guardrail catching malicious or dangerous code before it runs | Judge evaluates scanning results alongside execution behaviour | Humans review scan findings and define scanning policy |
AI Runtime Behaviour Security, 2026 (Jonathan Gill).