Incident Response for AI Systems¶

Control Domain: Response Controls
Purpose: Define AI-specific incident detection, classification, response, and recovery procedures that extend the organisation's existing IR capability.
Relationship: Relies on the Logging controls (LOG-01 through LOG-10) for detection data, the IAM controls for containment actions, and the three-layer model for escalation pathways.

Why AI Incident Response Is Different¶

Traditional incident response follows a pattern: detect, contain, eradicate, recover. AI incidents introduce complications that break conventional IR assumptions:

Assumption	Traditional IR	AI Systems
You can identify the payload	Malware has a hash, an exploit has a CVE	Prompt injection is natural language — no signature
You can isolate the affected system	Take the server offline	The model is stateless — the "infection" is in the prompt, not the system
You can determine impact	Forensics reveals what the attacker accessed	AI context windows are ephemeral — what the model "saw" may not be fully logged
You can prevent recurrence	Patch the vulnerability	The same injection technique can be paraphrased infinitely
Root cause is identifiable	Vulnerability + exploit chain	Model behaviour is non-deterministic — the same input might not reproduce the issue

The core problem: AI incidents often involve behavioural failures rather than system compromises. The model isn't "hacked" in the traditional sense — it's manipulated into behaving in ways that violate policy. This requires IR procedures that address behaviour, not just infrastructure.

Control Objectives¶

ID	Objective	Risk Tiers
IR-01	Define AI-specific incident categories and severity levels	All
IR-02	Establish AI incident detection triggers from logging and monitoring	All
IR-03	Define containment procedures for AI-specific incidents	All
IR-04	Implement model rollback and guardrail emergency update capability	Tier 2+
IR-05	Define investigation procedures for non-deterministic systems	Tier 2+
IR-06	Establish communication protocols for AI incidents	Tier 2+
IR-07	Conduct post-incident review with AI-specific root cause analysis	All
IR-08	Integrate AI IR with enterprise IR processes	All

IR-01: AI Incident Categories¶

Standard incident categories (malware, unauthorised access, data breach) don't adequately describe AI-specific incidents. Define additional categories:

AI-Specific Incident Types¶

AI Incident Classification

Category	Description	Example
Prompt injection — successful	Attacker bypassed guardrails and manipulated model behaviour	Model disclosed system prompt, executed unintended tool calls
Prompt injection — attempted	Guardrails or Judge detected injection attempt	Blocked injection, but technique is novel and needs analysis
Guardrail failure	Guardrails passed content that should have been blocked	PII in output not caught, harmful content not filtered
Judge disagreement	Judge flagged content that guardrails passed (or vice versa) at significant rates	Systemic gap between detection layers
Model behavioural drift	Model behaviour shifted outside baseline parameters	Response quality degradation, topic drift, tone changes
Data poisoning	Malicious content entered the vector store or training pipeline	Manipulated RAG responses, biased fine-tuning outcomes
Agent autonomy violation	Agent took actions outside its declared permission set	Unauthorised tool invocation, exceeded scope
Credential exposure	Credentials appeared in model context, output, or logs	API key in model response, token in log file
Context window exfiltration	Sensitive data extracted from model context via adversarial prompts	System prompt, other users' data, retrieved documents leaked
Evaluation failure	Judge system failed, producing no evaluations during a period	Monitoring gap, undetected issues during failure window

Severity Classification¶

Severity	Criteria
Critical	Active data breach via AI system, successful agent autonomy violation with impact, credential compromise with confirmed exploitation
High	Successful prompt injection with policy violation, guardrail failure on Tier 3+ system, data poisoning confirmed
Medium	Novel injection technique detected (even if blocked), guardrail/Judge disagreement exceeding threshold, model drift beyond baseline
Low	Known injection technique blocked, single-instance guardrail false negative, minor drift within recovery parameters

IR-02: Detection Triggers¶

AI incidents are detected through the logging and monitoring infrastructure (LOG-01 through LOG-10). Define specific triggers:

Automated Detection¶

Trigger	Source	Incident Category
Guardrail block rate spike (>2σ)	LOG-05	Potential injection campaign
Judge escalation rate exceeds threshold	LOG-03, LOG-05	Guardrail failure or model drift
Credential pattern in model I/O	SEC-04	Credential exposure
Agent tool call to undeclared endpoint	LOG-04, NET-04	Agent autonomy violation
System prompt hash mismatch	LOG-01, IAM-08	Configuration tampering
Cross-zone traffic anomaly	NET-08	Potential compromise
Single user generating anomalous volume	LOG-05	Adversarial probing
Judge availability below SLA	LOG-03	Evaluation failure
Vector store content classification change	SUP-03	Potential data poisoning

Human-Reported¶

User reports unexpected model behaviour.
Security team identifies AI-related IoCs during other investigations.
Vendor notifies of model vulnerability or incident.
Regulatory or legal inquiry triggers review.

IR-03: Containment Procedures¶

AI containment differs from traditional containment. You can't "quarantine" a stateless model — but you can restrict what reaches it and what it can do.

Containment Actions by Severity¶

Severity	Containment Action
Critical	Disable the AI system endpoint. Route traffic to a static fallback. Revoke all agent credentials. Preserve logs.
High	Increase guardrail strictness (lower thresholds). Disable agent tool access. Enable synchronous Judge evaluation (block on flag). Increase logging verbosity.
Medium	Add targeted guardrail rules for the detected technique. Increase monitoring for the affected category. Alert human reviewers.
Low	Log for analysis. Update guardrail rules in next scheduled release. No immediate containment required.

AI-Specific Containment Principles¶

Fail safe, not fail open: If the guardrail system fails, block all traffic rather than allowing unfiltered access to the model.
Preserve forensic data: Before containment actions change system behaviour, ensure current logs and configurations are preserved.
Contain the session, not the system: If a single user session is compromised (injection, exfiltration), terminate that session without disabling the entire system (unless the attack vector is systemic).
Credential rotation is containment: Any credential exposure triggers immediate rotation as a containment action, not a later remediation step.

IR-04: Rollback and Emergency Updates¶

The ability to rapidly roll back model deployments and update guardrails is critical for AI incident response.

Requirements¶

Model rollback: The ability to revert to a previous known-good model version within minutes, not hours.
Guardrail emergency update: The ability to deploy new guardrail rules (to block a newly discovered injection technique) without a full deployment cycle.
Judge criteria update: The ability to update evaluation criteria to catch a newly identified failure mode.
Vector store rollback: The ability to remove recently ingested documents that may be poisoned.
Agent permission reduction: The ability to immediately reduce agent permissions without redeployment.

Deployment Requirements¶

Pre-staged rollback artefacts for current-1 and current-2 model versions.
Guardrail rule hot-reload capability (update rules without restarting the guardrail service).
Blue/green or canary deployment for model updates, enabling rapid rollback.
Agent permission sets managed as configuration, not code — enabling runtime updates.

IR-05: Investigation Procedures¶

AI incident investigation must account for non-determinism. The same input may not reproduce the same output, making traditional reproduction-based debugging insufficient.

Investigation Framework¶

Reconstruct the interaction: Use LOG-01 (model I/O) and LOG-04 (agent chains) to reconstruct exactly what happened during the incident.
Identify the trigger: What input or sequence of inputs caused the undesired behaviour? Was it a single prompt or a multi-turn escalation?
Assess guardrail performance: Did guardrails evaluate the input/output? What was the confidence score? Did they pass content they should have blocked (LOG-02)?
Assess Judge performance: Did the Judge evaluate the interaction? What was the verdict? Was there a guardrail/Judge disagreement (LOG-03)?
Determine scope: Was this an isolated incident or part of a pattern? Search logs for similar inputs, techniques, or user behaviour.
Assess impact: What data was exposed, what actions were taken, what decisions were influenced? For agent incidents, reconstruct the full action chain.
Identify the control gap: Which control failed? Was it a guardrail rule gap, a Judge criteria gap, a network bypass, or a permission misconfiguration?
Attempt reproduction: Try to reproduce the behaviour in a sandboxed environment. Accept that non-determinism may prevent exact reproduction — focus on reproducing the category of failure.

IR-06: Communication Protocols¶

AI incidents may require communication with stakeholders who don't exist in traditional IR:

Stakeholder	When to Notify	What to Communicate
Model provider	Model vulnerability exploited, unexpected model behaviour	Technique details (sanitised), impact, request for guidance
Data subjects	PII exposed via AI system	What was exposed, how, remediation steps
Regulators	Reportable breach involving AI system, EU AI Act compliance failure	Incident details per regulatory requirements
AI ethics/governance board	Bias incident, harmful output to vulnerable user, autonomy violation	Full incident report, proposed controls
Affected users	User received harmful, incorrect, or manipulated AI output	What happened, corrective actions, how to verify
Executive leadership	Critical or high severity, reputational risk, regulatory exposure	Business impact summary, containment status, timeline

IR-07: Post-Incident Review¶

AI post-incident reviews must go beyond traditional root cause analysis to address the non-deterministic nature of AI failures.

Review Elements¶

Timeline reconstruction using LOG-01 through LOG-04 data.
Control gap analysis: Which layer of the three-layer model failed, and why?
Guardrail rule update: If guardrails missed the issue, what new rules are needed?
Judge criteria update: If the Judge missed the issue, what evaluation criteria need refinement?
Detection improvement: How can this incident type be detected earlier/faster?
Injection technique catalogue: If prompt injection was involved, add the technique to the internal injection pattern library (LOG-06).
Baseline update: If model drift was involved, update behavioural baselines (LOG-05).
Non-determinism acknowledgement: Accept that some AI failures may not have a single root cause. Focus on strengthening detection and response capability rather than expecting prevention of all possible failures.

IR-08: Enterprise IR Integration¶

AI incident response must integrate with the existing enterprise IR process, not replace it.

Integration Points¶

AI incident categories map to the enterprise incident taxonomy.
AI severity levels align with enterprise severity levels.
AI incidents are tracked in the enterprise incident management system.
AI incidents that involve data breach, unauthorised access, or compliance violations trigger the enterprise IR playbook in parallel.
AI-specific forensic data (model I/O logs, agent chains, guardrail decisions) is available to the enterprise IR team.
AI incidents contribute to the enterprise risk register and inform control investment decisions.

Three-Layer Mapping¶

Control	Guardrails	LLM-as-Judge	Human Oversight
IR-01 Categories	Guardrail failure as incident category	Judge failure as incident category	Humans classify and prioritise
IR-02 Detection	Guardrail log anomalies trigger detection	Judge flag rate triggers detection	Humans review escalated alerts
IR-03 Containment	Guardrail strictness increased	Judge switched to synchronous mode	Humans authorise containment actions
IR-04 Rollback	Guardrail rules hot-reloaded	Judge criteria updated	Humans authorise rollback decisions
IR-05 Investigation	Guardrail logs provide evidence	Judge logs provide evidence	Humans conduct investigation
IR-06 Communication	Guardrail status communicated	Judge status communicated	Humans manage stakeholder communication
IR-07 Post-incident	Guardrail rules updated	Judge criteria refined	Humans lead review process
IR-08 Integration	Guardrail data feeds enterprise SIEM	Judge data feeds enterprise SIEM	Humans bridge AI and enterprise IR teams

Platform-Neutral Implementation Checklist¶

AI Runtime Behaviour Security, 2026 (Jonathan Gill).