AI Incident Response Playbook¶

Incident response procedures specific to AI systems. These playbooks supplement, not replace, existing incident response procedures.

Playbook Index¶

#	Incident Type	Severity	Page
1	Prompt Injection Attack	High-Critical	Below
2	Data Leakage via AI Output	Critical	Below
3	Hallucination with Business Impact	Medium-Critical	Below
4	AI System Producing Biased Outputs	High-Critical	Below
5	Model Provider Breach	High-Critical	Below
6	Guardrail Bypass	High	Below
7	Judge System Failure	Medium-High	Below
8	Agentic AI Taking Unintended Actions	Critical	Below
9	Knowledge Base Poisoning	High-Critical	Below
10	Silent Quality Degradation	Medium	Below

Severity Classification¶

Severity	Definition	Response Time
Critical	Active exploitation, data breach, regulatory breach, or significant customer harm	Immediate (< 1 hour)
High	Potential for significant harm, control bypass, or integrity compromise	< 4 hours
Medium	Quality issues, limited scope impact, potential for escalation	< 24 hours
Low	Minor issues, no customer impact, easily contained	< 72 hours

1. Prompt Injection Attack¶

Indicators¶

Guardrails flagging unusual input patterns
AI outputs that deviate from expected behaviour
Instructions appearing in outputs that don't match system prompt
User reporting unexpected AI behaviour
Judge flagging anomalous interactions

Immediate Actions (First 30 minutes)¶

Assess scope — Is this a single incident or pattern?
Preserve evidence — Capture full interaction logs (input, context, output, metadata)
Determine if attack was successful — Did the AI follow injected instructions?
Identify attack vector — Direct input? Indirect via retrieved content? Tool output?

Containment¶

If...	Then...
Attack via direct user input	Update input guardrails with pattern
Attack via RAG content	Quarantine affected knowledge base content
Attack via tool output	Disable affected tool integration
Attack successful and ongoing	Consider taking system offline

Investigation¶

Query logs for similar patterns across all interactions
Identify if other users/sessions were affected
Determine what information or actions the attacker obtained
Review guardrail effectiveness — why wasn't this blocked?
Check if attack pattern is known (OWASP, public research)

Recovery¶

Deploy updated guardrails
Re-enable system with monitoring on high alert
Inform affected users if data was exposed
Update Judge criteria to detect this pattern

Post-Incident¶

Root cause analysis
Guardrail gap analysis
Update adversarial testing suite
Consider architectural changes if fundamental weakness identified

2. Data Leakage via AI Output¶

Indicators¶

Output guardrails flagging PII/sensitive data
Customer complaint about receiving another customer's data
Judge detecting sensitive content in outputs
Audit finding of improper data disclosure

Immediate Actions (First 15 minutes)¶

Capture the output — Preserve exactly what was disclosed
Identify the data — What was leaked? Whose data? Classification level?
Identify recipients — Who received the leaked data?
Stop the bleeding — Can you prevent further disclosure?

Containment¶

Data Type	Action
Single customer's data to another customer	Disable feature, contact both customers
Multiple customers' data	Take system offline
Regulated data (PII, financial, health)	Invoke data breach procedure
Internal/confidential business data	Restrict access, assess impact

Regulatory Notification¶

Jurisdiction	Notification Requirement	Timeline
UK (ICO)	Notify if risk to individuals	72 hours
EU (GDPR)	Notify if risk to individuals	72 hours
US (varies by state)	Check state requirements	Varies
Sector-specific (PCI, HIPAA)	Check specific requirements	Varies

Investigation¶

How did the data enter the AI context? (RAG, prior conversation, training?)
Why didn't output guardrails catch it?
Was this a one-time error or systematic issue?
Full scope assessment — who else might be affected?

Recovery¶

Deploy enhanced output guardrails
Review data minimisation in prompts and RAG
Implement cross-reference checks (verify output doesn't contain data belonging to other users)
Consider data isolation architecture

3. Hallucination with Business Impact¶

Indicators¶

Customer complaint about incorrect information
Downstream system acting on false AI-generated data
Judge flagging unsupported claims
Audit finding discrepancy between AI output and source data

Immediate Actions¶

Verify the hallucination — Confirm output is actually false
Assess impact — What decisions were made based on this? What harm occurred?
Identify affected parties — Who received the false information?

Impact Categories¶

Impact	Example	Response Level
Informational error, no action taken	Wrong answer in internal chat	Low
Customer received incorrect advice	Wrong product feature description	Medium
Business decision based on false data	Incorrect financial figure in report	High
Regulatory/legal implications	False compliance statement	Critical
Safety implications	Incorrect safety guidance	Critical

Containment¶

Correct the record with affected parties
If output was used for decisions, flag those decisions for review
If output was forwarded downstream, trace and correct

Investigation¶

Was this a random hallucination or systematic pattern?
Did the AI have access to correct source data?
Did the AI ignore source data or fabricate?
Are guardrails/Judge configured to catch this type of hallucination?

Recovery¶

Implement grounding verification for this output type
Update Judge criteria for hallucination detection
Consider requiring source citation for this use case
Review if use case risk tier is appropriate

4. AI System Producing Biased Outputs¶

Indicators¶

Disparate outcomes detected across protected characteristics
Customer complaints alleging discrimination
Audit/testing reveals bias
Judge flagging potential fairness issues
Regulatory inquiry

Immediate Actions¶

Verify the bias — Statistical analysis of outputs across groups
Assess scope — How long has this been occurring? How many affected?
Preserve evidence — Full logs for investigation
Legal/compliance notification — This may be a regulatory matter

Containment¶

Bias Severity	Action
Statistical anomaly, unclear if bias	Continue monitoring, deeper analysis
Clear disparate impact, limited scope	Disable affected feature
Systematic discrimination	Take system offline
Active regulatory/legal matter	Follow legal counsel

Investigation¶

Is the bias in the model itself or in the data/prompts?
Can you identify the source of bias?
What protected characteristics are affected?
What is the quantified impact (rejection rates, outcomes, etc.)?

Remediation Options¶

Source	Remediation
Training data bias (foundation model)	Different model, fine-tuning, output calibration
RAG data bias	Curate knowledge base
Prompt bias	Revise prompts, add debiasing instructions
Structural bias	Architectural changes, human review

Regulatory Considerations¶

Document everything — investigation, findings, remediation
Consider proactive disclosure vs. wait for inquiry
Engage legal counsel for discrimination-related bias
Monitor for similar issues across other AI systems

5. Model Provider Breach¶

Indicators¶

Provider notification of security incident
News reports of provider breach
Unexplained changes in model behaviour
Provider communication about data exposure

Immediate Actions¶

Confirm the breach — Contact provider, review official communications
Assess exposure — What data of yours did the provider have?
Determine if your data was affected — Request specific confirmation

Data at Risk Assessment¶

Data Category	Risk Level	Action
API keys/credentials	Critical	Rotate immediately
Customer data in prompts	High	Assess scope, prepare notification
System prompts	Medium	Review for sensitive content
Interaction logs	Medium-High	Depends on content
Fine-tuning data	High	Assess sensitivity

Containment¶

Rotate all API keys and credentials
If zero-retention was not enabled, assume data was accessible
Consider temporarily switching to alternative provider
Review what data should not have been sent to provider

Regulatory Implications¶

If customer data was exposed via provider, this may be your breach too
Notification obligations may apply
Document your data processing agreement and provider's obligations

6. Guardrail Bypass¶

Indicators¶

Attack that should have been blocked reached the model
Known-bad content appearing in inputs or outputs
Adversarial testing reveals gap
User reports inappropriate content

Immediate Actions¶

Capture the bypass — Exact input that evaded detection
Assess impact — What happened after the bypass?
Determine method — How did the attacker bypass? (encoding, rephrasing, etc.)

Bypass Methods and Responses¶

Method	Detection	Mitigation
Encoding tricks (base64, rot13)	Add decoder to guardrails	Pattern expansion
Semantic rephrasing	Classifier miss	Retrain classifier, add examples
Multi-turn manipulation	Per-message check only	Add conversation-level analysis
Language switching	English-only guardrails	Multilingual guardrails
Prompt structure manipulation	Pattern too specific	More flexible patterns

Recovery¶

Deploy fix for specific bypass
Add to adversarial test suite
Review for related bypass vectors
Consider if Judge would have caught this (async defence)

7. Judge System Failure¶

Indicators¶

Judge not producing evaluations
Judge producing inconsistent/wrong evaluations
Backlog of unevaluated interactions
Judge costs spiking unexpectedly

Immediate Actions¶

Assess type of failure — Down? Degraded? Producing wrong results?
Determine duration — When did this start? What's the gap?
Assess risk — How many interactions were not evaluated? What tiers?

Impact Assessment¶

Tier	Judge Down Impact	Action
CRITICAL	100% evaluation required, gap is serious	Increase HITL, consider pause
HIGH	Significant sampling, gap matters	Backfill evaluation when restored
MEDIUM	Lower sampling, moderate gap	Backfill on best-effort
LOW	Spot checks, limited impact	Resume when restored

Recovery¶

Restore Judge operation
Backfill evaluations for gap period (prioritise by tier)
Review why failure occurred
Implement redundancy if single point of failure

8. Agentic AI Taking Unintended Actions¶

Indicators¶

Agent performed action outside expected scope
Downstream system received unexpected commands
Resource consumption spike (API calls, compute, cost)
Agent achieved goal through unintended means

Immediate Actions (First 5 minutes)¶

Stop the agent — Halt execution immediately
Assess what happened — What actions were taken?
Determine reversibility — Can actions be undone?

Action Assessment¶

Action Type	Example	Reversibility
Read-only queries	Database reads	N/A (no harm)
Reversible writes	Draft email saved	Undo
Sent communications	Email sent	Cannot undo, can follow up
Financial transactions	Payment made	May be reversible
Data deletion	Records deleted	Restore from backup
External API calls	Third-party action triggered	Depends on API

Containment¶

Undo reversible actions
For irreversible actions, assess damage and notify affected parties
Preserve full trajectory log for investigation
Disable agent until investigation complete

Investigation¶

Review full trajectory — planning, actions, reasoning
Identify where agent deviated from expected behaviour
Was scope enforcement working? Did agent exceed boundaries?
Was this goal hijacking (prompt injection) or emergent behaviour?
Were circuit breakers triggered? If not, why not?

Recovery¶

Tighten scope definitions
Add circuit breaker for this action type
Require approval checkpoint before this action
Update Judge to detect this trajectory pattern

9. Knowledge Base Poisoning¶

Indicators¶

AI outputs contain unexpected content
RAG returning content that shouldn't exist
Knowledge base audit reveals unauthorised content
AI behaviour changed without prompt changes

Immediate Actions¶

Identify the poisoned content — What was added/modified?
Quarantine — Remove or isolate affected content
Assess exposure — How many interactions used poisoned content?

Investigation¶

How was content modified? (Authorised user? Compromised account? Data pipeline?)
When was content modified?
What was the intent? (Injection attack? Misinformation? Vandalism?)
Full audit of knowledge base for other modifications

Recovery¶

Restore from known-good backup
Implement content integrity validation (checksums, signatures)
Review access controls on knowledge base
Add anomaly detection on content changes

10. Silent Quality Degradation¶

Indicators¶

Gradual decline in user satisfaction scores
Increasing escalation rates
Baseline comparison showing drift
Judge finding rates increasing
HITL reviewers reporting quality issues

Immediate Actions¶

Verify degradation — Compare current performance to baseline
Identify scope — All outputs or specific categories?
Determine cause — Model change? Data change? Guardrail change?

Common Causes¶

Cause	Detection	Mitigation
Provider model update	Check model version	Pin version, evaluate new version
RAG data staleness	Check data freshness	Refresh data, fix pipeline
Guardrail over-filtering	Check false positive rate	Tune guardrails
Prompt drift	Review prompt changes	Revert or fix prompts
Concept drift	Compare input distributions	Retrain/update as needed

Recovery¶

If model version changed, evaluate rolling back
If data staleness, refresh and validate
If guardrail issue, tune and test
Implement more aggressive baseline monitoring

Incident Report Template¶

INCIDENT REPORT

Incident ID: [YYYYMMDD-###]
Date/Time Detected: 
Date/Time Resolved:
Severity: [Critical/High/Medium/Low]
System Affected:
Risk Tier of System:

SUMMARY
[One paragraph description]

TIMELINE
[Chronological events]

ROOT CAUSE
[What caused the incident]

IMPACT
- Users affected:
- Data affected:
- Business impact:
- Regulatory implications:

RESPONSE ACTIONS
[What was done]

REMEDIATION
[What will prevent recurrence]

LESSONS LEARNED
[What we learned]

FOLLOW-UP ACTIONS
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| | | | |

¶

AI Runtime Behaviour Security, 2026 (Jonathan Gill).