PACE Resilience Checklist¶
Verification items for PACE resilience, organised by risk tier. Complete these before go-live and revalidate at the cadence specified for your tier.
This document uses the simplified three-tier system (Tier 1/2/3). See Risk Tiers — Simplified Tier Mapping for the mapping to LOW/MEDIUM/HIGH/CRITICAL.
All Tiers — Pre-Deployment¶
These items apply to every AI system, regardless of risk tier.
Fail Posture¶
- Fail posture (open or closed) defined for each control layer (Guardrails, Judge, Human Oversight)
- Fail posture documented in system runbook
- Fail posture decision reviewed and approved by system owner
Fallback Path¶
- Non-AI fallback path identified (the process that continues the business function without AI)
- Fallback path documented (who does what, using what tools)
- Fallback path tested at least once before go-live
- Team aware that fallback path exists and how to activate it
Recovery¶
- Recovery criteria defined for each control layer (what must be true before returning to normal)
- Recovery procedure documented in runbook
Tier 1 — Low Risk¶
Internal tools, content generation, employee productivity.
PACE Plan¶
- P (Primary) and A (Alternate) behaviour defined for each control layer (one sentence each)
- C (Contingency) and E (Emergency) combined statement: "Disable feature via [mechanism], revert to [manual process], contact [name]"
- PACE plan documented as paragraph in system runbook
Testing (Annual)¶
- Fallback path still works (manual process can be executed)
- Feature disable mechanism works (feature flag, deployment rollback, or equivalent)
- At least one team member knows how to activate fallback
- Runbook entry reviewed and still accurate
Tier 2 — Medium Risk¶
Customer-facing content, decision support, document processing.
PACE Plan¶
- Full P/A/C/E defined for each control layer with specific behaviours at each level
- Transition triggers defined (measurable conditions for each state change)
- Automated transition mechanisms configured where possible (circuit breaker, health checks)
- Escalation contact list documented (primary and alternate for each control layer)
- Customer communication template pre-drafted for degraded service notification
- PACE plan documented as dedicated runbook section
Fail-Closed Verification¶
- Guardrail failure results in traffic blocked (not passed)
- Judge failure results in outputs held for human review
- Human oversight unavailability results in conservative automated thresholds
- Circuit breaker health check is independent of AI system components
Non-AI Fallback Path¶
- Rule-based or templated fallback system operational
- Fallback handles 100% of AI traffic at degraded quality
- Fallback does not depend on any AI infrastructure component
- Fallback activation is automated (circuit breaker)
Testing (Quarterly)¶
- Guardrail failure simulation: system fails-closed correctly
- Judge failure simulation: outputs held or handled per PACE plan
- Human escalation exercise: flagged items reach reviewers within SLA
- Circuit breaker activation: non-AI fallback activates cleanly
- Fallback path operated with production-equivalent traffic
- Recovery procedure validated: step back up from fallback to normal
- Runbook entries reviewed and updated
Testing (Semi-Annual)¶
- Full degradation walkthrough: P → A → C → E and recovery for at least one control layer
- Exercise involves same personnel and tools as real incident
Tier 3 — High Risk¶
Regulated decisions, autonomous agents with write access, financial/medical/legal domains.
PACE Plan¶
- Full P/A/C/E defined for each control layer with quantitative trigger criteria
- All transition triggers have automated monitoring and alerting
- Standalone operational resilience document created and reviewed by risk function
- Regulatory notification templates prepared and reviewed by legal
- Forensic evidence preservation automated (audit logs immutable, state snapshots configured)
- Recovery governance defined: sign-off chain for each step-back-up
Fail-Closed Verification¶
- No AI traffic can pass a degraded control under any condition
- Circuit breaker activates automatically when triggered
- Circuit breaker cannot be overridden without documented authorisation
- Two or more layers at Emergency simultaneously triggers immediate circuit breaker
Non-AI Fallback Path¶
- Staffed parallel process operational with trained operators
- Fallback handles critical subset of AI functions at production quality
- Fallback shares no infrastructure with AI system
- Fallback staffing model defined (who, how many, availability)
Agentic Systems (if applicable)¶
- All five degradation phases defined (Normal, Constrained, Supervised, Bypassed, Full Stop)
- Trigger criteria defined for each phase transition
- Transaction resolution matrix completed for every tool in agent's permission set
- Multi-agent cascade prevention designed and tested (if multi-agent)
- State preservation automation validated (memory, context, plans, tool calls, Judge state)
- Agent cannot modify its own logs during phase transition
- Recovery step-back-up procedure defined with authorisation gates at each phase
Testing (Monthly)¶
- Guardrail failure simulation: system fails-closed, no AI traffic passes
- Judge failure simulation: all traffic paused or held
- Circuit breaker activation: non-AI fallback activates within seconds
- Fallback path operated with production-equivalent traffic
- Recovery procedure validated
- Monitoring and alerting thresholds validated against current baselines
Testing (Quarterly)¶
- Full degradation walkthrough: all five phases for agentic systems (or P → A → C → E for non-agentic)
- Exercise involves same personnel, tools, and communication channels as real incident
- Human escalation exercise with domain expert reviewers
- Transaction resolution matrix validated against current tool set
- Post-exercise review documented with lessons learned and plan updates
Testing (Semi-Annual)¶
- Degradation walkthrough with regulator observation (where required)
- Full operational resilience document reviewed and updated
- Fallback staffing model reviewed against current team composition
- PACE plan alignment reviewed against regulatory changes
Ongoing Maintenance — All Tiers¶
These items prevent PACE plan decay over time:
- PACE plan reviewed when any control layer is modified (guardrail rules updated, Judge model changed, reviewer pool changed)
- PACE plan reviewed when agent tool permissions change (agentic systems)
- Fallback path validated after any infrastructure change that could affect it
- New team members briefed on PACE plan and their role in it
- Departing team members' PACE responsibilities reassigned
- Lessons from any real degradation events incorporated into PACE plan
AI Runtime Behaviour Security, 2026 (Jonathan Gill).