Skip to content

PACE Resilience Checklist

Verification items for PACE resilience, organised by risk tier. Complete these before go-live and revalidate at the cadence specified for your tier.

This document uses the simplified three-tier system (Tier 1/2/3). See Risk Tiers — Simplified Tier Mapping for the mapping to LOW/MEDIUM/HIGH/CRITICAL.


All Tiers — Pre-Deployment

These items apply to every AI system, regardless of risk tier.

Fail Posture

  • Fail posture (open or closed) defined for each control layer (Guardrails, Judge, Human Oversight)
  • Fail posture documented in system runbook
  • Fail posture decision reviewed and approved by system owner

Fallback Path

  • Non-AI fallback path identified (the process that continues the business function without AI)
  • Fallback path documented (who does what, using what tools)
  • Fallback path tested at least once before go-live
  • Team aware that fallback path exists and how to activate it

Recovery

  • Recovery criteria defined for each control layer (what must be true before returning to normal)
  • Recovery procedure documented in runbook

Tier 1 — Low Risk

Internal tools, content generation, employee productivity.

PACE Plan

  • P (Primary) and A (Alternate) behaviour defined for each control layer (one sentence each)
  • C (Contingency) and E (Emergency) combined statement: "Disable feature via [mechanism], revert to [manual process], contact [name]"
  • PACE plan documented as paragraph in system runbook

Testing (Annual)

  • Fallback path still works (manual process can be executed)
  • Feature disable mechanism works (feature flag, deployment rollback, or equivalent)
  • At least one team member knows how to activate fallback
  • Runbook entry reviewed and still accurate

Tier 2 — Medium Risk

Customer-facing content, decision support, document processing.

PACE Plan

  • Full P/A/C/E defined for each control layer with specific behaviours at each level
  • Transition triggers defined (measurable conditions for each state change)
  • Automated transition mechanisms configured where possible (circuit breaker, health checks)
  • Escalation contact list documented (primary and alternate for each control layer)
  • Customer communication template pre-drafted for degraded service notification
  • PACE plan documented as dedicated runbook section

Fail-Closed Verification

  • Guardrail failure results in traffic blocked (not passed)
  • Judge failure results in outputs held for human review
  • Human oversight unavailability results in conservative automated thresholds
  • Circuit breaker health check is independent of AI system components

Non-AI Fallback Path

  • Rule-based or templated fallback system operational
  • Fallback handles 100% of AI traffic at degraded quality
  • Fallback does not depend on any AI infrastructure component
  • Fallback activation is automated (circuit breaker)

Testing (Quarterly)

  • Guardrail failure simulation: system fails-closed correctly
  • Judge failure simulation: outputs held or handled per PACE plan
  • Human escalation exercise: flagged items reach reviewers within SLA
  • Circuit breaker activation: non-AI fallback activates cleanly
  • Fallback path operated with production-equivalent traffic
  • Recovery procedure validated: step back up from fallback to normal
  • Runbook entries reviewed and updated

Testing (Semi-Annual)

  • Full degradation walkthrough: P → A → C → E and recovery for at least one control layer
  • Exercise involves same personnel and tools as real incident

Tier 3 — High Risk

Regulated decisions, autonomous agents with write access, financial/medical/legal domains.

PACE Plan

  • Full P/A/C/E defined for each control layer with quantitative trigger criteria
  • All transition triggers have automated monitoring and alerting
  • Standalone operational resilience document created and reviewed by risk function
  • Regulatory notification templates prepared and reviewed by legal
  • Forensic evidence preservation automated (audit logs immutable, state snapshots configured)
  • Recovery governance defined: sign-off chain for each step-back-up

Fail-Closed Verification

  • No AI traffic can pass a degraded control under any condition
  • Circuit breaker activates automatically when triggered
  • Circuit breaker cannot be overridden without documented authorisation
  • Two or more layers at Emergency simultaneously triggers immediate circuit breaker

Non-AI Fallback Path

  • Staffed parallel process operational with trained operators
  • Fallback handles critical subset of AI functions at production quality
  • Fallback shares no infrastructure with AI system
  • Fallback staffing model defined (who, how many, availability)

Agentic Systems (if applicable)

  • All five degradation phases defined (Normal, Constrained, Supervised, Bypassed, Full Stop)
  • Trigger criteria defined for each phase transition
  • Transaction resolution matrix completed for every tool in agent's permission set
  • Multi-agent cascade prevention designed and tested (if multi-agent)
  • State preservation automation validated (memory, context, plans, tool calls, Judge state)
  • Agent cannot modify its own logs during phase transition
  • Recovery step-back-up procedure defined with authorisation gates at each phase

Testing (Monthly)

  • Guardrail failure simulation: system fails-closed, no AI traffic passes
  • Judge failure simulation: all traffic paused or held
  • Circuit breaker activation: non-AI fallback activates within seconds
  • Fallback path operated with production-equivalent traffic
  • Recovery procedure validated
  • Monitoring and alerting thresholds validated against current baselines

Testing (Quarterly)

  • Full degradation walkthrough: all five phases for agentic systems (or P → A → C → E for non-agentic)
  • Exercise involves same personnel, tools, and communication channels as real incident
  • Human escalation exercise with domain expert reviewers
  • Transaction resolution matrix validated against current tool set
  • Post-exercise review documented with lessons learned and plan updates

Testing (Semi-Annual)

  • Degradation walkthrough with regulator observation (where required)
  • Full operational resilience document reviewed and updated
  • Fallback staffing model reviewed against current team composition
  • PACE plan alignment reviewed against regulatory changes

Ongoing Maintenance — All Tiers

These items prevent PACE plan decay over time:

  • PACE plan reviewed when any control layer is modified (guardrail rules updated, Judge model changed, reviewer pool changed)
  • PACE plan reviewed when agent tool permissions change (agentic systems)
  • Fallback path validated after any infrastructure change that could affect it
  • New team members briefed on PACE plan and their role in it
  • Departing team members' PACE responsibilities reassigned
  • Lessons from any real degradation events incorporated into PACE plan

AI Runtime Behaviour Security, 2026 (Jonathan Gill).