Control Layer Resilience: Internal PACE¶
This section defines what happens when each control layer degrades. Every control has its own PACE plan (vertical axis) in addition to the architecture-level PACE across layers (horizontal axis). See the PACE Resilience Methodology for the full model.
This document uses the simplified three-tier system (Tier 1/2/3). See Risk Tiers — Simplified Tier Mapping for the mapping to LOW/MEDIUM/HIGH/CRITICAL.
Design Principle¶
Every control layer will eventually fail. The question is not whether but how. Before deploying any AI system, the architect must define the fail posture for each control at the assigned risk tier. This is a mandatory design input, not an operational afterthought.
Guardrails — Internal PACE¶
Guardrails are the Primary layer: deterministic, fast, always-on. When they degrade, the system must decide instantly whether to continue serving traffic (fail-open) or stop (fail-closed). The tier determines the answer.
| PACE | Condition | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|---|
| P — Normal | Engine running, all filters active | Standard filters: topic, length, PII | Full suite: content filters, schema validation, injection detection, hallucination indicators | Hardened multi-layer: regulatory constraint enforcement, action-level permissions, schema-strict validation. No bypass path. |
| A — Degraded | Engine slow, partial filter failure, rule update pending | Pass traffic. Log which filters are degraded. Flag for next business day review. | Fall back to stricter, simpler rule set. Over-block rather than under-block. Alert on-call. | Fall back to stricter rule set. Alert on-call immediately. Increase Judge evaluation to 100% if not already. |
| C — Down | Engine unresponsive or returning errors | Fail-open. Pass traffic. Log all requests. Rely on Judge and/or human review. | Fail-closed. Hold all AI outputs for Judge evaluation + human review. If Judge also degraded, route to non-AI fallback. | Fail-closed. Block all AI traffic immediately. Route to non-AI fallback path. Incident response initiated. |
| E — Compromised | Evidence of guardrail tampering or adversarial bypass at infrastructure level | Disable AI feature. Alert team. | Route to non-AI fallback. Incident response. Preserve guardrail configuration for forensic analysis. | Full stop on AI operations. Non-AI fallback path active. Forensic evidence preserved. Stakeholders and regulators notified. |
Guardrail Transition Triggers¶
| Transition | Recommended Triggers |
|---|---|
| P → A | Guardrail response latency >2x baseline for 5 min; any filter returning default/error responses; rule update deployed but not validated |
| A → C | Guardrail engine health check failing; >50% of filter categories non-functional; engine unresponsive for >30s |
| C → E | Evidence of configuration tampering; known vulnerability actively exploited; guardrail logs show signs of adversarial manipulation |
| Recovery: C/E → P | Engine restored, all filters validated with test suite, monitoring confirms normal operation for >15 min (Tier 1) / >1 hour (Tier 2/3) |
LLM-as-Judge — Internal PACE¶
The Judge is the Alternate layer: probabilistic, asynchronous, catches what guardrails miss. Its failure modes are different from guardrails — it can be slow, wrong, or manipulated — and the fallback response differs by tier.
| PACE | Condition | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|---|
| P — Normal | Judge evaluating outputs, scores within calibration range | Sample 10–20% of outputs. Log scores. | Evaluate 100% of outputs async. Auto-hold outputs below confidence threshold. | Evaluate 100% of outputs AND actions. Dual-model evaluation. Pre- and post-action verification for agents. |
| A — Degraded | Judge latency >3x baseline, or partial model errors, or score distribution drifting | Reduce sampling rate. Log coverage gap. Review gap at next business day. | Switch to priority-only evaluation: flag high-risk outputs first (customer-facing, financial, PII-containing). Accept evaluation gap on low-risk outputs. Alert on-call. | Queue all outputs. Accept increased latency rather than skipping evaluation. Alert on-call. If agent, constrain to read-only until Judge performance restored. |
| C — Down | Judge returning errors, nonsensical scores, or completely unresponsive | Suspend Judge. Guardrails-only operation. Increase human sampling of outputs if practical. | All outputs held for human review until Judge restored. If human review queue exceeds capacity, throttle AI throughput to match. | All AI traffic paused. Human-only operation for any in-flight work. No new AI requests accepted. |
| E — Compromised | Evidence of Judge model poisoning, adversarial manipulation of evaluation criteria, or score manipulation | Disable Judge. Guardrails only. Alert team. Investigate before restoring. | Activate circuit breaker. Route to non-AI fallback. Incident response. Do not trust any recent Judge scores — review outputs that passed since potential compromise. | Full stop. Forensic analysis of Judge model, evaluation prompts, and score history. All outputs evaluated by compromised Judge must be re-reviewed by humans. Regulators notified. |
Judge Transition Triggers¶
| Transition | Recommended Triggers |
|---|---|
| P → A | Judge latency >3x baseline for 5 min; error rate >5% of evaluations; score distribution shifts >2 standard deviations from calibration baseline |
| A → C | Judge health check failing; >50% of evaluation requests returning errors; complete unresponsiveness for >60s |
| C → E | Anomalous score patterns (e.g., all outputs scoring identically); evidence of prompt injection in evaluation chain; Judge model integrity check fails |
| Recovery: C/E → P | Judge model reloaded from known-good checkpoint, validated against test suite, calibration confirmed against baseline, monitoring confirms normal operation for >1 hour |
Human Oversight — Internal PACE¶
Human Oversight is the Contingency layer: slow, expensive, but brings judgment that automated systems lack. Its failure modes are organisational — staffing gaps, queue overflow, fatigue — not technical.
| PACE | Condition | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|---|
| P — Normal | Review queue staffed, items processed within SLA | Async review of flagged items. Next business day acceptable. | Dedicated reviewers with domain knowledge. SLA-bound response times. | Domain experts with regulatory knowledge. Dual approval for irreversible actions. SLA measured in minutes, not hours. |
| A — Degraded | Primary reviewer unavailable (leave, illness, turnover) | Flagged items queue until reviewer available. Acceptable backlog: 1–2 business days. | Escalate to secondary reviewer pool. Extend SLA by defined buffer (e.g., 2x) but don't skip review. | On-call escalation to alternate domain expert. No action proceeds without approval. If no alternate available within SLA, escalate to C. |
| C — Overloaded | Review queue exceeds capacity — flood of flags from Judge or guardrail changes | Throttle AI throughput to match review capacity. Extend queue SLA. | Tighten Judge thresholds to reduce flag volume (fewer borderline cases flagged). Activate additional reviewers from trained pool. If still overloaded, throttle AI throughput. | Constrain agent scope to reduce action volume. Activate crisis staffing from pre-identified pool. If still overloaded, move to Supervised phase (agent proposes, human approves every action). |
| E — Unavailable | No reviewers available (incident, holiday gap, mass resignation) | Disable AI features requiring review. Guardrails and Judge operate autonomously for remaining features. | Switch to automated-only: Judge must auto-approve/reject with conservative thresholds. Accept higher false-positive rate (more blocks) in exchange for no human review. Flag for urgent staffing resolution. | Suspend all AI operations requiring human approval. No exceptions. In-flight agent actions completed or rolled back per transaction resolution matrix. Full stop until oversight restored. |
Human Oversight Transition Triggers¶
| Transition | Recommended Triggers |
|---|---|
| P → A | Primary reviewer unavailable; queue wait time >1.5x SLA |
| A → C | Queue size >3x normal; wait time >2x SLA; multiple reviewers unavailable simultaneously |
| C → E | Zero qualified reviewers available; queue wait time >5x SLA with no resolution timeline |
| Recovery: E → P | Reviewers available and confirmed; queue backlog cleared or triaged; SLA performance confirmed for >4 hours |
Cross-Layer PACE: Architecture-Level Resilience¶
When individual layer PACE has been exhausted — the layer is at its Emergency state — the architecture-level PACE activates:
| Condition | Architecture Response |
|---|---|
| Guardrails at E (compromised) | Judge becomes sole automated defence. Human oversight scope expanded. Evaluate whether to activate circuit breaker based on Judge coverage and confidence. |
| Judge at E (compromised) | Guardrails continue blocking known-bad. Human oversight absorbs all quality assurance. Reduce AI throughput to match human capacity. |
| Human Oversight at E (unavailable) | Guardrails and Judge operate without human backstop. At Tier 2+, tighten all automated thresholds. At Tier 3, activate circuit breaker — automated-only operation is not acceptable for regulated decisions. |
| Two or more layers at E simultaneously | Circuit breaker activates immediately, regardless of tier. Route to non-AI fallback. This is a systemic failure requiring incident response. |
| Circuit breaker activated | Non-AI fallback path serves traffic. All AI components isolated. Incident response team assembled. Recovery requires layer-by-layer restoration with validation at each step. |
The Non-AI Fallback Path¶
Every system at Tier 2 or above must have a documented, tested, and maintained non-AI fallback path. This is the last line of defence and must not share dependencies with the AI system.
| Aspect | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| What it is | The manual process the AI replaced | Rule-based system or templated responses | Staffed parallel process with trained operators |
| Who maintains it | Same team that runs the AI | Designated owner with quarterly review | Dedicated operational resilience function |
| Dependencies | Must not depend on the AI model or guardrail engine | Must not depend on any AI infrastructure component | Must not share any infrastructure with the AI system |
| Testing | Annually: confirm it still works | Quarterly: run production-equivalent traffic through it | Monthly: operate in parallel for a defined period |
| Activation | Manual (feature flag, deployment rollback) | Automated (circuit breaker with health checks) | Automated (circuit breaker) with manual confirmation within defined window |
| Capacity | Best effort | Must handle 100% of AI traffic at degraded quality | Must handle critical subset at production quality |
AI Runtime Behaviour Security, 2026 (Jonathan Gill).