Beyond Security: Where This Architecture Transfers¶

This framework solves AI runtime security. Its architecture isn't limited to security.

The Observation¶

This framework was built to answer a security question: how do you control AI systems that are non-deterministic, operate at scale, and can fail in ways no test suite anticipated?

The answer it arrived at was structural:

Layer controls independently — rules-based detection, ML-based evaluation, human judgment — so no single failure is catastrophic.
Tier by impact — decision authority, reversibility, sensitivity, audience, scale, regulation — so controls are proportionate to what's at stake.
Quantify residual risk — measure what each layer catches, compound the misses, compare to appetite.
Define fail posture before deployment — Primary, Alternate, Contingency, Emergency — so degradation is planned, not improvised.
Scale controls to risk — more critical systems get more layers, more coverage, more formal governance.
Test continuously — controls degrade. Verify they still work.

None of these principles mention security. They describe how to build reliable, layered, proportional controls for any risk domain where the thing you're controlling is uncertain and the consequences of failure vary.

What Changes, What Doesn't¶

The architecture has two parts: the structural patterns and the domain content. Only the content is security-specific.

Structural Pattern	Security Content	The Pattern Itself
Three independent layers	Guardrails, Judge, Human	Detection → Evaluation → Judgment
Risk tiering	PII, injection, policy compliance	Impact dimensions → control depth
Quantitative risk model	P(injection miss), P(PII leak)	P(miss₁) × P(miss₂) × P(miss₃) = residual
PACE resilience	Guardrail bypass → Judge primary → circuit breaker	Primary fails → alternate activates → contingency holds → emergency stops
Graduated complexity	LOW: guardrails only → CRITICAL: all layers at 100%	Lower risk → fewer controls; higher risk → more controls
Phased implementation	Classify → guardrails → judge → human → PACE → test	Foundation → controls → resilience → verification

Swap the content. Keep the architecture. The framework still works.

How It Reads for Other AI Risks¶

This section doesn't prescribe controls for drift, fairness, or explainability. It shows what the structural patterns look like when you point them at a different problem. The reader does the thinking.

Model Drift¶

The problem: a model's accuracy degrades over time as the world changes. The inputs it was trained on no longer represent the inputs it receives.

Three layers, applied:

Layer	Security Version	Drift Version
Layer 1 (Detection)	Pattern-matching guardrails catch known-bad inputs	Statistical monitors catch distribution shift beyond threshold
Layer 2 (Evaluation)	LLM-as-Judge evaluates outputs for policy compliance	Validation pipeline evaluates predictions against labelled holdout set
Layer 3 (Judgment)	Human reviews escalated cases	Domain expert reviews flagged performance degradation and decides: retrain, adjust, or accept

PACE, applied:

Primary: Online drift detection + periodic retraining on fresh data.
Alternate: Freeze model, widen uncertainty bounds, increase human review.
Contingency: Revert to last validated model version.
Emergency: Route to non-AI decision path.

Tiering, applied: An internal search tool that drifts slightly is an inconvenience. A clinical decision support model that drifts silently is dangerous. Same architecture. Different tier. Different control depth.

Fairness¶

The problem: a model produces outcomes that systematically disadvantage a protected group — sometimes because the training data encoded historical bias, sometimes because a proxy variable correlates with protected attributes.

Three layers, applied:

Layer	Security Version	Fairness Version
Layer 1 (Detection)	Guardrails block prohibited inputs	Disparity monitors flag when outcome rates diverge beyond threshold across protected groups
Layer 2 (Evaluation)	Judge evaluates output quality and compliance	Bias measurement pipeline evaluates model decisions against fairness metrics on sampled cohorts
Layer 3 (Judgment)	Human reviews escalated findings	Equity review board assesses whether statistical disparity reflects genuine bias or legitimate signal

PACE, applied:

Primary: Continuous disparity monitoring with automated alerts.
Alternate: Freeze model retraining, flag all decisions in affected cohort for manual review.
Contingency: Disable model for affected decision category, route to rule-based fallback.
Emergency: Halt all automated decisions, revert to fully human process.

Tiering, applied: A content recommendation model with slight demographic skew is low-tier. An automated hiring screen with disparate impact on a protected class is critical-tier. The architecture scales identically.

Explainability¶

The problem: a model produces a decision but cannot adequately explain why. In regulated domains, "the model said so" is not sufficient. In high-stakes domains, humans need to understand the reasoning to trust it — or override it.

Three layers, applied:

Layer	Security Version	Explainability Version
Layer 1 (Detection)	Guardrails validate input/output format and content	Explanation validators check that every decision includes a structured rationale meeting minimum criteria
Layer 2 (Evaluation)	Judge evaluates output against policy	Explanation quality scorer evaluates whether the rationale is consistent, complete, and faithful to the model's actual decision factors
Layer 3 (Judgment)	Human reviews escalated cases	Domain expert assesses whether the explanation is genuinely interpretable — not just present, but useful

PACE, applied:

Primary: Full explanation generation with automated quality checks.
Alternate: Simplified explanation from pre-approved rationale templates.
Contingency: Flag decision as "explanation unavailable," route to manual justification.
Emergency: Halt autonomous decisions, require human-authored rationale for each action.

Tiering, applied: An internal summarisation tool that doesn't explain its choices is fine. An autonomous loan denial that can't articulate its reasoning violates regulatory requirements. Same architecture. Different tier. Different obligation.

Reliability¶

The problem: a model produces outputs that are inconsistent, contradictory, or confidently wrong — hallucination, confabulation, or simply unreliable performance under edge conditions.

Three layers, applied:

Layer	Security Version	Reliability Version
Layer 1 (Detection)	Guardrails catch known-bad patterns	Consistency checks flag outputs that contradict source material, prior outputs, or known facts
Layer 2 (Evaluation)	Judge evaluates policy compliance	Grounding evaluator assesses whether outputs are supported by retrieved evidence and internally coherent
Layer 3 (Judgment)	Human reviews escalated findings	Domain expert reviews flagged outputs for factual accuracy and coherence

PACE, applied:

Primary: Real-time consistency validation against source documents.
Alternate: Reduce model autonomy — present outputs as drafts requiring human confirmation.
Contingency: Switch to retrieval-only mode (return source documents, don't generate).
Emergency: Disable generative capability entirely.

The Quantitative Model Transfers Directly¶

The risk assessment methodology calculates residual risk as the product of independent miss rates across layers. This calculation doesn't care what you're detecting.

For security: What is the probability that a prompt injection passes the guardrail, is missed by the Judge, and is not caught by the human reviewer?

For drift: What is the probability that an accuracy degradation exceeds the threshold, is missed by the statistical monitor, is not caught by the validation pipeline, and is not flagged by the domain expert?

For fairness: What is the probability that a disparate impact emerges, is missed by the disparity monitor, is not caught by the bias measurement pipeline, and is not identified by the equity review board?

Same math. Different inputs. The residual risk calculation, the severity weighting, the recalibration cycle — all of it transfers without modification.

What This Framework Does Not Do¶

This framework does not provide domain-specific controls for drift, fairness, explainability, or reliability. It does not tell you which statistical test detects distribution shift, which fairness metric to use, or how to generate faithful explanations. Those are domain problems with mature, domain-specific tooling.

What it provides is the control architecture — the structural reasoning about how to layer defences, how to tier by impact, how to quantify what gets through, and how to degrade gracefully when controls fail. That architecture is domain-agnostic because it describes how to control, not what to control.

If you are building controls for AI risks beyond security, the framework offers a structural starting point:

Classify the system using the same impact dimensions — they are not security-specific.
Layer your controls so they fail independently — use different mechanisms at each layer.
Quantify your residual risk using the same compounding model — measure, don't assume.
Define your fail posture using the same PACE methodology — decide before deployment what happens when controls degrade.
Scale to the risk — not every AI system needs maximum controls for every risk dimension.

The security content in this framework is one instantiation of the architecture. Drift, fairness, explainability, and reliability are others. The architecture holds.

The Bottom Line¶

This framework was built for AI security. Its architecture — layered independence, proportional tiering, quantitative compounding, defined fail posture — was not built for any single risk domain. It was built for the structural problem of controlling systems that are uncertain, non-deterministic, and consequential.

Security is where we started. It is not where the architecture ends.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).