Quantitative Risk Assessment for AI Controls¶

How the three-layer pattern reduces residual risk — with worked examples across all four tiers.

Part of the AI Runtime Behaviour Security Version 1.0 · February 2026 · Jonathan Gill

NIST AI RMF Alignment¶

This document implements activities from all four functions of the NIST AI Risk Management Framework (AI RMF 1.0). If your organisation already uses NIST AI RMF, this risk assessment plugs directly into your existing process.

NIST AI RMF Function	Subcategories	What This Document Provides
GOVERN	1.3, 1.4, 1.5	Risk management process structure; risk tolerance expressed as quantitative residual risk thresholds; recalibration schedule
MAP	2.1, 3.1, 3.2	AI system risk categorisation (four tiers); threat identification per scenario; lifecycle risk across transaction volumes
MEASURE	1.1, 1.2, 2.1, 2.2, 2.3, 2.6	Quantitative risk metrics; per-layer effectiveness measurement; residual risk calculation; recalibration methodology
MANAGE	1.1, 1.3, 2.1, 2.2, 2.4	Control selection proportionate to risk tier; compensating controls; PACE fail postures per tier; incident-driven recalculation

The methodology below follows the NIST RMF lifecycle: identify threats (MAP), measure control effectiveness (MEASURE), calculate residual risk for governance decisions (GOVERN), and define response actions when risk exceeds appetite (MANAGE).

For detailed infrastructure control mappings to all 51 NIST AI RMF subcategories, see NIST AI RMF Mapping.

Why Quantify Control Effectiveness¶

Most AI security guidance says "add guardrails" or "implement human oversight" without answering the question that risk committees actually ask: how much does each layer reduce the probability of harm, and what residual risk remains?

This document provides a quantitative model for answering that question. It uses illustrative effectiveness rates — not empirically validated benchmarks — to demonstrate the methodology. Your actual rates will depend on your implementation quality, threat landscape, and operational maturity. The point is the approach, not the specific numbers.

Important: The effectiveness percentages in this document are illustrative. They exist to demonstrate how layered controls compound to reduce residual risk. Your organisation should measure actual effectiveness through red teaming, Judge accuracy calibration (see Judge Assurance), and incident data. Replace the illustrative rates with your measured rates as they become available.

The Layered Control Model¶

Each layer in the three-layer pattern operates independently. When one layer misses a threat, the next layer has an independent opportunity to catch it. This is the same principle behind defence in depth in traditional security — except here we can model the compounding effect mathematically.

Illustrative Effectiveness Rates¶

Layer	Effectiveness	What This Means	Scope
Guardrails	~90%	Catches 90% of issues that reach it — known patterns, policy violations, format errors	Every transaction, real-time
LLM-as-Judge	~95%	Catches 95% of issues in the transactions it evaluates — semantic violations, subtle policy breaches, quality failures	Sampled or full coverage depending on tier
Human Oversight	~98%	Catches 98% of issues surfaced to reviewers — edge cases, nuanced judgement calls, novel threats	Flagged transactions + sampling

How Layers Compound¶

When all three layers are active and operating independently on a transaction:

P(issue reaches customer) = P(miss guardrail) × P(miss judge) × P(miss human)
                          = (1 - 0.90) × (1 - 0.95) × (1 - 0.98)
                          = 0.10 × 0.05 × 0.02
                          = 0.0001
                          = 0.01%

One in ten thousand. Compare this to any single layer alone:

Configuration	Residual Risk	Improvement Factor
No controls	100%	—
Guardrails only	10%	10×
Guardrails + Judge	0.5%	200×
Guardrails + Judge + Human	0.01%	10,000×

This is why the framework insists on layered controls. Each layer alone is insufficient. Together, they achieve orders-of-magnitude risk reduction.

Critical Caveat: Independence Assumption¶

This model assumes layers fail independently — a threat that bypasses guardrails is not inherently more likely to bypass the Judge. This holds when:

The Judge uses a different model than the task agent
Human reviewers have domain expertise beyond what the AI provides
Each layer uses different detection methods (pattern matching vs. semantic evaluation vs. human judgement)

If your Judge uses the same model as your task agent, or your human reviewers rubber-stamp AI outputs, the independence assumption breaks and your actual residual risk is higher than this model predicts.

Worked Example: Customer Product Chatbot (HIGH Tier)¶

System Description¶

A customer-facing AI chatbot that helps customers browse products, compare options, and complete purchases. The system has:

Product catalog access (read) — prices, specifications, availability
Shopping cart management (write) — add/remove items, apply promotions
Payment processing (write via API) — charge customer payment methods
Order management (write) — create and confirm orders
Customer account access (read) — order history, saved addresses, payment methods

Risk Classification¶

Dimension	Assessment	Rationale
Decision authority	High — takes actions (processes payments, creates orders)
Reversibility	Medium — payments can be refunded but create operational cost
Data sensitivity	High — PII, payment data, purchase history
Audience	Critical — external customers
Scale	High — thousands of transactions per day
Regulatory	Medium — consumer protection, PCI-DSS adjacency

Assigned tier: HIGH (payment processing pushes toward CRITICAL, but payment gateway provides independent validation — see compensating controls)

Threat Scenarios¶

Five failure modes that matter for this system, with per-layer analysis across 1,000 transactions.

Scenario 1: Prompt Injection — Price Manipulation¶

Threat: Attacker crafts input that causes the chatbot to apply unauthorized discounts, override pricing, or bypass payment validation.

Inherent likelihood: ~20 attempts per 1,000 transactions (2%). Most e-commerce chatbots see regular probing from both malicious actors and curious users.

Layer	Detection Method	Effectiveness	Catches	Misses
Guardrails	Pattern matching for injection signatures, encoding detection, input normalisation	90%	18.0	2.0
Judge	Semantic evaluation — "did the chatbot apply pricing outside catalog parameters?"	95%	1.9	0.1
Human	Review flagged transactions, price anomaly alerts	98%	0.098	0.002

Residual: 0.002 successful price manipulations per 1,000 transactions = 1 in 500,000 transactions

Metric	Value
Inherent risk (no controls)	20 per 1,000
Residual risk (all layers)	0.002 per 1,000
Risk reduction factor	10,000×
Annualised (1M transactions/year)	~2 incidents

Scenario 2: Hallucinated Product Information¶

Threat: Chatbot fabricates product specifications, availability, pricing, or warranty terms that don't match the catalog. Customer makes a purchase decision based on wrong information.

Inherent likelihood: ~50 per 1,000 transactions (5%). Hallucination rates vary by model and RAG implementation quality, but product attribute errors are common when the model generates rather than retrieves.

Layer	Detection Method	Effectiveness	Catches	Misses
Guardrails	Output validation against product catalog API — price match, spec match, availability check	90%	45.0	5.0
Judge	Semantic grounding evaluation — "are all stated product attributes supported by the catalog source?"	95%	4.75	0.25
Human	Review flagged hallucination alerts, spot-check random transactions	98%	0.245	0.005

Residual: 0.005 undetected hallucinations per 1,000 transactions = 1 in 200,000 transactions

Metric	Value
Inherent risk (no controls)	50 per 1,000
Residual risk (all layers)	0.005 per 1,000
Risk reduction factor	10,000×
Annualised (1M transactions/year)	~5 incidents

Scenario 3: PII Leakage¶

Threat: Chatbot includes another customer's personal data, order history, or payment details in a response. Could occur through context window contamination, shared session state, or RAG retrieval pulling the wrong customer's data.

Inherent likelihood: ~10 per 1,000 transactions (1%). Lower base rate than hallucination, but higher impact per incident.

Layer	Detection Method	Effectiveness	Catches	Misses
Guardrails	Output PII scanner — regex + ML for names, addresses, card numbers, account IDs not belonging to the current session	90%	9.0	1.0
Judge	Cross-reference evaluation — "does the response contain any data elements not attributable to the requesting customer?"	95%	0.95	0.05
Human	Review PII alerts, periodic audit of cross-customer data access patterns	98%	0.049	0.001

Residual: 0.001 PII leakage incidents per 1,000 transactions = 1 in 1,000,000 transactions

Metric	Value
Inherent risk (no controls)	10 per 1,000
Residual risk (all layers)	0.001 per 1,000
Risk reduction factor	10,000×
Annualised (1M transactions/year)	~1 incident

Scenario 4: Unauthorized or Incorrect Payment¶

Threat: Chatbot processes a payment for the wrong amount, charges the wrong payment method, or initiates a transaction the customer didn't authorize.

Inherent likelihood: ~5 per 1,000 transactions (0.5%). Lower base rate because payment flows are typically more structured, but highest financial impact per incident.

Layer	Detection Method	Effectiveness	Catches	Misses
Guardrails	Amount validation against cart total, payment method confirmation check, duplicate transaction detection	90%	4.5	0.5
Judge	Transaction integrity evaluation — "does the payment amount, method, and authorization match the conversation flow?"	95%	0.475	0.025
Human	Payment anomaly queue, high-value transaction review	98%	0.0245	0.0005

Residual: 0.0005 unauthorized payments per 1,000 transactions = 1 in 2,000,000 transactions

Metric	Value
Inherent risk (no controls)	5 per 1,000
Residual risk (all layers)	0.0005 per 1,000
Risk reduction factor	10,000×
Annualised (1M transactions/year)	~0.5 incidents

Scenario 5: Inappropriate or Harmful Response¶

Threat: Chatbot generates offensive content, makes inappropriate recommendations, provides dangerous advice (e.g., regarding product use), or behaves in a way that damages brand reputation.

Inherent likelihood: ~15 per 1,000 transactions (1.5%). Includes both adversarial prompting and edge cases where the model's training produces unexpected outputs in a commercial context.

Layer	Detection Method	Effectiveness	Catches	Misses
Guardrails	Content policy filter, toxicity classifier, brand guideline checker	90%	13.5	1.5
Judge	Tone and policy evaluation — "is this response appropriate for a customer-facing commercial interaction?"	95%	1.425	0.075
Human	Escalation queue review, customer complaint correlation	98%	0.0735	0.0015

Residual: 0.0015 inappropriate responses per 1,000 transactions = 1 in ~667,000 transactions

Metric	Value
Inherent risk (no controls)	15 per 1,000
Residual risk (all layers)	0.0015 per 1,000
Risk reduction factor	10,000×
Annualised (1M transactions/year)	~1.5 incidents

Combined Risk Summary — Product Chatbot¶

Threat	Inherent (per 1K)	Residual (per 1K)	Annualised (1M txn)	Severity
Prompt injection — price manipulation	20	0.002	~2	High
Hallucinated product info	50	0.005	~5	Medium
PII leakage	10	0.001	~1	Critical
Unauthorized payment	5	0.0005	~0.5	Critical
Inappropriate response	15	0.0015	~1.5	Medium
Total	100	0.0100	~10

Interpretation: Without controls, roughly 10% of transactions would have some form of issue. With all three layers active, the residual rate drops to approximately 0.001% — about 10 incidents per year at 1M transactions. Of those, the critical-severity incidents (PII, unauthorized payment) are expected at fewer than 2 per year.

Compensating Controls¶

The three-layer AI pattern does not operate in isolation. Existing infrastructure provides independent controls that further reduce residual risk. These are not substitutes for AI-specific controls — they are additional layers in the overall defence.

For the Product Chatbot¶

Compensating Control	What It Catches	Independence From AI Layers
Payment gateway validation	Amount limits, card verification, 3D Secure, duplicate detection	Operates at payment infrastructure level — catches errors regardless of what the chatbot sends
API input validation	Malformed requests, out-of-range values, schema violations	Application layer — rejects structurally invalid API calls before they reach backend systems
Fraud detection system	Anomalous transaction patterns, velocity checks, device fingerprinting	Operates on transaction data, not chatbot outputs — independent signal
Rate limiting (API gateway)	Bulk exploitation, automated attacks, enumeration	Network/infrastructure level — limits blast radius regardless of individual transaction success
Order confirmation workflow	Customer verifies order details before final payment	Human-in-the-loop at the customer level — the customer themselves is a control
Inventory management system	Prevents fulfillment of out-of-stock items, catches quantity errors	Backend system of record — chatbot hallucination about availability is caught at fulfillment
Refund/chargeback process	Enables recovery from payment errors	Not a preventive control, but reduces financial impact of residual failures

Adjusted Residual Risk with Compensating Controls¶

Taking the two highest-severity scenarios and applying compensating controls:

Unauthorized payment (0.0005 per 1,000 AI-layer residual):

Additional Layer	Catches	Remaining
Payment gateway validation (amount, card, 3DS)	~95% of remaining	0.000025
Fraud detection system	~80% of remaining	0.000005
Customer order confirmation	~90% of remaining	0.0000005

Effective residual with compensating controls: ~1 in 2 billion transactions

PII leakage (0.001 per 1,000 AI-layer residual):

Additional Layer	Catches	Remaining
API response filtering (DLP at gateway)	~85% of remaining	0.00015
Session isolation (infrastructure)	~70% of remaining	0.000045

Effective residual with compensating controls: ~1 in 22 million transactions

The point: Compensating controls don't excuse weak AI-specific controls. But when a risk committee asks "what's the realistic probability of a customer being charged incorrectly?" the answer includes the full control stack, not just the AI layers. Present both the AI-layer residual and the compensated residual.

Risk Tier Scenarios¶

The product chatbot is a HIGH-tier system. Here's how the same methodology applies across all four tiers, showing how control depth scales with risk.

LOW Tier: Public FAQ Chatbot¶

System: Answers general product questions from public documentation. No customer data access, no transaction capability, no personalization.

Control configuration: Guardrails only. No Judge (or optional 1-5% sampling). No human-in-the-loop (exception-only).

Threat	Inherent (per 1K)	Controls Applied	Residual (per 1K)
Hallucinated FAQ answer	30	Guardrails (90%): output grounding check against FAQ corpus	3.0
Inappropriate response	10	Guardrails (90%): content policy filter	1.0
Brand reputation harm	5	Guardrails (90%): tone checker	0.5

Residual: ~4.5 issues per 1,000 interactions

Why this is acceptable: No financial impact, no data exposure, no irreversible actions. The FAQ bot gives wrong or awkward answers ~0.45% of the time. Users can verify against the website. The cost of these failures is low. Adding Judge and HITL would improve accuracy but the investment isn't proportionate to the risk.

PACE posture: Primary only (fail-open with logging). If guardrails fail, the chatbot continues to operate but all outputs are logged for batch review.

MEDIUM Tier: Internal Document Assistant¶

System: Helps internal employees search and summarise company policy documents. Has access to internal knowledge base (read-only). Users are employees with domain knowledge who are expected to verify outputs.

Control configuration: Guardrails + Judge (5-10% sampling, batch daily). Human review on flags only.

Threat	Inherent (per 1K)	Controls Applied	Residual (per 1K)
Hallucinated policy detail	40	Guardrails (90%) + Judge on 10% sample (95%)	4.0 full coverage, ~3.8 effective*
PII in internal docs exposed incorrectly	8	Guardrails (90%) + Judge on 10% sample (95%)	0.8 full coverage, ~0.76 effective*
Confidential doc outside need-to-know	5	Guardrails (90%): access control check + Judge	0.5 full coverage, ~0.475 effective*

* Judge at 10% sampling catches 95% of the 10% it evaluates. Effective additional catch rate: 0.10 × 0.95 = 9.5% of guardrail misses.

How sampling affects the math:

With full Judge coverage:  P(miss) = 0.10 × 0.05 = 0.005 (0.5%)
With 10% Judge sampling:   P(miss) = 0.10 × (0.90 + 0.10 × 0.05) = 0.10 × 0.905 = 0.0905 (~9%)

Residual: ~5 issues per 1,000 interactions (with sampling)

Why this is acceptable: Internal users with domain expertise will catch most residual errors. The document assistant is an accelerator, not a decision-maker. Employees are expected to verify critical details against source documents. The Judge sampling catches systematic errors (drifting summaries, recurring hallucination patterns) even if it doesn't catch every individual instance.

PACE posture: P + A configured. If guardrails degrade, scope narrows to read-only retrieval (no summarisation). If Judge is unavailable, guardrail-only mode with increased human spot-checking.

HIGH Tier: Customer Product Chatbot¶

See the detailed worked example above.

Control configuration: Guardrails + Judge (20-50% coverage, near real-time) + Human oversight (flagged transactions + sampling).

Residual: ~0.01 issues per 1,000 transactions across all threat categories

PACE posture: P + A + C configured and tested. - Alternate: Judge down → guardrails remain active, all transactions flagged for human review queue, response latency accepted - Contingency: Guardrails degraded → chatbot enters "assisted browse" mode (read-only, no transactions), human reviews every interaction - Emergency: Multiple layers down → circuit breaker fires, chatbot replaced with static product pages + "contact us" fallback

CRITICAL Tier: Credit Decisioning System¶

System: AI evaluates loan applications and produces credit decisions that are auto-executed for standard cases. Decisions affect customer finances directly. Regulatory obligations (fair lending, adverse action notices).

Control configuration: Full three-layer deployment. Judge at 100% coverage, real-time. Human review of all adverse decisions and all decisions above a threshold.

Threat	Inherent (per 1K)	Controls Applied	Residual (per 1K)
Discriminatory decision	15	Guardrails (90%): protected-class input filtering + bias detection. Judge (95%): fairness evaluation per decision. Human (98%): all adverse decisions reviewed	0.0015
Hallucinated financial data	20	Guardrails (90%): data validation against bureau records. Judge (95%): source verification. Human (98%): sample review of all auto-approved	0.002
Incorrect risk score	30	Guardrails (90%): range and consistency checks. Judge (95%): independent risk recalculation on sample. Human (98%): all high-value reviewed	0.003
Regulatory violation	10	Guardrails (90%): compliance rule engine. Judge (95%): regulatory checklist evaluation. Human (98%): compliance officer review	0.001
Model drift — gradual accuracy degradation	5	Guardrails (90%): statistical drift detection. Judge (95%): decision distribution monitoring. Human (98%): monthly portfolio review	0.0005

Residual: ~0.008 issues per 1,000 decisions across all threat categories = 1 in 125,000 decisions

Compensating controls that further reduce risk: - Regulatory model validation — independent model risk management (OCC/Fed guidance) - Adverse action notice process — customer can challenge decisions, creating a feedback loop - Portfolio-level monitoring — statistical analysis catches systematic bias even if individual decisions pass - Audit trail requirements — every decision is logged with full reasoning chain for regulatory examination

PACE posture: Full PACE cycle with tested E→P recovery. - Alternate: Judge degraded → all decisions require human approval (no auto-execution) - Contingency: Multiple layers degraded → system enters "manual underwriting" mode — AI provides data retrieval only, all decisions made by human underwriters - Emergency: Circuit breaker → AI removed from decision path entirely, application queue held, existing commitments honoured through manual process

Cross-Tier Summary¶

Tier	Example	Layers Active	Judge Coverage	Inherent Risk (per 1K)	Residual Risk (per 1K)	Reduction Factor
LOW	Public FAQ	Guardrails	Optional 1-5%	~45	~4.5	10×
MEDIUM	Internal docs	Guardrails + Judge (sampled)	5-10%	~53	~5.0	10×
HIGH	Product chatbot	All three	20-50%	~100	~0.01	10,000×
CRITICAL	Credit decisions	All three (full)	100%	~80	~0.008	10,000×

Key insight: The jump from 10× to 10,000× risk reduction happens when the Judge moves from sampling to substantial coverage and Human Oversight moves from exception-only to systematic review. This is why the framework requires full three-layer deployment for HIGH and CRITICAL tiers.

What These Numbers Do Not Tell You¶

1. Severity is not uniform. One PII leakage incident may matter more than fifty hallucinated FAQ answers. The residual risk numbers are per-incident counts, not impact-weighted. Weight your residual risk by impact severity when reporting to risk committees.

2. Effectiveness rates change over time. Adversaries adapt. Models drift. Guardrail bypass techniques evolve. The 90/95/98 rates are a snapshot. Schedule quarterly recalibration through: - Red team exercises against guardrails - Judge accuracy measurement against labelled datasets (see Judge Assurance) - Human reviewer agreement studies

3. Correlated failures break the model. If a novel attack technique bypasses both guardrails AND the Judge (because both rely on similar detection approaches), the independence assumption fails and residual risk is higher than predicted. This is why the framework emphasises different models, different methods, and different perspectives across layers.

4. The "unknown unknown" isn't modelled. This analysis covers known threat categories. Novel failure modes — threats you haven't imagined — are not captured. The Judge layer's semantic evaluation and Human Oversight provide some coverage for novel threats, but the model cannot quantify what it cannot anticipate. This is the fundamental argument for defence in depth: you need layers precisely because you can't predict everything.

5. Compensating controls have their own failure rates. The payment gateway, fraud detection system, and API validation layer can all fail too. A complete risk assessment would model these as additional independent layers with their own effectiveness rates, producing a full probability tree. The simplified analysis above is sufficient for directional decision-making.

Using This in Practice¶

For Risk Committees¶

If your organisation uses NIST AI RMF, frame this assessment in those terms:

What You're Presenting	NIST RMF Function	Language to Use
Threat scenarios and likelihood	MAP	"We've identified and categorised the AI-specific risks for this system"
Per-layer effectiveness data	MEASURE	"We've measured control effectiveness through red teaming and Judge calibration"
Residual risk calculation	MEASURE	"Residual risk after all control layers is X per Y transactions"
Risk appetite comparison	GOVERN	"This residual risk is within/outside our stated risk tolerance"
Compensating controls and PACE postures	MANAGE	"We have compensating controls and defined degradation paths when controls fail"

Present two numbers: 1. AI-layer residual risk — what's left after guardrails, Judge, and human oversight 2. Compensated residual risk — what's left after existing infrastructure controls also apply

Frame the discussion around whether the compensated residual risk is within appetite, not whether it's zero. It will never be zero.

For Engineering Teams¶

Use the per-scenario tables to: - Prioritise control implementation — highest inherent likelihood × severity first - Justify Judge coverage levels — show the math on sampling vs. full coverage - Identify where compensating controls reduce urgency — payment gateway validation may mean you can deploy with guardrails-only initially while building out the Judge layer

For Incident Response¶

When an incident occurs, update the effectiveness rates: - If a prompt injection bypasses guardrails, your guardrail effectiveness for that attack class drops - Recalculate residual risk with updated rates - Determine whether the remaining layers still bring residual risk within appetite - If not, implement additional controls or reduce system scope

Recalibration Schedule¶

Activity	Frequency	Updates
Red team guardrail testing	Quarterly	Guardrail effectiveness rate
Judge accuracy evaluation	Quarterly	Judge effectiveness rate
Human reviewer agreement study	Bi-annually	Human oversight effectiveness rate
Incident-driven recalculation	Per incident	Specific scenario rates
Full risk assessment refresh	Annually	All rates, all scenarios, all tiers

Template: Applying This to Your System¶

For each AI system, complete this assessment. NIST AI RMF function labels are included so you can slot each step into your existing risk management process.

1. System description and tier classification (NIST RMF: MAP 1.1, MAP 2.1) - What does the system do? - What data does it access? What actions can it take? - What is the assigned risk tier and why?

2. Threat scenario identification (NIST RMF: MAP 3.1, MAP 3.2) - List 3-7 realistic failure modes - Estimate inherent likelihood per 1,000 transactions (use incident data, red team results, or informed estimates) - Rate severity: Critical / High / Medium / Low

3. Per-layer control analysis (NIST RMF: MEASURE 1.1, MEASURE 2.1, MEASURE 2.6) - For each scenario, describe what each layer detects and how - Apply your measured or estimated effectiveness rates - Calculate residual risk

4. Compensating controls (NIST RMF: MANAGE 1.1, MANAGE 2.2) - List existing infrastructure controls that independently reduce risk - Estimate their effectiveness against each scenario - Calculate compensated residual risk

5. Appetite comparison (NIST RMF: GOVERN 1.5, MANAGE 2.4) - Does the compensated residual risk fall within your risk appetite? - If not, what additional controls or scope reductions are needed?

6. Recalibration plan (NIST RMF: MEASURE 2.3, GOVERN 1.4) - When will you re-measure effectiveness rates? - What triggers an unscheduled reassessment?

AI Runtime Behaviour Security, 2026 (Jonathan Gill).