Risk & Governance¶
CROs, Risk Managers, GRC Teams — how to quantify AI risk, set appetite, and demonstrate control effectiveness to the board.
Executive Summary¶
Your organisation is deploying AI systems that behave differently every time they run. Traditional risk frameworks assume deterministic systems — test it, prove it works, assign a rating, move on. AI breaks that assumption. The same system can pass every test today and hallucinate a medical dosage, leak customer data, or approve an unauthorised transaction tomorrow.
Three things the board needs to know:
- You can quantify AI risk the same way you quantify any operational risk — inherent risk, control effectiveness per layer, residual risk, annualised frequency. This framework provides the methodology.
- Controls exist and are measurable. The industry has converged on a layered pattern: automated guardrails (real-time), independent AI evaluation (async), human oversight (as needed), and circuit breakers (immediate). Each layer's effectiveness can be measured through red team testing, evaluation datasets, and agreement studies.
- Failure modes are defined in advance. Every control layer has a predetermined degradation path — from full operation to supervised-only to full stop. The system doesn't fail silently. It transitions to a safe state you've already approved.
The ask: Classify your AI systems by risk tier, set quantified appetite per tier, and require measured control evidence — not assertions. This page shows you how.
Time to read: 5 minutes. Time to act: Start Monday.
Five Runtime Control Principles¶
The framework is built on five principles. Every control, every metric, every degradation path follows from these:
- Prevent known-bad in real time. Guardrails block prompt injection, data leakage, and policy violations before the AI responds. This is your first line — fast, pattern-based, always on.
- Detect unknown-bad independently. A separate AI evaluates outputs the guardrails passed. It catches semantic violations, hallucinated facts, and subtle policy breaches that pattern matching misses.
- Keep humans in the loop where it matters. Not on every transaction — on the ones that matter. High-risk decisions, edge cases, and flagged outputs route to qualified reviewers with defined SLAs.
- Fail to a safe state, not to silence. When controls degrade, the system doesn't continue unchecked. It transitions through predetermined states — each with defined risk implications — down to full stop if needed.
- Measure everything. Assert nothing. Control effectiveness is quantified through red team testing, evaluation datasets, and human agreement studies. "We have guardrails" is not a risk treatment.
How the Layers Work Together¶
Three independent control layers, each catching what the previous one missed:
| Layer | Function | Speed | What It Catches |
|---|---|---|---|
| Guardrails | Block known-bad patterns | Real-time (<100ms) | Prompt injection, PII leakage, policy violations |
| AI Evaluation | Assess output quality and safety | Near-real-time | Hallucination, semantic policy breaches, tone violations |
| Human Oversight | Review flagged and sampled outputs | Hours to days | Edge cases, novel risks, calibration drift |
Circuit breakers sit across all three: if any layer detects a critical failure or becomes unavailable, the system transitions to a predetermined safe state.
What This Adds That Existing Standards Don't¶
You already have NIST AI RMF, ISO 42001, and the EU AI Act. This framework isn't competing with them — it fills three gaps they don't address:
| Gap | What's Missing in Standards | What This Framework Provides |
|---|---|---|
| Runtime behaviour | Standards focus on design-time risk assessment and pre-deployment testing. They don't specify how to monitor and control AI after it's deployed, in production, under real conditions. | A layered runtime control architecture with quantified per-layer effectiveness |
| Defined failure modes | Standards require "robustness" and "resilience" but don't define what happens when specific control layers fail. | PACE degradation: four predetermined states (Primary → Alternate → Contingency → Emergency) with pre-approved risk implications |
| Multi-agent security | Standards address single AI systems. They don't cover agent-to-agent communication, delegated authority, tool access chains, or emergent behaviour in multi-agent workflows. | MASO: six control domains for multi-agent orchestration with risk-tiered implementation |
This framework is the implementation layer that sits between what standards require and what engineering teams build. It turns "implement risk management measures" (EU AI Act Art. 9) into "here are the three control layers, here's how to measure each one, and here's what happens when they fail."
Why AI Risk Is Different¶
AI introduces a risk category your frameworks weren't designed for: non-deterministic system behaviour.
Your existing risk models assume systems behave predictably. AI doesn't. The same input produces different outputs at different times. An AI system that passed every test last quarter might produce harmful outputs tomorrow — not because something broke, but because that's how the technology works.
Three questions your board is already asking:
| Board Question | What They Need | Where It Is |
|---|---|---|
| How much AI risk do we have? | Inventory and classification | Risk Tiers — six dimensions, four tiers |
| Are our controls actually working? | Measurement, not assertion | Risk Assessment — quantified per layer |
| What happens when controls fail? | Defined resilience posture | PACE Resilience — predetermined degradation |
The Cost of Doing Nothing¶
AI risk isn't theoretical. These are public, documented incidents:
- Air Canada (2024) — Chatbot fabricated a bereavement fare policy. Customer relied on it. Airline held liable by tribunal. No runtime monitoring detected the hallucination before the customer acted on it.
- Chevrolet dealership (2023) — AI chatbot agreed to sell a vehicle for $1. No guardrail prevented the commitment. No human oversight caught it.
- DPD (2024) — Customer service AI swore at customers and criticised the company. Went viral. No behavioural monitoring flagged the output before delivery.
- Samsung (2023) — Engineers pasted proprietary source code into an AI tool. Data exfiltrated to the model provider. No data loss prevention on the AI interface.
- Mata v. Avianca (2023) — Lawyer submitted AI-generated legal brief citing fabricated case law. Sanctioned by the court. No independent evaluation verified the AI's output.
Every one of these was preventable with controls this framework describes. Every one caused measurable financial, legal, or reputational damage.
Regulatory exposure is increasing:
| Regulation | AI Relevance | Penalty |
|---|---|---|
| EU AI Act | Risk management, human oversight, robustness requirements | Up to 7% annual global turnover |
| DORA | Digital operational resilience for AI in financial services | Regulatory sanctions, licence conditions |
| NIST AI RMF | US federal AI risk management expectations | Procurement disqualification, reputational |
| ISO 42001 | AI management system certification | Market access, contractual requirements |
The question isn't whether to manage AI risk. It's whether you have a defensible methodology when the regulator asks how.
What This Framework Gives You¶
Quantified residual risk — in language the board already uses¶
Your risk committee can read this without translation. The Risk Assessment methodology produces these outputs directly:
Example: HIGH-tier system (customer-facing chatbot, 1M transactions/year):
| Threat Scenario | Inherent Risk (per 1K) | Residual Risk (per 1K) | Annual Incidents |
|---|---|---|---|
| Prompt injection | 20 | 0.002 | ~2 |
| Hallucinated information | 50 | 0.005 | ~5 |
| PII leakage | 10 | 0.001 | ~1 |
| Unauthorised action | 5 | 0.0005 | ~0.5 |
This is inherent risk, control effectiveness, residual risk, annualised frequency. The same language you use for every other operational risk. Not "we have guardrails."
The methodology is aligned to NIST AI RMF (Govern, Map, Measure, Manage):
| NIST AI RMF Function | What the Framework Provides |
|---|---|
| GOVERN | Risk tolerance expressed as quantitative residual risk thresholds |
| MAP | Threat identification per scenario, per system, per tier |
| MEASURE | Per-layer effectiveness measurement with compounding calculations |
| MANAGE | Control selection proportionate to risk; defined fail postures per tier |
A classification scheme that maps to risk appetite¶
Four tiers. Six scoring dimensions. Each produces a measurable assessment, not a subjective rating:
| Dimension | What It Measures |
|---|---|
| Decision authority | Can the AI take action, or only advise? |
| Reversibility | Can outcomes be undone? At what cost? |
| Data sensitivity | PII, financial, health, legal, classified? |
| Audience | Internal employees or external customers? |
| Scale | Hundreds or millions of transactions? |
| Regulatory | Which regulatory obligations apply? |
The tier drives everything downstream: which controls apply, how much evaluation coverage, how often humans review, what resilience posture is required. Risk appetite is expressed through tier boundaries, not abstract statements.
Defined resilience for operational risk¶
Every control layer has a predetermined degradation path:
| State | What It Means | Risk Implication |
|---|---|---|
| Primary | All control layers operational | Within approved risk appetite |
| Alternate | Backup controls active, primary being restored | Elevated monitoring, risk still within tolerance |
| Contingency | Supervised-only mode, human approval required | Reduced throughput, risk contained |
| Emergency | AI traffic stopped, non-AI fallback active | No AI risk exposure, operational impact |
This maps directly to DORA, BCM, and operational risk appetite statements. The AI system doesn't fail silently. It transitions through states you've already approved, each with defined risk implications.
Regulatory mapping you can hand to auditors¶
Pre-built crosswalks to the standards your GRC team already tracks:
| Standard | Mapping |
|---|---|
| NIST AI RMF 1.0 | 51 subcategories mapped |
| ISO 42001 | Annex A alignment |
| EU AI Act | Art. 9, 14, 15 crosswalk |
| OWASP LLM Top 10 | Full control mapping |
| NIST CSF 2.0 | Function mapping |
Your Starting Path¶
| # | Document | Why You Need It | Time |
|---|---|---|---|
| 1 | Risk Tiers | The classification scheme — six dimensions, four tiers, governance approval gates | 15 min |
| 2 | Risk Assessment | Quantitative methodology — worked examples at every tier, NIST AI RMF aligned | 20 min |
| 3 | Controls | What each control layer does, so you can evaluate whether they're implemented correctly | 15 min |
| 4 | PACE Resilience | Operational resilience — defined fail postures and degradation paths | 15 min |
| 5 | AI Governance Operating Model | Organisational structure for AI risk governance | 20 min |
For regulated industries: Add High-Risk Financial Services and EU AI Act Crosswalk.
What You Can Do Monday Morning¶
-
Inventory and classify every AI system using the Risk Tiers six-dimension scoring. Most organisations don't know how many AI systems they're running or at what tier.
-
Define risk appetite per tier. Use the Risk Assessment template: "For HIGH-tier systems, we accept residual risk below X per 1,000 transactions." This converts abstract appetite into measurable thresholds.
-
Require quantified control evidence. Stop accepting "we have guardrails" as a risk treatment. Require the worked example format: inherent risk → per-layer effectiveness → residual risk → compensated residual.
-
Add resilience posture to your risk register. For each AI system, record which state it's in (Primary / Alternate / Contingency / Emergency) and what triggers transitions between states. This is your operational resilience evidence.
-
Schedule quarterly recalibration. Control effectiveness rates change. Require red team results, evaluation accuracy measurements, and human reviewer agreement studies to update residual risk calculations. The recalibration schedule is in the Risk Assessment document.
Anticipating Pushback¶
"We already have an AI risk framework." Does it quantify residual risk per control layer? Does it define what happens when controls fail? Does it map to NIST AI RMF at the subcategory level? This framework isn't competing with your existing GRC tooling — it provides the AI-specific risk methodology that plugs into it.
"The AI team says the risk is low." Risk classification is a governance function, not an engineering function. The six-dimension scoring is designed to be completed jointly by the AI team and the risk function. Engineers assess technical dimensions (reversibility, scale); risk assesses business dimensions (regulatory, data sensitivity, audience).
"We can't measure AI control effectiveness." You can. Guardrail effectiveness is measured through red team exercises. Evaluation layer accuracy is measured through labelled datasets. Human reviewer effectiveness is measured through agreement studies. The Risk Assessment methodology explains exactly how — and what to do with illustrative rates before you have measured ones.
"This is too complex for our current maturity." Start with the Cheat Sheet and apply tier classification only. Even classifying your AI systems into four tiers, without implementing any new controls, gives you a risk inventory you didn't have before. That alone is a board-reportable improvement.
AI Runtime Behaviour Security, 2026 (Jonathan Gill).