AI Security Operational Metrics¶

How do you know if the framework is working? This document defines the metrics to track.

Metrics Framework¶

The Three Questions¶

Every metric should help answer one of these questions:

Is the AI system working correctly? (Quality metrics)
Is the AI system secure? (Security metrics)
Is the control framework effective? (Assurance metrics)

Quality Metrics¶

Q1. Output Accuracy¶

Metric	Definition	Target by Tier	Alert Threshold
Hallucination rate	% of outputs containing unsupported claims (detected by Judge)	CRITICAL: <1%, HIGH: <3%, MEDIUM: <5%	>2x target
Grounding score	% of claims that can be traced to source data	CRITICAL: >95%, HIGH: >90%	<target - 5%
Factual error rate	% of outputs with verifiable factual errors	CRITICAL: <0.5%, HIGH: <1%	>2x target

Q2. Output Quality¶

Metric	Definition	Target	Alert Threshold
User satisfaction	Rating from user feedback (1-5)	>4.0	<3.5
Task completion rate	% of interactions that achieve user goal	>85%	<75%
Escalation rate	% of interactions escalated to human	Varies by tier	>2x baseline
Baseline drift	Deviation from known-good baseline outputs	<10%	>20%

Q3. Availability¶

Metric	Definition	Target	Alert Threshold
Uptime	% of time system is available	>99.5%	<99%
Response latency (p50)	Median response time	<2s	>5s
Response latency (p99)	99th percentile response time	<10s	>30s
Error rate	% of requests returning errors	<1%	>3%

Security Metrics¶

S1. Guardrail Effectiveness¶

Metric	Definition	Target	Alert Threshold
Input guardrail block rate	% of inputs blocked	Baseline ± 2σ	>3σ from baseline
Output guardrail block rate	% of outputs blocked	Baseline ± 2σ	>3σ from baseline
False positive rate	% of blocks that were incorrect (sampled)	<5%	>10%
Bypass detection rate	% of known-bad inputs blocked (adversarial testing)	>95%	<90%
Time to guardrail update	Days between new threat and guardrail update	<7 days	>14 days

S2. Attack Detection¶

Metric	Definition	Target	Alert Threshold
Prompt injection detection rate	% of injection attempts detected (any layer)	>90%	<80%
Mean time to detect (MTTD)	Time from attack to detection	<1 hour	>4 hours
Attack volume trend	Week-over-week change in attack attempts	Track only	>50% increase

S3. Data Protection¶

Metric	Definition	Target	Alert Threshold
PII leakage incidents	Count of PII in outputs not caught by guardrails	0	Any
Cross-user leakage incidents	Count of data appearing in wrong user's context	0	Any
Context isolation verification	% of sessions verified for isolation	100% (CRITICAL)	<100%

Assurance Metrics¶

A1. Judge Effectiveness¶

Metric	Definition	Target	Alert Threshold
Judge coverage	% of interactions evaluated by Judge	Per tier target	<target
Judge latency	Time from interaction to evaluation	<5 min (CRITICAL), <1 hr (HIGH)	>2x target
Judge accuracy	% agreement with human reviewers (sampled)	>90%	<85%
Finding rate	% of evaluated interactions with findings	Track trend	>2x baseline
Finding backlog	Count of findings awaiting human review	<24 hrs worth	>48 hrs worth

A2. HITL Effectiveness¶

Metric	Definition	Target	Alert Threshold
Override rate	% of AI recommendations overridden by HITL	Track trend	<5% (may indicate rubber-stamping)
Decision time	Average time for HITL to review and decide	Benchmark	<50% of benchmark
Canary detection rate	% of planted test cases caught by HITL	>95% (CRITICAL), >90% (HIGH)	<target
Inter-reviewer agreement	% agreement between different reviewers (sampled)	>85%	<75%
Reviewer fatigue index	Quality decline over shift (measured by canary performance)	No decline	Significant decline

A3. Control Coverage¶

Metric	Definition	Target	Alert Threshold
Systems with guardrails	% of AI systems with guardrails deployed	100%	<100%
Systems with Judge	% of HIGH+ systems with Judge coverage	100%	<100%
Control testing coverage	% of controls tested in last quarter	100%	<80%
Audit findings open	Count of open audit findings	0 critical/high	Any critical

Operational Metrics¶

O1. Cost¶

Cost-Quality Frontier

Metric	Definition	Target	Alert Threshold
Cost per interaction	Total AI cost / interactions	Budget	>120% of budget
Judge cost ratio	Judge cost / Primary model cost	<30%	>50%
Cost variance	Actual vs. forecast	±10%	>25% variance
Cost by risk tier	Breakdown of cost across tiers	Track	Unexpected shift

O2. Capacity¶

Metric	Definition	Target	Alert Threshold
Rate limit utilisation	% of rate limit consumed	<70%	>85%
Queue depth	Pending requests in queue	<100	>1000
HITL queue depth	Interactions awaiting human review	<8 hrs	>24 hrs
Judge queue depth	Interactions awaiting evaluation	<1 hr (CRITICAL)	>2 hrs

O3. Change Management¶

Metric	Definition	Target	Alert Threshold
Prompt changes per week	Count of prompt modifications	Track	Unexpected spike
Guardrail changes per week	Count of guardrail updates	Track	Unexpected spike
Model version changes	Count of model version updates	Track	Unplanned change
Rollback rate	% of changes that required rollback	<10%	>20%

Metric Dashboards¶

Executive Dashboard¶

For leadership review (monthly):

Metric	This Period	Prior Period	Trend	Status
AI system availability
Security incidents
Hallucination rate (CRITICAL systems)
User satisfaction
Cost vs. budget
Open audit findings

Operations Dashboard¶

For daily monitoring:

Category	Key Metrics	Update Frequency
Health	Uptime, latency, error rate	Real-time
Security	Guardrail blocks, attack volume	Real-time
Quality	Hallucination rate, user satisfaction	Daily
Assurance	Judge coverage, finding backlog	Daily
Cost	Daily spend, variance	Daily

Security Dashboard¶

For security team:

Category	Key Metrics	Update Frequency
Threats	Attack volume, injection attempts	Real-time
Controls	Guardrail effectiveness, bypass attempts	Hourly
Incidents	Open incidents, MTTD, MTTR	Real-time
Compliance	Control coverage, audit status	Weekly

Alerting Thresholds¶

Critical Alerts (Immediate Response)¶

Condition	Action
PII leakage detected	Page on-call, invoke incident response
Uptime <95% for 15 minutes	Page on-call
Security incident detected	Page on-call, invoke incident response
CRITICAL system Judge coverage <100%	Page on-call

High Alerts (Response within 4 hours)¶

Condition	Action
Hallucination rate >2x target	Notify AI operations
Guardrail bypass detected	Notify security team
Cost >150% of daily budget	Notify AI operations
HITL queue >24 hours	Notify HITL team lead

Medium Alerts (Response within 24 hours)¶

Condition	Action
Baseline drift >20%	Create ticket for investigation
Judge accuracy <85%	Create ticket for recalibration
User satisfaction <3.5	Create ticket for review

Metric Collection¶

Data Sources¶

Metric Category	Primary Source	Backup Source
Quality	Judge evaluations, user feedback	HITL reviews
Security	Guardrail logs, SIEM	Incident database
Assurance	Judge system, HITL system	Audit logs
Operational	Platform metrics, cost APIs	Manual sampling

Sampling Requirements¶

Risk Tier	Minimum Sample for Statistical Validity
CRITICAL	100% (no sampling)
HIGH	20% or 1000/day, whichever is greater
MEDIUM	5% or 500/day, whichever is greater
LOW	1% or 100/day, whichever is greater

Data Retention¶

Metric Type	Granularity	Retention
Real-time alerts	Event-level	90 days
Daily aggregates	Daily	2 years
Weekly reports	Weekly	5 years
Monthly reports	Monthly	7 years

Reporting Cadence¶

Report	Audience	Frequency	Key Content
Operational summary	AI operations	Daily	Health, incidents, anomalies
Security summary	Security team	Weekly	Threats, control effectiveness
Executive summary	Leadership	Monthly	KPIs, trends, risks
Audit report	Compliance	Quarterly	Control coverage, findings
Annual review	Board	Annually	Strategy, maturity, roadmap

Maturity Progression¶

Level 1: Basic¶

Uptime and error rate tracked
Manual incident counting
No systematic quality metrics

Level 2: Developing¶

Guardrail metrics tracked
Basic quality metrics (user feedback)
Manual reporting

Level 3: Defined¶

Full metric framework implemented
Automated dashboards
Regular reporting cadence

Level 4: Managed¶

Real-time alerting
Trend analysis and forecasting
Metrics-driven decision making

Level 5: Optimising¶

Predictive analytics
Automated remediation
Continuous improvement feedback loops

AI Runtime Behaviour Security, 2026 (Jonathan Gill).