Human Factors¶

Skills, time, learning capacity, and the organisational readiness nobody wants to audit.

Part of From Strategy to Production

The Missing Variable¶

AI strategies account for technology, data, and budget. They rarely account for people.

Not "headcount" — that's a budget line. People: their skills, their capacity to learn new things, their available time, their willingness to change how they work, and their ability to operate AI systems safely day after day.

The framework's novel risk #12 — Human-AI Interaction Risk — identifies automation bias, deskilling, and accountability gaps as risks that emerge from how humans work alongside AI. But these aren't just runtime risks. They're strategic risks. If your AI strategy assumes human capabilities that don't exist, it will fail before any security control is tested.

The Five Human Constraints¶

1. Skills: What People Can Do Today¶

AI systems require skills across three domains — and most organisations have significant gaps in at least two.

Human Factors Skills Map

Skill Domain	Who Needs It	Current Gap
AI technical — Building, deploying, maintaining AI systems	Engineering team	Competitive market; experienced AI engineers are expensive and scarce
AI operational — Monitoring, tuning guardrails, operating Judge, managing HITL queues	Security/ops team	Almost nobody has this skill set yet; it didn't exist 3 years ago
AI-aware domain expertise — Using AI outputs critically, knowing when to trust and when to challenge	Business users	Most domain experts have never worked with non-deterministic tools

The Build Skills Gap¶

Can your team build the AI system the strategy requires?

What You Need	Typical Availability	Realistic Option
Prompt engineering	Increasingly common	Train existing developers — 2-4 weeks
RAG pipeline development	Common among senior engineers	Hire or upskill — 1-3 months
Fine-tuning / model training	Specialist skill	Hire ML engineers or use vendor services
Guardrail implementation	Rare (emerging)	Train security engineers using framework guides — 1-2 months
Judge evaluation design	Very rare	Train using Judge prompt examples — ongoing
Multi-agent orchestration	Very rare	Hire specialists or partner with vendor
AI security architecture	Very rare	This framework is a starting point; experience takes time

Strategic implication: The skills required to build a Fast Lane deployment (basic guardrails, logging, feature flag) can be acquired in weeks. The skills required for a Tier 3 autonomous agent system take months to years. Strategy must align ambition with available (or realistically acquirable) skills.

The Operate Skills Gap¶

Building the system is only half the problem. Who operates it?

The framework's governance operating model specifies these roles:

Role	What They Do	Where They Come From
HITL reviewers	Review AI outputs flagged by Judge	Domain experts redeployed from existing roles
Judge operators	Calibrate Judge prompts, manage sampling rates, review accuracy	Security or QA analysts — retrained
Guardrail maintainers	Update guardrail patterns, manage false positives	Security operations — retrained
AI incident responders	Investigate AI-specific incidents	Security incident team — with additional training
AI risk analysts	Classify risk, assess controls, report to governance	Risk team — with AI-specific training

These are not new hires (in most cases). They're existing people who need new skills. But the training takes time, and the people need to be freed from their current responsibilities to learn and then to operate.

Real-world scenario: A retail bank implements an AI customer service assistant classified as HIGH tier. The framework requires human review of all flagged outputs within a 4-hour SLA. The bank assigns this to the existing customer complaints team. The problems:

Complaints team has no training on AI-specific failure modes
They don't understand what hallucination looks like vs. a genuinely unusual response
They apply their existing judgement framework (customer intent, complaint handling) rather than AI-specific criteria (accuracy, policy compliance, data leakage)
Review times are 3x longer than estimated because they don't know what they're evaluating
SLA compliance drops below 60% within the first month

The system works. The people don't. Not because they're incapable — because they weren't prepared.

2. Time: What People Can Learn¶

Every AI initiative requires people to learn new things. The question is whether there's time to learn them before the system goes live.

Learning Time Estimates¶

These are realistic, not optimistic. They assume motivated professionals with relevant background.

Skill	Target Audience	Time to Basic Competence	Time to Operational Competence
Understanding AI limitations (non-determinism, hallucination)	All AI users	2-4 hours (awareness)	2-4 weeks (working knowledge)
Using AI tools critically (not trusting blindly)	Business users	1 day	1-2 months (habit formation)
HITL review for AI outputs	Domain experts	2-3 days (training)	1 month (calibrated judgement)
Guardrail configuration and tuning	Security engineers	1-2 weeks	2-3 months
Judge prompt design and calibration	QA/security analysts	2-4 weeks	3-6 months
AI risk classification	Risk analysts	1-2 weeks	3 months
AI incident investigation	Security incident team	2-3 weeks	6 months
Multi-agent security operations	Security architects	1-2 months	6-12 months

The gap that matters: The time between "basic competence" and "operational competence" is where mistakes happen. People know enough to do the job but not enough to do it well. For LOW and MEDIUM tier systems, this is acceptable — errors are low-impact and recoverable. For HIGH and CRITICAL tier systems, this gap is dangerous.

The Learning Capacity Problem¶

Organisations have a finite capacity to absorb change. AI is not the only thing people are being asked to learn.

Competing Demands	Reality
Cloud migration	Still ongoing in many organisations
Regulatory changes	Continuous compliance burden
Security awareness	Annual training, phishing exercises
New tooling	Every year brings new platforms and processes
Business-as-usual	The work that was there before AI arrived

Adding "learn AI" to an already-full training calendar doesn't work by simply mandating it. Something else needs to give. Strategy should identify what gets deprioritised to make room for AI capability building.

3. Capacity: What People Can Absorb¶

Even with time and training, there's a limit to how much change people can absorb at once.

The absorption curve:

Phase	What Happens	Duration
Awareness	"I know AI is coming"	Days
Understanding	"I understand what it means for my role"	Weeks
Competence	"I can do the new things"	Months
Confidence	"I trust my judgement with AI systems"	Months-years
Mastery	"I know when to trust AI and when to override"	Years

Most AI strategies plan for the Competence phase. The framework's controls — particularly HITL review — require the Confidence phase to work properly. A human reviewer who has reached Competence can follow the review process. A human reviewer who has reached Confidence knows when the process isn't capturing the right thing.

The automation bias problem (framework risk #12) is a confidence-phase problem. At the Competence phase, reviewers follow the process. At the Confidence phase, they develop genuine independent judgement. In between, there's a dangerous period where they're fast enough to process high volumes but not experienced enough to catch subtle AI failures.

4. Willingness: What People Will Actually Do¶

Training assumes willingness. Willingness isn't guaranteed.

Resistance Factor	What It Looks Like	Impact
Fear of replacement	"This AI is going to take my job"	People undermine adoption; withhold domain knowledge; don't engage with training
Expertise threat	"I have 20 years of experience and now a chatbot is doing my job"	Senior experts disengage; HITL quality drops because experts don't take review seriously
Workflow disruption	"This is slower than what I was doing before"	Workarounds; shadow processes; people bypass the AI system
Trust deficit	"I don't trust this thing"	Over-checking (inefficient) or ignoring AI outputs entirely (defeats the purpose)
Change fatigue	"Not another transformation programme"	Compliance without engagement; minimum effort

Strategic implication: Willingness is not a training problem. It's a communication and leadership problem. People need to understand:

What the AI does and doesn't replace
How their role changes (specifically, not vaguely)
What new skills they need and how they'll be supported in acquiring them
That their domain expertise is more valuable with AI, not less — because the AI needs humans who can judge its outputs

The framework's principle "Humans Remain Accountable" is the right message, but it needs to reach the people doing the work, not just the governance committee.

5. Sustainability: What People Can Maintain¶

Day 1 is not the problem. Month 6 is the problem.

Sustainability Risk	What Happens	When
Alert fatigue	HITL reviewers stop reading flagged outputs carefully	2-3 months
Guardrail drift	Nobody updates guardrail patterns as threats evolve	3-6 months
Judge calibration decay	Judge prompts aren't recalibrated; accuracy drops silently	3-6 months
Knowledge attrition	Key people leave; replacements aren't trained	6-12 months
Process erosion	Shortcuts become normal; reviews happen in name only	6-12 months

The framework's PACE resilience model addresses technical degradation. But human degradation follows the same pattern:

PACE Phase	Technical Equivalent	Human Equivalent
Primary	All controls active	All roles staffed, trained, engaged
Alternate	One layer degraded	Key person leaves; coverage maintained by remaining team
Contingency	Multiple layers degraded	Team understaffed; reviews backlogged; guardrails stale
Emergency	Full stop	Nobody qualified to operate the system; knowledge lost

The framework doesn't have a human PACE model. This is a gap. Technical systems have failover; human systems typically don't. When the one person who understands Judge calibration leaves, there's no automatic failover to a backup.

Human Factors by Risk Tier¶

The human requirements scale with risk tier, just as technical controls do:

Factor	Fast Lane / LOW	MEDIUM	HIGH	CRITICAL
Users	Basic AI awareness training	AI limitations training; know when to distrust	Detailed training on system-specific failure modes	Expert users only; mandatory certification
HITL reviewers	None required	General domain knowledge; spot-check capability	Domain experts with AI-specific training; calibrated judgement	Senior experts; independent judgement; regular accuracy testing
Operators	Any engineer	Engineer with guardrail experience	Dedicated AI security engineer	Specialist team with 24/7 coverage
Training frequency	Annual refresher	Quarterly	Monthly recalibration	Continuous (part of role)
Backup personnel	Not required	Identified but not dedicated	Trained and available	Active rotation; no single point of failure

Real-World Scenarios¶

Scenario 1: The Under-Skilled HITL¶

Context: Healthcare organisation deploys AI to summarise patient notes for clinicians. Classified as HIGH tier.

Assumption: Clinicians will review AI summaries before making clinical decisions.

Reality: Clinicians are overworked. Average consultation time is 10 minutes. They don't have time to cross-reference AI summaries against full notes. They scan the summary, confirm it looks plausible, and move on. Effective review rate: near zero.

Framework impact: The HITL control exists on paper. In practice, the human layer isn't functioning. The system is operating as if it were at MEDIUM or LOW tier — basic guardrails only, with no effective human oversight.

Strategic response: - Redesign the AI output to highlight uncertainty ("confidence: low on medication history") - Reduce the review burden — instead of reviewing every summary, review only those the Judge flags - Measure actual review behaviour (time spent, override rate) not just claimed process compliance - Consider whether the risk tier is appropriate given realistic human capacity

Scenario 2: The Missing Operator¶

Context: Financial services firm deploys AI fraud detection with Judge evaluation of flagged transactions. Classified as CRITICAL tier.

Assumption: Security operations team will manage Judge calibration, guardrail updates, and escalation triage.

Reality: The security operations team has one person who understands the Judge system. That person also manages three other security tools. Judge calibration happens when they have time — roughly once a quarter instead of the weekly cadence specified. When they take holiday, nobody monitors Judge accuracy.

Framework impact: Judge accuracy degrades without calibration. The framework's invisible degradation risk materialises — not through technical failure, but through human capacity failure.

Strategic response: - Fund a dedicated AI security operations role (the governance model's "Technical Operations Team") - Cross-train at least one additional person on Judge operations - Automate what can be automated (calibration alerts, accuracy dashboards) - Accept that until staffing is adequate, the system should operate at a lower autonomy level

Scenario 3: The Resistant Expert¶

Context: Insurance company deploys AI to assist claims assessors. Classified as MEDIUM tier.

Assumption: Claims assessors will use AI recommendations as input to their decisions.

Reality: Senior assessors with 15+ years experience refuse to use the system. "I don't need a computer to tell me how to assess a claim." Junior assessors use it for everything without critical evaluation. The senior experts' knowledge isn't being captured; the junior assessors aren't developing independent judgement.

Framework impact: The system works technically but creates two failure modes: 1. Senior assessors bypass the AI, gaining no benefit 2. Junior assessors trust it uncritically — automation bias (framework risk #12)

Strategic response: - Engage senior assessors in Judge calibration — their expertise is exactly what's needed to evaluate AI quality - Position the AI as a second opinion, not a replacement for expertise - Monitor override rates — very high (seniors ignoring AI) and very low (juniors trusting blindly) are both warning signals - Structure HITL so senior and junior assessors review each other's AI-assisted decisions

The Deskilling Problem¶

The framework identifies deskilling as a novel risk. It's also a strategic risk.

When AI handles tasks that humans used to do, humans lose the ability to do those tasks. This matters because:

PACE resilience requires human fallback. If the AI fails and humans can't do the task manually, there's no contingency.
HITL quality depends on domain expertise. If reviewers have lost domain knowledge, they can't effectively evaluate AI outputs.
Model drift is detected by humans. If nobody remembers what "normal" looks like, nobody notices when the AI drifts.

Deskilling Timeline	What's Lost	Impact
3-6 months	Speed at manual process	Inconvenient but manageable if AI fails
6-12 months	Routine judgement calls	Errors increase when reverting to manual
1-2 years	Nuanced expertise	Manual fallback quality degrades significantly
2+ years	Institutional knowledge	Organisation cannot operate without AI — single point of failure

Strategic mitigation: - Maintain manual process capability through periodic exercises (like disaster recovery testing) - Rotate staff between AI-assisted and manual work - Document manual processes before they're automated — not after - Build deskilling risk into the PACE plan: if humans can't fall back, the Emergency phase is incomplete

What to Do About This¶

Before Starting an AI Initiative¶

Action	Purpose	Time
Skills audit	Identify gaps in build, operate, and use capabilities	1-2 weeks
Training plan	What skills, who needs them, how they'll be acquired	1 week to plan
Capacity assessment	Do people have time to learn and operate this?	1 week
Resistance assessment	Where will resistance come from? How will it be addressed?	1 week
Sustainability plan	Who operates this on day 180? What happens when they leave?	1 week

During Implementation¶

Action	Purpose	When
Train before deploy	People are ready when the system goes live	2-4 weeks before launch
Shadow period	AI runs but humans make all decisions; builds competence	2-4 weeks after launch
Graduated autonomy	AI takes on more responsibility as human confidence grows	Months 2-6
Measure human performance	Are reviews effective? Are overrides appropriate?	Continuously

After Deployment¶

Action	Purpose	When
Recalibration training	Refresh skills, share lessons learned	Quarterly
Override rate monitoring	Detect automation bias or excessive distrust	Monthly
Backup personnel check	Is there someone who can cover every role?	Quarterly
Manual process exercise	Verify the fallback still works	Bi-annually
Exit interview knowledge capture	When operators leave, capture what they know	Every departure

The Framework Gap¶

The framework treats human factors as an implementation detail. It specifies that HITL reviewers should exist, that they should have domain expertise, and that they should review within SLAs. But it doesn't address:

How those humans are trained
How long training takes
What happens when they're unavailable
How to detect when they're not performing effectively
How to prevent deskilling over time
How to sustain human capability as the AI portfolio grows

This is partly by design — the framework is a security controls framework, not an organisational change management programme. But strategy cannot treat human factors as somebody else's problem. The controls don't work if the humans operating them aren't ready, willing, and sustainably capable.

Recommendation: For any deployment above Fast Lane, include a human factors assessment alongside the technical risk assessment. The framework's risk classification asks "what can this system do?" The human factors assessment asks "can our people safely operate this system?" Both questions need answers before deployment.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).