Enterprise Architects¶

Solution Architects, Platform Architects, Technical Leads — where controls go in your pipeline, what they cost, and how they fail.

Part of Stakeholder Views · AI Runtime Behaviour Security

The Problem You Have¶

You're designing AI systems or integrating AI into existing architectures. Your security and risk teams are asking for "guardrails" and "oversight," but nobody's told you:

Where in the pipeline do controls go?
What's the latency and cost impact of adding an evaluation layer?
How do controls degrade when upstream services fail?
What's different about securing a RAG pipeline vs. a fine-tuned model vs. a multi-agent system?
What can your existing infrastructure (API gateway, IAM, DLP) already handle?

You don't need governance theory. You need an architecture reference.

What This Framework Gives You¶

Control placement in the request/response flow¶

Every AI request passes through a pipeline. Controls intercept at specific points:

Pipeline Control Flow

Key architectural decisions: - Judge runs asynchronously for most tiers (doesn't block response). Runs synchronously for CRITICAL tier (blocks until evaluated) - Guardrails add ~10ms per layer. Judge adds ~500ms–5s depending on model and prompt complexity - Judge should use a different model from the task agent — same-model evaluation has correlated failure modes

What your existing infrastructure already covers¶

Before adding AI-specific controls, map what you already have:

Existing Infrastructure	AI Control Coverage	Gap
API gateway (rate limiting, auth)	Request throttling, identity verification	No content-aware filtering
WAF	Some injection patterns	Doesn't detect semantic injection or indirect prompt injection
DLP	PII in structured data	Misses PII in natural language, generated content
IAM	User identity, RBAC	No agent identity, no credential scoping per AI session
Logging / SIEM	Request/response metadata	No semantic evaluation, no decision chain audit
Content delivery	Response caching, edge logic	No output quality evaluation

The framework fills the gaps, not replaces the stack. See Infrastructure Controls for the 80-control mapping.

Architecture patterns by deployment type¶

If You're Building...	Read	Key Architecture Decision
RAG pipeline	RAG Security	Retrieval layer is your biggest attack surface — poisoned documents become instructions
Single agent with tools	Agentic Controls	Tool access scoping, action classification (read/write/irreversible), confirmation gates
Multi-agent orchestration	MASO Integration Guide	Message bus signing, per-agent NHI, cross-agent DLP, delegation depth limits
Streaming responses	Streaming Controls	You can't evaluate output that hasn't finished — buffer or accept partial validation
Multimodal (image/audio/video)	Multimodal Controls	Text guardrails don't work on images — you need modality-specific evaluation

Cost and latency budgets¶

The Judge layer isn't free. Budget for it:

Configuration	Added Latency	Added Cost (per 1K txn)	When to Use
Guardrails only	~10-20ms	~$0.01-0.05	LOW tier
Guardrails + Judge (sampled 10%)	~10-20ms p50, ~2s p90 (sampled)	~$0.50-2.00	MEDIUM tier
Guardrails + Judge (full, async)	~10-20ms (non-blocking)	~$5-20	HIGH tier
Guardrails + Judge (full, sync)	~1-5s added	~$5-20	CRITICAL tier

Full analysis: Cost & Latency

PACE fail postures — what you wire into your architecture¶

Each control layer needs a defined failure mode. These aren't operational procedures — they're architecture decisions you make at design time:

Layer Failure	Architectural Response
Guardrail service down	Route through bypass with full logging → trigger Judge on 100% of traffic
Judge service down	Continue with guardrails only → flag all responses for human review queue
Judge + Guardrails down	Circuit breaker activates → serve static fallback / disable AI path
Human review queue overflows	Auto-hold new requests → expand queue capacity → degrade to narrower scope

Wire these as health checks and circuit breakers in your service mesh or orchestration layer. Not as runbooks. See PACE Resilience.

Your Starting Path¶

#	Document	Why You Need It
1	Controls	Three-layer pattern with implementation detail — the core architectural reference
2	Risk Tiers	Determines your control requirements — different tiers, different architectures
3	Infrastructure Controls	80 controls across 11 domains — what to enforce at infrastructure level
4	Cost & Latency	Budget the evaluation layer — latency vs. coverage trade-offs
5	PACE Resilience	Fail postures as architecture decisions

If you're building with a specific platform: AWS Bedrock · Azure AI · Databricks

If you're building multi-agent: MASO Integration Guide — LangGraph, AutoGen, CrewAI, AWS Bedrock patterns.

If you want to see one transaction end-to-end: Runtime Telemetry Reference — every JSON event, every threshold, every evidence artefact for a single request through the full control stack.

What You Can Do Monday Morning¶

Map your existing infrastructure against the Infrastructure Controls to identify what you already cover and where the AI-specific gaps are.
Add the Judge layer to your architecture. Pick one HIGH or CRITICAL tier system. Add an independent LLM evaluation step — even async, even sampled. The LLM-as-Judge Implementation guide has the implementation patterns.
Wire PACE health checks. Add circuit breakers for your guardrail and Judge services. Define what the system does when each is unavailable. Test the failover path.
Scope your agent's permissions. If you have an agentic system, classify every tool by action type (read / write / irreversible) and add confirmation gates for write and irreversible actions. See Agentic Controls.
Budget the evaluation layer. Use the Cost & Latency analysis to present the cost of the Judge layer vs. the cost of the incidents it prevents. The Risk Assessment gives you the incident frequency numbers.

Common Objections — With Answers¶

"Adding a Judge layer doubles our latency." Only if you run it synchronously. For HIGH tier, run it async — the guardrails provide real-time protection while the Judge evaluates in the background. Only CRITICAL tier needs synchronous Judge evaluation. Budget ~10ms for guardrails, not seconds.

"We're using [vendor]'s built-in guardrails. That's enough." Vendor guardrails are your first layer. The framework's three-layer pattern adds an independent evaluation layer (different model, different detection approach) and human oversight. Single-layer controls have a ~10% miss rate. Three layers compound to ~0.01%. Why Guardrails Aren't Enough.

"Our RAG pipeline grounds the model — it won't hallucinate." RAG reduces but doesn't eliminate hallucination. More importantly, RAG creates a new attack surface: poisoned documents in your retrieval corpus become instructions to the model. RAG Is Your Biggest Attack Surface.

"The infrastructure team handles security, not us." Infrastructure handles network, identity, and data-at-rest. Nobody handles the AI-specific controls (semantic evaluation, injection detection, decision chain audit, agent credential scoping) unless you design them into the pipeline. That's your job.

AI Runtime Behaviour Security, 2026 (Jonathan Gill).