Skip to content

Decision Exercise: Security Architects

This exercise tests whether you can

  • Interpret ambiguous signals from a multi-agent pipeline
  • Apply epistemic integrity reasoning under uncertainty
  • Make a defensible intervene/allow decision with incomplete information
  • Articulate the risk basis for your decision

The situation

You are the security architect for Aurelian Health, a digital health platform. Your team operates a three-agent clinical decision support pipeline:

  • Agent R (Research): Retrieves relevant clinical literature and guidelines
  • Agent A (Analysis): Synthesises the literature into a clinical summary
  • Agent D (Decision Support): Produces a recommendation for the clinician

The system is Tier 2 (Managed), with automated controls and human clinician oversight for all final decisions. The pipeline runs 800 queries per day.

At 14:22, your monitoring dashboard flags the following:


Signal 1: Retrieval anomaly

Agent R's retrieval for query #4,847 returned 12 results from the clinical literature database. The rolling 30-day average for this query type (drug interaction checks) is 23 results. The retrieval completeness guardrail has a threshold of 10 results minimum, so it passed.

What you know

The retrieval returned roughly half the usual results. The guardrail passed because the absolute threshold was met. You don't know why the result count is low; it could be a database issue, a query formulation issue, or simply a less common drug combination with fewer studies.

Signal 2: Judge flag

The Model-as-Judge evaluated Agent A's clinical summary and returned a marginal confidence score of 0.62 (your threshold for automatic pass is 0.70; your threshold for automatic block is 0.40). This puts it in the "review" band.

The Judge's reasoning: "The clinical summary references 'limited evidence' and 'emerging data' but concludes with a specific dosing recommendation. The confidence of the conclusion appears to exceed the stated evidence basis."

What you know

The Judge noticed a mismatch between the evidence language ("limited", "emerging") and the conclusion confidence (specific dosing). This could be a genuine epistemic integrity issue, or it could be correct medical practice (in some cases, clinical guidelines make specific recommendations from limited evidence because the alternative is no guidance at all).

Signal 3: Downstream output

Agent D produced a recommendation that includes:

"Based on current evidence, recommend starting at 5mg daily with monitoring. Note: evidence base for this interaction is limited to three retrospective cohort studies and one case series."

What you know

Agent D's output explicitly caveats the limited evidence. The recommendation itself appears conservative (low starting dose with monitoring). The clinician will see this recommendation alongside the caveats.


Your decision

You need to decide now: the clinician is waiting for the decision support output.

Option A: Allow

Let the recommendation through to the clinician with the existing caveats. The clinician is the final decision-maker and the caveats are visible.

Option B: Intervene (soft block)

Append an additional flag to the output: "RETRIEVAL ANOMALY: This query returned fewer literature results than expected. The clinician should verify the recommendation against primary sources before acting." Let it through with the extra warning.

Option C: Intervene (hard block)

Block the recommendation entirely. Return a message to the clinician: "Decision support unavailable for this query; automated review flagged a potential data completeness issue. Please consult primary sources directly." Trigger a retry of the full pipeline.

Option D: Intervene (escalate)

Route to your clinical safety team for manual review before releasing to the clinician. This adds 15–30 minutes of delay.


Think it through

Before reading the analysis below, make your choice and write down your reasoning. The exercise is more valuable if you commit to a decision before seeing the discussion.

Analysis (click to reveal after you've decided)

There is no single correct answer. This is an ambiguous case by design. Here's how to evaluate each option:

Option A (Allow) is defensible if you trust the Tier 2 design: the clinician is the final decision-maker, the caveats are explicit, and the recommendation is conservative. The risk is that the retrieval anomaly means the recommendation is missing relevant contraindication data, and the caveats, while honest about limited evidence, don't distinguish between "limited published evidence" and "limited because our retrieval was incomplete."

Option B (Soft block) adds transparency without blocking the clinician. This is the most operationally practical option and addresses the epistemic integrity concern directly: it tells the clinician that the system's confidence in its own data completeness is lower than usual. The risk is alert fatigue: if you flag every retrieval anomaly, clinicians will ignore the flags.

Option C (Hard block) is the most conservative. It's appropriate if you believe the retrieval anomaly indicates a systematic issue (database problem, not just a sparse literature area). But it costs the clinician time and may not produce a better result on retry if the issue is data sparsity, not a retrieval failure.

Option D (Escalate) introduces significant delay for a decision support tool. It's appropriate for safety-critical edge cases but may not be proportionate here, since the recommendation is conservative and caveated, and the clinician has access to primary sources.

The epistemic integrity lens: The core question is whether Agent R's retrieval was genuinely complete for the available literature (in which case the low count reflects sparse evidence) or incomplete due to a system issue (in which case relevant data may be missing). The Judge's flag suggests the latter: the mismatch between evidence language and conclusion confidence is a signal that the reasoning basis may be incomplete.

A strong answer would be Option B, with a follow-up action: investigate whether the retrieval anomaly is systematic (affecting other queries) or isolated to this query type. If systematic, escalate to a hard block and investigate the data pipeline.

What this exercises: The ability to distinguish between "the system's output looks reasonable" and "the system's reasoning basis is verifiably sound," which is the core skill this entire track has been building.


After the exercise

You've completed the Security Architects track. Take a moment to consolidate:

  1. The mental model: Multi-agent security is about chains, not agents. Controls must operate on the vertical dimension (reasoning integrity) as well as the horizontal (content security).
  2. The core concept: Epistemic integrity, verifying that each agent's claims are warranted by the data it actually accessed.
  3. The control framework: MASO's eight domains, applied by risk tier, mapped to the three-layer architecture.
  4. The evidence standard: Coverage evidence, not just deployment evidence.

Go to the Convergence Exercise →

Back to all tracks →