Skip to content

Runtime Telemetry Reference

One transaction, end-to-end. Every control layer. Every log event. Every threshold.

Part of Technical Extensions · AI Runtime Behaviour Security


Purpose

This document shows exactly what happens when a single request passes through the runtime control stack. It follows one transaction from user input to final response, showing the JSON event emitted at each layer, the thresholds that trigger action, and the evidence artefact each event produces.

Scenario: A customer asks a financial services chatbot about investment products. The system is classified HIGH tier (customer-facing, financial data, 500K transactions/year).

Architecture: AWS Bedrock with Claude, guardrails via Bedrock Guardrails, Judge via async evaluation, human oversight via review queue.


The Transaction

User input: "What's the best investment for my retirement savings of $450,000?"

This input triggers five events across the control stack. Each event is logged, evaluated, and produces audit evidence.


Event 1: Request Received (LOG-01)

The platform logs the raw request before any processing.

{
  "event_type": "model_request",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:01.456Z",
  "user_identity": "user:customer-8842@portal",
  "service_identity": "svc:wealth-advisor-v3",
  "session_id": "session:ws-29471",
  "model_id": "bedrock:claude-3-sonnet-v2",
  "system_prompt_hash": "sha256:e4a1c3f8...",
  "input_text": "What's the best investment for my retirement savings of $450,000?",
  "input_tokens": 18,
  "risk_tier": "HIGH",
  "metadata": {
    "temperature": 0.3,
    "max_tokens": 1024,
    "channel": "web_portal",
    "customer_segment": "retail"
  }
}

Evidence produced: Request audit trail (Art. 12 record-keeping).


Event 2: Guardrail Evaluation (LOG-02)

Input guardrails run before the request reaches the model. Output guardrails run after.

Input guardrail result:

{
  "event_type": "guardrail_decision",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:01.478Z",
  "guardrail_id": "grd:bedrock-content-filter-v2",
  "guardrail_version": "2.1.4",
  "stage": "input",
  "checks": [
    {
      "check_type": "prompt_injection",
      "decision": "pass",
      "confidence": 0.12,
      "latency_ms": 8
    },
    {
      "check_type": "topic_restriction",
      "decision": "pass",
      "confidence": 0.04,
      "latency_ms": 3
    },
    {
      "check_type": "pii_detection",
      "decision": "flag",
      "confidence": 0.91,
      "detail": "Financial amount detected: $450,000",
      "action": "log_and_proceed",
      "latency_ms": 6
    }
  ],
  "overall_decision": "pass",
  "total_latency_ms": 22
}

Output guardrail result (after model responds):

{
  "event_type": "guardrail_decision",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:03.892Z",
  "guardrail_id": "grd:bedrock-content-filter-v2",
  "guardrail_version": "2.1.4",
  "stage": "output",
  "checks": [
    {
      "check_type": "financial_advice",
      "decision": "flag",
      "confidence": 0.78,
      "detail": "Response discusses investment products with specific amounts",
      "action": "add_disclaimer",
      "latency_ms": 12
    },
    {
      "check_type": "pii_leakage",
      "decision": "pass",
      "confidence": 0.03,
      "latency_ms": 5
    },
    {
      "check_type": "hallucination_pattern",
      "decision": "pass",
      "confidence": 0.15,
      "latency_ms": 9
    }
  ],
  "overall_decision": "pass_modified",
  "modification": "Regulatory disclaimer appended",
  "total_latency_ms": 31
}

Thresholds that matter:

Check Block Threshold Flag Threshold This Request
Prompt injection confidence > 0.85 confidence > 0.60 0.12 — pass
Financial advice confidence > 0.95 confidence > 0.70 0.78 — flag, add disclaimer
PII leakage confidence > 0.80 confidence > 0.50 0.03 — pass

Evidence produced: Guardrail effectiveness measurement (Art. 9 risk mitigation), content modification audit trail.


Event 3: Model Response (LOG-01 completion)

{
  "event_type": "model_response",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:03.861Z",
  "output_text": "For retirement savings of that size, there are several approaches worth discussing with a qualified financial adviser. Diversified index funds, target-date retirement funds, and balanced portfolios are common strategies. The right choice depends on your timeline to retirement, risk tolerance, and other income sources. I'd recommend speaking with a certified financial planner who can assess your complete financial picture.\n\n*This is general information, not personalised financial advice. Please consult a licensed financial adviser before making investment decisions.*",
  "output_tokens": 87,
  "latency_ms": 2383,
  "guardrail_actions": ["disclaimer_appended"],
  "finish_reason": "end_turn"
}

Evidence produced: Full response audit trail with model attribution (Art. 12 traceability).


Event 4: Judge Evaluation (LOG-03)

The Judge evaluates the response asynchronously. For HIGH tier systems, 100% of responses containing flagged content are evaluated, plus 20% random sample.

This request is evaluated because: Output guardrail flagged financial advice content.

{
  "event_type": "judge_evaluation",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:06.244Z",
  "judge_model_id": "bedrock:claude-3-haiku-v2",
  "evaluation_criteria": "financial-services-v3",
  "evaluation_criteria_version": "3.2.1",
  "trigger": "guardrail_flag:financial_advice",
  "scores": {
    "policy_compliance": 0.94,
    "factual_accuracy": 0.91,
    "appropriate_disclaimers": 1.0,
    "referral_to_professional": 1.0,
    "no_specific_recommendations": 0.88
  },
  "overall_score": 0.93,
  "verdict": "acceptable",
  "reasoning": "Response appropriately redirects to professional advice. Does not recommend specific products or allocations. Disclaimer present. Minor: mentions 'index funds' and 'target-date funds' by category which could be interpreted as directional guidance.",
  "conduct_risk": "LOW",
  "recommended_action": "none",
  "latency_ms": 1380
}

Thresholds that matter:

Verdict Score Range Action
Acceptable overall_score ≥ 0.85 Log only
Review 0.70 ≤ overall_score < 0.85 Route to daily review queue
Escalate overall_score < 0.70 Route to immediate review queue
Escalate conduct_risk = "HIGH" Alert compliance team immediately

Evidence produced: Independent evaluation record with reasoning (Art. 14 human oversight support, Art. 15 accuracy measurement).


Event 5: Human Oversight Routing Decision

Based on the Judge evaluation, the system decides whether human review is needed.

{
  "event_type": "oversight_decision",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:06.248Z",
  "judge_verdict": "acceptable",
  "judge_score": 0.93,
  "human_review_required": false,
  "reason": "Score above review threshold (0.85). No conduct risk flags.",
  "sampling_selected": false,
  "sampling_rate": 0.20
}

If the Judge had scored this below 0.70, the event would be:

{
  "event_type": "oversight_decision",
  "request_id": "req-7f3a-4b2c-9d1e",
  "timestamp": "2026-02-22T14:23:06.248Z",
  "judge_verdict": "escalate",
  "judge_score": 0.58,
  "human_review_required": true,
  "review_queue": "immediate",
  "escalation_target": "compliance_team",
  "sla_hours": 2,
  "reason": "Score below escalation threshold (0.70). Possible inappropriate financial recommendation."
}

Evidence produced: Oversight decision audit trail (Art. 14 human oversight), escalation records.


Detection Queries for This Scenario

These queries run in your SIEM against the events above.

Detect repeated financial advice flags (Splunk)

index=ai_security event_type="guardrail_decision" stage="output"
  checks{}.check_type="financial_advice" checks{}.decision="flag"
| stats count as flag_count by user_identity, session_id
| where flag_count > 3
| eval severity=case(flag_count > 10, "high", flag_count > 5, "medium", 1=1, "low")

Threshold rationale: >3 flags in one session suggests the user is probing for specific financial advice that guardrails are catching. Investigate for social engineering.

Detect Judge disagreement with guardrails (Sentinel)

AISecurity_CL
| where event_type_s == "guardrail_decision" and overall_decision_s == "pass"
| join kind=inner (
    AISecurity_CL
    | where event_type_s == "judge_evaluation" and overall_score_d < 0.70
  ) on request_id_s
| project TimeGenerated, request_id_s, user_identity_s, judge_score=overall_score_d

What this detects: Guardrails passed the request, but the Judge flagged it. This is a guardrail gap — the pattern the guardrail doesn't recognise. Feed these back into guardrail tuning.

Detect escalation volume spike (Splunk)

index=ai_security event_type="oversight_decision" human_review_required="true"
| timechart span=1h count as escalations
| streamstats window=168 avg(escalations) as baseline, stdev(escalations) as stddev
| eval z_score=(escalations - baseline) / stddev
| where z_score > 3.0

Threshold: z-score > 3.0 (3 standard deviations above 7-day rolling baseline). Indicates either a model behaviour change or an attack campaign.


Evidence Artefacts Summary

Every transaction through the control stack produces evidence for five compliance requirements:

Requirement Evidence Artefact Source Event Retention
Art. 9 Risk mitigation Guardrail decision + Judge evaluation LOG-02, LOG-03 1 year
Art. 12 Record-keeping Full request/response with correlation ID LOG-01 90 days (full), 1 year (metadata)
Art. 14 Human oversight Oversight routing decision + review records Event 5 + HITL log 1 year
Art. 15 Accuracy Judge scores + validation metrics LOG-03 1 year
PACE Resilience Control layer health status Platform telemetry 1 year

Failure Scenario: What Changes When the Judge Goes Down

When the Judge layer becomes unavailable, PACE transitions from Primary to Alternate:

{
  "event_type": "pace_transition",
  "timestamp": "2026-02-22T15:01:44.000Z",
  "service_identity": "svc:wealth-advisor-v3",
  "previous_state": "primary",
  "new_state": "alternate",
  "trigger": "judge_health_check_failed",
  "detail": "Judge model bedrock:claude-3-haiku-v2 returned 3 consecutive 503 errors",
  "actions_taken": [
    "judge_evaluation_suspended",
    "guardrail_thresholds_tightened",
    "human_sampling_rate_increased_to_1.0",
    "soc_alert_raised:severity_high"
  ],
  "risk_implication": "Guardrails only. No independent evaluation. All responses routed to human review.",
  "restoration_sla_minutes": 30
}

Guardrail threshold changes during Alternate state:

Check Primary Threshold Alternate Threshold Rationale
Financial advice flag > 0.70 block > 0.60 Tighter without Judge backup
Prompt injection block > 0.85 block > 0.70 Lower tolerance without verification
PII leakage block > 0.80 block > 0.60 Conservative without second check

This is the operational resilience evidence auditors look for: not just "we have controls" but "here's exactly what happens when they fail."


Connecting to Existing Documentation

Topic Detailed Reference
Full field definitions for LOG-01 through LOG-10 Logging & Observability Controls
Complete detection rule library (8 rules, 3 SIEM platforms) SOC Content Pack
Baseline establishment and z-score methodology Anomaly Detection Operations
Full worked example with Python code Customer Service AI Example
PACE degradation methodology PACE Resilience
Risk tier classification Risk Tiers

AI Runtime Behaviour Security, 2026 (Jonathan Gill).