Skip to content

SOC Integration for AI Systems

Bridging AI security monitoring and existing security operations.

The Problem

You've implemented guardrails, a judge, and human oversight. They generate alerts, logs, and escalations.

Where do these go?

In most enterprises: nowhere useful. AI security telemetry sits in application logs, disconnected from the SOC that handles every other security event. The SOC team doesn't know what an LLM judge alert means. The AI team doesn't know how to write a SOC-compatible runbook.


Alert Taxonomy

AI systems produce security-relevant events that don't map cleanly to existing SOC categories. Define them explicitly.

Alert Category Source Severity Baseline Examples
Guardrail Block Input/output guardrails Low–Medium PII detected, toxicity blocked, off-topic rejected
Judge Flag LLM-as-Judge Medium–High Policy violation detected, hallucination scored high, unsafe reasoning
Anomaly Behavioral monitoring Medium Usage pattern deviation, output distribution shift, latency spike
Prompt Attack Guardrails + Judge High Prompt injection detected, jailbreak attempt, system prompt extraction
Data Exfiltration Signal Output monitoring High Bulk data in response, structured data extraction pattern
Agent Boundary Violation Agentic controls High Unauthorised tool use, delegation policy breach, scope escalation
Model Drift Judge assurance metrics Medium Judge accuracy degradation, output distribution change

Severity Mapping

Map AI alert severity to your existing SOC severity framework:

AI Severity SOC Equivalent Response SLA
Low (single guardrail block) Informational Log, aggregate, review weekly
Medium (judge flag, single anomaly) Warning Triage within 4 hours
High (attack pattern, data exfil signal) Alert Triage within 1 hour
Critical (confirmed breach, agent compromise) Incident Immediate response

Adjust SLAs to match your existing SOC tiers. Don't create a parallel severity system.


Identity Correlation

The hardest operational problem: linking an AI security event to a real identity.

AI telemetry typically gives you:

Source Identity Available
API gateway logs API key, OAuth token, IP address
LLM provider logs (Bedrock, Azure OpenAI) Service principal, request ID
Application logs Session ID, user ID (if your app logs it)
Judge evaluation logs Request ID (if you propagate it)
Vector database logs Service account, query metadata

To correlate "User X performed action Y that triggered alert Z," you need to join across at least three of these sources.

Correlation Pattern

API Gateway           →  request_id, user_token, timestamp
     ↓
Application Layer     →  request_id, session_id, user_id
     ↓
LLM Provider          →  request_id (if propagated), model, tokens
     ↓
Judge                  →  request_id, evaluation_result, score
     ↓
SIEM                   →  Correlated event: user_id + evaluation_result + model + timestamp

Critical requirement: Propagate a correlation ID (request_id or trace_id) across every component. Without this, you're guessing.

Platform-Specific Patterns

Amazon Bedrock + CloudWatch: - Bedrock invocation logs → CloudWatch Logs → EventBridge → SIEM - Correlation key: requestId from Bedrock invocation metadata - Identity: IAM role/user from CloudTrail

Databricks + Unity Catalog: - Serving endpoint logs → Delta table → Databricks SQL → SIEM export - Correlation key: request_id from serving endpoint - Identity: Unity Catalog identity mapped to workspace user

Azure OpenAI + Sentinel: - Diagnostic logs → Log Analytics → Sentinel - Correlation key: x-ms-client-request-id header - Identity: Entra ID from API Management authentication


Escalation Triggers

Define explicit triggers that move AI events from monitoring to investigation.

Automated Escalation (no analyst required)

Trigger Action
>10 guardrail blocks from same user in 5 minutes Block user, alert SOC
Judge flags >3 high-severity outputs in 1 hour Disable AI feature for user, alert SOC
Prompt injection pattern detected Log full request/response, alert SOC
Agent attempts tool outside allowlist Halt agent, alert SOC + AI team

Analyst-Driven Escalation

Signal Triage Question Escalation Path
Repeated judge flags, low severity Is the user testing boundaries or doing legitimate work? → AI team for context
Anomalous output volume Is this a batch job or data exfiltration? → SOC L2 for investigation
Model drift detected Did the provider update the model, or is this adversarial? → AI team + vendor liaison
New prompt injection technique Is this a known pattern or novel? → SOC L2 + threat intel

Triage Procedures for AI Alerts

SOC analysts need clear guidance for AI-specific alerts. They are not AI experts. Don't expect them to be.

Guardrail Block (Low Severity)

1. Check: Is this a single event or part of a pattern?
   - Single event → Log and close
   - Pattern (>5 from same user) → Escalate to Medium

2. Check: What was blocked?
   - PII in output → Verify PII type, check if user has legitimate access to that data
   - Toxicity → Log, no further action unless repeated
   - Off-topic → Log, no further action

3. No user contact required for single events.

Judge Flag (Medium Severity)

1. Review the judge evaluation alongside the original request and response.

2. Check: Does the response violate policy?
   - Yes → Escalate to AI team for remediation
   - Unclear → Flag for human review in next calibration cycle
   - No (false positive) → Log as FP, feed back to judge tuning

3. Check: Is the user's request legitimate?
   - Research/testing → Confirm with user's manager
   - Appears adversarial → Escalate to High

Prompt Attack (High Severity)

1. Capture full request and response (do not summarise — exact content matters).

2. Check: Did the attack succeed?
   - Yes (guardrails/judge bypassed) → Incident. Disable endpoint. Notify AI team immediately.
   - No (blocked by guardrails) → Log technique, update threat intel, monitor for variants.

3. Check: Is this a known technique?
   - Known → Verify guardrails are current, close.
   - Novel → Escalate to AI security team for analysis and guardrail update.

4. Do NOT attempt to reproduce the attack in production.

SIEM Integration

Log Format

Emit AI security events in a format your SIEM can parse. Extend your existing schema rather than creating a new one.

{
  "event_type": "ai_security",
  "timestamp": "2026-02-11T14:30:00Z",
  "severity": "medium",
  "category": "judge_flag",
  "request_id": "req-abc-123",
  "user_id": "jgill@example.com",
  "model": "claude-sonnet-4-5-20250929",
  "judge_model": "gpt-4o",
  "judge_score": 0.82,
  "judge_verdict": "policy_violation",
  "policy_violated": "financial_advice_without_disclaimer",
  "guardrail_result": "pass",
  "tokens_in": 450,
  "tokens_out": 1200,
  "endpoint": "/api/v1/chat",
  "risk_tier": 2
}

Detection Rules (Examples)

Splunk SPL:

index=ai_security category="prompt_attack" 
| stats count by user_id, src_ip 
| where count > 3
| alert severity=high

Sentinel KQL:

AISecurity_CL
| where category_s == "judge_flag" and judge_score_d > 0.8
| summarize count() by user_id_s, bin(TimeGenerated, 1h)
| where count_ > 5


Runbook Integration

For each alert category, create a runbook in your existing ITSM (ServiceNow, PagerDuty, Jira).

Alert Category Runbook Owner Tool Integration
Guardrail Block AI Platform Team Automated — no ticket unless threshold breached
Judge Flag AI Security + SOC L1 Ticket auto-created at Medium severity
Prompt Attack SOC L2 + AI Security Incident auto-created at High severity
Agent Boundary Violation AI Platform Team + SOC L2 Incident auto-created
Model Drift AI Platform Team Alert to AI team, no SOC ticket

AI Runtime Behaviour Security, 2026 (Jonathan Gill).