Worked Example: Customer Service AI at Meridian Bank¶
A complete walkthrough of implementing AI security controls for a real-world use case.
This example follows Meridian Bank (fictional) as they deploy an AI-powered customer service assistant. We walk through every step: risk classification, control selection, implementation, monitoring, and incident response.
Important: Control Layer Separation¶
This example uses the layered control model:
| Layer | What It Does | Timing |
|---|---|---|
| Guardrails | Block known-bad inputs/outputs | Inline, real-time |
| LLM-as-Judge | Evaluate quality, detect issues | Async, after delivery |
| Human Oversight | Review findings, decide action | As needed |
The Judge does not block transactions. It reviews interactions after the fact and surfaces findings for human review. Guardrails handle real-time protection.
The Use Case¶
System Name: Aria (AI-Powered Customer Assistant)
What it does: - Answers customer questions about accounts, products, and policies - Helps customers navigate the mobile app and website - Escalates complex issues to human agents - Available 24/7 via chat on mobile app and website
What it can access: - Customer's own account information (balances, transactions, statements) - Product information and rates (public) - Bank policies and FAQs (public) - Customer's contact preferences
What it cannot do: - Transfer money or make payments - Change account settings - Access other customers' data - Make credit decisions - Provide personalised financial advice
Scale: - 50,000 conversations per day - 3 million customers eligible to use it - Peak: 5,000 concurrent sessions
Technology: - GPT-4 Turbo via Azure OpenAI - RAG system for policy/FAQ retrieval - Custom API integration for account data - Deployed in bank's Azure environment
Step 1: Risk Classification¶
Assessment¶
Using the Risk Classification Matrix:
| Factor | Assessment | Score |
|---|---|---|
| Decision Impact | Informational only, no binding decisions | Low |
| Data Sensitivity | Accesses customer PII and financial data | High |
| User Population | External customers (3M potential) | High |
| Autonomy Level | Read-only, cannot take actions | Low |
| Regulatory Scope | Banking (OCC, CFPB, state regulators) | High |
| Reputational Risk | Customer-facing, brand impact | High |
Classification Decision¶
Risk Tier: HIGH
Rationale: While Aria doesn't make decisions, it accesses sensitive customer financial data and represents the bank to millions of customers. A data leak or inappropriate response could cause regulatory action and reputational damage.
Approval Required: Security + Risk sign-off
Step 2: Control Selection¶
Based on HIGH risk tier, these controls apply:
Control Architecture¶
Step 3: Guardrails Implementation (Inline)¶
Input Guardrails¶
Purpose: Block obvious attacks and out-of-scope requests in real-time
import re
from typing import Tuple
def validate_input(message: str, customer_id: str) -> Tuple[bool, str]:
"""
Inline input validation. Fast, deterministic, blocks known-bad.
Returns (is_valid, error_message_if_blocked)
"""
# Length check
if len(message) > 2000:
log_blocked("length_exceeded", customer_id, message)
return False, "Message too long. Please keep your question brief."
# Rate limiting (checked elsewhere, but enforced here)
if is_rate_limited(customer_id):
log_blocked("rate_limited", customer_id, message)
return False, "You're sending messages too quickly. Please wait a moment."
# Known injection patterns
injection_patterns = [
r"ignore (previous|all|your) instructions",
r"disregard (previous|all|your)",
r"forget (everything|your rules)",
r"you are now",
r"pretend (to be|you are)",
r"system prompt",
r"reveal your",
r"jailbreak",
r"DAN mode",
r"\[INST\]",
r"<\|im_start\|>",
]
for pattern in injection_patterns:
if re.search(pattern, message, re.IGNORECASE):
log_blocked("injection_pattern", customer_id, message)
return False, "I can help you with questions about your account and our services."
# Threat/harassment patterns
threat_patterns = [
r"(kill|murder|hurt|attack)\s+(you|someone|people)",
r"bomb|explosive|weapon",
r"i('ll| will)\s+sue",
]
for pattern in threat_patterns:
if re.search(pattern, message, re.IGNORECASE):
log_blocked("threat_pattern", customer_id, message)
log_security_alert("threat_detected", customer_id, message)
return False, "I'm not able to help with that. If you have concerns, please contact us at 1-800-555-0123." # Replace with your actual support number
return True, ""
Output Guardrails¶
Purpose: Filter responses before delivery to customer
def filter_output(response: str, customer_id: str, context: dict) -> Tuple[bool, str]:
"""
Inline output filtering. Catches PII leakage and policy violations.
Returns (is_valid, filtered_response_or_error)
"""
# PII patterns that shouldn't appear in responses
pii_patterns = [
(r'\b\d{3}-\d{2}-\d{4}\b', "SSN"), # SSN
(r'\b\d{16}\b', "card_number"), # Credit card
(r'\b\d{9}\b', "account_other"), # Other account numbers (not customer's own)
]
for pattern, pii_type in pii_patterns:
matches = re.findall(pattern, response)
for match in matches:
# Check if this is the customer's own data (allowed)
if not is_customers_own_data(match, customer_id, context):
log_blocked("pii_leakage", customer_id, response, pii_type)
log_security_alert("pii_leakage", customer_id, pii_type)
return False, "I encountered an error. Please try again or contact support."
# Check for cross-customer data leakage
if contains_other_customer_data(response, customer_id):
log_blocked("cross_customer_leakage", customer_id, response)
log_security_alert("cross_customer_leakage", customer_id)
return False, "I encountered an error. Please try again or contact support."
# Financial advice patterns (we can't give advice)
advice_patterns = [
r"you should (buy|sell|invest)",
r"i (recommend|advise|suggest) (buying|selling|investing)",
r"guaranteed (return|profit)",
]
for pattern in advice_patterns:
if re.search(pattern, response, re.IGNORECASE):
log_blocked("financial_advice", customer_id, response)
# Don't block, but flag and append disclaimer
response += "\n\nPlease note: I provide information only, not financial advice. Consult a financial advisor for personalised recommendations."
# Length sanity check
if len(response) > 5000:
log_alert("unusually_long_response", customer_id, len(response))
# Allow but flag
return True, response
Step 4: LLM-as-Judge Implementation (Async)¶
Purpose¶
The Judge evaluates conversations after they've been delivered to: - Assess quality and accuracy - Detect issues guardrails missed - Identify patterns across conversations - Surface findings for human review
The Judge does not block or modify responses.
Sampling Strategy¶
| Category | Sampling Rate | Rationale |
|---|---|---|
| All conversations | 20% random | Baseline quality monitoring |
| Guardrail near-misses | 100% | Learn from edge cases |
| Customer complaints | 100% | Investigate issues |
| Long conversations (>10 turns) | 100% | Higher risk of drift |
| New topic areas | 50% | Monitor expansion areas |
Judge Evaluation Prompt¶
You are a quality assurance evaluator for a bank's customer service AI. You are reviewing a conversation AFTER it has already been delivered to the customer. Your evaluation will be reviewed by humans who will decide if any action is needed.
CONTEXT:
- This is a customer service chat for a retail bank
- The AI can answer questions about the customer's own accounts
- The AI cannot transfer money, change settings, or give financial advice
- Responses have already been delivered to the customer
EVALUATE THIS CONVERSATION FOR:
1. QUALITY
- Was the response accurate and helpful?
- Did it actually answer the customer's question?
- Was the tone appropriate?
2. POLICY COMPLIANCE
- Did the AI stay within scope (banking questions only)?
- Did it avoid giving financial advice?
- Did it appropriately escalate when needed?
3. POTENTIAL ISSUES
- Any signs of hallucination (fabricated information)?
- Any inappropriate content that got through?
- Any signs of successful manipulation?
4. DATA HANDLING
- Was customer data handled appropriately?
- Any signs of data leakage (other customers, internal systems)?
5. CONDUCT RISK
- Could any response cause customer harm?
- Any regulatory concerns (misleading statements, etc.)?
CONVERSATION:
"""
{conversation_transcript}
"""
Respond with JSON:
{
"quality_score": 1-5,
"quality_issues": ["list of specific issues"],
"policy_compliant": true/false,
"policy_concerns": ["list of concerns"],
"potential_issues_detected": true/false,
"issues": ["list of issues"],
"data_handling_ok": true/false,
"data_concerns": ["list of concerns"],
"conduct_risk": "LOW" | "MEDIUM" | "HIGH",
"conduct_concerns": ["list of concerns"],
"overall_assessment": "OK" | "REVIEW" | "ESCALATE",
"summary": "brief overall assessment",
"recommended_action": "none" | "review" | "customer_outreach" | "process_improvement" | "security_investigation"
}
Processing Judge Findings¶
def process_judge_evaluation(evaluation: dict, conversation_id: str):
"""
Process Judge findings and route appropriately.
Judge does not block - it informs human action.
"""
if evaluation["overall_assessment"] == "OK":
# Log for metrics, no action needed
log_evaluation(conversation_id, evaluation, action="none")
return
if evaluation["overall_assessment"] == "REVIEW":
# Queue for daily analyst review
queue_for_review(
conversation_id=conversation_id,
evaluation=evaluation,
priority="normal",
queue="daily_review"
)
log_evaluation(conversation_id, evaluation, action="queued_review")
return
if evaluation["overall_assessment"] == "ESCALATE":
# Immediate escalation
if evaluation["conduct_risk"] == "HIGH":
alert_compliance_team(conversation_id, evaluation)
if not evaluation["data_handling_ok"]:
alert_security_team(conversation_id, evaluation)
queue_for_review(
conversation_id=conversation_id,
evaluation=evaluation,
priority="high",
queue="immediate_review"
)
log_evaluation(conversation_id, evaluation, action="escalated")
return
Step 5: HITL Configuration¶
Review Queues¶
| Queue | Source | SLA | Reviewer |
|---|---|---|---|
| Immediate Review | Judge escalations, security alerts | 2 hours | Senior analyst |
| Daily Review | Judge "REVIEW" findings | 24 hours | HITL analyst |
| Weekly Sample | Random 1% for quality calibration | 1 week | QA team |
| Complaint Investigation | Customer complaints | 4 hours | Senior analyst + compliance |
What Reviewers Do¶
For Judge-flagged conversations:
- Review the conversation and Judge evaluation
- Assess: Does the finding represent a real issue?
- Decide action:
- No action — False positive, log and dismiss
- Process improvement — Update guardrails, prompts, or training
- Customer outreach — If customer was harmed or misled
- Security investigation — If attack or data issue suspected
- Provide feedback on Judge accuracy (improves future evaluations)
Estimated Volumes¶
| Category | Daily Volume | Review Time | FTE Required |
|---|---|---|---|
| Judge escalations | ~50 | 10 min each | 1.0 FTE |
| Judge reviews | ~500 | 3 min each | 3.0 FTE |
| Random sample | ~500 | 2 min each | 2.0 FTE |
| Complaints | ~20 | 30 min each | 1.5 FTE |
| Total | 7.5 FTE |
Step 6: System Prompt¶
You are Aria, Meridian Bank's AI assistant. You help customers with questions about their accounts, products, and services.
## What You Can Do
- Answer questions about the customer's accounts (balances, transactions, statements)
- Explain Meridian Bank products, rates, and policies
- Help customers navigate our app and website
- Provide general banking information
## What You Cannot Do
- Transfer money or make payments
- Change account settings or personal information
- Access other customers' information
- Give personalised financial, investment, or tax advice
- Make promises or guarantees about rates, approvals, or outcomes
- Discuss internal systems, policies not meant for customers, or how you work
## Important Guidelines
1. If you don't know something, say so. Don't guess.
2. For complex issues, offer to connect the customer with a human agent.
3. If asked to do something outside your capabilities, explain what you can help with instead.
4. Be friendly, professional, and concise.
5. If a customer seems distressed about financial hardship, offer our financial assistance resources.
## Current Context
Customer: {customer_name}
Account Type: {account_type}
Time: {current_time}
Begin by greeting the customer and asking how you can help.
Step 7: Monitoring Dashboard¶
Real-Time Metrics (Guardrails)¶
| Metric | Threshold | Alert |
|---|---|---|
| Input block rate | >5% | Immediate |
| Output block rate | >1% | Immediate |
| Error rate | >2% | Immediate |
| Latency p99 | >3s | Warning |
Daily Metrics (Judge + HITL)¶
| Metric | Target | Review |
|---|---|---|
| Judge quality score (avg) | >4.0/5.0 | Daily |
| Judge escalation rate | <1% | Daily |
| HITL review completion | 100% | Daily |
| False positive rate (Judge) | <20% | Weekly |
| Issues requiring action | Trend | Weekly |
Weekly Metrics¶
| Metric | Purpose |
|---|---|
| Guardrail pattern effectiveness | Which patterns are catching what |
| Judge accuracy vs HITL decisions | Calibrate Judge |
| Customer satisfaction trend | Overall quality |
| Conversation topic distribution | Monitor scope |
Step 8: Logging¶
Conversation Log¶
{
"conversation_id": "conv_abc123",
"customer_id": "cust_xyz789",
"timestamp_start": "2026-01-15T14:30:00Z",
"timestamp_end": "2026-01-15T14:35:22Z",
"messages": [
{
"role": "customer",
"content": "What's my checking account balance?",
"timestamp": "2026-01-15T14:30:00Z",
"guardrail_result": {
"passed": true,
"checks_run": ["length", "injection", "threat"],
"latency_ms": 12
}
},
{
"role": "assistant",
"content": "Your checking account ending in 4521 has a current balance of $3,247.82...",
"timestamp": "2026-01-15T14:30:02Z",
"guardrail_result": {
"passed": true,
"checks_run": ["pii", "advice", "length"],
"latency_ms": 8
},
"model_latency_ms": 1823
}
],
"judge_evaluation": {
"evaluated": true,
"sample_reason": "random_20pct",
"evaluation_timestamp": "2026-01-15T14:40:00Z",
"result": {
"quality_score": 5,
"policy_compliant": true,
"overall_assessment": "OK"
}
},
"hitl_review": null
}
Log Retention¶
| Log Type | Retention | Storage |
|---|---|---|
| Conversation metadata | 3 years | Hot (90d) → Warm (1y) → Cold |
| Full conversation content | 3 years | Encrypted, access-controlled |
| Guardrail decisions | 1 year | Hot |
| Judge evaluations | 3 years | Hot (90d) → Cold |
| HITL decisions | 7 years | Compliance archive |
Step 9: Incident Response Example¶
Scenario: Judge Detects Pattern of Hallucinations¶
Day 1, 09:00 UTC
Daily HITL review notices: Judge flagged 12 conversations yesterday with "hallucination_risk: MEDIUM" related to mortgage rate questions. Normal rate is 2-3.
Investigation (09:00-11:00):
- Pull all 12 flagged conversations
- HITL analyst reviews: 8 of 12 contained incorrect mortgage rate information
- Root cause: RAG system returning outdated rate sheet (hadn't been updated after rate change on Day 0)
- Customers received slightly incorrect rates (off by 0.125%)
Impact Assessment: - 8 customers affected - Information was directionally correct but outdated - No financial harm (informational only, no decisions made) - Potential customer confusion if they call to confirm
Remediation (11:00-14:00):
- Fix: Update RAG system with current rate sheet
- Fix: Add monitoring for rate sheet freshness
- Customer outreach: Proactive email to 8 customers with correct rates
- Process improvement: Daily automated check that rate sheets are current
Documentation: - Incident logged: INC-2026-0203 - Root cause: Data freshness issue - Detection method: Judge async review - Time to detect: ~18 hours (next-day review) - Customer impact: Minor (corrected proactively)
Key Learning: Judge caught an issue that guardrails couldn't—guardrails don't know what the correct mortgage rate is. This is the value of async quality assurance.
Step 10: Costs¶
Implementation Costs (One-Time)¶
| Item | Cost | Notes |
|---|---|---|
| Security review and assessment | $45,000 | Internal + external |
| Guardrails development | $30,000 | Rules, patterns, testing |
| Judge prompt development | $25,000 | Prompts, testing, calibration |
| HITL workflow development | $35,000 | Review interface, queuing |
| SIEM integration | $25,000 | Log shipping, dashboards |
| Documentation | $15,000 | Policies, procedures, training |
| Total Implementation | $175,000 |
Ongoing Costs (Annual)¶
| Item | Cost | Notes |
|---|---|---|
| Primary AI inference | $365,000 | ~$0.02/conversation × 50K/day |
| Judge inference | $73,000 | $0.01/eval × 20% sampling × 50K/day |
| HITL staffing | $525,000 | 7.5 FTE analysts |
| SIEM/logging storage | $48,000 | High-volume logging |
| Security team allocation | $60,000 | 0.25 FTE |
| Judge calibration/updates | $20,000 | Quarterly reviews |
| Total Annual | $1,091,000 |
Cost per Conversation: ~$0.06
Lessons Learned¶
After 6 months in production:
What Worked Well¶
- Guardrails catch 95%+ of obvious attacks — Fast, cheap, effective for known patterns
- Judge finds what guardrails miss — Quality issues, subtle policy violations, hallucinations
- Async Judge doesn't impact UX — No latency penalty for customers
- HITL feedback improves both layers — Guardrails and Judge get better over time
- Clear separation of concerns — Everyone understands what each layer does
What We'd Do Differently¶
- Start with higher Judge sampling — 10% wasn't enough initially, moved to 20%
- Build feedback loop faster — Took 6 weeks to operationalise HITL → guardrail updates
- More nuanced Judge scoring — Binary OK/REVIEW wasn't granular enough
- Better tooling for pattern discovery — Hard to spot trends across Judge findings initially
Key Metrics After 6 Months¶
| Metric | Target | Actual |
|---|---|---|
| Input guardrail block rate | <5% | 2.3% |
| Output guardrail block rate | <1% | 0.4% |
| Judge escalation rate | <1% | 0.6% |
| HITL actionable findings | <5% of reviews | 3.2% |
| Customer satisfaction | >4.0/5.0 | 4.3/5.0 |
| Security incidents | 0 | 0 |
Summary¶
Aria demonstrates HIGH-tier implementation with proper control separation:
| Layer | Function | Result |
|---|---|---|
| Input Guardrails | Block injection, threats, abuse | 2.3% block rate, <15ms latency |
| Output Guardrails | Filter PII, advice, errors | 0.4% block rate, <10ms latency |
| LLM-as-Judge | Quality assurance, pattern detection | 20% sampling, finds ~15 issues/day |
| HITL | Review findings, decide action | 7.5 FTE, 3.2% actionable rate |
The Judge is not a gatekeeper. It's a quality assurance mechanism that makes human oversight scalable. Guardrails protect in real-time. The Judge finds what they miss. Humans decide what to do about it.¶
AI Runtime Behaviour Security, 2026 (Jonathan Gill).