AI Security Controls — Implementation Guide¶
This guide points you to resources for implementing AI security controls. We don't provide code—the APIs change frequently and untested code causes more problems than it solves.
The Pattern¶
| Component | Purpose |
|---|---|
| Input Guardrails | Block malicious inputs before they reach the LLM |
| Output Guardrails | Validate and sanitize responses before delivery |
| Judge Queue | Async LLM evaluation of sampled interactions |
| Human Review | Final decision on edge cases and flagged content |
Open Source Implementations¶
These projects have tested, maintained code:
| Project | What It Does | Link |
|---|---|---|
| NeMo Guardrails | NVIDIA's guardrails framework, Colang-based | https://github.com/NVIDIA/NeMo-Guardrails |
| Guardrails AI | Output validation and structured generation | https://github.com/guardrails-ai/guardrails |
| LangChain | Includes moderation chains and safety tools | https://github.com/langchain-ai/langchain |
| LlamaGuard | Meta's safety classifier | https://github.com/meta-llama/PurpleLlama |
| Rebuff | Prompt injection detection | https://github.com/protectai/rebuff |
Cloud Provider Documentation¶
AWS Bedrock¶
| Component | Link |
|---|---|
| Bedrock Guardrails (managed) | https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html |
| ApplyGuardrail API | https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ApplyGuardrail.html |
| Terraform resource | https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/bedrock_guardrail |
| boto3 reference | https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html |
Azure OpenAI¶
| Component | Link |
|---|---|
| Content Filtering | https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter |
| Prompt Shields | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection |
| Python SDK | https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart |
Google Vertex AI¶
| Component | Link |
|---|---|
| Safety Filters | https://cloud.google.com/vertex-ai/generative-ai/docs/learn/responsible-ai |
| Python SDK | https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal |
OpenAI¶
| Component | Link |
|---|---|
| Moderation API | https://platform.openai.com/docs/guides/moderation |
| Chat Completions | https://platform.openai.com/docs/api-reference/chat |
Anthropic¶
| Component | Link |
|---|---|
| Messages API | https://docs.anthropic.com/en/api/messages |
| Prompt Engineering | https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering |
Standards and Frameworks¶
| Resource | Link |
|---|---|
| OWASP LLM Top 10 | https://owasp.org/www-project-top-10-for-large-language-model-applications/ |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework |
| MITRE ATLAS | https://atlas.mitre.org/ |
| EU AI Act | https://artificialintelligenceact.eu/ |
What to Build¶
-
Input validation: Use your cloud provider's managed guardrails, or adapt patterns from NeMo/Guardrails AI.
-
Output validation: PII detection, forbidden content, structured output validation. See Guardrails AI for examples.
-
Sampling and evaluation: Not every interaction needs review. Sample 5-10% plus all flagged items.
-
Human review queue: Priority-based queue with SLA tracking. Standard engineering—use your existing tooling.
-
Logging and metrics: Log all interactions (inputs, outputs, blocks, latency). Essential for debugging and compliance.
Multi-Agent Systems¶
The pattern above applies to single-model deployments. For systems where multiple agents communicate, delegate, and act autonomously, the MASO Framework extends these controls with additional requirements.
| MASO Component | What It Adds | Implementation Guidance |
|---|---|---|
| Inter-agent message bus security | Signed messages, source tagging, injection detection between agents | Integration Guide |
| Non-Human Identity per agent | Unique credentials, scoped permissions, no transitive authority | Identity & Access Controls |
| LLM-as-Judge for agent outputs | Independent evaluation before cross-agent actions | Execution Control |
| Epistemic integrity | Hallucination chain detection, provenance tagging, uncertainty preservation | Prompt, Goal & Epistemic Integrity |
| Kill switch architecture | Independent observability agent with system-wide emergency stop | Observability Controls |
Framework-specific patterns for LangGraph, AutoGen, CrewAI, and AWS Bedrock Agents are in the Integration Guide.
Recommendations¶
- Start with managed services (Bedrock Guardrails, Azure Content Filtering) before building custom.
- Use existing libraries rather than writing regex patterns from scratch.
- Test against real attacks—see OWASP LLM Top 10 for attack categories.
- Plan for false positives—overly aggressive filters frustrate users.
- Keep humans in the loop—automated systems miss edge cases.
AI Runtime Behaviour Security, 2026 (Jonathan Gill).