AI Security Cheat Sheet¶
Classify. Control. Define fail posture. Test. One page.
This document uses the simplified three-tier system (Tier 1/2/3). See Risk Tiers — Simplified Tier Mapping for the mapping to LOW/MEDIUM/HIGH/CRITICAL.
1. Classify¶
| All four true? | → Fast Lane — self-certify, deploy in days |
|---|---|
| Internal users only | Read-only (no write to external systems) |
| No regulated data (PII, financial, health, legal) | Human reviews before acting on output |
If any criterion fails, classify by the highest applicable:
| Tier | When | Example |
|---|---|---|
| 1 — Low | Internal users. May have write access or unreviewed output. No regulated decisions. | Internal chatbot, code assistant, meeting summariser |
| 2 — Medium | Customer-facing. Human reviews before delivery. | Customer support draft, document processing, decision support |
| 3 — High | Regulated decisions, autonomous agents with write access, financial/medical/legal. | Loan decisioning, autonomous trading, clinical support |
2. Controls Required¶
| Control | Fast Lane | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|---|
| Guardrails | Basic filter | Standard | Full suite + injection detection | Hardened, multi-layer |
| LLM-as-Judge | — | 10–20% sample | 100% async | 100% dual-model, pre+post action |
| Human Oversight | — | — | Dedicated reviewers, SLA-bound | Domain experts, dual approval |
| Circuit Breaker | Feature flag | Feature flag | Automated health-check | Automated + staffed fallback |
| Usage Logging | Yes | Yes | Yes | Yes |
Agentic add-ons (if agent has write access): tool permission matrix, transaction resolution plan, multi-agent cascade prevention, 5-phase degradation path (Tier 2+).
3. Fail Posture¶
For each control, define: when it fails, does the system fail-open or fail-closed?
| Tier | Default | What It Means |
|---|---|---|
| Fast Lane / Tier 1 | Fail-open | Pass traffic. Log. Fix next business day. |
| Tier 2 | Fail-closed | Block AI traffic. Auto-switch to fallback. |
| Tier 3 | Fail-closed always | No AI traffic passes a degraded control. No exceptions. |
Fallback path:
| Tier | Fallback | Speed | Maintenance |
|---|---|---|---|
| Fast Lane | Manual process (already exists) | Hours | Near zero |
| Tier 1 | Manual process (documented) | Hours | Near zero |
| Tier 2 | Rule-based / templated | Minutes (auto) | Quarterly |
| Tier 3 | Staffed parallel process | Seconds (auto) | Monthly |
4. Agentic Degradation Path¶
If deploying an agent, define these five phases before go-live:
| Phase | Autonomy | What Changes |
|---|---|---|
| Normal | Full | All controls active |
| Constrained | Reduced | Read-only tools, tightened thresholds, all outputs reviewed |
| Supervised | Propose only | Human approves every action |
| Bypassed | Isolated | Non-AI fallback active, agent quarantined |
| Full Stop | None | All sessions terminated, incident response |
For each tool the agent uses, answer: Can the action be rolled back? Completed without the agent? Is partial completion dangerous?
4b. Multi-Agent Systems¶
If deploying multiple agents that communicate, delegate, or act across trust boundaries, single-agent controls are necessary but not sufficient. The MASO Framework adds six control domains on top of the foundation.
| MASO Control | What It Addresses |
|---|---|
| Prompt, Goal & Epistemic Integrity | Injection propagation across agents, goal drift, hallucination amplification, groupthink |
| Identity & Access | Non-Human Identity per agent, no shared credentials, no transitive authority |
| Data Protection | Cross-agent data fencing, DLP on the message bus, memory isolation |
| Execution Control | Sandboxed execution, blast radius caps, LLM-as-Judge gate, interaction timeouts |
| Observability | Decision chain audit, anomaly scoring, drift detection, independent kill switch |
| Supply Chain | AIBOM per agent, signed tool manifests, MCP server vetting |
Implementation tiers: Tier 1 — Supervised (human approves all writes) → Tier 2 — Managed (auto-approve low-risk, escalate high-risk) → Tier 3 — Autonomous (self-healing PACE, adversarial testing, kill switch).
Key difference from single-agent: PACE extends to agent orchestration. When one agent fails, the system isolates that agent and tightens permissions across the chain — not just within a single model's control layers.
5. Test¶
| Test | Fast Lane | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|---|
| Feature flag / kill switch works | Annual | Annual | Quarterly | Monthly |
| Control layer failure simulation | — | Annual | Quarterly | Monthly |
| Human escalation exercise | — | Annual | Quarterly | Quarterly |
| Full degradation walkthrough | — | — | Semi-annual | Quarterly |
| Non-AI fallback operation | Annual | Annual | Quarterly | Monthly |
| Recovery (step back up) | — | Annual | Quarterly | Monthly |
The Six Questions¶
Every AI deployment must answer these before production:
- What tier is this?
- What controls does it need?
- Fail-open or fail-closed?
- What's the fallback path?
- Has it been tested?
- Is this multi-agent? If yes → apply MASO controls on top of the foundation.
If you can answer all six, you're ready. If you can't, you're not.¶
AI Runtime Behaviour Security, 2026 (Jonathan Gill).