Skip to content

Sandbox Patterns for Agentic AI

Control Domain: Agentic — Execution Controls
Purpose: Contain the execution environment for agents that generate and run code, interact with file systems, or manipulate infrastructure.
Extends: NET-01 (network zones) and SESS-02 (session isolation) with execution-specific depth.


The Problem

Code-generating agents (coding assistants, data analysis agents, automation agents) don't just produce text — they produce executable code and then run it. This means a prompt injection or model error can result in:

  • Arbitrary code execution on infrastructure the agent has access to.
  • File system access (read, write, delete) beyond the intended scope.
  • Network requests to unintended destinations.
  • Resource exhaustion (CPU, memory, disk, network).
  • Persistent changes that outlive the agent session.

The standard controls (guardrails, tool permissions) are necessary but insufficient for code execution. Code is inherently unconstrained — a single line of Python can do anything the runtime environment permits. The sandbox is what limits what "anything" means.


Control Objectives

ID Objective Risk Tiers
SAND-01 Execute agent-generated code in isolated sandbox environments All (code-gen agents)
SAND-02 Restrict sandbox file system access to declared paths All (code-gen agents)
SAND-03 Restrict sandbox network access to declared destinations All (code-gen agents)
SAND-04 Enforce resource limits on sandbox execution All (code-gen agents)
SAND-05 Prevent persistent state from sandbox escaping the session Tier 2+ (code-gen agents)
SAND-06 Scan generated code before execution Tier 2+ (code-gen agents)

SAND-01: Isolated Execution Environments

Agent-generated code must never execute in the same environment as the AI system's infrastructure, backend services, or control plane.

Isolation Levels

Level Technology Use Case
Process isolation Separate process with reduced privileges (seccomp, AppArmor) Low-risk data analysis, read-only operations
Container isolation Ephemeral container per execution (Docker, gVisor) Standard code execution, file manipulation
VM isolation Separate virtual machine per execution High-risk code execution, Tier 3+ systems
Remote sandbox Execution on a separate, disposable host Maximum isolation, untrusted code execution

Selection Criteria

Risk Factor Lower Isolation OK Higher Isolation Required
Code reads data only
Code writes to file system
Code makes network requests
Code installs packages
Code runs user-provided input
Tier 3+ system

SAND-02: File System Restrictions

The sandbox must restrict file system access to explicitly declared paths.

Access Rules

Access Type Permitted Implementation
Read Declared input directories only Mount specific directories read-only
Write Declared output directory only Mount a single output directory read-write
Execute Pre-installed runtimes only No package installation without pre-approval
Temp Sandbox-local temp directory Mounted as tmpfs, size-limited
System None No access to /etc, /var, /proc, system binaries
Other sessions None No access to other sandbox instances' file systems

What This Prevents

  • Agent-generated code reading sensitive files from the host or other sessions.
  • Code writing persistent backdoors to the file system.
  • Code modifying system configuration or installing persistent software.
  • Cross-session data leakage via shared file system paths.

SAND-03: Network Restrictions

Sandbox network access must be explicitly constrained.

Default: No Network Access

For most code execution tasks, the sandbox should have no network access by default. The agent's tools handle external communication via the authorization gateway (IAM-04) and egress proxy (NET-04). The sandbox itself doesn't need network access.

When Network Access Is Required

If the code genuinely needs network access (e.g., fetching a dataset from an approved URL), it must be:

  • Restricted to declared destinations (allowlist).
  • Routed through the egress proxy.
  • Protocol-restricted (HTTPS only).
  • Rate-limited.
  • Logged.

What This Prevents

  • Agent-generated code exfiltrating data to attacker-controlled servers.
  • Reverse shells or C2 channels from within the sandbox.
  • The sandbox being used as a network pivot to attack internal systems.
  • Cryptocurrency mining or other resource abuse via network access.

SAND-04: Resource Limits

Without resource limits, agent-generated code can cause denial of service through resource exhaustion.

Limits

Resource Limit Enforcement
CPU time Maximum wall-clock time per execution (e.g., 60 seconds) Kill process on timeout
Memory Maximum memory allocation (e.g., 512MB) OOM-kill on breach
Disk Maximum disk usage in output directory (e.g., 100MB) Write failure on breach
Processes Maximum process/thread count (e.g., 10) Fork failure on breach
File descriptors Maximum open files (e.g., 100) Open failure on breach
Output size Maximum output returned to the agent (e.g., 1MB) Truncate on breach

Enforcement

Use OS-level resource controls (cgroups, ulimits) rather than application-level checks. The code being executed is untrusted — application-level limits can be circumvented.


SAND-05: No Persistent State Escaping Sessions

Code execution within a sandbox must not create persistent state that survives the session.

Requirements

  • Sandbox environments are ephemeral — created at execution start, destroyed at execution end.
  • Output files are returned to the agent via the authorized path, not left on a shared file system.
  • No installed packages, modified configurations, or created users persist beyond the execution.
  • Environment variables, process state, and temporary files are destroyed.
  • For container-based sandboxes: containers are created from a clean image per execution, never reused.

What This Prevents

  • An attacker using prompt injection to install a persistent backdoor in the execution environment.
  • Cross-execution contamination (poisoned output from execution N affecting execution N+1).
  • Accumulated state creating a growing attack surface over time.

SAND-06: Pre-Execution Code Scanning

Before agent-generated code is executed, scan it for dangerous patterns.

Scanning Targets

Pattern Risk Action
Network calls (requests, urllib, socket, fetch) Data exfiltration Block unless network access explicitly permitted
File system access outside declared paths Unauthorised read/write Block
Subprocess/shell execution (os.system, subprocess, exec) Sandbox escape Block or flag for review
Package installation (pip install, npm install) Supply chain attack Block unless pre-approved
Encoded/obfuscated code Evasion attempt Flag for review
Resource-intensive patterns (infinite loops, fork bombs) DoS Flag, rely on resource limits as backup
Credential patterns in code Credential exposure Redact and flag

Limitations

Code scanning catches known dangerous patterns but is inherently incomplete — the sandbox resource limits and isolation are the primary controls. Code scanning is defence in depth, not a replacement for sandboxing.


Platform-Neutral Implementation Checklist

  • All agent-generated code executes in isolated sandbox environments
  • Isolation level selected based on risk tier and code capabilities
  • File system access restricted to declared input/output paths
  • Default: no network access from sandbox
  • Network access (when required) allowlisted, proxied, and logged
  • Resource limits enforced at OS level (CPU, memory, disk, processes)
  • Sandbox environments ephemeral — no persistent state across executions
  • Pre-execution code scanning for dangerous patterns
  • Sandbox execution logged with code, output, resource usage, and duration
  • Sandbox escape attempts detected and classified as security incidents

AI Runtime Behaviour Security, 2026 (Jonathan Gill).