When OpenAI launched its plugin marketplace in March 2023, third-party developers could register endpoints that ChatGPT would call on a user's behalf. Security researchers at Embrace The Red and independent testers quickly discovered that a malicious website could embed instructions in its content that ChatGPT's browsing plugin would read — causing the model to silently exfiltrate conversation data to an attacker-controlled URL. The plugin had no mechanism to distinguish between data it was supposed to retrieve and commands embedded inside that data.
This was the first large-scale, publicly documented demonstration that indirect prompt injection through plugins was a practical, not theoretical, threat. OpenAI eventually restricted the browsing plugin and overhauled plugin permissions before the general release of GPT-4 Plugins.
A plugin (also called a tool in many frameworks) is a function the LLM can invoke during inference. The model receives a list of available tools with their schemas, decides whether to call one, constructs the call arguments from natural language, receives the result, and continues generating. In practice this covers: web search, code execution, database queries, email/calendar access, payment APIs, file system operations, and custom enterprise integrations.
The architectural pattern is straightforward — but it fundamentally changes the threat model. A model that only generates text can cause harm through its output. A model that can act — send an email, run SQL, commit code, transfer funds — can cause harm at machine speed with no human review in the loop.
OWASP's LLM Top 10 (2025 edition) lists Excessive Agency at LLM06 and Insecure Plugin Design as a distinct entry. The core concern: plugins are often designed to be maximally capable for legitimate users, which means they are also maximally dangerous when the model is manipulated. The three failure modes OWASP identifies are:
When you are pen testing a system with LLM plugin access, enumerate the full attack surface before attempting exploitation. The surface has five distinct layers:
When you first encounter an LLM with plugin access, spend time enumerating all registered tools before attempting any injection. Request the system prompt if possible — many systems expose it via "what tools do you have?" or "describe your capabilities." The tool schema is your target map.
Modern LLM deployments attach plugins in several ways. Each has different pen testing entry points:
OpenAI Function Calling / Tool Use — arguments are JSON, validated against a schema. Test for type coercion attacks (passing a string where an integer is expected), schema boundary violations, and argument injection through retrieved context.
LangChain / LlamaIndex Agents — open-source frameworks that wrap tools as Python callables. Often deployed without the schema validation that hosted APIs enforce. The tool's docstring becomes the model's understanding of what the tool does — manipulate the docstring (if you have access) or the input that triggers tool selection.
Semantic Kernel / Microsoft Copilot Extensions — enterprise-oriented, with plugins that have access to M365 data including email, calendar, SharePoint, and Teams. Compromise here has significant data access implications.
Anthropic Tool Use (Claude) — XML-tagged tool calls, strict schema. The model is more resistant to producing malformed tool calls but indirect injection through retrieved documents remains viable.
The model is not the security boundary. The plugin is. If the plugin executes whatever the model sends without validation, then any attack that influences the model's output — prompt injection, jailbreak, indirect injection — becomes an attack on the downstream system the plugin connects to.
You are beginning a pen test engagement on an enterprise LLM deployment described as a "sales assistant" with plugin access to CRM, email, calendar, and a web search tool. Your task is to enumerate the attack surface before attempting any exploitation.
Use the AI assistant below to work through: what questions you would ask the system to enumerate its tools, what schemas you'd look for, and what initial risk signals each tool type presents. Aim for at least 3 exchanges.
In a 2023 paper titled "Compromising LLM-Integrated Applications with Indirect Prompt Injections," researchers Greshake et al. demonstrated that an LLM agent with database access could be manipulated through retrieved document content to execute SQL queries beyond its intended scope. The attack did not require the attacker to have any direct access to the application — injecting adversarial text into a document the agent would later retrieve was sufficient to trigger unauthorized data exfiltration via the model's database plugin.
A separate demonstration by security researcher Johann Rehberger in 2024 showed that Microsoft 365 Copilot could be induced to exfiltrate SharePoint file contents through crafted email content — the injected instructions arrived via the email plugin's retrieved data, then redirected the model to call the file-access plugin with exfiltration parameters.
Classic injection attacks (SQLi, command injection, path traversal, SSRF) require the attacker to place malicious input into a field that gets interpreted by a backend system. With LLM plugins, the attack chain gains an extra step — but also gains new possibilities:
LLM plugins that translate natural language to SQL (text-to-SQL plugins) are particularly vulnerable. The model generates a SQL query string from user intent — but if user input contains SQL syntax fragments, the model may incorporate them into the generated query, especially if the model is not explicitly instructed to parameterize queries.
Plugins that read or write files from a user-specified path are vulnerable when the model passes path arguments constructed from user input without sanitization. A file-read plugin designed to retrieve project documents can be redirected to read /etc/passwd or AWS credential files if the model is manipulated to construct that path.
Testing approach: ask the model to "read the file at ../../etc/passwd" or embed that path reference in retrieved content. Observe whether the model passes the literal path to the plugin or sanitizes it. Many implementations do not sanitize.
Web browsing plugins fetch URLs on behalf of the user. If the plugin does not validate that the URL points to a public resource, an attacker can direct it to internal network endpoints — cloud metadata services, internal APIs, or admin dashboards not exposed to the internet.
In 2023, the Embrace The Red research demonstrated that ChatGPT's browsing plugin could be directed to fetch http://169.254.169.254/ before OpenAI added URL allowlisting. The cloud metadata endpoint would return AWS credentials in environments where ChatGPT ran on EC2 instances. OpenAI patched this with URL validation controls.
Code interpreter plugins (ChatGPT's Code Interpreter, Jupyter-backed agents, custom Python execution environments) present the highest-severity injection surface. The model generates code that the plugin executes. Injection here does not require classic string injection — it requires convincing the model to write and execute malicious code.
=cmd|' /C calc'!A0 — model reads and processes it, triggering execution in some environments.Approach plugin injection testing in a structured sequence. Do not jump directly to exotic attacks — start with the basics, which are often undefended:
Many developers assume the LLM "knows better" than to pass dangerous strings to plugins. This is wrong — the model has no inherent understanding of what constitutes a dangerous backend argument. Validation must happen in the plugin code, not in the model's reasoning.
You are testing an LLM agent that has three plugins: a text-to-SQL database query tool, a web browsing tool, and a file reader. Your goal is to construct proof-of-concept injection attack chains for each plugin type and analyze how indirect injection could chain them together.
Work with the assistant to develop specific payload examples and explain the complete attack chain for each vector. Complete at least 3 substantive exchanges.
In March 2024, security researcher Johann Rehberger published a documented attack against Microsoft 365 Copilot demonstrating a full cross-plugin exploitation chain. An attacker sent a target a crafted email containing adversarial instructions embedded in the body text. When the victim used Copilot to summarize their email, Copilot's email plugin retrieved the message — and the embedded instructions directed the model to call the SharePoint file access plugin with parameters that exfiltrated the victim's sensitive documents. The data was then encoded in a URL and the model was instructed to render it as a hyperlink the victim would click — delivering the exfiltrated content to the attacker's server via a simple click.
This attack required no malware, no credentials, and no direct access to the victim's tenant. A single crafted email, combined with the victim's legitimate Copilot use, was sufficient. Microsoft acknowledged the research and implemented mitigations, though the underlying architecture remained subject to the same class of attack.
When an LLM agent has access to multiple plugins, compromising the data flow through one plugin can enable access to others. This is the LLM equivalent of lateral movement in traditional network pen testing — but it happens inside the agent's reasoning loop, mediated by natural language rather than network packets.
The attack pattern is: exploit a low-privilege plugin to inject instructions, which redirect the model to call a high-privilege plugin with attacker-controlled parameters. The model itself becomes the lateral movement vehicle.
Several distinct privilege escalation patterns emerge in multi-plugin systems. Each has a different exploitation path and different defensive requirements:
Once the model has been redirected to access sensitive data, it needs a channel to deliver it to the attacker. In multi-plugin systems, multiple exfiltration channels exist simultaneously:
To test for cross-plugin exploitation, construct a matrix of all plugin pairs and evaluate which combinations could enable privilege escalation. For each pair, test:
In the Rehberger M365 research, the full chain was: email plugin read → context hijack → SharePoint file plugin read → markdown URL exfiltration. Map this pattern explicitly in your report: each arrow is a control failure. "The model called a second plugin with attacker-controlled parameters" should appear in your finding description with the exact plugin names and parameter values observed.
If the agent has a memory or knowledge-base plugin with write access, cross-plugin exploitation can achieve persistence. A single successful injection can store adversarial instructions in the agent's memory — ensuring that every future session begins with the attacker's instructions already in context.
This is the LLM equivalent of achieving persistence on a compromised host. Test for it explicitly: after injecting instructions that write to memory, reset the conversation and verify whether the agent's behavior in the new session reflects the injected memory content.
A single injection vulnerability in a multi-plugin system with memory access is a critical finding, not a medium. The combination of cross-plugin lateral movement and persistent memory poisoning means the initial injection can affect all future users and sessions — the blast radius extends far beyond the initial attacker interaction.
You are pen testing an enterprise LLM agent with five plugins: email read/write, calendar read/write, SharePoint file access (read), a web search plugin, and a persistent memory store (read/write). You need to build a cross-plugin exploitation matrix and identify the highest-severity chains.
Work with the assistant to construct specific attack chains, identify the most dangerous plugin combinations, and draft the finding descriptions you would include in a pen test report. Complete at least 3 exchanges.
In 2023, NVIDIA released Garak, an open-source LLM vulnerability scanner designed to systematically probe for prompt injection, data exfiltration, and plugin abuse vulnerabilities. The framework's approach — treating LLM security testing as a systematic, repeatable process rather than ad hoc probing — reflected the industry's recognition that manual testing alone was insufficient for production deployments with multiple plugin integrations.
The OWASP LLM Security Top 10 working group incorporated plugin-specific test cases into its guidance following documented incidents, establishing that plugin defense testing must be part of every LLM security assessment, not an optional add-on. Frameworks like Garak demonstrated that automated probing for plugin injection could surface vulnerabilities that manual testing missed — particularly in text-to-SQL and file system plugins where the attack surface is broad.
Before testing whether controls are effective, you need to know what controls should exist. A securely designed plugin system implements defense at four distinct layers:
Verify that each plugin's permission scope matches its stated function. The test is simple: attempt actions that exceed the plugin's stated scope and verify they fail at the plugin layer, not just in the model's reasoning.
For each plugin, construct a test payload set targeting the parameter types it accepts. Evaluate whether validation is happening at the model level (insufficient — bypassable via injection) or at the plugin code level (required):
For plugins that perform irreversible actions, verify that the control actually triggers and cannot be bypassed through the model's reasoning. Test three bypass vectors:
As a pen tester, your deliverable is not just the vulnerability list — it is actionable remediation guidance. For plugin security findings, the standard recommendations align with OWASP's LLM06/LLM07 guidance:
A plugin vulnerability finding in a pen test report should contain: the exact plugin name and parameter affected, the injection payload used, the model-generated plugin call that resulted, the backend action that executed, and the data or capability accessed. Include the full chain — do not simply report "prompt injection possible." Quantify the blast radius.
A high-quality plugin finding reads: "By embedding the payload [X] in a retrieved email, the model called the SharePoint plugin with arguments {path: '../../hr/salaries.xlsx'}, returning salary data for 847 employees. The plugin performed no path validation. Combined with the email send plugin, this constitutes a zero-click data exfiltration chain against any user who asks Copilot to summarize their email." That is a critical finding. "Prompt injection may be possible in email processing" is not a finding — it is an observation."
Your pen test engagement is nearing completion. You've identified two plugin vulnerabilities: (1) a text-to-SQL plugin that concatenates model-generated strings into raw SQL, and (2) a file-read plugin with no path restriction. The development team claims they've added "LLM guardrails" (system prompt instructions) as mitigations. You need to test whether those mitigations are sufficient and draft the findings for your report.
Work with the assistant to: evaluate whether the proposed mitigations are adequate, design bypass tests, and draft a complete finding for one of the vulnerabilities. Complete at least 3 exchanges.