In 2023, security researchers auditing a healthcare SaaS platform discovered that roughly 40% of the database-access functions generated by GitHub Copilot contained unsanitized string concatenation in SQL queries. The platform had accepted Copilot suggestions wholesale during a sprint. None of the generated code was malicious — the model was simply completing patterns it had seen thousands of times in legacy tutorials, Stack Overflow answers, and pre-2010 PHP codebases. The fix required three weeks of remediation across 200+ endpoints.
This is the core problem: AI models are trained on the entire history of the internet, including all the bad patterns that predated parameterized queries.
SQL injection has been ranked in the OWASP Top 10 every year since 2003. Yet it remains the most common injection vulnerability in AI-generated code for a specific structural reason: the model does not distinguish between "code that appears in a tutorial to illustrate a concept" and "code that should be used in production." Both are training data.
Large language models learn to complete patterns. When a developer writes query = "SELECT * FROM users WHERE id = " +, the model's most statistically likely completion is the string concatenation pattern, because that is what appears most frequently in its training corpus — regardless of whether those examples were warning labels or recommendations.
The consequence: GitHub's own research (2022) found that when developers asked Copilot to write code for security-sensitive scenarios, 40% of the generated suggestions contained at least one vulnerability. SQL injection was the dominant finding in database interaction code.
Injection vulnerabilities occur when untrusted data is sent to an interpreter as part of a command or query. SQL injection allows attackers to manipulate queries to bypass authentication, extract data, modify records, or execute administrative operations. The attack has been exploited in breaches affecting Heartland Payment Systems (134 million cards, 2008), Sony Pictures (2011), and numerous others.
When auditing AI-generated database code, watch for three distinct vulnerable patterns:
The critical audit signal is any SQL string construction that involves a variable. Even seemingly safe patterns like whitelist checking before concatenation are dangerous if implemented incorrectly — and AI models frequently generate incomplete whitelist logic.
A subtler problem: AI models almost never generate protections against second-order injection. In this pattern, malicious data is safely stored (correctly parameterized) but then retrieved and used unsafely in a later query — usually because the developer assumes data from the database is already trusted.
In a 2021 analysis by Trail of Bits, second-order injection accounted for approximately 12% of all SQL injection findings in code review engagements. AI code generators routinely produce this pattern when generating multi-step workflows like password reset flows or user profile updates, where data is read from one table and written to another.
1. Search for any SQL string containing a variable reference (grep: query.*+|f".*SELECT|execute.*format). 2. Verify every cursor.execute() call uses a parameter tuple, never string formatting. 3. Trace data flows from external input through storage and back into queries. 4. Check ORM raw() and execute() escape hatches — AI often uses these when the ORM cannot express the intended query.
You are auditing a Python Flask application. The AI assistant will present you with AI-generated code snippets. Your job is to identify injection vulnerabilities, explain the risk, and ask the assistant to produce safe alternatives. Practice your audit methodology by interrogating the code samples provided.
The Log4Shell vulnerability (CVE-2021-44228), disclosed in December 2021, demonstrated what happens when logging frameworks process untrusted input through an interpreter chain. While Log4Shell itself was not AI-generated, security researchers studying developer responses to the vulnerability found that AI coding assistants — including Copilot and Tabnine — frequently suggested subprocess.call(command, shell=True) patterns when developers asked for logging and notification utilities, creating new command injection pathways in remediation code written during the incident response rush.
The irony: developers patching one injection class were inadvertently introducing another via AI-assisted tooling under time pressure.
Python's subprocess module is the most common vector for command injection in AI-generated code. The shell=True parameter instructs Python to pass the command string to the operating system's shell interpreter (/bin/sh on Unix), which then performs variable expansion, pipe processing, and command chaining. This makes semicolons, pipes, backticks, and dollar signs meaningful as command separators.
AI models generate shell=True patterns for a simple reason: many beginner tutorials and examples use it because it allows passing a full command string rather than a list, which looks simpler. The model has seen this pattern thousands of times and completes it naturally.
Server-side template injection (SSTI) is a form of injection that AI models generate with remarkable frequency in Flask and Django code. When developers ask AI assistants to "render dynamic content" or "create an email template," the model often generates code that passes user-controlled data directly into the template engine's evaluation context.
The real-world consequence of SSTI in Jinja2 (Flask's default) can be remote code execution — a complete system compromise. In 2016, HackerOne disclosed an SSTI vulnerability in Uber's systems that allowed full server takeover via Jinja2 template evaluation. The payload {{7*7}} is the canonical test; if the server returns 49, template evaluation is occurring on user input.
Veracode's 2022 State of Software Security report noted a 300% increase in template injection findings between 2020 and 2022, correlating with increased AI assistant adoption. The report specifically identified render_template_string misuse as a frequent finding in Flask applications developed with AI assistance.
Command injection also appears in AI-generated code through unsafe deserialization — particularly pickle.loads() on user-supplied data. Python's pickle module can execute arbitrary code during deserialization. AI models generate this pattern when asked for "cache this object" or "store session data" scenarios, frequently suggesting pickle for convenience.
1. Grep for: shell=True, os.system, os.popen, eval(, exec(, render_template_string, pickle.loads, yaml.load( (not safe_load). 2. For every hit, trace backward to determine if any variable in the call has a path to user-controlled input. 3. For template rendering, verify all user data is passed as context variables to static template files, never interpolated into template strings. 4. Check Celery tasks and background workers — AI-generated async code frequently inherits these patterns.
You are reviewing a Python web application that uses Flask and calls external system tools for file conversion and email rendering. The AI assistant has access to the codebase. Practice identifying shell=True patterns, template injection risks, and unsafe deserialization in the AI-generated code samples presented.
The British Airways breach of 2018 — which exposed 500,000 customers' payment details and resulted in a £20 million ICO fine — began with a 22-line JavaScript skimmer injected into the booking page. The Magecart group exploited a stored XSS pathway to insert script that exfiltrated form data to a lookalike domain. While not AI-generated, security researchers reviewing similar e-commerce platform code in 2022 and 2023 consistently found that AI coding assistants produced innerHTML-based rendering patterns in payment form components — the exact vector Magecart exploits.
The pattern persists because it is common in tutorials. The consequence is not academic: it is the mechanism behind the most financially damaging web attacks of the past decade.
innerHTML is the most frequently generated XSS vector in AI-produced JavaScript. When a developer asks an AI assistant to "display user comments," "render search results," or "show profile information," the model typically completes with a pattern like element.innerHTML = data because this is the dominant pattern in its training corpus — it works, it is concise, and the vast majority of examples in tutorials do not include sanitization.
The security distinction that AI models consistently miss: innerHTML parses HTML, including script execution contexts, while textContent sets the text node value without parsing. The difference is the entire XSS attack surface.
React deliberately named the prop dangerouslySetInnerHTML as a warning signal. Despite this explicit naming, AI models generate it routinely — often without the DOMPurify sanitization layer that makes it safe. In a 2023 analysis by Snyk of 100 AI-generated React components that handled user-generated content, 23 used dangerouslySetInnerHTML and of those, only 4 included any sanitization whatsoever.
The model generates it because developers genuinely need to render formatted content (markdown, HTML emails, rich text) and dangerouslySetInnerHTML is the correct React mechanism — when combined with sanitization. The AI consistently omits the sanitization step.
AI models generating URL parameter processing code — for redirects, search queries, and referral tracking — frequently create DOM-based XSS pathways. The pattern involves reading location.search or location.hash and writing values to the DOM without validation.
This class of XSS does not appear in server logs (the payload never reaches the server) and is not caught by WAFs inspecting HTTP request bodies. It requires JavaScript-specific auditing tooling or manual code review.
A properly configured Content Security Policy prevents XSS even when injection occurs, by blocking inline script execution. AI models almost never generate CSP headers. Auditors should flag any web application without a CSP as missing an essential defense layer, regardless of injection findings in the code itself.
You are reviewing a React application that includes a user comment system, a profile bio renderer, and a search results page. The codebase was generated with AI assistance. Practice identifying innerHTML, dangerouslySetInnerHTML, and DOM-based XSS patterns and work with the AI assistant to produce safe alternatives.
In April 2021, GitLab patched CVE-2021-22205, a path traversal combined with file upload vulnerability that allowed unauthenticated remote code execution. Separately, security firm Detectify documented in their 2022 research that path traversal vulnerabilities in file download and static serving code generated by GitHub Copilot were reproducible across multiple test scenarios — the model consistently generated os.path.join(base_dir, filename) without realizing that user-controlled filenames containing ../ sequences can escape the intended directory, because os.path.join does not normalize traversal sequences when the second argument is not absolute.
This is not a subtle edge case. It is a beginner mistake — one that AI models make because the safe pattern requires an additional normalization step that does not appear in most tutorial examples.
Path traversal (also called directory traversal) allows attackers to access files outside the intended directory by inserting ../ sequences into filenames. The attack reads: if a web server serves files from /var/www/uploads/ and constructs the path as os.path.join("/var/www/uploads", filename), a filename of ../../etc/passwd resolves to /etc/passwd.
The safe pattern requires two steps AI models reliably omit: normalizing the path to resolve traversal sequences, then verifying the resolved path still begins with the intended base directory. AI-generated code almost universally performs one or neither step.
Header injection occurs when user-controlled data is placed into HTTP response headers without stripping newline characters. An attacker who can inject \r\n sequences into a header value can insert arbitrary HTTP headers — including a second response body, enabling cache poisoning and cross-site scripting via the injected response.
AI models generate this pattern most frequently in redirect handling — using Location header values derived from user input — and in cookie setting code that incorporates user-provided values into the Set-Cookie header.
Open redirects are frequently generated by AI when implementing OAuth flows, "continue to" post-login redirects, and referral tracking. While not directly an injection vulnerability, they amplify phishing attacks by allowing attackers to use trusted domain URLs that redirect to malicious sites. In 2022, Twitter disclosed an open redirect that was being used in OAuth phishing campaigns. The vulnerability was trivial: an unvalidated next parameter in the authentication flow.
AI models almost never validate redirect targets because the tutorial examples of OAuth and post-login redirect handling almost never include this step — it is assumed to be handled elsewhere, or simply overlooked.
1. Find all file path constructions involving user input. Verify os.path.realpath is called and the result is checked against the base directory with startswith(base + os.sep). 2. Flag any use of send_file() (not send_from_directory()) with user-supplied paths. 3. Identify every response.headers assignment — check for CRLF stripping on any user-controlled value. 4. Audit all redirects — confirm next_url or similar parameters are validated to relative paths or a whitelist of allowed domains. 5. Check Set-Cookie implementations for user-supplied cookie values.
You are auditing a Flask application that serves user-uploaded files and handles post-authentication redirects. The codebase was generated with AI assistance. Practice identifying path traversal vulnerabilities, CRLF header injection risks, and open redirect patterns in the file serving and redirect handling code.
query = "SELECT * FROM users WHERE email='" + email + "'". Which injection class does this represent?pickle.loads(request.data) in a Flask endpoint. What is the security implication?