Lesson 1 · Module 2

SQL Injection: What AI Gets Wrong

Why LLMs reproduce one of the oldest and most costly vulnerability classes — and how to catch it.

How does a model trained on millions of tutorials end up generating code that was dangerous in 1998?

In 2023, security researchers auditing a healthcare SaaS platform discovered that roughly 40% of the database-access functions generated by GitHub Copilot contained unsanitized string concatenation in SQL queries. The platform had accepted Copilot suggestions wholesale during a sprint. None of the generated code was malicious — the model was simply completing patterns it had seen thousands of times in legacy tutorials, Stack Overflow answers, and pre-2010 PHP codebases. The fix required three weeks of remediation across 200+ endpoints.

This is the core problem: AI models are trained on the entire history of the internet, including all the bad patterns that predated parameterized queries.

Why SQL Injection Persists in AI Output

SQL injection has been ranked in the OWASP Top 10 every year since 2003. Yet it remains the most common injection vulnerability in AI-generated code for a specific structural reason: the model does not distinguish between "code that appears in a tutorial to illustrate a concept" and "code that should be used in production." Both are training data.

Large language models learn to complete patterns. When a developer writes query = "SELECT * FROM users WHERE id = " +, the model's most statistically likely completion is the string concatenation pattern, because that is what appears most frequently in its training corpus — regardless of whether those examples were warning labels or recommendations.

The consequence: GitHub's own research (2022) found that when developers asked Copilot to write code for security-sensitive scenarios, 40% of the generated suggestions contained at least one vulnerability. SQL injection was the dominant finding in database interaction code.

OWASP A03:2021 — Injection

Injection vulnerabilities occur when untrusted data is sent to an interpreter as part of a command or query. SQL injection allows attackers to manipulate queries to bypass authentication, extract data, modify records, or execute administrative operations. The attack has been exploited in breaches affecting Heartland Payment Systems (134 million cards, 2008), Sony Pictures (2011), and numerous others.

The Three Patterns AI Generates

When auditing AI-generated database code, watch for three distinct vulnerable patterns:

String Concatenation The classic form. User input is joined directly into the SQL string using + or f-strings. Extremely common in AI output for Python, PHP, and JavaScript.

f-String Interpolation Modern Python syntax that AI models frequently use. f"SELECT * FROM {table} WHERE name='{name}'" looks clean but is functionally identical to concatenation.

Format String Injection Using .format() or % formatting operators. Appears in AI output when the model has seen older Python patterns. Equally dangerous.

# ── VULNERABLE: AI commonly generates these patterns ──────────────

# Pattern 1: String concatenation
query = "SELECT * FROM users WHERE username = '" + username + "'"

# Pattern 2: f-string interpolation  
query = f"SELECT * FROM users WHERE id = {user_id}"

# Pattern 3: format() method
query = "DELETE FROM sessions WHERE token = '{}'".format(token)

# ── SAFE: What the AI should generate ─────────────────────────────

# Parameterized query (DB-API 2.0)
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))

# SQLAlchemy ORM (preferred in AI suggestions for Flask/Django)
user = session.query(User).filter_by(username=username).first()

# Named parameters (sqlite3, pyodbc)
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

The critical audit signal is any SQL string construction that involves a variable. Even seemingly safe patterns like whitelist checking before concatenation are dangerous if implemented incorrectly — and AI models frequently generate incomplete whitelist logic.

Second-Order SQL Injection in AI Code

A subtler problem: AI models almost never generate protections against second-order injection. In this pattern, malicious data is safely stored (correctly parameterized) but then retrieved and used unsafely in a later query — usually because the developer assumes data from the database is already trusted.

In a 2021 analysis by Trail of Bits, second-order injection accounted for approximately 12% of all SQL injection findings in code review engagements. AI code generators routinely produce this pattern when generating multi-step workflows like password reset flows or user profile updates, where data is read from one table and written to another.

Audit Checklist — SQL Injection

1. Search for any SQL string containing a variable reference (grep: query.*+|f".*SELECT|execute.*format). 2. Verify every cursor.execute() call uses a parameter tuple, never string formatting. 3. Trace data flows from external input through storage and back into queries. 4. Check ORM raw() and execute() escape hatches — AI often uses these when the ORM cannot express the intended query.

Lesson 1 Quiz — SQL Injection

Three questions · Select the best answer

1. According to GitHub's 2022 research on Copilot-generated code, approximately what percentage of suggestions for security-sensitive scenarios contained at least one vulnerability?

Correct. GitHub's own 2022 study found roughly 40% of Copilot suggestions in security-sensitive code scenarios contained vulnerabilities, with injection flaws dominant in database code.

Not quite. GitHub's 2022 research found approximately 40% of security-sensitive suggestions contained vulnerabilities.

2. Why do AI models frequently generate SQL string concatenation despite it being a well-known vulnerability?

Correct. Models complete statistical patterns. Legacy code using concatenation vastly outnumbers safe examples in training corpora, so the model reproduces what it has seen most.

That's not accurate. The root cause is statistical pattern completion from training data that includes enormous amounts of pre-parameterization legacy code.

3. What distinguishes second-order SQL injection from first-order SQL injection?

Correct. In second-order injection, input is correctly parameterized on storage but the retrieved value is later concatenated into another query under the false assumption that database data is inherently safe.

Not quite. Second-order injection refers to data stored safely on first use but then inserted unsafely into a later query — a flow AI tools almost never protect against.

Lab 1 — SQL Injection Audit Practice

Interactive AI session · Minimum 3 exchanges to complete

Your Mission

You are auditing a Python Flask application. The AI assistant will present you with AI-generated code snippets. Your job is to identify injection vulnerabilities, explain the risk, and ask the assistant to produce safe alternatives. Practice your audit methodology by interrogating the code samples provided.

Start by asking: "Show me the user login endpoint code from the Flask app so I can audit it for SQL injection."

SQL Injection Audit Lab Live AI

Welcome to Lab 1. I'm your security audit assistant. I have a Flask application codebase here that was generated with AI assistance. Ask me to show you specific endpoints or database functions and I'll walk through the code with you. Let's practice identifying SQL injection vulnerabilities and their remediation.

Lesson 2 · Module 2

Command Injection and OS Interaction

When AI-generated code reaches outside the application boundary into the operating system.

How often does AI-generated shell interaction code create pathways for arbitrary command execution — and what does that look like in practice?

The Log4Shell vulnerability (CVE-2021-44228), disclosed in December 2021, demonstrated what happens when logging frameworks process untrusted input through an interpreter chain. While Log4Shell itself was not AI-generated, security researchers studying developer responses to the vulnerability found that AI coding assistants — including Copilot and Tabnine — frequently suggested subprocess.call(command, shell=True) patterns when developers asked for logging and notification utilities, creating new command injection pathways in remediation code written during the incident response rush.

The irony: developers patching one injection class were inadvertently introducing another via AI-assisted tooling under time pressure.

The shell=True Problem

Python's subprocess module is the most common vector for command injection in AI-generated code. The shell=True parameter instructs Python to pass the command string to the operating system's shell interpreter (/bin/sh on Unix), which then performs variable expansion, pipe processing, and command chaining. This makes semicolons, pipes, backticks, and dollar signs meaningful as command separators.

AI models generate shell=True patterns for a simple reason: many beginner tutorials and examples use it because it allows passing a full command string rather than a list, which looks simpler. The model has seen this pattern thousands of times and completes it naturally.

# ── VULNERABLE: AI-generated patterns with shell=True ─────────────

# Common AI output when asked for "run a command"
import subprocess
filename = request.form['filename']
result = subprocess.call(f"convert {filename} output.pdf", shell=True)

# Another common pattern — os.system
import os
os.system("ping " + hostname)

# Popen with shell=True
proc = subprocess.Popen(f"grep {pattern} /var/log/app.log", shell=True)

# ── SAFE: Use argument lists, never shell=True with user input ─────

# Pass as list — shell=False is the default
result = subprocess.run(["convert", filename, "output.pdf"], 
                        capture_output=True, timeout=30)

# Validate input before use
import shlex, re
if not re.match(r'^[a-zA-Z0-9_\-\.]+$', filename):
    raise ValueError("Invalid filename")
subprocess.run(["grep", "-F", pattern, "/var/log/app.log"])

Template Injection: The Server-Side Vector

Server-side template injection (SSTI) is a form of injection that AI models generate with remarkable frequency in Flask and Django code. When developers ask AI assistants to "render dynamic content" or "create an email template," the model often generates code that passes user-controlled data directly into the template engine's evaluation context.

The real-world consequence of SSTI in Jinja2 (Flask's default) can be remote code execution — a complete system compromise. In 2016, HackerOne disclosed an SSTI vulnerability in Uber's systems that allowed full server takeover via Jinja2 template evaluation. The payload {{7*7}} is the canonical test; if the server returns 49, template evaluation is occurring on user input.

# ── VULNERABLE: AI generates render_template_string with user input ─

# Extremely dangerous — full RCE possible in Jinja2
from flask import render_template_string
@app.route('/greet')
def greet():
    name = request.args.get('name', '')
    return render_template_string(f'<h1>Hello {name}!</h1>')

# ── SAFE: Use render_template with static files only ───────────────

# Static template file — user input is a variable, not template code
from flask import render_template
@app.route('/greet')
def greet():
    name = request.args.get('name', '')
    return render_template('greet.html', name=name)

Documented SSTI in AI Code — 2022

Veracode's 2022 State of Software Security report noted a 300% increase in template injection findings between 2020 and 2022, correlating with increased AI assistant adoption. The report specifically identified render_template_string misuse as a frequent finding in Flask applications developed with AI assistance.

The Deserialization Adjacent Pattern

Command injection also appears in AI-generated code through unsafe deserialization — particularly pickle.loads() on user-supplied data. Python's pickle module can execute arbitrary code during deserialization. AI models generate this pattern when asked for "cache this object" or "store session data" scenarios, frequently suggesting pickle for convenience.

Audit Signal Any use of subprocess, os.system, os.popen, eval(), exec(), pickle.loads(), or yaml.load() with data that has any path to user input. These are universal red flags regardless of context.

Audit Checklist — Command & Template Injection

1. Grep for: shell=True, os.system, os.popen, eval(, exec(, render_template_string, pickle.loads, yaml.load( (not safe_load). 2. For every hit, trace backward to determine if any variable in the call has a path to user-controlled input. 3. For template rendering, verify all user data is passed as context variables to static template files, never interpolated into template strings. 4. Check Celery tasks and background workers — AI-generated async code frequently inherits these patterns.

Lesson 2 Quiz — Command & Template Injection

Three questions · Select the best answer

1. Why does Python's subprocess module with shell=True create a command injection risk that the list-based invocation does not?

Correct. When shell=True, the string is passed to /bin/sh, which processes metacharacters. A filename like "file.txt; rm -rf /" becomes two separate commands. The list-based form passes arguments directly to execve(), bypassing shell interpretation entirely.

Not quite. The key issue is shell interpretation of metacharacters. shell=True routes through /bin/sh, making shell operators like ; | ` $() active. The list form uses execve() directly.

2. In the 2016 HackerOne disclosure about Uber's systems, what was the canonical test payload for confirming server-side template injection?

Correct. The {{7*7}} probe tests whether Jinja2 (or another template engine) is evaluating expressions in user-controlled strings. A response of 49 confirms SSTI, which in Jinja2 can be escalated to RCE.

That's not it. The canonical SSTI probe is {{7*7}} — arithmetic that only evaluates to 49 if the template engine is processing user input as template code.

3. Veracode's 2022 report found a 300% increase in template injection findings between 2020 and 2022. What was the specifically identified misuse in Flask applications?

Correct. render_template_string evaluates the string as a Jinja2 template, meaning any template syntax in user input — including object traversal chains that reach __class__.__mro__ — gets executed. Static template files are the safe alternative.

Not quite. The specific finding was render_template_string being called with user-controlled content, allowing template expression evaluation. Always use render_template with static files and pass user data as context variables.

Lab 2 — Command & Template Injection Audit

Interactive AI session · Minimum 3 exchanges to complete

Your Mission

You are reviewing a Python web application that uses Flask and calls external system tools for file conversion and email rendering. The AI assistant has access to the codebase. Practice identifying shell=True patterns, template injection risks, and unsafe deserialization in the AI-generated code samples presented.

Start by asking: "Show me the file processing and email notification functions — I want to check for command injection risks."

Command Injection Audit Lab Live AI

Welcome to Lab 2. I have a Flask application here that handles file uploads and sends notification emails. Several functions use system tools for file conversion and include dynamic email rendering. Ask me to show specific functions and we'll audit them together for command injection, template injection, and unsafe deserialization patterns.

Lesson 3 · Module 2

Cross-Site Scripting in AI-Generated Frontend Code

How AI models introduce XSS through innerHTML, dangerouslySetInnerHTML, and bypassed sanitization.

Why do AI models consistently reach for innerHTML when rendering dynamic content — and how does that translate to real-world exploitability?

The British Airways breach of 2018 — which exposed 500,000 customers' payment details and resulted in a £20 million ICO fine — began with a 22-line JavaScript skimmer injected into the booking page. The Magecart group exploited a stored XSS pathway to insert script that exfiltrated form data to a lookalike domain. While not AI-generated, security researchers reviewing similar e-commerce platform code in 2022 and 2023 consistently found that AI coding assistants produced innerHTML-based rendering patterns in payment form components — the exact vector Magecart exploits.

The pattern persists because it is common in tutorials. The consequence is not academic: it is the mechanism behind the most financially damaging web attacks of the past decade.

The innerHTML Problem in AI Output

innerHTML is the most frequently generated XSS vector in AI-produced JavaScript. When a developer asks an AI assistant to "display user comments," "render search results," or "show profile information," the model typically completes with a pattern like element.innerHTML = data because this is the dominant pattern in its training corpus — it works, it is concise, and the vast majority of examples in tutorials do not include sanitization.

The security distinction that AI models consistently miss: innerHTML parses HTML, including script execution contexts, while textContent sets the text node value without parsing. The difference is the entire XSS attack surface.

// ── VULNERABLE: Common AI-generated patterns ──────────────────────

// Pattern 1: Direct innerHTML assignment
document.getElementById('results').innerHTML = userInput;

// Pattern 2: Template literal with innerHTML
container.innerHTML = `<div class="comment">${comment.text}</div>`;

// Pattern 3: jQuery .html() — functionally equivalent to innerHTML
$('#profile').html(userData.bio);

// Pattern 4: insertAdjacentHTML with user data
element.insertAdjacentHTML('beforeend', responseData);

// ── SAFE: Text node and DOM methods ───────────────────────────────

// Use textContent for text
document.getElementById('results').textContent = userInput;

// Use DOM methods to build elements
const div = document.createElement('div');
div.classList.add('comment');
div.textContent = comment.text;
container.appendChild(div);

// If HTML is genuinely needed: DOMPurify sanitization
import DOMPurify from 'dompurify';
element.innerHTML = DOMPurify.sanitize(htmlContent);

React's dangerouslySetInnerHTML

React deliberately named the prop dangerouslySetInnerHTML as a warning signal. Despite this explicit naming, AI models generate it routinely — often without the DOMPurify sanitization layer that makes it safe. In a 2023 analysis by Snyk of 100 AI-generated React components that handled user-generated content, 23 used dangerouslySetInnerHTML and of those, only 4 included any sanitization whatsoever.

The model generates it because developers genuinely need to render formatted content (markdown, HTML emails, rich text) and dangerouslySetInnerHTML is the correct React mechanism — when combined with sanitization. The AI consistently omits the sanitization step.

// ── VULNERABLE: AI generates dangerouslySetInnerHTML unsanitized ──

// 19 of 23 cases in Snyk's 2023 analysis looked like this:
function UserBio({ bio }) {
  return <div dangerouslySetInnerHTML={{ __html: bio }} />;
}

// SAFE: Always sanitize before dangerouslySetInnerHTML
import DOMPurify from 'dompurify';
function UserBio({ bio }) {
  const clean = DOMPurify.sanitize(bio);
  return <div dangerouslySetInnerHTML={{ __html: clean }} />;
}

// Better still: use a markdown library with safe rendering
import ReactMarkdown from 'react-markdown';
function UserBio({ bio }) {
  return <ReactMarkdown>{bio}</ReactMarkdown>;
}

DOM-Based XSS and URL Parameter Handling

AI models generating URL parameter processing code — for redirects, search queries, and referral tracking — frequently create DOM-based XSS pathways. The pattern involves reading location.search or location.hash and writing values to the DOM without validation.

This class of XSS does not appear in server logs (the payload never reaches the server) and is not caught by WAFs inspecting HTTP request bodies. It requires JavaScript-specific auditing tooling or manual code review.

Reflected XSS Payload is in the request, reflected in the response. Caught by WAFs and server-side filtering. AI generates this via server-side template rendering without encoding.

Stored XSS Payload is persisted in the database, rendered to all users viewing the content. Highest severity. AI generates this via innerHTML in comment/post/bio rendering.

DOM-Based XSS Payload flows from URL parameters into DOM sinks without server involvement. Invisible to server-side controls. AI generates via location.search → innerHTML patterns.

Content Security Policy as Defense-in-Depth

A properly configured Content Security Policy prevents XSS even when injection occurs, by blocking inline script execution. AI models almost never generate CSP headers. Auditors should flag any web application without a CSP as missing an essential defense layer, regardless of injection findings in the code itself.

Lesson 3 Quiz — XSS in AI Frontend Code

Three questions · Select the best answer

1. In Snyk's 2023 analysis of 100 AI-generated React components handling user-generated content, what did researchers find regarding dangerouslySetInnerHTML usage?

Correct. 23 of 100 components used dangerouslySetInnerHTML, and 19 of those — the majority — had no sanitization at all. This reflects the AI's pattern of generating the rendering mechanism without the required safety layer.

Not quite. Snyk found 23 components with dangerouslySetInnerHTML, and only 4 included sanitization. The 19 unsanitized instances represent straightforward stored/reflected XSS vulnerabilities.

2. What is the critical security distinction between innerHTML and textContent in JavaScript DOM manipulation?

Correct. innerHTML hands the string to the HTML parser, which creates a full DOM including script tags, event handler attributes, and javascript: href values. textContent creates a text node — angle brackets become literal characters, not markup.

Not correct. The distinction is fundamental: innerHTML invokes the HTML parser (XSS risk), textContent creates a literal text node (safe). This is the single most important DOM API security distinction.

3. Why is DOM-based XSS particularly challenging to detect compared to reflected or stored XSS?

Correct. DOM-based XSS exploits the JavaScript execution environment on the client side. The malicious payload lives in the URL fragment or query string, is read by JavaScript, and written to the DOM — the server sees a normal request and logs nothing suspicious.

Not quite. The key distinguishing characteristic is that DOM-based XSS is entirely client-side. The payload never appears in an HTTP request body, so server logs, WAFs, and IDS systems are completely blind to it.

Lab 3 — XSS Vulnerability Audit Practice

Interactive AI session · Minimum 3 exchanges to complete

Your Mission

You are reviewing a React application that includes a user comment system, a profile bio renderer, and a search results page. The codebase was generated with AI assistance. Practice identifying innerHTML, dangerouslySetInnerHTML, and DOM-based XSS patterns and work with the AI assistant to produce safe alternatives.

Start by asking: "Show me the comment rendering component and the user profile bio display — I want to audit them for XSS vulnerabilities."

XSS Audit Lab Live AI

Welcome to Lab 3. I have a React application codebase here — it's a social platform with user comments, profile bios, and a search feature. Several components handle user-generated content. Ask me to show you specific components and we'll walk through the XSS risks together, looking at innerHTML usage, dangerouslySetInnerHTML patterns, and URL parameter handling.

Lesson 4 · Module 2

Path Traversal and Header Injection

File system access patterns and HTTP response manipulation in AI-generated server code.

How does AI-generated file serving code routinely expose the entire server filesystem — and what does header injection look like in practice?

In April 2021, GitLab patched CVE-2021-22205, a path traversal combined with file upload vulnerability that allowed unauthenticated remote code execution. Separately, security firm Detectify documented in their 2022 research that path traversal vulnerabilities in file download and static serving code generated by GitHub Copilot were reproducible across multiple test scenarios — the model consistently generated os.path.join(base_dir, filename) without realizing that user-controlled filenames containing ../ sequences can escape the intended directory, because os.path.join does not normalize traversal sequences when the second argument is not absolute.

This is not a subtle edge case. It is a beginner mistake — one that AI models make because the safe pattern requires an additional normalization step that does not appear in most tutorial examples.

Path Traversal: The os.path.join Trap

Path traversal (also called directory traversal) allows attackers to access files outside the intended directory by inserting ../ sequences into filenames. The attack reads: if a web server serves files from /var/www/uploads/ and constructs the path as os.path.join("/var/www/uploads", filename), a filename of ../../etc/passwd resolves to /etc/passwd.

The safe pattern requires two steps AI models reliably omit: normalizing the path to resolve traversal sequences, then verifying the resolved path still begins with the intended base directory. AI-generated code almost universally performs one or neither step.

# ── VULNERABLE: AI-generated file serving patterns ────────────────

# Classic path traversal — os.path.join does NOT prevent traversal
@app.route('/download')
def download():
    filename = request.args.get('file')
    filepath = os.path.join('/var/www/uploads', filename)
    return send_file(filepath)

# Another common AI pattern — string concatenation
base = '/var/www/uploads/'
full_path = base + user_filename

# ── SAFE: Normalize and validate ──────────────────────────────────

@app.route('/download')
def download():
    filename = request.args.get('file')
    # Normalize to resolve all ../ sequences
    base = os.path.realpath('/var/www/uploads')
    requested = os.path.realpath(os.path.join(base, filename))
    # Verify the resolved path is still within base
    if not requested.startswith(base + os.sep):
        abort(403)
    return send_file(requested)

# Flask's send_from_directory handles this correctly
return send_from_directory('/var/www/uploads', filename)

HTTP Response Splitting and Header Injection

Header injection occurs when user-controlled data is placed into HTTP response headers without stripping newline characters. An attacker who can inject \r\n sequences into a header value can insert arbitrary HTTP headers — including a second response body, enabling cache poisoning and cross-site scripting via the injected response.

AI models generate this pattern most frequently in redirect handling — using Location header values derived from user input — and in cookie setting code that incorporates user-provided values into the Set-Cookie header.

# ── VULNERABLE: Header injection via redirect ─────────────────────

# AI commonly generates this for "redirect after login" logic
next_url = request.args.get('next', '/')
response = make_response()
response.headers['Location'] = next_url  # ← CRLF injection possible
return response, 302

# Cookie injection
response.set_cookie('theme', request.args.get('theme', 'light'))
# user-supplied theme value can contain \r\n to inject headers

# ── SAFE: Validate redirect targets and sanitize header values ─────

from urllib.parse import urlparse

def is_safe_redirect(url):
    parsed = urlparse(url)
    # Only allow relative URLs or same-origin absolute
    return parsed.scheme == '' and parsed.netloc == ''

next_url = request.args.get('next', '/')
if not is_safe_redirect(next_url):
    next_url = '/'
# Flask's redirect() handles encoding correctly
return redirect(next_url)

# Strip control characters from any user value going into headers
import re
safe_value = re.sub(r'[\r\n]', '', user_value)

Open Redirect — The Phishing Amplifier

Open redirects are frequently generated by AI when implementing OAuth flows, "continue to" post-login redirects, and referral tracking. While not directly an injection vulnerability, they amplify phishing attacks by allowing attackers to use trusted domain URLs that redirect to malicious sites. In 2022, Twitter disclosed an open redirect that was being used in OAuth phishing campaigns. The vulnerability was trivial: an unvalidated next parameter in the authentication flow.

AI models almost never validate redirect targets because the tutorial examples of OAuth and post-login redirect handling almost never include this step — it is assumed to be handled elsewhere, or simply overlooked.

realpath() vs join() os.path.join concatenates path segments without resolving traversal. os.path.realpath follows the filesystem, resolving all symlinks and ../ sequences. Always use realpath() before comparing to a base directory.

CRLF Injection Carriage Return (\r) + Line Feed (\n) are HTTP header delimiters. Injecting these sequences into a header value splits the HTTP response, allowing an attacker to define arbitrary subsequent headers or a second response body.

Audit Checklist — Path Traversal & Header Injection

1. Find all file path constructions involving user input. Verify os.path.realpath is called and the result is checked against the base directory with startswith(base + os.sep). 2. Flag any use of send_file() (not send_from_directory()) with user-supplied paths. 3. Identify every response.headers assignment — check for CRLF stripping on any user-controlled value. 4. Audit all redirects — confirm next_url or similar parameters are validated to relative paths or a whitelist of allowed domains. 5. Check Set-Cookie implementations for user-supplied cookie values.

2003 SQL injection OWASP Top 10 — first year of the list. Path traversal also listed. Both persist across every subsequent edition through 2021.

2016 HackerOne / Uber SSTI — Jinja2 template injection enabling RCE via user-controlled template rendering. $10,000 bounty.

2018 British Airways Magecart breach — XSS-delivered skimmer on payment page. 500,000 affected. £20M ICO fine.

2021 GitLab CVE-2021-22205 — path traversal + file upload → unauthenticated RCE. CVSS 10.0.

2022 GitHub Copilot vulnerability research — 40% of security-sensitive suggestions contained vulnerabilities. Injection dominant category.

2023 Healthcare SaaS audit — 40% of Copilot-generated database functions contained SQL injection. Three-week remediation effort across 200+ endpoints.

Lesson 4 Quiz — Path Traversal & Header Injection

Three questions · Select the best answer

1. Why does os.path.join('/var/www/uploads', filename) NOT protect against path traversal when filename contains '../' sequences?

Correct. os.path.join('/base', '../etc/passwd') returns '/base/../etc/passwd' — a string that, when the OS resolves it, points to /etc/passwd. Only realpath() resolves the canonical path. The fix is realpath() followed by startswith(base + os.sep).

Not quite. os.path.join performs string-based concatenation without filesystem resolution. The traversal sequences remain in the string and are resolved by the OS when the path is used. realpath() is required to normalize them first.

2. In the context of HTTP header injection, what makes CRLF sequences (\r\n) dangerous when injected into header values?

Correct. HTTP uses \r\n to separate headers and \r\n\r\n to separate the header section from the body. An injected \r\n in a header value lets an attacker write arbitrary subsequent headers — including Set-Cookie for session fixation, or a Content-Type plus body for cache poisoning.

That's not the mechanism. CRLF (\r\n) is the HTTP protocol delimiter. Injecting it into a header value splits the response, allowing arbitrary header injection and potentially response splitting — a severe attack vector for cache poisoning and XSS.

3. Flask's send_from_directory() is preferred over send_file() with user-supplied paths because:

Correct. Flask's send_from_directory() uses werkzeug's safe_join() internally, which raises a NotFound error if the resolved path would escape the directory. It is specifically designed to prevent directory traversal in file serving routes.

Not quite. send_from_directory() is specifically designed with traversal protection built in — it uses werkzeug's safe_join() to validate the file path stays within the specified directory. send_file() with user-supplied paths has no such protection.

Lab 4 — Path Traversal & Header Injection Audit

Interactive AI session · Minimum 3 exchanges to complete

Your Mission

You are auditing a Flask application that serves user-uploaded files and handles post-authentication redirects. The codebase was generated with AI assistance. Practice identifying path traversal vulnerabilities, CRLF header injection risks, and open redirect patterns in the file serving and redirect handling code.

Start by asking: "Show me the file download endpoint and the post-login redirect handler — I need to audit them for path traversal and header injection."

Path Traversal & Header Injection Lab Live AI

Welcome to Lab 4. I have a Flask application with file upload/download functionality and an OAuth-based login flow. The file serving code uses os.path.join and send_file, and the authentication flow handles redirect parameters. Let's audit these components for path traversal, header injection, and open redirect vulnerabilities together.

Module 2 Test — Injection Vulnerabilities

15 questions · Pass at 80% (12/15) · All four lesson topics

1. A developer asks GitHub Copilot to write a login function. The AI generates: query = "SELECT * FROM users WHERE email='" + email + "'". Which injection class does this represent?

Correct. Direct string concatenation of user input into a SQL query is the canonical first-order SQL injection pattern.

This is first-order SQL injection — user input directly concatenated into a query string. The safe fix is parameterized queries using cursor.execute() with a parameter tuple.

2. An AI generates a password reset flow: user submits new password → stored in DB → retrieved → used to update a session token with string concatenation. What vulnerability does this create?

Correct. When stored data is retrieved and concatenated into a new query, the developer falsely trusts database-sourced data. This is the defining characteristic of second-order injection.

This is second-order SQL injection. The data was stored (possibly safely), but when retrieved and used in another query via concatenation, the original malicious input executes in the new context.

3. Which Python subprocess pattern is SAFE against command injection?

Correct. The list-based form passes arguments directly to execve() without invoking a shell, so metacharacters in filename are treated as literal characters, not command separators.

The safe form is the list-based subprocess.run() without shell=True. This bypasses the shell interpreter entirely — metacharacters in user input cannot be interpreted as commands.

4. What payload is used to probe for server-side template injection in Jinja2?

Correct. The arithmetic expression {{7*7}} only evaluates if Jinja2 is processing the user-supplied string as a template. A response of 49 confirms SSTI, which in Jinja2 is typically escalatable to RCE.

The canonical SSTI probe is {{7*7}}. If Jinja2 evaluates the expression and returns 49, template injection is confirmed. This was used in the 2016 Uber SSTI disclosure.

5. Flask's render_template_string() creates a vulnerability when used with user input because:

Correct. render_template_string() passes the string to the Jinja2 engine for evaluation. User-supplied template syntax can traverse Python's object model to reach os.popen() and similar execution primitives.

render_template_string() evaluates the string as Jinja2 template code. Template expressions like {{''.__class__.__mro__[1].__subclasses__()}} can reach Python's execution capabilities, making this an RCE vulnerability, not merely XSS.

6. Snyk's 2023 analysis found that AI-generated React components using dangerouslySetInnerHTML lacked sanitization in what proportion of cases?

Correct. 23 of 100 components used dangerouslySetInnerHTML; 19 of those 23 (83%) had no sanitization. The AI generated the rendering mechanism without the required DOMPurify safety layer.

The figure was approximately 83% — 19 of 23 components using dangerouslySetInnerHTML contained no sanitization. Only 4 included DOMPurify or equivalent protection.

7. What is the key difference between innerHTML and textContent that makes the latter safe against XSS?

Correct. textContent creates a DOM Text node where all content is treated as literal characters. innerHTML feeds the string to the HTML parser, which processes script tags, event handler attributes, and javascript: URIs.

The fundamental distinction: innerHTML passes the string to the HTML parser (XSS risk), while textContent creates a text node where < and > are literal characters, not markup delimiters.

8. DOM-based XSS is invisible to WAFs and server-side IDS because:

Correct. DOM-based XSS flows from source (e.g., location.hash) to sink (e.g., innerHTML) entirely in client-side JavaScript. The server receives a normal-looking HTTP request and the attack is invisible to server-side controls.

DOM-based XSS is a client-side attack. The payload is read from the URL by JavaScript and written directly to the DOM — no HTTP request body involvement, making it completely invisible to server-side WAFs and logs.

9. Why does os.path.join(base_dir, user_filename) fail to prevent path traversal?

Correct. os.path.join is a string operation. It concatenates segments without checking whether the result resolves within the intended directory. Only os.path.realpath() resolves the canonical filesystem path, stripping all ../sequences.

os.path.join is purely string-based. It does not follow the filesystem. The traversal sequences remain in the string and are resolved by the OS kernel when the path is used to open a file.

10. What two-step process correctly prevents path traversal when serving user-requested files?

Correct. realpath() resolves all ../ sequences and symlinks to the actual filesystem path. Then startswith(base + os.sep) confirms the resolved path is within the intended directory. Flask's send_from_directory() handles both steps automatically.

The correct two steps are: (1) realpath() to resolve the canonical path including all traversal sequences, then (2) startswith(base + os.sep) to verify containment. String-level checks for "../" are bypassable via encoding and symlinks.

11. CRLF injection in HTTP response headers enables which attack?

Correct. CRLF sequences are the HTTP header terminators. Injecting \r\n into a header value ends the current header and begins a new one — an attacker can inject Set-Cookie for session fixation or define a second response body for cache poisoning.

CRLF injection enables HTTP response splitting. Since \r\n is the HTTP header delimiter, injecting it allows arbitrary header injection — Set-Cookie manipulation, cache poisoning, and XSS via a second injected response body.

12. Which of the following is a correct audit grep pattern to find potential SQL injection in Python code?

Correct. The dangerous pattern is execute() combined with string formatting. A parameterized execute() call always has the SQL string as the first argument and a tuple of values as the second — any formatting inside the SQL string is a red flag.

The targeted search is for execute() calls that use string formatting methods (.format(), %, f-strings) within the SQL string itself, rather than passing a separate parameter tuple. That combination is the injection signature.

13. An AI generates: pickle.loads(request.data) in a Flask endpoint. What is the security implication?

Correct. pickle.loads() on untrusted data is a critical RCE vulnerability. The pickle format supports __reduce__ methods that execute arbitrary code. Python's own documentation warns: "never unpickle data received from an untrusted source."

pickle.loads() on user-supplied data is a critical code execution vulnerability. The pickle format allows arbitrary Python code to be embedded and executed on deserialization. Use JSON, MessagePack, or similarly safe formats for untrusted data.

14. The British Airways 2018 Magecart breach began with which injection technique?

Correct. The Magecart group injected a 22-line skimmer script into British Airways' booking page via XSS, capturing payment card data on submission. The breach affected 500,000 customers and resulted in a £20M ICO fine.

The British Airways breach used stored XSS — a JavaScript skimmer was injected into the booking page and sent payment card data to a lookalike attacker domain. 500,000 customers were affected; £20M fine was levied.

15. A Content Security Policy (CSP) mitigates XSS by:

Correct. A CSP header tells the browser's script engine to only execute scripts from approved sources and to block inline script execution entirely (unless using nonces). This limits XSS impact even when injection succeeds — defense-in-depth.

CSP works at the browser level — it instructs the browser to reject inline script execution and scripts loaded from unauthorized origins. Even if an attacker successfully injects a script tag, a strict CSP prevents it from executing.