Why AI Gets Security Wrong
Learning Objectives
- Understand why AI coding assistants produce plausible-looking but insecure code by design.
- Recognize the difference between syntactic correctness and security correctness in AI output.
- Identify the training data and architecture factors that make certain vulnerability classes more likely.
- Articulate why human review of AI-generated code is not optional, even for experienced teams.
Session Overview
This opening session establishes the foundational premise of the entire course: AI coding assistants optimize for producing code that looks correct and compiles cleanly, not for code that is secure. Participants often arrive with a vague sense that AI code "might have issues" — this session makes that concern concrete and structural.
The instructor should spend meaningful time on the training data problem. The vast majority of publicly available code — the training corpus for most models — contains well-documented insecure patterns that were acceptable at the time they were written, were never fixed, or exist in tutorial and example form specifically to illustrate what not to do. AI models learned from this data without the ability to distinguish secure examples from insecure ones by intent.
Key Teaching Points
- Plausibility is not safety. AI models are trained to predict the next token, not to reason about security properties. Code that compiles, passes type checking, and looks structurally familiar to the model is "correct" from its perspective regardless of what it does with user input or credentials.
- Training data carries historical vulnerabilities. GitHub and similar repositories contain decades of code written before many modern security practices existed. SQL concatenation, MD5 password hashing, and hardcoded secrets were all common — and all appear in training data. The model learned them as valid patterns.
- Context windows limit security reasoning. Many security properties depend on understanding how code interacts across function boundaries, across services, or over time. AI models working within a single file or prompt context cannot see the full system and routinely generate code that is locally valid but globally dangerous.
- The confidence problem. AI output arrives with no uncertainty markers. A model that is 40% confident in a cryptographic implementation produces the same confident-looking code as one that is 99% confident in a CRUD operation. Developers who learned to trust the output have no signal to slow down and scrutinize.
- Vulnerability reproduction is systematic, not random. Research consistently shows that AI models reproduce the same vulnerability classes at high rates: injection, insecure defaults, missing validation, and broken authentication. These are not random errors — they are structural weaknesses driven by what the training data emphasized.
- Speed pressure amplifies risk. One of the primary value propositions of AI coding tools is generating code faster. This creates organizational pressure to accept output without thorough review, which compounds the underlying model limitations into real production risk.
Discussion Prompts
- Has anyone on your team had an experience where AI-generated code looked correct but turned out to have a problem? What was the nature of that problem — logic, performance, or security?
- If a developer has been using AI tools for months without obvious incidents, how should they think about the risks we are describing? Does absence of known incidents mean the code is safe?
- Who in your organization currently has the mandate to review AI-generated code before it ships? Is that process actually working at your current pace of development?
- How would you explain this risk to a manager who has been told AI assistants increase productivity and wants to know why adding review steps is not just slowing things down?
Open with a concrete example — not a hypothetical. If you have access to a real AI-generated code snippet containing a clear vulnerability (many are publicly documented in research papers), show it on screen and let participants examine it before revealing the flaw. The experience of trusting a snippet and then seeing the problem is more valuable than any explanation.
Expect pushback from participants who have had mostly positive experiences with AI tools. Validate that the tools are genuinely productive — the goal is not to argue against using them, but to establish that productivity and security review are not in conflict. Keep returning to the phrase "optimized for plausibility, not safety."
Do not go deep on any specific vulnerability class in this session — that is Sessions 2 through 7. If the class starts asking about specific injection types or cryptography mistakes, acknowledge them and promise those sessions. This session is about framing and motivation.
Timing Guide
Transition to Session 2
Close by telling participants that for the next six sessions, you will examine specific vulnerability categories one at a time — starting with the most consistently reproduced class in AI-generated code: injection. Frame Session 2 as moving from "why it happens" to "exactly what it looks like and how to find it."
Injection Vulnerabilities in AI Code
Learning Objectives
- Identify the code signatures of SQL, OS command, LDAP, and prompt injection in AI-generated output.
- Understand why AI models default to string concatenation patterns that create injection vulnerabilities.
- Explain the remediation for each injection type and why parameterization or escaping must happen at the right layer.
- Recognize prompt injection as a new and distinct attack surface unique to AI-integrated applications.
Session Overview
Injection remains the most consistently reproduced vulnerability class in AI-generated code. When asked to write database queries, shell commands, or any code that incorporates external input into a structured command, AI models overwhelmingly default to string concatenation or f-string interpolation. This is not a random failure — it is a reflection of what the dominant pattern in training data looks like for "getting the value into the query."
This session covers the three classical injection types (SQL, command, LDAP) plus prompt injection, which is a new attack surface that AI models introduce rather than merely reproduce. Participants should leave able to spot the visual signature of each type in a code review and articulate the correct fix without needing to run the code.
Key Teaching Points
- String concatenation is the tell for SQL injection. When AI generates a database query, look for user-controlled variables appearing directly inside a string that gets executed. The pattern
"SELECT * FROM users WHERE id = " + user_idis the canonical bad pattern. Parameterized queries using placeholders are the only correct fix; sanitization-based approaches are insufficient. - OS command injection follows the same pattern in a different context. When AI generates code that calls shell commands, system(), exec(), or subprocess calls, look for variables interpolated into the command string. AI frequently generates this pattern when asked to write scripts that "run a command with this user-supplied value." The fix is to pass arguments as arrays, never as a single string shell command.
- LDAP injection is often overlooked but common in enterprise code. AI-generated authentication code that queries a directory service frequently constructs LDAP filter strings with user input. The attack surface is smaller than SQL but the consequences (authentication bypass, credential enumeration) are severe. Proper LDAP escaping libraries are required — hand-rolling escaping logic is error-prone.
- Prompt injection is a category that did not exist before AI integration. When AI-generated applications pass user input into prompts that are sent to language models, an attacker can craft input that overrides the system prompt and hijacks the model's behavior. AI models asked to "build a chatbot" or "integrate AI into this form" routinely produce code that does not sanitize or isolate user input from system instructions.
- The pattern: look for the seam where external data enters a command structure. Across all injection types, the audit technique is the same — trace the flow of user-controlled input and look for the point where it is inserted into something that will be interpreted or executed. Any place where data becomes code is an injection surface.
Discussion Prompts
- If you were reviewing a pull request with 500 lines of AI-generated database access code, what would be the fastest way to scan for SQL injection risk without reading every line?
- How does the risk profile of prompt injection differ from SQL injection — both in terms of what the attacker can do and in terms of how difficult it is to prevent?
- When AI generates parameterized queries correctly, is there still a reason to audit that code? What else could go wrong with correct parameterization?
- Have you seen injection-adjacent issues in production — cases where a developer "sanitized" input instead of parameterizing and the sanitization was incomplete?
Show side-by-side code examples throughout this session — the vulnerable AI-generated version on one side, the corrected version on the other. The visual comparison is more instructive than describing the difference in words. If you have access to a code display tool or projector, consider having participants call out what is wrong before you reveal the annotation.
Prompt injection tends to generate the most discussion because it is new and because many participants have not considered it yet. Budget 5–8 minutes of unscripted discussion for this topic if the room is engaged — it will not be wasted time. Be prepared to answer "can the model just be told not to follow injected instructions?" — the answer is no, and this is worth explaining carefully.
Common confusion: participants sometimes believe that encoding or escaping user input before SQL concatenation is safe. Spend time explaining why this is not the case — encoding approaches are context-specific and fragile, while parameterization separates data from code entirely.
Timing Guide
Transition to Session 3
Injection deals with how untrusted data enters commands. Session 3 turns to a different failure mode: how AI-generated code handles the question of who is allowed to do what in the first place. Authentication and authorization logic from AI tools contains its own distinct pattern of failures — which the next session examines in detail.
Authentication and Authorization Gaps
Learning Objectives
- Distinguish between authentication (who you are) and authorization (what you can do) failures in AI-generated code.
- Identify the most common broken access control patterns AI tools produce in API and web application code.
- Recognize insecure session management, token handling, and privilege escalation risks in AI output.
- Apply a systematic review checklist for auth-related code regardless of the framework involved.
Session Overview
Authentication and authorization failures are consistently ranked among the most critical web application vulnerabilities — and AI coding assistants produce them at a remarkable rate. The failure mode is not that AI builds no auth system; it is that the auth system it builds has subtle but exploitable gaps. Missing authorization checks on individual endpoints, insecure defaults on newly generated routes, improper session token handling, and role confusion are all common outputs.
A particular challenge with AI-generated auth code is that it often uses the right vocabulary — it creates login functions, generates tokens, and references roles — while getting the implementation wrong in ways that are not visible from the function signatures alone. Reviewing auth code requires understanding what questions to ask, not just what to read on screen.
Key Teaching Points
- AI generates routes and forgets to protect them. When AI adds a new API endpoint, it often generates the handler and the route registration without automatically adding the authentication middleware to that route. A new endpoint for "admin user management" may appear in the codebase with no auth check at all if the developer did not explicitly specify one in the prompt.
- Authorization is conflated with authentication. AI frequently generates code that checks whether a user is logged in but not whether the logged-in user is permitted to access the specific resource requested. "Is there a valid token?" is not the same as "does this token belong to someone allowed to modify this record?" IDOR (Insecure Direct Object Reference) is a frequent result.
- Session token handling has predictable mistakes. AI-generated session code regularly produces tokens that are not bound to a user agent, have excessively long or unlimited lifetimes, are stored in insecure locations (localStorage for sensitive tokens), and are not invalidated on logout. Each of these is exploitable independently.
- Privilege defaults are often too permissive. When generating role-based access control, AI tends to default new users, new roles, or new features to a permissive state. This follows the "make it work first" pattern in training data. Auditing RBAC code should specifically look for what happens to a new or unassigned entity — can they do more than intended by default?
- Password handling mistakes cluster in predictable places. AI-generated password flows frequently use hashing algorithms that are technically non-null but inappropriate for passwords (SHA-256 instead of bcrypt, argon2, or scrypt), omit salt entirely, or implement comparison logic vulnerable to timing attacks. The presence of a hash function in code is not evidence of correct password security.
Discussion Prompts
- Think about the last time you reviewed an authentication-related pull request. What specific things did you look for, and what would you add to that checklist based on this session?
- How do you handle the case where AI generates a dozen new API endpoints simultaneously? Is it practical to audit every new route individually for missing auth middleware?
- IDOR vulnerabilities are often not visible by looking at the route handler alone — you need to understand the data model. How do you build that context into a code review process?
- If you discovered that a recently shipped AI-generated feature was missing authorization checks on several endpoints, what would your incident response process look like?
The distinction between authentication and authorization is foundational and worth spending real time on. A significant portion of the class will conflate them initially. A useful framing: authentication answers "who are you?" and authorization answers "what are you allowed to do?" — and AI code tends to answer only the first question while assuming the second is covered.
IDOR is often new to participants who do not work primarily in web security. A concrete example — "user A can access user B's medical record by changing the ID in the URL" — lands better than abstract descriptions. Emphasize that AI generates IDOR vulnerabilities not because it builds broken auth, but because it builds incomplete auth that omits the ownership check.
Resist the urge to go deep on specific framework-level solutions (JWT libraries, OAuth specifics). Keep the session at the pattern-recognition level — the goal is auditing, not building. Specific remediation technologies will vary by participant context.
Timing Guide
Transition to Session 4
Sessions 2 and 3 covered how AI code handles commands and access control incorrectly. Session 4 shifts to what AI code does with the data itself — how it logs, stores, and transmits sensitive information, and the specific mistakes that expose secrets and personal data.
Insecure Data Handling and Secrets
Learning Objectives
- Identify the data exposure patterns AI code commonly introduces in logging, error handling, and API responses.
- Recognize hardcoded credentials, API keys, and connection strings as a systematic output of AI code generation.
- Understand the risks of improper data storage including unencrypted sensitive fields and insecure temporary file handling.
- Apply a practical checklist for secrets detection and data minimization review in AI-generated code.
Session Overview
AI coding assistants are remarkably consistent at hardcoding secrets. When a developer asks an AI to "connect to the database" or "call this API with authentication," the model reaches for the most direct and visible approach: putting the credential directly in the code. This is the pattern the model saw most often in its training data — in configuration examples, README files, tutorials, and quick-start guides where a real credential was temporarily acceptable. The model has no mechanism to distinguish that context from production application code.
Beyond secrets, AI code has characteristic data handling failures: logging full request bodies that contain PII, returning stack traces and internal state in error responses, storing sensitive fields in plaintext, and transmitting data over connections that should be encrypted but are not configured to require it. Each of these represents a data exposure risk that a thorough code review must catch.
Key Teaching Points
- Hardcoded secrets are a default, not an exception, in AI output. Any prompt that involves connecting to an external service risks producing a credential in the code. The fix — environment variables, secrets managers, vault references — requires an explicit instruction to the model or a post-generation audit. Automated secrets scanning (truffleHog, gitleaks) should be part of every CI pipeline accepting AI-generated code.
- Logging is a silent data exfiltration channel. AI-generated logging code frequently logs full request objects, function parameters, and error context at DEBUG or INFO level. In a production environment with centralized logging, this creates a searchable record of every piece of sensitive data that passed through the system. Review all log statements in AI output for what data they capture, not just what level they use.
- Error handling reveals internal state. AI-generated exception handlers commonly return the full exception message and stack trace in API error responses. An attacker who can trigger an error learns the internal structure of the system, the libraries in use, file paths, and sometimes the values of local variables. Error responses should log detail internally and return generic messages externally.
- Storage of sensitive data defaults to plaintext. When AI generates a user data model or a database schema, it does not automatically encrypt sensitive fields. Social security numbers, credit card numbers, and health data are stored as VARCHAR or TEXT unless the developer specifically requests encryption. Review data models for fields that require encryption at rest.
- Transmission security is frequently incomplete. AI code that makes HTTP requests does not always enforce HTTPS, does not always validate certificates, and sometimes generates configurations that accept any protocol. TLS validation should be explicitly verified in code that transmits sensitive data, not assumed.
Discussion Prompts
- If your team is using AI to generate configuration and infrastructure code as well as application code, how does the secrets risk change? What is the blast radius of a hardcoded credential in a Terraform file versus an application file?
- How do you balance the operational value of detailed logging with the data exposure risk of logging sensitive fields? Where does that decision get made in your organization?
- Has your team run a secrets scanning tool against your repositories? What would you expect to find in the AI-generated code added in the last 6 months?
- When AI generates a data model, who is responsible for reviewing it for privacy and encryption requirements — the developer who prompted it, the reviewer, or someone with a data governance role?
The hardcoded secrets point tends to generate immediate recognition — most experienced developers have seen or committed an API key at some point. Use that recognition as an opening: "This is something many of us have done manually, and now AI does it on our behalf at scale." The scaling aspect is what changes the risk calculus.
Data minimization — collecting and logging only what is needed — is worth introducing here even if the course does not cover privacy compliance in depth. Many participants will not have considered that the problem is not just how data is protected but how much data the code captures in the first place. AI code tends to capture everything available.
If participants ask about specific tools (truffleHog, gitleaks, Vault), briefly acknowledge them and note that Session 8 covers tooling integration in depth. Keep the focus this session on pattern recognition.
Timing Guide
Transition to Session 5
Sessions so far have focused on the code AI writes directly. Session 5 expands the surface to include the dependencies AI recommends — the packages, libraries, and version pins that arrive with AI-generated code and carry their own security risk profile.
Dependency and Supply Chain Risks
Learning Objectives
- Understand how AI models recommend packages and why those recommendations may reference outdated or vulnerable versions.
- Identify supply chain risk vectors including typosquatting, package confusion, and abandoned maintainers in AI-suggested dependencies.
- Apply a dependency review process that validates packages AI recommends before installation.
- Recognize the difference between a known-vulnerable version and a version for which no vulnerability has yet been published.
Session Overview
AI coding assistants do not have real-time access to vulnerability databases. Their package knowledge is frozen at training time — which means that a library with a critical CVE published last month may be recommended without any caveat. Beyond the knowledge cutoff problem, AI models also exhibit a tendency to recommend well-known packages by approximate name, which creates exposure to typosquatting and package confusion attacks: an attacker publishes a malicious package with a name one character away from the legitimate one, and the AI recommends the wrong one.
The supply chain risk in AI-assisted development goes deeper than individual package recommendations. AI-generated lock files, requirements files, and package.json entries carry version pins that may have known vulnerabilities. AI-generated Dockerfiles may reference base images with unpatched CVEs. AI-generated CI configurations may pull dependencies at build time without pinning. Each of these is a supply chain risk that requires explicit review.
Key Teaching Points
- AI's package knowledge has a training cutoff and no CVE awareness. Every vulnerability published after the model's training cutoff is invisible to it. When the model recommends a specific version of a library, it is not checking current vulnerability databases — it is reproducing what it learned was a good version at the time it was trained. Treat all AI-recommended package versions as unverified until you check them against current advisories.
- Typosquatting and package confusion are real risks in AI output. Documented cases exist of AI assistants recommending package names that do not exist — and attackers have responded by publishing malicious packages under those exact names. Before installing any package an AI recommends, verify that it is the canonical package, not a variant, and check its download count, publish history, and maintainer activity.
- Abandoned packages carry compounding risk. AI frequently recommends packages that were popular at training time but have since been deprecated, abandoned, or transferred to a new maintainer. An abandoned package means no security patches for future vulnerabilities. Check the last publish date and open issue count for any AI-recommended package before adopting it.
- Transitive dependencies multiply the attack surface. The package AI recommends directly is the visible tip; its dependency tree is the real surface. A lightweight convenience library may pull in a chain of transitive dependencies with their own vulnerability profiles. Use dependency tree analysis tools and SBOM generation to understand the full scope of what you are accepting.
- Lock file and pin hygiene matters as much as the install. AI-generated dependency files often do not include integrity hashes, use loose version ranges that will drift over time, or omit lock file entries for some dependencies. Each of these gaps allows a dependency substitution attack — where a package is swapped for a malicious version between install and deployment.
Discussion Prompts
- Does your team have a process for reviewing the packages introduced by AI-generated code, separate from reviewing the code itself? If not, what would that process need to look like to be practical?
- How would you handle a situation where AI recommends a package you have never heard of? What is your research process before approving it?
- Supply chain attacks targeting open source packages have become more frequent. How does the use of AI coding tools change your exposure to this risk compared to your previous development workflow?
- If your CI pipeline uses software composition analysis (SCA) tooling, does it cover every artifact type AI might generate — requirements files, package.json, go.mod, Dockerfiles? What gaps exist?
The training cutoff concept resonates strongly with participants once they think about it concretely. Ask the class: "If the model was trained in early 2024, and a major vulnerability in a popular library was published in late 2024, how would the model know?" The answer is it would not — and this should reframe how they interpret any AI package recommendation as a starting point, not an endorsement.
The typosquatting discussion often prompts the question "has this actually happened with AI tools?" — yes, it has, and there are documented public examples. If you have a specific incident to reference, do so. The concreteness is more persuasive than the theoretical risk alone.
Keep the technical depth on SBOM and SCA tools light in this session — the goal is awareness that the surface exists, not training on specific tooling. Session 8 covers tooling integration.
Timing Guide
Transition to Session 6
Session 5 covered what comes in from outside the code. Session 6 returns to the code itself — specifically the data validation and output encoding that AI-generated code routinely omits, creating a class of vulnerabilities that spans XSS, path traversal, business logic abuse, and more.
Input Validation and Output Encoding
Learning Objectives
- Recognize the difference between type checking, format validation, and semantic (business logic) validation — and where AI code commonly skips the latter two.
- Identify XSS, path traversal, and open redirect vulnerabilities that result from missing output encoding in AI-generated code.
- Understand why input validation and output encoding are complementary defenses that must both be present.
- Apply a validation and encoding review pattern to AI-generated form handlers, file operations, and redirect logic.
Session Overview
Input validation and output encoding are foundational security controls, widely taught, widely understood in principle — and widely absent from AI-generated code in practice. The model knows that validation is a concept, and will include basic type checks when directly prompted. But semantic validation — "is this value within the expected range, from an allowed set, or representing a valid state in our business domain?" — is almost never generated without explicit instruction.
On the output side, AI-generated code that renders user-supplied data into HTML, URLs, shell contexts, or file paths does not automatically apply context-appropriate encoding. The result is XSS in web applications, path traversal in file operations, and open redirects in URL handling — vulnerability classes that have been well understood for decades but persist because developers (and now AI) optimize for making the data appear, not for making it appear safely.
Key Teaching Points
- Three layers of validation: type, format, and semantic. AI code reliably generates type validation (is this an integer?) and sometimes format validation (does this match an email regex?). It almost never generates semantic validation: is this user allowed to specify this value? Is this quantity within a range that makes business sense? Is this file path confined to the expected directory? Semantic validation is application-specific and cannot be inferred from the data structure alone.
- Allowlist validation is safer than denylist validation, and AI defaults to neither. When AI does generate validation, it tends toward simple checks rather than explicit allowlists of permitted values. A denylist of "bad" characters is always incomplete; an allowlist of "good" values is closed by definition. Review AI-generated validation to determine which approach is being used and whether the allowlist is genuinely complete for the use case.
- XSS results from rendering user data without context-appropriate encoding. When AI generates templates or response handlers that include user-supplied strings in HTML output, it frequently omits HTML entity encoding. Stored XSS is especially dangerous: the malicious content is injected once and executed for every subsequent visitor. Review every template rendering path in AI-generated code for unencoded dynamic values.
- Path traversal occurs when file operations use unsanitized user input. AI-generated file download, upload, and read operations frequently accept a filename or path from user input without validating that it remains within the intended directory. The pattern
../../etc/passwdworks because the code did not canonicalize the path and check it against an allowed prefix before opening the file. - Open redirects enable phishing and session hijacking. AI code that accepts a redirect URL parameter and uses it directly — after login, after payment, after form submission — creates an open redirect. Attackers use these to build convincing phishing links that start on a trusted domain and land on a malicious one. Validate redirect URLs against an allowlist of permitted destinations or restrict them to relative paths.
Discussion Prompts
- How does your team currently communicate validation requirements to developers — and how would those requirements be expressed to an AI coding assistant effectively?
- Stored XSS is often discovered by security researchers rather than internal review. If your application renders user-supplied content anywhere, what is your confidence that all those rendering paths are properly encoded?
- Path traversal in file operations often only surfaces when someone tries it. What would a threat model for a file download feature look like, and who in your team would produce that threat model?
- If you had to add a single automated check to your CI pipeline that caught the most input validation issues in AI-generated code, what would it be?
The three-layer validation model (type, format, semantic) is a useful framework that many participants will not have explicitly articulated before, even if they understand the concept intuitively. Spend time making the layers concrete with examples: type = "is this a number," format = "does this look like a date," semantic = "is this date in the future and within 90 days." The semantic layer is where AI reliably fails.
XSS tends to be familiar to most participants but underestimated in AI context because "modern frameworks auto-escape." Probe this assumption: are all rendering paths in the AI-generated code going through the framework's safe rendering, or are any using raw HTML concatenation? AI frequently bypasses framework safety by using low-level rendering APIs.
If time allows, the open redirect discussion pairs well with a question about the organization's use of OAuth. OAuth redirect_uri validation is a prime example of this class of vulnerability with high stakes — worth mentioning if participants have experience with OAuth integration.
Timing Guide
Transition to Session 7
Session 6 covered how data enters and exits code unsafely. Session 7 goes into one of the most consequential and least forgiving domains in secure software: cryptography. AI-generated cryptographic code has a distinct failure profile — correct-looking functions using incorrect primitives, configurations, or key management.
Cryptography Mistakes
Learning Objectives
- Recognize the specific cryptographic primitives and configurations that AI models default to and explain why they are insufficient.
- Identify key management failures in AI-generated code including hardcoded keys, predictable keys, and key reuse.
- Understand why implementing cryptography from low-level primitives is almost always wrong and what higher-level alternatives should be used instead.
- Apply a cryptography review checklist to AI-generated encryption, hashing, signature, and random number generation code.
Session Overview
Cryptography is the domain where "looks right" is most dangerous. An AI-generated encryption function that uses AES in ECB mode produces ciphertext that decrypts correctly and causes no errors — but leaks structural information about the plaintext that makes the encryption effectively useless for most security purposes. An AI-generated token using HMAC-MD5 will verify correctly — but the underlying hash function has known collision vulnerabilities that can be exploited by attackers with enough motivation.
AI models tend to use cryptographic primitives that were dominant in their training data — MD5, SHA-1, ECB mode, statically-seeded random number generators, 1024-bit RSA keys — without awareness that the security community's consensus on these primitives has shifted or that better alternatives exist. The auditor's job in this domain is to know not just what the code does, but whether what it does meets current standards for the threat model.
Key Teaching Points
- Algorithm choices reflect training data vintage, not current best practice. MD5 and SHA-1 appear extensively in older code and documentation. AI models reproduce them without flagging that they are deprecated for security use. For password hashing, bcrypt, Argon2id, or scrypt are required — not any variant of SHA. For general hashing and HMAC, SHA-256 or SHA-3 is the minimum; for new designs, SHA-256 at minimum.
- Block cipher mode selection is systematically wrong in AI output. When AI generates AES encryption, it defaults to ECB mode more often than statistically expected because ECB is the simplest API call and appears heavily in tutorial code. ECB mode encrypts identical plaintext blocks to identical ciphertext blocks, leaking structure. AES-GCM (which also provides authentication) or AES-CBC with a properly random IV are the correct choices for most use cases.
- Random number generation for security requires a CSPRNG. AI code frequently uses language-level random functions (Math.random() in JavaScript, random() in Python) for purposes that require cryptographic randomness — token generation, nonce selection, key material. These functions are not cryptographically secure PRNGs. The consequence is that tokens and keys generated this way are predictable by an attacker who knows enough about the PRNG state.
- Key management is where cryptography implementations most commonly fail. Beyond choosing correct algorithms, AI-generated code regularly hardcodes encryption keys, generates keys from predictable inputs (user IDs, timestamps), reuses the same key across all users or all data, and fails to implement key rotation. A cryptographically sound algorithm with a predictably generated or hardcoded key provides minimal security.
- Do not implement; use a vetted library at the right level of abstraction. AI is more likely to generate custom cryptographic implementations or low-level primitive compositions when the high-level operation is complex. Any custom crypto implementation is a red flag. Use high-level vetted libraries — NaCl/libsodium, PyCA cryptography, Google Tink — that have been formally reviewed. The goal of an audit is not to verify that the math is correct but to verify that a trustworthy implementation is being used correctly.
Discussion Prompts
- If you discovered that AI-generated encryption code in a production system was using AES-ECB, what is the scope of the remediation? What data would need to be re-encrypted and what is the operational complexity of that operation?
- How confident are you that your team could identify an insecure random number generator in a code review without running the code? What would the code look like, and would it stand out?
- Who on your team has enough cryptography background to evaluate AI-generated crypto code? If the answer is "nobody," what does that mean for your review process?
- What is the right level of cryptographic expertise to expect from a standard code reviewer versus a designated security reviewer? How do you structure that responsibility in your organization?
This session benefits from acknowledging up front that most developers are not cryptographers, and that is acceptable — the goal is not to train everyone to implement cryptography, but to train them to recognize when the cryptography being used is wrong. The three questions every reviewer should ask: Is this algorithm current? Is the mode/configuration correct? Is the key handled safely?
The "implement, don't use a library" failure mode is worth emphasizing. When AI generates a custom base64-encoded-then-XOR'd "encryption" scheme — which does happen — no amount of algorithm review will help because the construction is broken by design. The red flag is any bespoke crypto construction rather than a named standard implemented by a vetted library.
Participants sometimes push back with "we don't write crypto code, we use TLS and HTTPS everywhere." Probe this: does the application also encrypt data at rest? Does it generate tokens? Does it hash passwords? Almost every non-trivial application touches crypto in multiple places.
Timing Guide
Transition to Session 8
Sessions 1 through 7 have built a comprehensive map of the vulnerability landscape in AI-generated code. The final session asks: given everything you now know, how do you operationalize that knowledge? Session 8 covers workflows, tooling, and how to build a sustainable security review practice that keeps pace with AI-assisted development.
Security Review Workflows and Tooling
Learning Objectives
- Identify the SAST, secrets scanning, SCA, and linting tools most relevant for catching AI-generated code vulnerabilities.
- Design a practical security review workflow that integrates automated and manual checks without creating a bottleneck.
- Understand how to configure security tooling specifically for the vulnerability patterns AI code introduces most frequently.
- Articulate how to build a sustainable security review culture on a team that is heavily using AI coding assistants.
Session Overview
The seven preceding sessions have built a detailed map of the vulnerabilities AI-generated code introduces. This final session addresses the operational question: how do you actually catch these vulnerabilities at scale, consistently, without requiring every developer to have security expertise and without making the review process so burdensome that the team routes around it?
The answer is a layered approach. Automated tooling catches the deterministic, pattern-based issues — injection signatures, hardcoded secrets, known-vulnerable dependencies, weak crypto algorithm calls. Human review focuses on the semantic and contextual issues that tooling cannot evaluate — authorization logic, business rule validation, architectural decisions. The goal is not to catch everything automatically but to use automation to clear the obvious cases so that human attention can focus on the ambiguous and high-stakes ones.
Key Teaching Points
- SAST tools catch pattern-based vulnerabilities before runtime. Static Application Security Testing tools (Semgrep, CodeQL, Bandit for Python, Brakeman for Rails, SpotBugs for Java) analyze code without executing it and flag patterns that match known vulnerability signatures. For AI-generated code, SAST is especially valuable for injection patterns, insecure API usage, and dangerous function calls. Configure rules that specifically target the vulnerability classes covered in this course.
- Secrets scanning must run on every commit, not just periodically. Tools like truffleHog, gitleaks, and GitHub's built-in secret scanning detect credentials, API keys, and tokens in code and commit history. Because AI code frequently generates hardcoded secrets, pre-commit hooks and CI pipeline checks for secrets are a minimum baseline. Importantly, scanning must cover the full commit history — a secret committed and then removed in a later commit is still retrievable from git history.
- Software Composition Analysis covers the dependency attack surface. SCA tools (Dependabot, Snyk, OWASP Dependency-Check, Grype) scan dependency files and lock files against known vulnerability databases and flag packages with published CVEs. Running SCA on every dependency change — not just periodic audits — catches AI-recommended vulnerable packages before they reach production.
- Manual review retains strategic importance for semantic issues. Automated tooling cannot evaluate whether an authorization check correctly covers all access paths, whether a validation rule matches the actual business constraint, or whether a cryptographic construction is appropriate for the threat model it faces. These require a human reviewer who understands the application context. Define explicitly which code paths require human security review and do not let automated tool passage be a substitute for contextual judgment.
- Build a security review checklist specific to your AI tool usage. The pattern of vulnerabilities AI generates is not random — it is predictable enough to create a targeted checklist. For each prompt type your team commonly uses (generate a login flow, add an API endpoint, write a database query), document the specific checks reviewers should apply. A prompt-specific checklist is faster to apply and less likely to be skipped than a generic security review guide.
- Culture and process matter as much as tooling. Tools block nothing if developers disable them locally or if CI failures do not gate merges. Effective security review for AI-generated code requires clear organizational expectations: security tool failures block merges, bypasses require documented justification, and recurring AI vulnerability patterns are tracked and fed back into developer training.
Discussion Prompts
- What security tooling does your organization currently run in CI, and how much of your AI-generated code actually passes through that pipeline before it is deployed?
- If you were designing a security review process from scratch for a team that uses AI assistants for 60% of its code output, what would the first three things you put in place be?
- How do you handle the tension between security review slowing down delivery and the risk of skipping review? Who in your organization has the authority to make that tradeoff explicitly?
- What would it take to make the security patterns covered in this course into a living reference document that your team maintains and updates as your AI tooling evolves?
This is a synthesis session — the goal is to help participants leave with a concrete plan, not just more information. Spend the first half covering tooling and the second half encouraging participants to sketch out what a practical workflow would look like for their specific team. Asking "what would you implement first if you started tomorrow?" is a useful focusing question for the discussion.
The culture point is worth lingering on. The best tooling configuration fails if developers perceive security review as a gate imposed on them rather than a practice they own. Framing security review as "here is how we make sure AI tools are working for us, not against us" tends to land better than framing it as a compliance or audit function.
Close the course by connecting back to Session 1's central point: AI tools optimize for plausibility, not safety. The workflow and tooling in this session exist to add back the safety layer that the tools cannot provide on their own. Security review is not a reaction to AI — it is the completion of the development process that AI started.
Timing Guide
Closing Remarks
Thank participants for their engagement across all eight sessions. Encourage them to identify one concrete action they will take in the next two weeks — whether that is adding a secrets scanner to their CI pipeline, conducting a targeted review of recent AI-generated authentication code, or drafting a prompt-specific security checklist for their team. The knowledge from this course becomes valuable when it is applied, and the best time to start is immediately.