Module 8 · Lesson 1

Static Analysis and SAST Integration

How automated scanners became the first line of defense — and why AI-generated code exposes their limits

Which static analysis tools catch the patterns AI code generators miss, and how do you wire them into your pipeline before a human ever reads the diff?

When Samsung Electronics employees inadvertently leaked proprietary source code by pasting it into ChatGPT in March 2023, the incident sparked a broader question inside security teams worldwide: if developers are routinely feeding sensitive context into AI assistants and accepting generated code wholesale, what does a meaningful review workflow actually look like? The answer, most teams discovered, starts with static analysis — but not static analysis as it was practiced before AI-assisted development arrived.

What SAST Tools Actually Examine

Static Application Security Testing (SAST) tools analyze source code without executing it, searching for patterns that correspond to known vulnerability classes. They work by building an abstract syntax tree (AST) or a control-flow graph of the codebase, then running rule libraries against that representation. Major commercial products include Checkmarx One, Veracode Static Analysis, and Snyk Code; open-source options include Semgrep, Bandit (Python), SpotBugs (Java), and ESLint security plugins (JavaScript).

Each tool has a rule-writing language. Semgrep, for example, uses YAML-based pattern rules that look structurally like the code they target, making it relatively straightforward for teams to write custom rules that specifically target patterns common in AI-generated output — for instance, the tendency of GitHub Copilot to produce SQL string concatenation rather than parameterized queries in certain prompt contexts.

Why AI Output Challenges Legacy SAST Configurations

A 2023 study by Stanford researchers (published in the IEEE Symposium on Security and Privacy) found that when participants used AI code assistants to complete security-sensitive tasks, they were significantly more likely to introduce vulnerabilities than the control group writing code manually — and critically, they were also more confident that their code was secure. This confidence gap matters for SAST because teams that assume AI code is pre-reviewed often suppress scanner warnings without investigation.

AI-generated code also introduces novel structural patterns that older rule sets were not written to detect. Copilot-era code frequently uses recently popularized library functions in ways that bypass sanitization layers that older tooling assumes will be present. SAST rules written for human coding patterns — where a developer tends to copy idioms from familiar projects — miss the more eclectic style mixtures that LLMs produce by interpolating across many training sources.

Key Finding — Pearce et al. 2022

In "Asleep at the Keyboard?" (IEEE S&P 2022), researchers generated 1,689 code snippets using GitHub Copilot across 89 different scenarios and found that approximately 40% contained at least one vulnerability detectable by CWE classification — with SQL injection, path traversal, and hardcoded credential patterns appearing most frequently. SAST integration before merge was the primary mitigation recommended.

Core Tool Categories

Semgrep

Pattern Matching SAST

Open-source, rule-based scanner with a public registry of 3,500+ rules. Integrates natively into GitHub Actions, GitLab CI, and pre-commit hooks. Custom rules can target AI-specific patterns in under 10 lines of YAML.

Checkmarx One

Enterprise SAST

Full AST analysis with taint tracking across function boundaries. Correlates findings with OWASP and CWE. Introduced AI-aware scanning rules in its 2024 release targeting Copilot-generated SQL and deserialization patterns.

Snyk Code

ML-augmented SAST

Uses a machine learning model trained on vulnerability patterns to reduce false positives. IDE plugin provides real-time feedback while developers accept AI suggestions — the closest current analog to in-loop review.

Bandit

Python-specific SAST

AST-based Python scanner maintained by PyCQA. Particularly effective for catching AI-generated use of dangerous functions: eval(), pickle.loads(), and subprocess with shell=True — all common in Copilot Python output.

CodeQL

Semantic Analysis

GitHub's query language for code analysis. Performs full data-flow and taint analysis. Used internally by GitHub to scan public repositories; available free for open-source projects via GitHub Advanced Security.

ESLint (security plugins)

JavaScript Linting

eslint-plugin-security and eslint-plugin-no-unsanitized extend ESLint with security-focused rules. Critical for catching AI-generated innerHTML assignments and eval() usage in Node.js and browser code.

Pipeline Integration: Where SAST Belongs

Pre-commit (Developer Workstation)

Lightweight fast rules run via pre-commit hooks (Semgrep, Bandit, ESLint) before code is committed. Catches obvious patterns — hardcoded secrets, dangerous function calls — in under 5 seconds.

Pull Request Gate (CI Pipeline)

Full SAST scan runs on every PR. Findings above a severity threshold block merge. CodeQL, Checkmarx, or Veracode run here with full taint analysis. AI-generated files can be tagged via commit metadata for elevated scrutiny.

Nightly Full-Repo Scan

Complete historical scan of all branches. Catches accumulated drift as rule sets are updated. Particularly important for AI-generated code because new vulnerability patterns are continuously discovered post-training-cutoff.

Release-Gate Deep Scan

Pre-production scan with full commercial tooling and manual triage of all medium+ findings. Human reviewer examines AI-attribution metadata and correlates scanner output with threat model before sign-off.

Workflow Principle

A SAST tool that blocks a merge when no human has reviewed AI-generated code is not optional overhead — it is the minimum viable safety net. The goal is not zero false positives; it is zero unchecked AI vulnerabilities in production.

Taint Analysis Tracking untrusted data (taint sources) as they flow through a program to dangerous function calls (taint sinks), detecting potential injection vulnerabilities without executing code.

AST (Abstract Syntax Tree) A tree representation of code structure used by SAST tools. Each node represents a syntactic construct; security rules query this tree rather than raw text, reducing false positives from comments and strings.

Severity Gate A CI/CD configuration that fails a build when SAST findings meet or exceed a defined severity threshold (e.g., CRITICAL or HIGH), blocking merge until findings are resolved or suppressed with documented justification.

Lesson 1 Quiz

Static Analysis and SAST Integration

1. According to the Pearce et al. 2022 IEEE study on GitHub Copilot, approximately what percentage of generated code snippets contained at least one detectable vulnerability?

✓ Correct — Correct. Pearce et al. found roughly 40% of Copilot-generated snippets contained CWE-classifiable vulnerabilities across 89 tested scenarios.

Not quite. The study found approximately 40% of snippets were vulnerable — a significant proportion but not a majority.

2. Which Semgrep characteristic makes it particularly useful for targeting AI-generated code patterns?

✓ Correct — Correct. Semgrep rules structurally resemble the code they match, making it straightforward to write rules targeting specific AI-generated patterns like SQL string concatenation.

Not quite. Semgrep's standout feature is its code-like YAML rule syntax that makes custom rule authoring fast and readable.

3. At which pipeline stage should full taint-analysis SAST (e.g., CodeQL, Checkmarx) most appropriately run?

✓ Correct — Correct. Full taint analysis is too slow for pre-commit but must run before merge — the PR gate is the right control point to block vulnerable AI-generated code.

The pre-commit stage uses lighter, faster rules. Full taint analysis belongs at the PR gate where merge can be blocked pending resolution.

Lab 1: SAST Pipeline Architecture

Design and justify a multi-stage SAST workflow for AI-assisted development

Scenario

Your team has adopted GitHub Copilot for a Node.js and Python microservices project. The security team has asked you to design a SAST integration that specifically accounts for AI-generated code risk — including which tools to use, at which pipeline stages, and how to handle the false-positive suppression problem without creating audit gaps.

Discuss your pipeline design with the AI security advisor. Cover: tool selection rationale, severity gate thresholds, how you'd tag AI-generated commits for elevated scanning, and how suppression decisions should be documented. At least 3 substantive exchanges required.

Security Advisor — SAST Pipeline

Lab 1

Welcome. You're designing a SAST pipeline for a team using GitHub Copilot on Node.js and Python services. Let's work through this together. Start by telling me: which two pipeline stages do you consider highest-priority, and why? What tools are you considering for each?

Module 8 · Lesson 2

Software Composition Analysis and Dependency Risk

AI code generators pull packages from training data — not from your approved supply chain

When an LLM suggests a dependency it learned from GitHub three years ago, how does your SCA toolchain detect the risk before it reaches production?

The event-stream npm incident of 2018 demonstrated that malicious code could enter production through a trusted, widely-used package. By 2023, AI code generators had added a new dimension: they suggest package names based on training data that predates current vulnerability disclosures. When a developer accepts a Copilot suggestion that includes lodash 4.17.15 — a version with known prototype pollution vulnerabilities — they may not recognize the version specificity embedded in the suggestion at all.

The XZ Utils backdoor discovered in March 2024 — a sophisticated supply-chain compromise of a compression library used by Linux distributions — underscored that dependency vetting cannot be treated as a one-time event. AI assistants trained before the disclosure would have continued recommending the compromised library as a standard dependency for months after it was identified as malicious.

What Software Composition Analysis Covers

SCA tools inventory all direct and transitive dependencies in a project, correlate them against vulnerability databases (NVD, OSV, GitHub Advisory Database, Snyk Vulnerability DB), and flag components with known CVEs or license compliance issues. Unlike SAST — which examines your first-party code — SCA examines the third-party code your project depends on.

For AI-generated code, SCA has three specific failure modes to address: stale version pinning (AI suggests a specific older version it learned during training), hallucinated packages (AI invents a plausible-sounding package name that either doesn't exist or is a typosquat), and transitive blindness (AI imports a package without surfacing its dependency tree, which may include vulnerable transitive dependencies).

Documented Case — Hallucinated Package Attack Vector

A 2023 study by Lanyado et al. (Vulcan Cyber) demonstrated that LLMs including ChatGPT and Copilot regularly hallucinate plausible-sounding npm and PyPI package names. The researchers registered several of these hallucinated names on npm and found that generated code immediately began importing the now-real-but-attacker-controlled packages — a technique they termed "package hallucination" supply-chain attack. In one test, a hallucinated package name appeared in Copilot suggestions across multiple unrelated user sessions.

Core SCA Tools

Dependabot

Automated Dependency Updates

GitHub-native. Monitors dependency manifests and opens automated PRs for vulnerable or outdated packages. When AI code introduces a specific old version, Dependabot detects and flags it within 24 hours of repository push.

Snyk Open Source

SCA with Fix Guidance

Deep transitive dependency analysis with actionable remediation. Integrates with IDE, CI, and registry. Snyk's proprietary vulnerability database often leads NVD by days for newly disclosed CVEs.

OWASP Dependency-Check

Open-Source SCA

Free SCA tool using NVD as its primary database. Generates HTML/XML reports and integrates with Maven, Gradle, and Jenkins. Strong for Java ecosystems; requires NVD API key for current data feeds.

Socket Security

Supply-Chain Intelligence

Analyzes npm package behavior for supply-chain risk signals: install scripts, network access, obfuscation. Specifically designed to detect malicious packages — including attacker-registered hallucinated names.

pip-audit

Python SCA

Lightweight Python tool from PyPA. Queries OSV and PyPI Advisory Database. Ideal for pre-commit hooks and CI checks on Python AI-generated code. Zero configuration for standard requirements.txt or pyproject.toml.

Syft + Grype

SBOM Generation + Scanning

Anchore's open-source pairing: Syft generates Software Bill of Materials (SBOM) in SPDX or CycloneDX format; Grype scans it for CVEs. Critical for container images built from AI-generated Dockerfiles.

Handling Hallucinated Package Names

Standard SCA tools check whether a package is in the registry (it exists) and whether it has CVEs. They do not inherently detect whether a package was originally hallucinated by an AI and subsequently registered by an attacker. Mitigating this requires additional controls: package age checks (flag packages registered after the model's training cutoff or within the last 30 days), download count thresholds (flag packages with fewer than 1,000 lifetime downloads), and code inspection gates for all new or infrequently-used dependencies.

Socket Security is the current specialized tool for this; its behavioral analysis of npm packages flags anomalies that raw CVE databases cannot surface. For Python, manual review of PyPI pages including maintainer history and release cadence should precede any AI-suggested package import into a production codebase.

Workflow Requirement

Every package introduced by AI-generated code that does not already appear in your approved dependency manifest should be treated as an untrusted first submission. Verify: it exists in the registry, has a credible maintenance history, has no recent ownership transfers, and has no open critical CVEs. Only then add it to the manifest.

Transitive Dependency A package your code depends on indirectly — because a direct dependency itself requires it. Transitive dependencies account for the majority of CVEs in production applications and are invisible in AI-generated import statements.

SBOM (Software Bill of Materials) A machine-readable inventory of all components in a software application, including versions, licenses, and provenance. Required by U.S. Executive Order 14028 (2021) for federal software suppliers; increasingly required by enterprise procurement.

Package Hallucination An AI code generator inventing a plausible but non-existent package name, which attackers can register to intercept installations. Documented in multiple LLMs including GPT-4 and Copilot as of 2023.

Lesson 2 Quiz

Software Composition Analysis and Dependency Risk

1. What is the specific supply-chain attack vector introduced when AI code generators hallucinate package names?

✓ Correct — Correct. Researchers demonstrated that attacker-registered hallucinated package names are then served to any developer whose AI assistant suggests that name — a documented attack technique from 2023.

Not quite. The attack works because attackers register the hallucinated names on real registries like npm or PyPI, serving malicious code to anyone who installs them.

2. Which tool specifically analyzes npm package BEHAVIOR (install scripts, network access) rather than just CVE databases?

✓ Correct — Correct. Socket Security performs behavioral analysis of package code to detect supply-chain risks that CVE-only databases cannot surface — critical for detecting attacker-registered hallucinated packages.

The other tools rely primarily on CVE databases. Socket Security is distinguished by its behavioral analysis of package contents and metadata.

3. Why does AI-generated code specifically create "stale version pinning" risk in dependency management?

✓ Correct — Correct. The training cutoff creates a temporal gap: an AI might recommend lodash 4.17.15 because that was current during training, even though that version has since received CVE disclosures.

The core issue is the training cutoff — AI suggestions reflect the package landscape at training time, not at the moment of code generation.

Lab 2: Dependency Vetting Protocol

Build a verification checklist for AI-suggested packages before production approval

Scenario

A junior developer used Claude to generate a Python data processing module. The AI suggested four packages, two of which are not in your team's existing requirements.txt. One of them has only 47 downloads on PyPI and was registered six weeks ago. Another is a well-known library but pinned to a version two years old.

Work with the advisor to build a step-by-step vetting protocol. Cover: what you check for each new package, which tools you run, what thresholds trigger rejection vs. investigation, and how you document the decision for audit. At least 3 substantive exchanges required.

Security Advisor — SCA and Dependencies

Lab 2

Let's build your vetting protocol. You have two suspicious packages: one very new with minimal downloads, and one well-known library pinned to an old version. Which one concerns you more immediately, and what is the first check you'd run on each?

Module 8 · Lesson 3

Secrets Detection and Credential Hygiene

AI code generators embed secrets in training-data patterns — and reviewers rarely notice until the breach

How do automated secrets detectors catch hardcoded credentials in AI-generated code faster than any human reviewer — and what do you do when the secret is already in git history?

In October 2022, security researcher Eaton Zveare discovered that a Toyota Motor Corporation web application contained a hardcoded credential in publicly accessible JavaScript — providing access to a customer data server. In a separate 2023 disclosure, researcher Muon Trap found hardcoded GitHub tokens in Toyota's supplier management system. Both incidents involved credentials embedded directly in source code, committed to repositories, and undetected for extended periods. Neither was initially attributed to AI code generation, but both illustrate the exact pattern that AI assistants reproduce at scale: API keys and tokens placed inline because that is how equivalent code appeared in training data.

Why AI Code Generators Hardcode Credentials

LLMs learn code patterns from public repositories on GitHub, GitLab, Stack Overflow, and similar sources. These repositories historically contained millions of hardcoded credentials — tokens, API keys, passwords, connection strings — that developers committed before best practices around secret management were widely adopted. When an LLM generates database connection code, authentication handler code, or API client code, it reproduces the inline credential pattern because that is statistically the most common form in its training corpus.

The problem is compounded by developer acceptance patterns: AI-generated code is often accepted as a block, and the credential placeholder (like API_KEY = "your-key-here") is replaced with a real value inline rather than refactored to use environment variables, because the developer is following the structure the AI provided. The credential then enters version history, where it persists even after subsequent commits remove it.

Scale of the Problem — GitGuardian 2023

GitGuardian's 2023 State of Secrets Sprawl report found 10 million new secrets exposed in public GitHub commits during 2022 — a 67% increase year-over-year. The report specifically noted that AI coding assistant adoption correlated with increased secrets in commits during the second half of 2022, as developers pasted real credentials into AI-generated scaffolding. Google API keys, GitHub tokens, and AWS access keys were the three most commonly exposed credential types.

Secrets Detection Tools

GitGuardian

Enterprise Secrets Detection

Scans commits in real-time with 350+ secret-type detectors. Integrates with GitHub, GitLab, Bitbucket. Sends alerts within seconds of a push. Free tier available for open-source projects; offers git history scanning for remediation.

Trufflehog

Open-Source Git Scanner

Scans git history, S3 buckets, Slack, Jira, and other sources for high-entropy strings and known secret patterns. Its verified-secrets feature actually tests whether discovered credentials are still active.

Gitleaks

Pre-commit Secret Scanning

Fast open-source tool ideal for pre-commit hooks and CI. Detects 150+ secret patterns via regex. Generates SARIF output for GitHub Advanced Security integration. Sub-second scan times for incremental commits.

GitHub Secret Scanning

Native Platform Scanning

Automatically enabled for public repositories; available in GitHub Advanced Security for private repos. Partners with 100+ providers (AWS, Google, Stripe) for push-protection that blocks commits containing known secret formats.

detect-secrets

Yelp / Auditing Tool

Yelp's open-source tool maintains a baseline file of known acceptable high-entropy strings, reducing false positives in codebases with test fixtures. Integrates with pre-commit framework; supports custom plugin development.

HashiCorp Vault + Sentinel

Secret Management + Policy

Not a scanner but the reference architecture for eliminating hardcoded secrets. Vault provides dynamic, short-lived credentials. Sentinel policies can enforce that application code never receives long-lived static secrets.

Remediation When Secrets Are in Git History

Removing a secret from the latest commit does not remove it from git history. Anyone who cloned the repository before the removal, or who has access to the full commit log, can recover the credential. The correct remediation sequence is: 1) Immediately revoke the credential at the provider (AWS, Google Cloud, Stripe, etc.) — assume it has been compromised from the moment of commit; 2) Issue a replacement credential stored in a secret manager; 3) Rewrite git history using BFG Repo Cleaner or git filter-repo to remove the secret from all commits; 4) Force-push the rewritten history and notify all collaborators to re-clone.

For AI-generated code specifically, pre-commit hooks running Gitleaks or detect-secrets are the most cost-effective prevention: they block the commit before the secret enters history at all, eliminating the expensive remediation cycle entirely.

Mandatory Control

Pre-commit secret scanning is non-negotiable for teams using AI code assistants. The pattern of inline credentials in AI-generated code is consistent and well-documented. A pre-commit hook that runs in under one second prevents incidents that take days to remediate. There is no valid engineering justification for skipping it.

High-Entropy String Detection Identifying potential secrets by measuring the information entropy of strings — credentials and tokens typically have higher entropy than human-readable text. Used alongside pattern matching to catch novel credential formats.

Push Protection A GitHub feature that blocks a push to a repository when it contains patterns matching known secret formats from partner providers — preventing the secret from entering history rather than alerting after the fact.

Secrets Sprawl The proliferation of credentials across codebases, CI/CD configurations, container images, logs, and documentation — a growing problem accelerated by AI code generation creating more credential-containing scaffolding code.

Lesson 3 Quiz

Secrets Detection and Credential Hygiene

1. According to GitGuardian's 2023 report, approximately how many new secrets were exposed in public GitHub commits during 2022?

✓ Correct — Correct. GitGuardian found 10 million new secrets exposed in 2022 public commits — a 67% year-over-year increase that correlated with growing AI coding assistant adoption.

The correct figure is 10 million — a 67% increase year-over-year that GitGuardian specifically correlated with AI coding assistant adoption patterns.

2. What is the FIRST action to take when a secret is discovered in a git repository's commit history?

✓ Correct — Correct. Revocation must happen first because the credential must be treated as compromised from the moment of commit. Git history rewriting comes after revocation — not before.

History rewriting is necessary but comes after revocation. The credential must be treated as compromised immediately — anyone who cloned before the fix may have it.

3. Which Trufflehog feature makes it particularly valuable compared to pattern-only secrets scanners?

✓ Correct — Correct. Trufflehog's verified-secrets feature makes live API calls to test whether found credentials are still active — prioritizing real risk over stale or rotated secrets.

Trufflehog's standout feature is credential verification — it tests whether discovered secrets are still active, distinguishing live risk from already-rotated credentials.

Lab 3: Secrets Incident Response

Walk through a credential exposure discovered in AI-generated code that was committed to a public repo

Scenario

A developer used GitHub Copilot to generate an AWS S3 upload utility. The AI generated code with a placeholder AWS_ACCESS_KEY and AWS_SECRET_KEY inline, which the developer replaced with real values and committed to a public repository two hours ago. GitGuardian has just alerted. The S3 bucket contains customer PII.

Work through the incident response with the security advisor. Cover: immediate containment steps, remediation sequence for the git history, how to assess the blast radius, and what process changes prevent recurrence. At least 3 substantive exchanges required.

Security Advisor — Secrets Response

Lab 3

You have an active credential exposure. AWS keys with access to a PII-containing S3 bucket have been in a public repo for two hours. Walk me through your first three actions in priority order — and tell me why sequence matters here.

Module 8 · Lesson 4

Human Review Workflows and Security Gates

Tooling catches patterns — humans catch intent. Building the review layer that tools cannot replace

How do you structure a human security review process that meaningfully engages with AI-generated code without becoming a rubber-stamp or a bottleneck?

Microsoft's Security Development Lifecycle (SDL), formalized in 2004 and continuously updated since, established the principle that automated tools and human review are complementary — neither sufficient alone. When Microsoft began integrating Copilot into its own development workflows in 2022, its security engineering team published guidance noting that SDL requirements including threat modeling, security code review, and penetration testing apply with unchanged force to AI-generated code — and that AI assistance does not constitute a security review. This position, from one of the largest adopters of AI coding assistance in the world, reflects the industry consensus on human review's irreplaceable role.

What Automated Tools Cannot Detect

SAST tools detect known syntactic vulnerability patterns. SCA tools detect known vulnerable components. Secrets scanners detect high-entropy strings and known credential formats. All three operate on what the code is, not what the code is supposed to do. Human reviewers operate on intent — comparing implementation against requirements, threat model, and architectural constraints that exist outside the code itself.

Specific things that require human judgment in AI-generated code: logic errors that are syntactically correct (an authorization check that runs but checks the wrong condition), missing security controls (code that handles data correctly but omits rate limiting), architectural violations (a microservice that correctly implements encryption but exposes an unintended internal endpoint), and business logic flaws (a discount calculation that an attacker can manipulate by controlling input ordering).

Research Evidence — Stanford 2022 (Sandoval et al.)

A 2022 Stanford/NYU study "Lost at C" found that developers using AI code assistants were more likely to produce insecure code, less likely to recognize their code was insecure, and when shown their code and asked if it was secure, rated AI-assisted insecure code as more secure than manually-written insecure code. This confidence inflation is the primary argument for mandatory human security review that cannot be waived because "Copilot already reviewed it."

Security Gate Architecture

AI-Attribution Tagging

Commits or PR descriptions tag code as AI-assisted using a standard format (e.g., "AI-assisted: Copilot" in commit message or a required PR label). This metadata routes the PR to elevated scrutiny paths in CI and assigns it to security-trained reviewers, not just the default peer reviewer pool.

Automated Tool Suite Run

SAST, SCA, and secrets scanning complete before human review begins. Reviewers receive a consolidated finding report, not raw scanner output. Findings are triaged: CRITICAL blocks merge automatically; HIGH requires reviewer sign-off; MEDIUM is documented and tracked; LOW is informational.

Security-Focused Code Review

A reviewer with security training (not just a peer developer) examines AI-generated sections against a checklist: authentication and authorization logic, input validation completeness, error handling that doesn't expose internals, cryptographic implementation, and adherence to the threat model.

Threat Model Reconciliation

For significant AI-generated features, the reviewer compares the implementation against the existing threat model. New attack surfaces, data flows, or trust boundaries introduced by the AI-generated code trigger a threat model update before merge approval.

Audit Trail Closure

All review decisions — approvals, suppressed scanner findings, threat model updates — are recorded with reviewer identity, rationale, and timestamp. AI attribution metadata is preserved in the commit record for future incident investigation and regulatory compliance.

Review Checklist for AI-Generated Code

Security reviewers examining AI-generated code should work through a structured checklist rather than an open-ended scan. Key checklist items include: Does every API endpoint validate and sanitize all inputs? Are authorization checks on the correct principal for the correct resource? Does error handling log enough for forensics without exposing stack traces to users? Are all cryptographic operations using current recommended algorithms and key lengths? Does the code introduce any new external network calls or data persistence not in the design doc? Are all new dependencies vetted through the SCA process?

The checklist approach is especially important because AI-generated code can appear complete and coherent while omitting entire security control categories. A reviewer reading without a checklist may follow the logic of what is present and never notice the complete absence of rate limiting on an authentication endpoint.

Process Principle

The human review step is not a check on whether the AI made a syntax error. It is a check on whether the implementation is correct for its security context. AI tools can write syntactically valid, SAST-passing code that is nonetheless catastrophically wrong from a security design perspective. Only a human reviewer with the right context can catch that.

Security Gate A mandatory checkpoint in a development pipeline — automated or human — that must be passed before code advances to the next stage. Gates block, not advise; a failed gate prevents merge or deployment until explicitly resolved.

AI Attribution Metadata Documentation in commits, PRs, or audit logs indicating that code was generated or substantially assisted by an AI tool, preserved for security review routing, incident investigation, and regulatory compliance purposes.

Confidence Inflation The documented tendency of developers to rate AI-assisted code as more secure than equivalent manually-written code, even when both contain the same vulnerabilities — creating false assurance that bypasses review scrutiny.

Lesson 4 Quiz

Human Review Workflows and Security Gates

1. According to the Stanford/NYU "Lost at C" study, how did developers rate AI-assisted insecure code compared to manually-written insecure code?

✓ Correct — Correct. This "confidence inflation" is the core finding — developers rated AI-assisted insecure code as more secure, creating false assurance that undermines the case for skipping human review.

The finding was counterintuitive: developers rated AI-assisted insecure code as MORE secure than manually-written insecure code — the basis for the "confidence inflation" concern.

2. Which category of security flaw specifically requires human review and CANNOT be detected by SAST tools alone?

✓ Correct — Correct. SAST tools verify that an authorization check exists; they cannot determine whether it checks the correct principal for the correct resource — that requires human understanding of intent vs. implementation.

The others can be detected by SAST or secrets scanners. An authorization check that runs on the wrong condition is syntactically correct — only a human reviewer who knows what the check should do can identify it as wrong.

3. What is the primary purpose of AI-attribution tagging in a security review workflow?

✓ Correct — Correct. Attribution tagging is a routing mechanism — it ensures AI-assisted code receives security-trained reviewer attention and elevated tooling scrutiny rather than standard peer review.

Attribution tagging enables elevated scrutiny, not automatic approval or blocking. It routes AI-generated code to security-trained reviewers and additional tooling checks.

Lab 4: Review Workflow Design

Design a complete security review gate for a team adopting AI code generation at scale

Scenario

Your organization is rolling out GitHub Copilot Business to 200 developers across 15 teams. The CISO has asked you to design the security review workflow — covering automated gates, human review requirements, attribution tracking, and audit trail requirements — that applies to all AI-assisted code before it reaches production. You have a budget for one additional security engineer headcount.

Work with the advisor to design the complete workflow. Cover: which gates are automated vs. human, how you scope human review effort to avoid bottlenecks, what the audit record must contain, and how you measure whether the workflow is effective. At least 3 substantive exchanges required.

Security Advisor — Review Workflow

Lab 4

You're designing a workflow for 200 developers, one new security engineer, and a mandate that all AI-generated code gets meaningful security review before production. Start with the hard constraint: how do you prevent this from becoming either a rubber-stamp or a six-day bottleneck? What's your core architectural decision?

Module 8 Test

Security Review Workflows and Tooling — 15 questions, 80% to pass

1. Semgrep differs from traditional grep-based secret scanners primarily because it:

✓ Correct — Correct. Semgrep's AST-based approach means it understands code structure, enabling rules that match semantically equivalent patterns regardless of whitespace, variable naming, or code ordering.

Semgrep's core distinction is AST-based semantic pattern matching — it understands code structure, not just text.

2. The "Asleep at the Keyboard?" study (Pearce et al., IEEE S&P 2022) tested GitHub Copilot across how many different scenarios?

✓ Correct — Correct. The study generated 1,689 code snippets across 89 different scenarios — a substantial empirical basis for its finding that ~40% contained CWE-classifiable vulnerabilities.

The study used 89 scenarios, generating 1,689 total snippets for vulnerability analysis.

3. Which SAST tool performs full data-flow taint analysis and is available free for open-source projects through GitHub Advanced Security?

✓ Correct — Correct. CodeQL is GitHub's semantic analysis system that performs full data-flow and taint analysis. It is free for open-source repositories via GitHub Advanced Security.

CodeQL is GitHub's full data-flow and taint analysis tool, available free for open-source projects through GitHub Advanced Security.

4. The Samsung March 2023 incident is relevant to security review workflows because it demonstrated:

✓ Correct — Correct. The Samsung incident — where employees pasted proprietary code into ChatGPT — revealed that the risk isn't only in AI-generated output but in the code context fed to AI tools, requiring data classification and usage policy controls.

The Samsung incident was about developers inadvertently sharing proprietary code with external AI services — a data exposure risk that workflow controls and usage policies must address.

5. What makes the "package hallucination" supply-chain attack possible?

✓ Correct — Correct. Attackers monitor for hallucinated package names appearing in public code and register them on npm/PyPI, intercepting installations by developers who accepted AI suggestions without verification.

The attack works by registering hallucinated names on real registries — the AI generates a plausible name, the attacker registers it, and any developer who installs it gets attacker-controlled code.

6. Syft and Grype are paired tools. What does each do in combination?

✓ Correct — Correct. Syft generates Software Bill of Materials (SBOM) in SPDX or CycloneDX format; Grype then scans that SBOM for known CVEs — together providing container image and application vulnerability scanning.

Syft generates SBOM (Software Bill of Materials); Grype scans the SBOM for CVEs. Together they provide vulnerability scanning for container images and application dependencies.

7. U.S. Executive Order 14028 (2021) introduced which software security requirement that directly affects AI-generated code management?

✓ Correct — Correct. EO 14028 established SBOM requirements for software sold to the federal government — making SBOM generation tools like Syft/Grype and SCA practices regulatory requirements for many organizations.

EO 14028's key software supply chain requirement was SBOM — a machine-readable inventory of all components — for federal software suppliers.

8. Which approach does detect-secrets use to reduce false positives compared to pure entropy-based detection?

✓ Correct — Correct. detect-secrets' baseline approach lets teams document intentional high-entropy strings (test keys, encoded data, salts) so they don't trigger repeated alerts, maintaining signal quality without manual exclusion each run.

detect-secrets reduces false positives through a baseline file — a versioned list of acceptable high-entropy strings that are excluded from future alerts.

9. The XZ Utils backdoor discovered in March 2024 is relevant to AI code review workflows because:

✓ Correct — Correct. The XZ Utils case illustrates the temporal gap problem: AI training cutoffs mean AI tools can recommend compromised libraries for months after security disclosures, requiring current SCA scanning to catch what AI suggestions miss.

The XZ Utils case demonstrates the training cutoff gap: AI models continue recommending compromised packages until retrained, making current SCA scanning essential for catching what AI suggestions miss.

10. In a SAST severity gate configuration, which approach best balances security and development velocity?

✓ Correct — Correct. Tiered response — auto-block CRITICAL, human sign-off on HIGH, tracked MEDIUM, logged LOW — maintains meaningful security gates without paralyzing development over low-confidence findings.

Tiered response is the standard approach: auto-block CRITICAL, require documented sign-off on HIGH, track MEDIUM for remediation planning, log LOW for awareness.

11. GitHub's push protection feature for secret scanning differs from after-the-fact scanning because:

✓ Correct — Correct. Push protection operates pre-receive — it prevents the credential from entering history at all, eliminating the expensive remediation cycle of history rewriting and credential rotation.

Push protection's key advantage is pre-receive blocking — it stops secrets from entering git history, which is far cheaper than the remediation required once a secret is in history.

12. Microsoft SDL guidance on AI-generated code states that security requirements including threat modeling and penetration testing:

✓ Correct — Correct. Microsoft's SDL guidance is explicit: AI assistance does not constitute a security review, and all SDL requirements apply unchanged to AI-generated code.

Microsoft SDL guidance is unambiguous: all SDL requirements apply with unchanged force to AI-generated code. AI assistance is not a substitute for any security review activity.

13. What category of vulnerability is specifically cited as requiring human review because AI-generated code can implement it correctly in form but incorrectly in intent?

✓ Correct — Correct. Business logic flaws — where syntactically correct code implements wrong security semantics — are the paradigmatic human-review-only vulnerability class. SAST tools cannot evaluate whether a discount calculation or authorization check is logically correct for its context.

Business logic flaws are the canonical example: code that is syntactically correct and SAST-passing but implements the wrong security semantics — something only a human reviewer with context can identify.

14. Snyk Code is distinguished from traditional SAST tools by its use of:

✓ Correct — Correct. Snyk Code uses ML trained on real vulnerability patterns to reduce false positive rates — and its IDE plugin provides real-time feedback as developers accept AI suggestions, making it uniquely positioned for in-loop review.

Snyk Code's differentiator is its ML model trained on vulnerability patterns, which reduces false positives and enables real-time IDE feedback as developers work with AI-generated code.

15. The complete remediation sequence when a secret is discovered in git history is:

✓ Correct — Correct. Revocation must be first — the credential is compromised from the moment of commit. History rewriting removes the secret from future access but cannot undo access that already occurred.

The correct sequence begins with immediate revocation — the credential must be treated as compromised from commit. Only after revocation do you rewrite history and notify collaborators to re-clone.