In 2023, security researchers at Vulcan Cyber published findings demonstrating that large language models including ChatGPT and Copilot routinely suggested non-existent npm, PyPI, and RubyGems packages with convincing-sounding names. They coined the attack pattern "AI package hallucination." In one documented test, GPT-3.5 recommended the package huggingface-cli for a task where no such package existed under that exact name on PyPI β a name an attacker could trivially register.
A separate 2023 study by researchers at the University of Texas at San Antonio found that in tests across multiple AI models, up to 30% of AI-suggested packages did not exist in the target registry. The research was published under the title "Hallucinating AI Hackathon" and presented at DEF CON 31.
Large language models learn to write code by training on massive corpora of open-source repositories, documentation, Stack Overflow answers, and GitHub issues. Within this corpus, references to packages appear with enormous variation: version-pinned requirements files, informal blog posts, deprecated names, forks, and packages that existed briefly before removal. The model learns statistical patterns β it learns that certain task descriptions tend to appear alongside certain package names β but it has no live connection to any package registry.
When asked to complete a task for which it has seen many partial, contradictory, or sparse examples, the model generates a plausible-sounding package name by interpolating across its training data. The result is a package name that feels right β it follows naming conventions, it matches the ecosystem's style β but does not correspond to any published package.
This is not a rare edge case. Because AI-generated code is now being adopted at scale without verification, the gap between "AI suggested it" and "it actually exists and is safe" has become one of the most quietly dangerous failure modes in modern software development.
When Vulcan Cyber researchers asked ChatGPT to write code using Hugging Face's API, the model suggested installing packages like huggingface-cli and several other variations that did not exist on PyPI. These names were plausible enough that a developer running pip install without verification would simply see an install failure β or, if an attacker had pre-registered the name, would silently install malicious code.
Dependency confusion is a distinct but related attack first documented by security researcher Alex Birsan in February 2021. Birsan demonstrated that by registering public package names that matched internal private package names used by companies including Apple, Microsoft, and Netflix, he could cause their build systems to automatically pull his (benign, proof-of-concept) packages instead of the intended private ones. He collected bug bounty payouts exceeding $130,000 from over 35 companies before publishing his findings.
The dependency confusion attack exploits a resolver priority flaw: most package managers, by default, prefer public registries over private ones when the same name exists in both. An AI model generating code that references private internal package names β learned from leaked configuration files, open-source forks, or developer blog posts in its training data β creates precisely the conditions Birsan exploited, at scale.
When combined, AI hallucination and dependency confusion create a compound threat: the AI suggests a non-existent or internal package name, the developer installs it without checking, and an attacker who has registered that name on the public registry delivers a malicious payload.
The audit workflow for AI-generated package references must include explicit existence verification before any installation. This sounds obvious but is routinely skipped β developers working at AI-accelerated pace often treat the AI's output as pre-validated. It is not.
Step 1: Extract all package references. Parse requirements.txt, package.json, go.mod, Cargo.toml, or equivalent files generated by the AI for every named dependency. Do not rely on the AI to have listed only real packages.
Step 2: Verify existence in the canonical registry. For every name, perform a direct registry query: pip index versions [package], npm view [package], or equivalent. A 404 or "package not found" response is a critical finding requiring immediate escalation.
Step 3: Verify the publisher and history. A package that exists but was registered within the last 30 days with zero prior versions and a single anonymous maintainer is a red flag, especially if its name closely matches an AI-suggested one.
Step 4: Check for scope/namespace confusion. AI models frequently confuse scoped npm packages (e.g., @company/utils) with unscoped variants. Verify the exact registry path matches what was intended.
Treat every package name in AI-generated code as unverified until explicitly confirmed against a live registry query. The AI has no knowledge of which packages currently exist, have been renamed, removed for malware, or were never published.
You have received a Python requirements file generated by an AI assistant for a new internal data pipeline project. Your task is to audit the dependency list for hallucinated, suspicious, or dependency-confusion-risk package names before any installation proceeds.
Work with the AI security tutor below. Describe your audit methodology, ask about specific package name patterns, and explore what signals indicate a high-risk dependency suggestion.
The December 2021 Log4Shell vulnerability (CVE-2021-44228) illustrated with catastrophic clarity why transitive dependencies matter. Log4j was not a package most developers knowingly chose β it was pulled in as a dependency of a dependency of a dependency. Thousands of organizations discovered they were running vulnerable Log4j versions only after exploitation attempts began. The U.S. CISA described the vulnerability as "one of the most serious" ever discovered, with estimated remediation costs in the billions.
AI code generators do not track or disclose the transitive dependency trees of packages they recommend. When an AI suggests using Spring Boot or a cloud SDK, the hundreds of transitive dependencies pulled along with that choice are invisible to both the AI and the developer unless explicitly inventoried.
An SBOM is a formal, machine-readable inventory of every software component in an application β including direct dependencies, transitive dependencies, and their versions and licensing information. The concept gained regulatory force in the United States with Executive Order 14028 (May 2021), which mandated SBOM generation for software sold to federal agencies. NIST's guidelines under the EO referenced SPDX and CycloneDX as the two primary SBOM standards.
For AI-generated code specifically, the SBOM serves a function beyond compliance: it makes visible the full dependency graph that the AI implicitly created by recommending certain packages. A developer who accepts AI-suggested code may end up with 200 transitive packages they never consciously chose and cannot name without tooling.
SPDX (Software Package Data Exchange) is maintained by the Linux Foundation and is the ISO/IEC 5962:2021 standard. CycloneDX is maintained by OWASP and is optimized for security use cases, including vulnerability correlation. Both are accepted under EO 14028.
Executive Order 14028 "Improving the Nation's Cybersecurity" (May 12, 2021) required that software providers selling to the federal government provide SBOMs. NIST published guidance in NIST SP 800-218 (Secure Software Development Framework) and NIST SP 800-161r1 (C-SCRM) addressing supply chain risk management practices that directly apply to AI-generated code workflows.
When AI tools generate project scaffolding or suggest frameworks, they select packages optimized for the task described β not for minimal dependency footprint or supply chain hygiene. The result is typically a project that inherits a large transitive dependency tree from day one, often including packages with known CVEs that have not been patched in the version range the AI specified.
A documented example: researchers at Endor Labs published a 2023 study titled "State of Dependency Management" finding that 95% of vulnerable open-source package versions arise from transitive dependencies, not direct ones. Of the top 10 critical vulnerabilities affecting open-source projects, 9 were in transitive dependencies. AI code generators do not audit for this; they recommend based on training data that may reflect version requirements predating known vulnerabilities.
The practical implication: when auditing AI-generated code, the first-layer requirements file is only the starting point. The complete dependency graph, including all transitive dependencies, must be resolved and scanned before the code is considered safe to use.
Tooling for SBOM generation: Syft (Anchore) generates SBOMs in both SPDX and CycloneDX format for containers and filesystems. cdxgen (OWASP) generates CycloneDX SBOMs from package manifests. pip-audit scans Python environments against the OSV vulnerability database. npm audit and Dependabot cover Node.js transitive trees.
The audit workflow: (1) accept AI-generated dependency manifest; (2) resolve the full transitive dependency tree without installing to production β using pip-compile, npm ci --dry-run, or equivalent; (3) generate an SBOM from the resolved tree; (4) scan the SBOM against NVD, OSV, and GitHub Advisory Database; (5) review any flagged CVEs against severity threshold before proceeding.
A critical audit point specific to AI-generated code: AI models frequently suggest unpinned version ranges (e.g., requests>=2.0 rather than requests==2.28.2). Unpinned ranges allow package managers to silently upgrade to newer versions that may introduce new vulnerabilities or, in cases of compromised maintainer accounts, injected malicious code. All AI-generated version specifications should be pinned before deployment.
1. Generate a full SBOM before any production deployment of AI-generated code. 2. Scan all transitive dependencies, not just direct ones. 3. Pin all version numbers β reject unpinned ranges. 4. Track SBOM against new CVE feeds on an ongoing basis, not just at initial generation.
An AI assistant has scaffolded a new Python web API project using FastAPI, SQLAlchemy, and boto3 for AWS S3 access. Your security review must include a complete SBOM analysis covering all transitive dependencies before the project can proceed to staging.
Use the AI tutor below to work through the SBOM generation process, understand which tools to use, and discuss how to interpret vulnerability scan results from the resolved dependency tree.
In October 2021, the npm package ua-parser-js β downloaded over 7 million times per week and used by Facebook, Microsoft, and Amazon β was compromised when its maintainer's npm account was taken over. The attacker published versions 0.7.29, 0.8.0, and 1.0.0 containing a cryptocurrency miner and a password stealer. The compromise was live for several hours before detection. Any CI/CD pipeline using unpinned version ranges pulled the malicious code automatically.
In January 2022, the npm package node-ipc (used by the popular Vue CLI) was deliberately sabotaged by its maintainer, Brandon Nozaki Miller, who pushed versions that overwrote files with a heart emoji on computers with Russian or Belarusian IP addresses β a protest against the Ukraine invasion. This event, catalogued as CVE-2022-23812, raised fundamental questions about maintainer trustworthiness independent of account compromise.
In March 2024, the XZ Utils backdoor (CVE-2024-3094) was discovered β a multi-year social engineering attack in which a threat actor known as "Jia Tan" systematically built trust with the xz maintainer over two years before inserting a sophisticated backdoor into versions 5.6.0 and 5.6.1. Microsoft engineer Andres Freund discovered the backdoor accidentally while investigating unusual SSH performance.
AI models are trained on a fixed corpus with a training cutoff date. When a model recommends a package, it is recommending the package as it existed β and was reviewed by the community β up to that cutoff. It has no knowledge of compromises that occurred afterward. The model may confidently recommend ua-parser-js in its training-data-known-good state, while the package being installed is a post-compromise version containing malware.
This temporal gap is a structural vulnerability in AI-assisted development. The AI cannot say "this package was compromised six months ago." It can only say "this package was widely used and highly rated in the data I was trained on."
When AI-generated code enters a codebase with unpinned version ranges, every subsequent dependency resolution is a fresh trust decision that the AI cannot inform. The combination of AI-recommended packages, unpinned versions, and automated dependency updates creates a pipeline where malicious updates can enter production without any human reviewing what changed.
The XZ Utils backdoor (CVE-2024-3094) demonstrated a sophisticated supply chain attack requiring years of social engineering: the attacker contributed legitimate, useful code for 2+ years before the malicious commit. AI models trained on pre-compromise repository data would recommend xz with no indication of the risk. This attack was only discovered by accident β no automated tool detected it prior to Andres Freund's investigation in March 2024.
Monitoring for compromise: Subscribe to security advisory feeds for every package in your SBOM. The GitHub Advisory Database, OSV (Open Source Vulnerabilities database), and Sonatype's vulnerability data all track known-compromised packages. Tools like Dependabot, Snyk, and Socket.dev can automatically flag newly reported compromises against your dependency tree.
Socket.dev specifically targets the problem of newly suspicious packages β it analyzes behavioral signals in package updates rather than waiting for CVE publication, flagging packages that suddenly add network access, file system access, or obfuscated code that wasn't present in prior versions.
Version pinning as a mitigation: Pinning to exact versions prevents automatic adoption of compromised updates, but requires a deliberate upgrade process and ongoing monitoring of pinned versions against advisories. A pinned version is safer against surprise compromise but must be actively maintained.
Integrity verification: Most modern package registries publish cryptographic hashes for packages. npm uses lockfile integrity fields (SHA-512), PyPI provides package hashes, and tools like pip-compile --generate-hashes create requirements files with integrity checks. AI-generated code rarely includes these verification steps; auditors must add them.
When reviewing AI-generated project configurations, check specifically for: (1) Whether dependency files use exact version pinning or ranges. (2) Whether lockfiles are generated and committed (package-lock.json, poetry.lock, Pipfile.lock, yarn.lock). (3) Whether any automated update tools (Dependabot, Renovate) are configured with appropriate review gates rather than auto-merge. (4) Whether package integrity hashes are included in lock files and verified during installation.
AI models frequently generate code that skips lockfile generation, uses broad version ranges, and does not include hash verification β because the training data it learned from frequently exhibited these patterns. These are not AI errors in the conventional sense; they are accurate reproductions of common but insecure practices from the training corpus.
Every AI-generated package recommendation reflects the state of that package at training time. Your audit must account for the gap between the model's training cutoff and today β this gap may span months or years of supply chain events the model has no knowledge of.
Your team has received an AI-generated Node.js project that uses unpinned dependencies in package.json, has no lockfile committed, and has Dependabot configured with auto-merge enabled. You have been asked to assess the compromise exposure and recommend specific mitigations.
Work with the tutor to develop an assessment of the configuration's risk and a prioritized remediation plan addressing version pinning, lockfile management, integrity verification, and monitoring.
The SolarWinds Orion compromise, revealed in December 2020, remains the definitive documented case of build pipeline injection at scale. Attackers identifying as Cozy Bear / APT29 injected malicious code into SolarWinds' build system β not the source code repository β such that the malware was compiled into official, digitally-signed SolarWinds binaries. Approximately 18,000 organizations installed the compromised updates, including nine U.S. federal agencies. The attack was active for nine months before discovery.
On a smaller but more directly relevant scale: in 2023, researchers at Palo Alto Unit 42 documented GitHub Actions supply chain attacks in which compromised third-party Actions (the reusable workflow components that AI models routinely suggest in CI/CD configurations) were used to exfiltrate secrets from build environments. AI coding assistants frequently suggest GitHub Actions by name and version β recommendations that may reflect outdated, now-compromised Action versions.
When developers ask AI assistants to generate CI/CD configurations β GitHub Actions workflows, GitLab CI YAML, Jenkinsfiles, CircleCI configs β the AI produces configurations based on patterns from its training data. These configurations typically include references to specific third-party Actions, Docker base images, and build scripts. Each of these references is a potential injection point if the referenced resource has been compromised or replaced since the AI's training cutoff.
GitHub Actions are particularly high-risk because they are referenced by owner/repo@version tags. AI models frequently suggest Actions pinned to branch references (uses: actions/checkout@main) rather than immutable commit SHAs (uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683). Branch references are mutable β the owner of the Action can push new code to the branch at any time, changing what runs in your pipeline without any change to your workflow file.
The tj-actions/changed-files compromise in March 2023 (CVE-2023-26301) demonstrated this concretely: attackers compromised the tj-actions/changed-files GitHub Action and modified it to print repository secrets to workflow logs. Any workflow using this Action with a non-SHA pin was automatically running malicious code. AI tools suggesting this Action by name without SHA pinning would have recommended an attack vector.
CVE-2023-26301 documented the compromise of the widely-used tj-actions/changed-files GitHub Action. Attackers modified the Action to exfiltrate CI/CD secrets by printing them to workflow logs. Organizations with workflows pinned to tag or branch references (the default AI recommendation pattern) were exposed. SHA-pinned workflows were unaffected. This incident affected thousands of repositories.
SHA pinning for GitHub Actions: Every third-party Action reference must use an immutable commit SHA rather than a tag or branch. This is the single highest-impact control for GitHub Actions supply chain risk. Tools like Ratchet (by Google) and pin-github-action automate the conversion of tag-based references to SHA pins. The StepSecurity Harden-Runner Action can also monitor and restrict Actions' runtime behavior.
Docker base image pinning: AI-generated Dockerfiles frequently use FROM python:3.11 or similar mutable tags. These should be replaced with digest-pinned references: FROM python:3.11@sha256:abc123β¦. The Docker Hub tag python:3.11 can be updated to point to a new image at any time; the digest-pinned reference is immutable.
Secrets management: AI-generated pipeline configurations frequently place secrets directly in environment variables with patterns like AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET }} without additional controls. Auditors should verify that secrets are scoped to the minimum necessary steps, that OIDC token-based authentication is used where possible (eliminating long-lived credentials), and that secrets are not printed to logs.
Pipeline permissions: AI-generated GitHub Actions workflows frequently use permissions: write-all or omit permissions entirely (which defaults to write in older configurations). Each workflow should use the minimum necessary permissions. The permissions key should be explicitly declared at the workflow level with only required access granted.
A systematic audit of AI-generated CI/CD configuration should cover: (1) All third-party Action references β flag any that use branch, tag, or latest references instead of commit SHAs. (2) All Docker base image references β flag mutable tags without digest pinning. (3) Workflow permissions β flag write-all, missing permission declarations, or overly broad access. (4) Secret handling β flag hardcoded values, overly broad secret scopes, or patterns that could cause secrets to be logged. (5) Script injection vectors β flag workflow steps that interpolate user-controlled input (PR titles, branch names, commit messages) directly into shell commands.
Script injection is a particularly insidious AI-generated pattern. When AI generates workflow steps that use GitHub Actions expression syntax inside run: blocks β such as run: echo "${{ github.event.pull_request.title }}" β this creates a command injection vulnerability if a pull request title contains shell metacharacters. The github.event.pull_request.title is attacker-controlled input. AI models replicate this pattern widely because it appears frequently in documentation examples.
Any AI-generated CI/CD configuration should be treated as potentially containing mutable references, overly broad permissions, and script injection patterns until explicitly audited. The AI generates configurations that work, not configurations that are secure.
uses: actions/checkout@v3 considered a supply chain risk compared to using a full commit SHA?run: echo "${{ github.event.pull_request.title }}" is vulnerable to which type of attack?An AI assistant has generated a GitHub Actions workflow for a Node.js application. The workflow uses several third-party Actions by tag, runs with default permissions, echoes PR metadata in run steps, and stores AWS credentials as static secrets. Your task is to conduct a security audit and produce a remediation plan.
Use the tutor below to work through each risk category: SHA pinning, permissions hardening, script injection prevention, and credential management. Develop specific remediated versions of problematic configurations.
requests>=2.0 a security risk compared to requests==2.28.2?uses: some-action/tool@v2, the security concern is:run: git commit -m "${{ github.event.issue.title }}" is vulnerable because: