When security researcher Joseph Thacker began systematically testing ChatGPT's ability to generate Python code that called third-party APIs, he documented a pattern that had already been silently entering codebases worldwide: the model would confidently produce syntactically correct, logically plausible function calls to endpoints that did not exist. The Stripe SDK would gain a charge.refund_partial() method it had never shipped. The Twilio library would acquire a send_whatsapp_bulk() convenience wrapper no engineer had ever written. Each fabrication was formatted exactly like the real documentation.
Large language models are trained on enormous volumes of API documentation, Stack Overflow threads, GitHub repositories, and tutorial blog posts. This training creates a powerful internal model of how APIs should look — naming conventions, parameter patterns, return types, error classes. When asked to write code using a library, the model synthesizes a response that is statistically consistent with everything it has seen about that library's style.
The result is a hallucination that passes superficial inspection. The method name follows the library's naming convention. The parameters match the types the library typically uses. The docstring, if the model generates one, echoes real documentation prose. A developer scanning the output sees familiar patterns and proceeds.
This phenomenon is distinct from a simple syntax error. The code is syntactically valid. It may even pass static analysis. The failure surfaces only at runtime — or, if the hallucinated package is installed via a similarly-named malicious package, it may never surface at all.
Researchers at Vulcan Cyber published findings in May 2023 showing that AI-generated package names that did not exist in PyPI were being registered by threat actors after the hallucinations were observed in the wild. The attack model: wait for an AI tool to suggest a nonexistent package name, register that name with malicious code, then collect installs from developers who ran the AI-generated requirements file without verification.
Hallucinated APIs tend to cluster around predictable patterns. Understanding these patterns is the first step toward systematic detection during code review.
Consider this plausible-looking Python snippet generated by an LLM tasked with "write code to send a templated SMS via Twilio":
The Twilio Python SDK has no create_from_template() method on the messages resource. Content Templates are managed through a separate Content API with a completely different call pattern. auto_approve is not a recognized parameter in any version of the SDK. This code will raise an AttributeError at runtime — but only if it runs.
Any method call on a third-party object that you did not personally verify in the library's official changelog or source code must be treated as potentially hallucinated. Familiarity with a library's naming style is not verification. The model is trained on that same style and uses it to fabricate convincingly.
The naive assumption is that hallucinated APIs self-reveal quickly: the code runs, an exception fires, the developer investigates. In practice, three conditions routinely delay or suppress discovery.
Infrequent code paths. A method called only during error handling, monthly billing runs, or specific feature flag states may not execute in testing. The hallucination ships to production and waits.
Silent parameter absorption. Some libraries accept arbitrary keyword arguments via **kwargs and simply ignore unrecognized ones. A hallucinated configuration key that a constructor silently discards produces no error — the feature just does not work as intended, and the bug may be attributed to other causes.
Package substitution. As documented in the Vulcan Cyber research, a hallucinated package name that an attacker subsequently registers means the install succeeds, the import succeeds, and the code may even run — with attacker-controlled behavior replacing the expected logic.
Lesson 2 examines the dependency manifest specifically — how phantom packages enter requirements.txt, package.json, and go.mod files, and the verification workflow that catches them before install.
auto_paginate=True which is not a real parameter. What is the likely outcome?You are reviewing a pull request that includes AI-generated code using the OpenAI Python SDK (v1.x). The snippet below was committed by a junior engineer who used Copilot to write it. Your job is to identify which method calls are hallucinated and explain your reasoning to the tutor.
Bar Lanyado, a researcher at Lasso Security, published a study in which he asked multiple AI assistants — including ChatGPT, Bard, and Claude — to recommend Python packages for various tasks. He found that models would with some regularity suggest package names that did not exist on PyPI. When he registered several of those invented names as test packages, he was able to confirm that at least some installs from developer environments followed within days — meaning developers had trusted AI-recommended package names without checking their existence on the registry.
AI-generated dependency hallucinations surface in several distinct places, each with a different risk profile and detection approach.
| File | Language / Runtime | Hallucination Risk | Detection Point |
|---|---|---|---|
| requirements.txt | Python / pip | High — free-form, no lock file required by default | pip install failure or PyPI lookup |
| package.json | Node.js / npm | High — npm install fails silently on 404 in some configs | npm install error or npmjs.com lookup |
| go.mod | Go / modules | Medium — go get verifies against VCS, fails loudly | go mod tidy failure |
| Cargo.toml | Rust / cargo | Medium — crates.io lookup at build time | cargo build failure |
| build.gradle | Java / Gradle | Medium — Maven Central query at sync time | Gradle sync failure |
A package name that does not exist today can be registered tomorrow. This creates a unique attack window specific to AI-generated dependency lists: the hallucination is observable before the attack exists, giving threat actors advance notice of which names developers will attempt to install.
The Python Package Index does not restrict registration of most package names, and name similarity to existing packages is not automatically blocked (PyPI introduced some typosquatting protections after high-profile incidents, but coverage is incomplete). An attacker who monitors AI assistant outputs for novel package suggestions has a list of high-probability target names to register.
The legitimate Python HTTP library is named requests. AI assistants have been observed suggesting requestslib, python-requests-extended, and requests-plus — all of which have at various points been registered by parties other than Kenneth Reitz's project. Some registered names contained code that exfiltrated environment variables on import. The requests library has over 30 billion all-time downloads; the attack surface from confusion with lookalike names is enormous.
A robust review protocol for any AI-assisted dependency addition involves three sequential checks, none of which can be skipped when the dependency originates from an AI suggestion.
Tools like pip-audit, socket.dev, and Snyk Open Source automate parts of this verification — but none are a substitute for the publisher identity check on AI-suggested packages. These tools primarily query known-vulnerability databases; a newly registered malicious package will not yet appear in those databases. Registry existence and publisher verification must remain manual steps for any dependency that did not appear in the codebase before AI involvement.
In a pull request review, the dependency manifest diff is often the highest-value target. Any new package added in a PR where the branch author used AI assistance should be individually verified. The review comment should not simply approve the addition — it should record the verification: "Confirmed httpx 0.27.0 is the encode/httpx project, publisher @tomchristie, 1.2k commits, existing dependency in our monorepo."
For packages that cannot be verified — the registry page 404s, the publisher account is new, the repository has no history — the correct action is to reject the PR with a clear comment and request the engineer use an already-approved alternative or go through the dependency approval process.
Lesson 3 moves from dependency manifests to runtime behavior: how hallucinated method signatures cause subtle bugs even when the underlying package is real and correctly installed.
requests-async-extended==0.4.2 added to requirements.txt. The PyPI page exists but the publisher account was created 3 days ago and has no other packages. What is the correct action?You are reviewing a requirements.txt diff in a pull request. The branch author used GitHub Copilot to scaffold a new microservice. Below is the additions section of the diff. Discuss with the tutor which entries need the deepest scrutiny and how you would conduct the verification.
Multiple independent evaluations of GitHub Copilot's code suggestions — including studies by researchers at NYU and published analyses by GitClear — documented that Copilot would correctly identify which SDK method to call for a given task but would frequently hallucinate the parameter names, default values, and call signatures. The underlying package would install correctly. The import would succeed. The method would exist. Only the specific parameters passed would be wrong — sometimes silently wrong, sometimes raising cryptic errors that traced back several layers.
When an AI assistant fabricates a method parameter, the outcome at runtime depends on how the receiving function handles unexpected input. Three distinct failure modes result, each with different detectability.
AWS boto3 is one of the most frequently used Python SDKs and one of the most frequently hallucinated. AI assistants commonly fabricate parameters on S3, Lambda, and IAM client methods. A documented pattern: Copilot suggesting s3_client.put_object(..., acl='private', enforce_encryption=True) — where enforce_encryption is not a real parameter. The call succeeds (boto3 uses **kwargs internally in some paths). The object uploads without enforced encryption. Security reviewers looking only at the code see a parameter that implies encryption is required and may not check the actual S3 bucket policy.
For any method call in AI-generated code that accepts keyword arguments, the review workflow requires checking each parameter name against the actual function signature, not the surrounding documentation prose.
Mode C failures — parameter aliasing — are the most consequential for security-sensitive code. An AI assistant generating IAM policy code, encryption configuration, or authentication middleware is particularly likely to invent plausible-sounding security parameters.
The psychological dynamic is dangerous: a developer who sees require_mfa=True in their PR tends not to verify whether that parameter is real, because its presence signals that the security consideration was addressed. The review stops at the parameter name rather than verifying that the parameter actually configures the intended behavior in the library.
Any parameter in AI-generated code whose name implies a security constraint — require_, enforce_, verify_, force_, restrict_ — must be independently verified against the real function signature before the PR is approved. The presence of a security-sounding parameter that does nothing is more dangerous than its absence, because it may suppress further security review.
Lesson 4 brings together the detection patterns from Lessons 1–3 and presents a practical code review checklist for AI-generated code touching third-party integrations.
enforce_encryption=True. boto3 accepts **kwargs internally and the call succeeds. What failure mode is this, and what is the security risk?inspect.signature() returns the real parameter list with defaults and annotations, exactly matching what the function will actually accept. It is the fastest authoritative check available without leaving your shell.inspect.signature() from Python's standard library. dir() lists attributes, not parameter signatures. __doc__ shows docstrings which may themselves be incorrect. co_varnames includes all local variables, not just parameters.require_mfa=True or force_https=True tends to consider the security concern addressed. The hallucinated parameter functions as a false reassurance that may prevent the reviewer from checking the actual security configuration.The following AI-generated code uses the real boto3 library to configure an S3 upload with server-side encryption. Your job is to identify which parameters are real, which are hallucinated, and which failure mode each hallucination represents. Discuss your analysis with the tutor.
retries={'mode': 'adaptive', 'max_attempts': 5} not separate keys. And EncryptionContext is a KMS parameter, not AES256.As GitHub Copilot Enterprise and similar tools became standard at large engineering organizations, security teams at companies including Shopify, Microsoft, and Atlassian published internal guidance — portions of which became public through engineering blog posts and conference talks — on how to extend existing code review processes to account for AI-generated content. The consistent theme: existing review workflows caught most hallucinations in obvious cases, but subtle signature and configuration hallucinations required explicit checklist items to surface reliably.
The following protocol synthesizes patterns from published enterprise guidance, security research, and the failure modes documented in Lessons 1–3. It applies to any PR where AI tooling was used to generate code that calls external packages, APIs, or SDKs.
Code review comments on AI-generated integration code should be specific enough to be auditable. A comment that says "Looks good" on a section of boto3 configuration provides no evidence that verification occurred. A comment that says "Verified ServerSideEncryption='AES256' against boto3 1.34.0 put_object signature — real parameter, correct value for bucket without KMS. Confirmed RequireEncryption is NOT a real parameter and has been removed in this review" provides a record of the verification that protects both the reviewer and the organization.
This level of specificity is not required for every line — it is required for lines where hallucination risk is elevated: any third-party call site generated by an AI tool, especially in security-sensitive contexts.
Static analysis tools like Pyright and mypy with strict stub packages will catch some hallucinated method names when type stubs are available — the IDE underlines the fabricated method. However: (1) many libraries have incomplete stubs, (2) **kwargs absorption prevents type checkers from flagging hallucinated parameters, and (3) type stubs may themselves be generated or outdated. Type checker silence is not confirmation of correctness. It is one signal among several.
Organizations that adopted AI coding tools in 2022–2023 before formal review protocols existed may have significant amounts of AI-generated integration code that was never subjected to the verification steps above. The recommended approach is a targeted audit rather than a full codebase scan:
1. Query your version control history for commits from periods of active Copilot or ChatGPT adoption. 2. Filter for commits that modified files touching external API integrations (files that import boto3, openai, stripe, twilio, etc.). 3. Apply the five-point protocol to the integration surface in those files. 4. Prioritize files in authentication, payment, and data storage paths — the highest-consequence locations for silent parameter hallucinations.
An AI assistant's code is a first draft produced by a system that has never run the code it writes. Every external interface in that draft — package names, method names, parameter names, configuration values — must be verified against authoritative sources by a human who understands the stakes. The AI is a productive author. You are the editor responsible for what ships.
You have completed all four lessons. Proceed to Lab 4 to practice applying the full five-point protocol, then take the Module Test to assess your mastery of hallucinated APIs and phantom dependencies.
RequireEncryption=True on an S3 put_object call, which is a hallucinated parameter. What review failure does this represent?You are the senior reviewer on a PR that modifies a Stripe payment integration. The author used ChatGPT to generate the refund handling code. Apply the full five-point protocol and discuss your review decisions with the tutor. Focus on Points 3, 4, and 5.
stripe.Refund.create() — not create_partial(). Partial amounts are passed via the amount parameter to the standard create() call.inspect.signature() returns the authoritative parameter signature including defaults and type annotations.inspect.signature() from Python's inspect module — it returns exactly the real parameter list.require_mfa=True that does nothing functions as a false assurance — reviewers see it and stop looking. Absence would trigger a review question; presence suppresses it.