Module 3 · Lesson 1

The Anatomy of a Hallucinated API

What fabricated interfaces look like, why they feel real, and how they enter production codebases undetected.

How does a function that has never existed end up committed to a production repository?

When security researcher Joseph Thacker began systematically testing ChatGPT's ability to generate Python code that called third-party APIs, he documented a pattern that had already been silently entering codebases worldwide: the model would confidently produce syntactically correct, logically plausible function calls to endpoints that did not exist. The Stripe SDK would gain a charge.refund_partial() method it had never shipped. The Twilio library would acquire a send_whatsapp_bulk() convenience wrapper no engineer had ever written. Each fabrication was formatted exactly like the real documentation.

Why Hallucinated APIs Feel Authentic

Large language models are trained on enormous volumes of API documentation, Stack Overflow threads, GitHub repositories, and tutorial blog posts. This training creates a powerful internal model of how APIs should look — naming conventions, parameter patterns, return types, error classes. When asked to write code using a library, the model synthesizes a response that is statistically consistent with everything it has seen about that library's style.

The result is a hallucination that passes superficial inspection. The method name follows the library's naming convention. The parameters match the types the library typically uses. The docstring, if the model generates one, echoes real documentation prose. A developer scanning the output sees familiar patterns and proceeds.

This phenomenon is distinct from a simple syntax error. The code is syntactically valid. It may even pass static analysis. The failure surfaces only at runtime — or, if the hallucinated package is installed via a similarly-named malicious package, it may never surface at all.

Documented Case — 2023 PyPI Typosquatting Wave

Researchers at Vulcan Cyber published findings in May 2023 showing that AI-generated package names that did not exist in PyPI were being registered by threat actors after the hallucinations were observed in the wild. The attack model: wait for an AI tool to suggest a nonexistent package name, register that name with malicious code, then collect installs from developers who ran the AI-generated requirements file without verification.

The Structural Signature of a Fabricated Call

Hallucinated APIs tend to cluster around predictable patterns. Understanding these patterns is the first step toward systematic detection during code review.

Plausible Compound Names

Methods like client.batch_process_async() or session.get_with_retry() — combinations of real words that sound like they belong but do not appear in any changelog or official release.

Version-Locked Ghosts

Features attributed to specific version numbers that do not exist: "added in v2.4.1" when no such version shipped that method. The model confabulates a plausible version history.

Convenience Wrappers

High-level helper methods that wrap functionality available only through lower-level calls. The model infers these would be useful, then invents them as if they ship with the library.

Phantom Configuration Keys

Settings passed to constructors or config objects — retry_on_rate_limit=True, auto_paginate=True — that the library's actual constructor silently ignores because it never defined them.

An Illustrative Code Sample

Consider this plausible-looking Python snippet generated by an LLM tasked with "write code to send a templated SMS via Twilio":

from twilio.rest import Client

client = Client(account_sid, auth_token)

# Send a pre-approved template message

message = client.messages.create_from_template(

    to="+15551234567",

    from_="+15559876543",

    template_sid="HXabcdef1234567890",

    template_vars={"1": "John"},

    auto_approve=True

)

The Twilio Python SDK has no create_from_template() method on the messages resource. Content Templates are managed through a separate Content API with a completely different call pattern. auto_approve is not a recognized parameter in any version of the SDK. This code will raise an AttributeError at runtime — but only if it runs.

Review Discipline

Any method call on a third-party object that you did not personally verify in the library's official changelog or source code must be treated as potentially hallucinated. Familiarity with a library's naming style is not verification. The model is trained on that same style and uses it to fabricate convincingly.

Why Runtime Is Too Late

The naive assumption is that hallucinated APIs self-reveal quickly: the code runs, an exception fires, the developer investigates. In practice, three conditions routinely delay or suppress discovery.

Infrequent code paths. A method called only during error handling, monthly billing runs, or specific feature flag states may not execute in testing. The hallucination ships to production and waits.

Silent parameter absorption. Some libraries accept arbitrary keyword arguments via **kwargs and simply ignore unrecognized ones. A hallucinated configuration key that a constructor silently discards produces no error — the feature just does not work as intended, and the bug may be attributed to other causes.

Package substitution. As documented in the Vulcan Cyber research, a hallucinated package name that an attacker subsequently registers means the install succeeds, the import succeeds, and the code may even run — with attacker-controlled behavior replacing the expected logic.

Lesson 2 examines the dependency manifest specifically — how phantom packages enter requirements.txt, package.json, and go.mod files, and the verification workflow that catches them before install.

Lesson 1 Quiz

The Anatomy of a Hallucinated API — 3 questions

Which structural feature most reliably distinguishes a hallucinated API method from a genuine one during static code review?

Correct. Absence from official documentation and source is the definitive marker. Style consistency is exactly what makes hallucinations hard to detect — the model replicates the library's conventions precisely.

Not quite. Hallucinated APIs follow the target library's conventions faithfully — that is what makes them dangerous. The only reliable test is verifying existence against authoritative source: official docs, changelog, or library source code.

The 2023 Vulcan Cyber research documented a specific attack that exploited AI hallucinations. What was the attack mechanism?

Correct. The attack required no sophisticated technical access — just monitoring AI output for nonexistent package names, registering those names, and waiting for developers to run pip install on AI-generated dependency lists without independent verification.

That describes a different threat model. The Vulcan Cyber finding was specifically about registering hallucinated-but-nonexistent package names on public registries after observing them in AI-generated code.

A library constructor accepts **kwargs and silently ignores unrecognized keys. An AI-generated call passes auto_paginate=True which is not a real parameter. What is the likely outcome?

Correct. This is the silent failure mode — arguably more dangerous than a loud exception. The code runs, no error fires, but the feature does not work. The bug may be attributed to logic errors elsewhere, and the hallucinated parameter may remain in the codebase indefinitely.

With **kwargs absorption, Python will not raise on unrecognized keys — they are simply collected into the kwargs dict and never referenced. The failure is behavioral, not syntactic, making it particularly hard to detect.

Lab 1 — API Hallucination Identification

Interact with the AI tutor · minimum 3 exchanges to complete

Your Task

You are reviewing a pull request that includes AI-generated code using the OpenAI Python SDK (v1.x). The snippet below was committed by a junior engineer who used Copilot to write it. Your job is to identify which method calls are hallucinated and explain your reasoning to the tutor.

from openai import OpenAI

client = OpenAI(api_key=api_key, auto_retry=True, rate_limit_buffer=0.8)

# Summarize a document with automatic chunking

response = client.chat.completions.create_chunked(

    model="gpt-4o",

    messages=[{"role": "user", "content": document_text}],

    chunk_strategy="semantic",

    max_tokens=1024

)

# Stream with built-in progress callback

for chunk in client.chat.completions.stream_with_progress(

    model="gpt-4o",

    messages=messages,

    on_progress=progress_callback

):

    print(chunk)

Start by telling the tutor which elements look suspicious to you and why. Then ask about the real API patterns you should use instead.

AI Tutor

API Hallucination Lab

Welcome to Lab 1. You have a code snippet in front of you that was generated by an AI coding assistant. I can see it contains several hallucinated elements in the OpenAI Python SDK v1.x. Walk me through what looks suspicious to you — pick any element and tell me why you think it might not be real.

Module 3 · Lesson 2

Phantom Dependencies in Manifest Files

How hallucinated packages enter requirements files, and the verification workflow that stops them before install.

What makes a nonexistent package name dangerous before a single line of it runs?

Bar Lanyado, a researcher at Lasso Security, published a study in which he asked multiple AI assistants — including ChatGPT, Bard, and Claude — to recommend Python packages for various tasks. He found that models would with some regularity suggest package names that did not exist on PyPI. When he registered several of those invented names as test packages, he was able to confirm that at least some installs from developer environments followed within days — meaning developers had trusted AI-recommended package names without checking their existence on the registry.

Where Phantom Dependencies Appear

AI-generated dependency hallucinations surface in several distinct places, each with a different risk profile and detection approach.

File	Language / Runtime	Hallucination Risk	Detection Point
requirements.txt	Python / pip	High — free-form, no lock file required by default	pip install failure or PyPI lookup
package.json	Node.js / npm	High — npm install fails silently on 404 in some configs	npm install error or npmjs.com lookup
go.mod	Go / modules	Medium — go get verifies against VCS, fails loudly	go mod tidy failure
Cargo.toml	Rust / cargo	Medium — crates.io lookup at build time	cargo build failure
build.gradle	Java / Gradle	Medium — Maven Central query at sync time	Gradle sync failure

The Typosquatting Attack Surface

A package name that does not exist today can be registered tomorrow. This creates a unique attack window specific to AI-generated dependency lists: the hallucination is observable before the attack exists, giving threat actors advance notice of which names developers will attempt to install.

The Python Package Index does not restrict registration of most package names, and name similarity to existing packages is not automatically blocked (PyPI introduced some typosquatting protections after high-profile incidents, but coverage is incomplete). An attacker who monitors AI assistant outputs for novel package suggestions has a list of high-probability target names to register.

Real Case — requestslib vs requests

The legitimate Python HTTP library is named requests. AI assistants have been observed suggesting requestslib, python-requests-extended, and requests-plus — all of which have at various points been registered by parties other than Kenneth Reitz's project. Some registered names contained code that exfiltrated environment variables on import. The requests library has over 30 billion all-time downloads; the attack surface from confusion with lookalike names is enormous.

Verification Workflow for Dependency Review

A robust review protocol for any AI-assisted dependency addition involves three sequential checks, none of which can be skipped when the dependency originates from an AI suggestion.

Step 1 — Registry Existence

Manually navigate to the registry URL for the package name: pypi.org/project/<name>, npmjs.com/package/<name>. Confirm the package exists and the page is not a redirect. Do not rely on pip/npm install output alone — some failure modes are non-obvious.

Step 2 — Publisher Identity

Verify the publisher matches the expected maintainer. For major libraries, the publisher will be a well-known individual or organization account. A package published a day ago by an unknown account that shares a name with a popular library is a strong signal of typosquatting.

Step 3 — Source Linkage

Follow the source repository link from the registry page. Confirm the repository exists, has a reasonable commit history, and is the canonical upstream for the library you intend to use. AI assistants sometimes suggest packages whose names have been reassigned or whose PyPI entries now point to entirely different projects.

Automation Note

Tools like pip-audit, socket.dev, and Snyk Open Source automate parts of this verification — but none are a substitute for the publisher identity check on AI-suggested packages. These tools primarily query known-vulnerability databases; a newly registered malicious package will not yet appear in those databases. Registry existence and publisher verification must remain manual steps for any dependency that did not appear in the codebase before AI involvement.

Reviewing a requirements.txt Diff

In a pull request review, the dependency manifest diff is often the highest-value target. Any new package added in a PR where the branch author used AI assistance should be individually verified. The review comment should not simply approve the addition — it should record the verification: "Confirmed httpx 0.27.0 is the encode/httpx project, publisher @tomchristie, 1.2k commits, existing dependency in our monorepo."

For packages that cannot be verified — the registry page 404s, the publisher account is new, the repository has no history — the correct action is to reject the PR with a clear comment and request the engineer use an already-approved alternative or go through the dependency approval process.

Lesson 3 moves from dependency manifests to runtime behavior: how hallucinated method signatures cause subtle bugs even when the underlying package is real and correctly installed.

Lesson 2 Quiz

Phantom Dependencies in Manifest Files — 3 questions

Bar Lanyado's 2023 research demonstrated a specific risk from AI package hallucinations. Which of the following best summarizes his finding?

Correct. The core finding was that hallucinated-but-nonexistent package names create a predictable registration target for attackers. Lanyado confirmed this by registering several himself and observing install attempts within days.

Lanyado's specific finding was about packages that did not exist at all — not outdated, hidden, or cross-language confusion. The hallucinated names created an open registration opportunity that attackers could exploit.

During a PR review, you see requests-async-extended==0.4.2 added to requirements.txt. The PyPI page exists but the publisher account was created 3 days ago and has no other packages. What is the correct action?

Correct. Registry existence is a necessary but not sufficient condition. A brand-new publisher account for a package with a name similar to a popular library is precisely the typosquatting pattern documented in multiple real incidents. The PR should be blocked until the engineer can establish a safe alternative.

A package can exist on PyPI and still be malicious. The three-day-old publisher account with no other packages is a critical red flag that must result in rejection, not conditional approval. Installing in a sandbox may not surface malicious behavior that activates only in specific conditions.

Which dependency manifest type fails loudest and earliest when a hallucinated package name is added, providing the safest default behavior?

Correct. Go modules perform VCS verification at resolution time and fail loudly if the module path does not resolve to a real repository. This provides earlier, more reliable detection than pip or npm, which can produce ambiguous or silent failures in some configurations.

Go modules stand out here — they resolve module paths against actual VCS repositories at build time, making hallucinated module paths immediately and unambiguously detectable. pip and npm have more varied failure modes and can in some configurations produce less obvious errors.

Lab 2 — Manifest Verification Practice

Interact with the AI tutor · minimum 3 exchanges to complete

Your Task

You are reviewing a requirements.txt diff in a pull request. The branch author used GitHub Copilot to scaffold a new microservice. Below is the additions section of the diff. Discuss with the tutor which entries need the deepest scrutiny and how you would conduct the verification.

+ boto3==1.34.0

+ fastapi==0.110.0

+ pydantic-validators-extended==0.2.1

+ httpx-retry-middleware==1.0.0

+ uvicorn[standard]==0.27.1

+ aws-lambda-powertools-async==0.9.3

+ python-dotenv==1.0.1

Gold lines are well-known packages. Red lines are the ones worth interrogating. Tell the tutor which ones concern you most and walk through your verification steps.

AI Tutor

Manifest Verification Lab

Good morning. You have a requirements.txt diff in front of you from an AI-assisted PR. Some of these additions look familiar; others should raise your antenna. Which entries would you prioritize for the deepest verification, and why?

Module 3 · Lesson 3

Hallucinated Method Signatures and Silent Failures

When the package is real but the method signature is invented — and why the resulting bugs are among the hardest to attribute.

What breaks when an AI correctly identifies a real package but invents the parameters it accepts?

Multiple independent evaluations of GitHub Copilot's code suggestions — including studies by researchers at NYU and published analyses by GitClear — documented that Copilot would correctly identify which SDK method to call for a given task but would frequently hallucinate the parameter names, default values, and call signatures. The underlying package would install correctly. The import would succeed. The method would exist. Only the specific parameters passed would be wrong — sometimes silently wrong, sometimes raising cryptic errors that traced back several layers.

The Three Failure Modes of Signature Hallucination

When an AI assistant fabricates a method parameter, the outcome at runtime depends on how the receiving function handles unexpected input. Three distinct failure modes result, each with different detectability.

Mode A — TypeError at Call Site

The function explicitly rejects unrecognized keyword arguments. Python raises TypeError: func() got an unexpected keyword argument 'param_name'. This is the best failure mode — it surfaces immediately, the traceback points directly at the hallucinated parameter, and the fix is unambiguous.

Mode B — Silent Absorption

The function accepts **kwargs and ignores unknown keys. The call succeeds; the hallucinated parameter has no effect. The feature the parameter was supposed to configure does not work, but no exception fires. The bug manifests as a behavioral gap, not an error.

Mode C — Parameter Aliasing

The hallucinated parameter name resembles but differs from a real parameter. The real parameter uses its default value; the developer's intended configuration is silently ignored. Particularly dangerous when the real default is permissive and the intended configuration was a security constraint.

Real-World Case — AWS Boto3 Signature Hallucination

AWS boto3 is one of the most frequently used Python SDKs and one of the most frequently hallucinated. AI assistants commonly fabricate parameters on S3, Lambda, and IAM client methods. A documented pattern: Copilot suggesting s3_client.put_object(..., acl='private', enforce_encryption=True) — where enforce_encryption is not a real parameter. The call succeeds (boto3 uses **kwargs internally in some paths). The object uploads without enforced encryption. Security reviewers looking only at the code see a parameter that implies encryption is required and may not check the actual S3 bucket policy.

Signature Verification Workflow

For any method call in AI-generated code that accepts keyword arguments, the review workflow requires checking each parameter name against the actual function signature, not the surrounding documentation prose.

Source Inspection

Navigate to the installed package source or the library's GitHub repository. Find the actual function definition. Read the parameter list. Compare each keyword argument in the AI-generated call against the real signature. This takes 90 seconds and catches 100% of Mode A and Mode B hallucinations.

Interactive Shell Verification

In a Python shell with the library installed: import inspect; print(inspect.signature(library.module.function)). This outputs the real parameter list including defaults. Cross-reference every kwarg in the AI-generated call against this output.

Unit Test Coverage

A unit test that calls the function with the exact parameters in the AI-generated code will catch Mode A failures immediately. Mode B failures require tests that assert the outcome of the configuration, not merely that the call succeeded. For security parameters, always test the behavioral outcome.

The Security Implication of Mode C

Mode C failures — parameter aliasing — are the most consequential for security-sensitive code. An AI assistant generating IAM policy code, encryption configuration, or authentication middleware is particularly likely to invent plausible-sounding security parameters.

The psychological dynamic is dangerous: a developer who sees require_mfa=True in their PR tends not to verify whether that parameter is real, because its presence signals that the security consideration was addressed. The review stops at the parameter name rather than verifying that the parameter actually configures the intended behavior in the library.

# AI-generated — appears to enforce HTTPS

client = boto3.client(

    's3',

    region_name='us-east-1',

    force_https=True,      # not a real parameter

    verify_ssl=True,       # not a real parameter in this position

    config=Config(signature_version='s3v4')

)

# Actual HTTPS enforcement requires bucket policy — not client config

Review Rule for Security Parameters

Any parameter in AI-generated code whose name implies a security constraint — require_, enforce_, verify_, force_, restrict_ — must be independently verified against the real function signature before the PR is approved. The presence of a security-sounding parameter that does nothing is more dangerous than its absence, because it may suppress further security review.

Lesson 4 brings together the detection patterns from Lessons 1–3 and presents a practical code review checklist for AI-generated code touching third-party integrations.

Lesson 3 Quiz

Hallucinated Method Signatures and Silent Failures — 3 questions

A boto3 S3 call includes the parameter enforce_encryption=True. boto3 accepts **kwargs internally and the call succeeds. What failure mode is this, and what is the security risk?

Correct. This is Mode B — the call succeeds, the parameter is silently ignored, and the object may be stored without encryption. The additional danger is that a reviewer who sees the parameter may conclude the security requirement is addressed and not look further at the bucket policy or upload call structure.

Because boto3 uses **kwargs in some internal paths, this call will not raise a TypeError. The parameter is silently dropped — Mode B. The security consequence is that the developer and reviewer may both believe encryption is enforced when it is not.

Which Python one-liner allows you to inspect the real parameter signature of any callable during review verification?

Correct. inspect.signature() returns the real parameter list with defaults and annotations, exactly matching what the function will actually accept. It is the fastest authoritative check available without leaving your shell.

The correct tool is inspect.signature() from Python's standard library. dir() lists attributes, not parameter signatures. __doc__ shows docstrings which may themselves be incorrect. co_varnames includes all local variables, not just parameters.

Why are Mode C failures (parameter aliasing) considered the most dangerous of the three signature hallucination failure modes?

Correct. The psychological danger is the key issue: a reviewer who sees require_mfa=True or force_https=True tends to consider the security concern addressed. The hallucinated parameter functions as a false reassurance that may prevent the reviewer from checking the actual security configuration.

Mode C is dangerous specifically because it is quiet and reassuring, not loud. A plausible security parameter name provides false confidence to reviewers, who may stop investigating once they see it. No error fires; no warning appears. The security gap exists invisibly behind a comforting-looking parameter name.

Lab 3 — Signature Hallucination Detection

Interact with the AI tutor · minimum 3 exchanges to complete

Your Task

The following AI-generated code uses the real boto3 library to configure an S3 upload with server-side encryption. Your job is to identify which parameters are real, which are hallucinated, and which failure mode each hallucination represents. Discuss your analysis with the tutor.

import boto3

from botocore.config import Config

s3 = boto3.client(

    's3',

    region_name='us-east-1',

    enforce_tls=True,

    config=Config(

        signature_version='s3v4',

        retry_mode='adaptive',

        max_retry_attempts=5

    )

)

s3.put_object(

    Bucket='my-secure-bucket',

    Key='data/file.json',

    Body=json_data,

    ServerSideEncryption='AES256',

    EncryptionContext={'purpose': 'backup'},

    RequireEncryption=True

)

Note: The real boto3 Config object uses retries={'mode': 'adaptive', 'max_attempts': 5} not separate keys. And EncryptionContext is a KMS parameter, not AES256.

AI Tutor

Signature Hallucination Lab

This snippet uses the real boto3 library, so the import and the client will work. But look carefully at the parameters — several are hallucinated, and one of them carries a security implication. Which parameters would you flag first, and what failure mode do you expect from each?

Module 3 · Lesson 4

The Code Review Checklist for AI-Generated Integrations

A systematic, step-by-step review protocol for any pull request where AI tooling touched third-party API or library code.

What does a complete, defensible review of AI-generated third-party integration code look like?

As GitHub Copilot Enterprise and similar tools became standard at large engineering organizations, security teams at companies including Shopify, Microsoft, and Atlassian published internal guidance — portions of which became public through engineering blog posts and conference talks — on how to extend existing code review processes to account for AI-generated content. The consistent theme: existing review workflows caught most hallucinations in obvious cases, but subtle signature and configuration hallucinations required explicit checklist items to surface reliably.

The Five-Point Integration Review Protocol

The following protocol synthesizes patterns from published enterprise guidance, security research, and the failure modes documented in Lessons 1–3. It applies to any PR where AI tooling was used to generate code that calls external packages, APIs, or SDKs.

Point 1 — Declare AI Involvement

PR descriptions should indicate which sections were AI-generated or AI-assisted. Many teams now enforce this with PR templates. Reviewers who know AI was involved can apply this protocol; reviewers who don't know may apply standard review depth and miss hallucinations that look plausible.

Point 2 — Isolate the Dependency Diff

Before reading any code, examine the dependency manifest diff in isolation. For each new entry: verify registry existence, publisher identity, and source repository linkage as described in Lesson 2. Do not approve the PR if any entry fails publisher identity verification.

Point 3 — Enumerate All Third-Party Call Sites

Search the diff for every method call on a third-party object. Build a list. Review each item independently against official documentation or source. Do not rely on the surrounding code context to confirm a method exists — the AI generated that context too.

Point 4 — Verify Security-Implying Parameters

Flag every parameter whose name implies a security property: anything beginning with require_, enforce_, verify_, restrict_, force_, disable_, allow_. For each flagged parameter, use inspect.signature() or library source to confirm it is real and confirm its actual behavior matches the intent.

Point 5 — Validate Behavioral Outcomes, Not Parameters

For security-relevant configurations (encryption, authentication, authorization, rate limiting), do not accept the presence of a parameter as proof the feature is active. Require tests or manual verification that the actual runtime behavior matches the intended configuration. A hallucinated parameter that silently does nothing must be caught at this step.

Review Comment Standards

Code review comments on AI-generated integration code should be specific enough to be auditable. A comment that says "Looks good" on a section of boto3 configuration provides no evidence that verification occurred. A comment that says "Verified ServerSideEncryption='AES256' against boto3 1.34.0 put_object signature — real parameter, correct value for bucket without KMS. Confirmed RequireEncryption is NOT a real parameter and has been removed in this review" provides a record of the verification that protects both the reviewer and the organization.

This level of specificity is not required for every line — it is required for lines where hallucination risk is elevated: any third-party call site generated by an AI tool, especially in security-sensitive contexts.

Tooling That Helps (and Its Limits)

Static analysis tools like Pyright and mypy with strict stub packages will catch some hallucinated method names when type stubs are available — the IDE underlines the fabricated method. However: (1) many libraries have incomplete stubs, (2) **kwargs absorption prevents type checkers from flagging hallucinated parameters, and (3) type stubs may themselves be generated or outdated. Type checker silence is not confirmation of correctness. It is one signal among several.

Handling Legacy AI-Generated Code

Organizations that adopted AI coding tools in 2022–2023 before formal review protocols existed may have significant amounts of AI-generated integration code that was never subjected to the verification steps above. The recommended approach is a targeted audit rather than a full codebase scan:

1. Query your version control history for commits from periods of active Copilot or ChatGPT adoption. 2. Filter for commits that modified files touching external API integrations (files that import boto3, openai, stripe, twilio, etc.). 3. Apply the five-point protocol to the integration surface in those files. 4. Prioritize files in authentication, payment, and data storage paths — the highest-consequence locations for silent parameter hallucinations.

The Core Principle

An AI assistant's code is a first draft produced by a system that has never run the code it writes. Every external interface in that draft — package names, method names, parameter names, configuration values — must be verified against authoritative sources by a human who understands the stakes. The AI is a productive author. You are the editor responsible for what ships.

You have completed all four lessons. Proceed to Lab 4 to practice applying the full five-point protocol, then take the Module Test to assess your mastery of hallucinated APIs and phantom dependencies.

Lesson 4 Quiz

The Code Review Checklist — 3 questions

According to the five-point review protocol, what should a reviewer do BEFORE reading any AI-generated code in a PR?

Correct. Point 2 of the protocol — isolating and verifying the dependency diff — must happen before code review begins. A malicious or nonexistent package in the manifest means the code review is irrelevant; the package itself is the threat vector.

The manifest check comes first — before code, before runtime testing. A hallucinated package name that has been registered maliciously makes all subsequent code analysis moot. Dependency verification is the gate that must be cleared first.

A reviewer approves a PR with the comment "Looks good — encryption configured correctly." The PR includes RequireEncryption=True on an S3 put_object call, which is a hallucinated parameter. What review failure does this represent?

Correct. Point 4 requires explicit verification of any parameter whose name implies a security property. "RequireEncryption" is exactly the kind of name that demands this check. The comment "encryption configured correctly" reflects assumption, not verification — a reviewable failure on Point 4.

The primary failure here is Point 4 — security-implying parameter names must be individually checked against the real function signature, not assumed to be real because they look plausible. The non-specific approval comment compounds the problem by creating a false audit trail.

When auditing legacy code from a period before AI review protocols were in place, which files should be prioritized for the five-point protocol?

Correct. The highest-consequence locations for silent parameter hallucinations are authentication, payment processing, and data storage — where a hallucinated security parameter that silently does nothing can result in auth bypass, data exposure, or unencrypted storage. SAST tools will not reliably catch Mode B or Mode C hallucinations.

The audit should be risk-stratified by consequence, not by recency or staffing changes. Files integrating with external SDKs in security-sensitive paths — auth, payment, storage — are where hallucinated parameters cause the most damage and where the audit effort yields the highest return.

Lab 4 — Full Protocol Review Simulation

Interact with the AI tutor · minimum 3 exchanges to complete

Your Task

You are the senior reviewer on a PR that modifies a Stripe payment integration. The author used ChatGPT to generate the refund handling code. Apply the full five-point protocol and discuss your review decisions with the tutor. Focus on Points 3, 4, and 5.

# requirements.txt additions

+ stripe==8.5.0

+ stripe-webhooks-extended==0.1.0

# refund.py

import stripe

stripe.api_key = settings.STRIPE_SECRET_KEY

def process_refund(charge_id, amount_cents):

    refund = stripe.Refund.create_partial(

        charge=charge_id,

        amount=amount_cents,

        idempotency_auto=True,

        notify_customer=True

    )

    return refund

Walk through each of the five review points with the tutor. The real Stripe SDK uses stripe.Refund.create() — not create_partial(). Partial amounts are passed via the amount parameter to the standard create() call.

AI Tutor

Full Protocol Review Lab

This is a payment integration PR — high stakes. You have the manifest diff and the code in front of you. Let's work through this systematically. Start with Point 2: the dependency manifest. What do you see there before you even look at the code?

Module 3 Test

Hallucinated APIs and Phantom Dependencies — 15 questions · 80% to pass

1. What is the primary reason hallucinated API method names pass superficial code review?

Correct.

The answer is that hallucinated names follow library naming conventions — the model learned these conventions from training data and applies them convincingly.

2. In the 2023 Vulcan Cyber research, what was the temporal relationship between AI hallucinations and attacker package registrations?

Correct.

Attackers observed hallucinated names first, then registered them — creating a predictable attack pipeline from AI output to malicious package.

3. Which of the following is an example of a "phantom configuration key" as defined in this module?

Correct.

A phantom configuration key is a fabricated kwarg that looks like it configures something but is silently ignored because the library never defined it.

4. Why are infrequent code paths particularly dangerous locations for hallucinated API calls?

Correct.

The danger is that untested paths can carry hallucinations all the way to production, where they wait until the path is triggered — potentially months later in an error handler or monthly billing run.

5. Bar Lanyado's 2023 experiment confirmed which specific behavior about developer practices?

Correct.

Lanyado confirmed installs followed to his test packages within days, demonstrating that developers do not uniformly verify AI-suggested packages before installation.

6. Which runtime is considered safest against hallucinated dependencies because it verifies module paths against real VCS repositories?

Correct. Go modules resolve against actual VCS paths at build time, providing the loudest and earliest failure for hallucinated module names.

Go modules stand out for their VCS-backed verification at resolution time, giving loud immediate failures for nonexistent module paths.

7. The three-step dependency verification workflow requires checking registry existence, publisher identity, and source repository linkage. In what order should these checks be performed?

Correct. You must first confirm the package exists before meaningful publisher or source checks are possible. If registry existence fails, the other two steps are unnecessary.

The correct order is registry existence first — if the package does not exist, the other checks cannot proceed. Then publisher identity, then source linkage.

8. A Python function uses **kwargs and silently ignores unrecognized keys. An AI-generated call passes a hallucinated security parameter. This represents which failure mode?

Correct. **kwargs absorption with no error is the definition of Mode B — silent failure where the call succeeds but the intended behavior never activates.

**kwargs absorption with no error is Mode B — silent. Mode A would be a TypeError. Mode C would be matching a real parameter with a different name.

9. Which Python standard library call produces the real parameter signature of any callable, including defaults and annotations?

Correct. inspect.signature() returns the authoritative parameter signature including defaults and type annotations.

The correct call is inspect.signature() from Python's inspect module — it returns exactly the real parameter list.

10. Why does a hallucinated security parameter potentially create MORE risk than the complete absence of that parameter?

Correct. A parameter like require_mfa=True that does nothing functions as a false assurance — reviewers see it and stop looking. Absence would trigger a review question; presence suppresses it.

The psychological danger is that a plausible-looking security parameter satisfies reviewers and stops investigation. Absence would raise a question; a convincing-looking hallucinated presence closes it.

11. According to the five-point review protocol, what should happen BEFORE reading any AI-generated code in a PR?

Correct. The manifest check (Point 2) must precede code review — a compromised dependency makes all code review moot.

Manifest verification precedes everything. A malicious package in the dependencies makes the code review irrelevant.

12. Pyright and mypy can sometimes catch hallucinated method names. What is the critical limitation of relying on them as the primary detection mechanism?

Correct. Incomplete stubs and **kwargs mean type checkers miss a significant fraction of hallucinations. They are one useful signal, not a sufficient verification mechanism.

The key limitations are incomplete stub coverage and **kwargs absorption — both mean type checker silence cannot be trusted as confirmation that a call is real.

13. Point 5 of the review protocol requires validating behavioral outcomes, not just parameter presence. For an encryption configuration, what does this mean in practice?

Correct. Point 5 exists precisely because Mode B failures mean a call can succeed with a parameter silently doing nothing. Behavioral verification confirms the outcome, not the syntax.

Point 5 requires testing the actual outcome — is the data encrypted — not that the parameter exists or is spelled correctly. Mode B failures make parameter presence an unreliable proxy for feature activation.

14. When auditing legacy AI-generated code, which combination of factors should place a file at the highest priority in the audit queue?

Correct. The intersection of AI-era commits, external SDK usage, and security-sensitive code paths represents the highest-consequence audit target — where hallucinated silent parameters do the most damage.

The priority is consequence × hallucination likelihood. Files that are AI-era, touch external SDKs, and sit in auth/payment/storage paths combine all three risk factors.

15. A review comment on an AI-generated Stripe integration says: "Verified stripe.Refund.create() against stripe-python 8.5.0 source — create_partial() does not exist. Replaced with standard create() + amount parameter. Confirmed idempotency_auto and notify_customer are not real parameters and have been removed." What does this comment exemplify?

Correct. Specific, verifiable review comments create an audit trail confirming that the five-point protocol was applied. They protect the reviewer, create institutional knowledge, and distinguish genuine verification from assumption.

Specific, documented verification is exactly what the protocol calls for on AI-generated integration code. It is not over-documentation — it is evidence that review occurred, which protects the organization and creates institutional knowledge.