On September 23, 1999, after a 286-day journey, the Mars Climate Orbiter was lost. The spacecraft entered the Martian atmosphere at the wrong angle and disintegrated. The root cause was a single unit mismatch: Lockheed Martin's navigation software output force data in pound-force seconds; NASA's receiving system expected newton-seconds. Every component of the system compiled and ran without error. Telemetry was transmitted and received. Only the planet's gravity revealed the failure.
Total mission cost: $327.6 million. The bug was a logic error — a wrong assumption baked silently into a computation.
Software errors are conventionally divided into three categories, and understanding where logic errors sit within that taxonomy is the foundation of this module.
Syntax errors are violations of a language's grammar rules. The compiler or interpreter rejects them before execution begins. They are the easiest class of bug to detect — and the class AI code generators almost never produce, because they have been trained on vast corpora of syntactically correct code.
Runtime errors occur during execution: a null pointer dereference, a division by zero, an out-of-bounds array access. They produce exceptions, crashes, or observable error messages. They are more dangerous than syntax errors but still tend to surface during testing.
Logic errors are the most dangerous class. The program runs to completion and produces output — but the output is wrong, or wrong only in certain conditions, or subtly wrong in a way that accumulates over time. No exception is raised. No linter fires. The code is semantically valid and structurally sound. It simply does the wrong thing.
A logic error is a defect in which a program executes without crashing but produces incorrect results because of a flaw in the algorithm, a wrong assumption about data, an incorrect operator, a boundary condition mistake, or a misunderstood specification.
Large language models generate code by predicting statistically plausible token sequences given a prompt. They have learned patterns — common function signatures, idiomatic loops, standard library calls. What they have not learned, in any reliable sense, is the specific semantics of your problem.
When you ask an AI to write a function that "calculates compound interest," it will produce code that looks like compound interest code. The variable names will be correct. The structure will be familiar. But if the compounding frequency is wrong, or if the formula uses addition where it should use multiplication, the code will silently produce wrong numbers for every input — and nothing in the execution environment will tell you so.
This is the fundamental asymmetry of logic errors in AI-generated code: the generator optimizes for plausibility, not correctness. The reviewer's job is to supply what the model cannot: domain knowledge, specification awareness, and adversarial testing.
This module organizes logic errors into four categories that appear with high frequency in AI-generated code:
> vs. >=, or && vs. ||.
Consider this AI-generated function meant to check whether a user's age makes them eligible for a service that requires users to be over 18 (not 18 or older):
Every linter will approve this code. Every type checker will pass it. An 18-year-old will be admitted when they should not be. The error is invisible to automated tooling and surfaces only when you compare the code against the specification. This is the discipline this module teaches.
For every AI-generated function, ask: "What is this code actually computing, and is that the same thing I asked for?" Run the logic in your head on at least three inputs: a typical case, a boundary case, and a case that should fail.
age >= 18 when the specification says "strictly older than 18." The minimum number of test inputs that would catch this is:Below is an AI-generated Python function that is meant to return the average of a list of numbers, but only for values strictly greater than zero. Read the code, identify the logic error(s), and discuss them with the AI assistant. Explain what the bug is, what input would reveal it, and how you would fix it.
positive_average function in front of me. Tell me what logic error(s) you spot — then we'll build a test case together and talk through the fix.Between 1985 and 1987, at least six patients received massive radiation overdoses from the Therac-25 medical linear accelerator, three of whom died. One contributing factor was a race condition in the control software, but a separate and independently investigated class of defect was boundary logic: the machine's beam-on confirmation code used incorrect range checks on operator-entered values, accepting parameters that should have been rejected. The code had been adapted from the Therac-20, where hardware interlocks had silently compensated for these off-by-one boundary conditions. When the hardware interlocks were removed in the Therac-25 to reduce cost, the latent boundary errors in the software became lethal.
An off-by-one error (OBOE) occurs when an index, a loop bound, or a comparison operator is displaced by one from the correct value. They cluster at three sites in code: loop initialization (i = 0 vs. i = 1), loop termination (i < n vs. i <= n), and conditional comparisons (> vs. >=).
AI generators are particularly prone to OBOEs because the training corpus contains both 0-indexed and 1-indexed conventions, both inclusive and exclusive upper bounds, and both strict and non-strict comparisons — often in syntactically identical contexts. The model cannot distinguish them by structure alone; only the specification determines which is correct.
Site 1: Loop bounds. The most frequent OBOE in generated iteration code. Consider a function asked to process "the first N items" of a list:
Site 2: Slice / substring extraction. Python slice semantics use exclusive upper bounds; many AI-generated slices are off by one when the prompt uses language like "up to and including position N."
Site 3: Conditional boundary comparisons. Seen in the Lesson 1 example with age. The operator family < / <= / > / >= is the most frequent source of one-position logical displacement.
When reviewing any AI-generated loop or conditional for OBOEs, apply this four-question checklist:
"Check this loop for off-by-one errors. For each loop bound and each comparison operator, tell me what value is included or excluded, and verify that matches the specification: [paste spec here]."
The classic formulation of OBOEs is the fence-post problem: to build a fence 10 meters long with posts every meter, how many posts do you need? The answer is 11, not 10 — there is always one more post than there are gaps. AI generators make this mistake frequently in pagination logic, page-count calculations, and any code that enumerates boundaries rather than intervals.
Always test the exact boundary value. If a function processes "items 1 through N," test it with N=1 and N=0. If it loops "while index < length," trace the last iteration manually. The error will be there or nowhere.
s[0:4] on the string "Hello" returns how many characters?s[0:4] returns indices 0, 1, 2, 3 — four characters: "Hell".s[0:4] covers indices 0 through 3 — four characters, not five.The following AI-generated function is meant to return every other element starting from the first, for a list of exactly N items — meaning it should return elements at indices 0, 2, 4, … up to and including the last valid even index. Identify the OBOE, state what boundary input reveals it, and confirm the fix.
every_other function. Walk me through the OBOE you see — where in the code does it occur, and what input would expose it?On August 1, 2012, Knight Capital Group deployed new trading software to production. A deployment error activated a dormant code path — a "Power Peg" algorithm that had been decommissioned — which began executing millions of unintended trades. The system had no circuit breaker. Error conditions were swallowed by exception handlers that logged messages but continued execution. Within 45 minutes, Knight had accumulated $7 billion in unwanted stock positions and lost $440 million. The company was effectively destroyed.
A post-mortem found that error handling code consistently returned silently rather than halting. The failure paths were present and executed — they simply never surfaced to the level where a human or automated monitor could act on them.
A silent failure occurs when a function or code block encounters an error condition — invalid input, a failed operation, an unexpected state — and responds by returning a default value, an empty result, or zero, instead of raising an exception or returning a typed error signal.
AI generators produce silent failures with high regularity because training data contains countless examples of defensive programming patterns where returning a safe default is idiomatic. The model has learned these patterns without learning the critical distinction: when a default return masks a real problem versus when it is a legitimate fallback.
Pattern 1: Return zero or empty on error.
Pattern 2: Bare except that logs and continues.
Pattern 3: None propagation. A function returns None on failure; callers assume a valid object and use it without checking, producing an AttributeError or NullPointerException deep in unrelated code, far from the original failure site.
An unhandled exception halts execution at the point of failure and produces a stack trace that identifies the problem precisely. A silent failure allows execution to continue with corrupted or default state. The corruption propagates through subsequent computation, and by the time a symptom becomes visible — a wrong report, an incorrect transaction, a crashed downstream system — the causal connection to the original failure is obscured or lost.
In financial, medical, or safety-critical systems, this propagation window is the difference between a recoverable incident and a catastrophic one. Knight Capital's $440 million loss was caused not by software that crashed, but by software that kept running when it should have stopped.
Immediately scrutinize any function that: (1) catches a broad Exception or bare except and returns a value; (2) returns 0, empty string, empty list, or None without a comment explaining that this is intentional and safe; (3) has a try/except that does not re-raise and does not set an error flag visible to the caller.
For every exception handler in AI-generated code, ask: "If this exception fires, what does the caller receive, and does the caller have enough information to know that an error occurred?" If the caller receives a value that looks like a valid result but is actually a default substituted for a failure, that is a silent failure path that must be corrected.
Prefer fail-fast over fail-silent. An exception that halts a transaction is recoverable. A transaction that completes on wrong data may not be. When reviewing AI code, flag every place where the code could lie to its caller.
get_price(item_id) catches all exceptions and returns 0. What is the primary danger of this pattern?except: catches everything — including SystemExit and KeyboardInterrupt — and printing a message then continuing means execution proceeds in an unknown state. The caller has no signal that anything failed.except: that swallows the error and allows execution to continue — the caller never knows the failure occurred.This AI-generated function fetches a configuration value from a remote source. It has two silent failure paths. Identify both, explain what downstream damage each could cause in a production system, and propose a fix that makes failures visible to the caller.
get_config function. There are two silent failure paths here. Walk me through them — and then let's trace what goes wrong in a real system when each one fires.In 1983, observers noticed that the Vancouver Stock Exchange index, launched in January 1982 at a value of 1000, had drifted to approximately 520 — despite rising stock prices. An investigation revealed that the index calculation software truncated rather than rounded the index value to three decimal places with each transaction. With approximately 3,000 transactions per day, the truncation bias accumulated to nearly 480 points over 22 months. The fix — switching from truncation to rounding — restored the index to 1098.892 within minutes of correction.
The software was syntactically correct. Each individual calculation was close to right. The error existed only in aggregate — invisible to any review of a single calculation, devastating in the total.
An accumulation error is a logic defect in which each individual operation introduces a small error, and those errors compound over repeated execution. The most common sources in AI-generated code are:
Wrong operators are distinct from OBOEs — they involve choosing a fundamentally incorrect operation rather than displacing a correct one by one. In AI-generated financial and scientific code, the most common wrong operator errors involve:
Addition vs. multiplication in compounding. Compound growth requires multiplying by (1 + rate) at each period; a naive AI implementation often adds the rate instead, producing linear rather than exponential growth.
The wrong version produces the right answer for year 1 (principal × rate is the same as principal × rate in both). It diverges from year 2 onward. A reviewer who tests only a single-period case will miss the error entirely.
The && / || (or and / or) confusion is particularly dangerous in validation and access-control code. Consider a function that should permit access only if the user has both a valid session and the required role:
A user with an expired session but the correct role will be admitted. A user with no role but an active session will also be admitted. The function is logically inverted, and in an access-control context it creates a security vulnerability, not merely a functional bug.
These errors require a different review approach from OBOEs and silent failures. Because they manifest at scale or over time, the review strategy must be mathematical rather than execution-based:
"For this accumulation loop, trace the value of the accumulator after iterations 1, 2, and 3, and confirm whether the growth pattern (linear/exponential/other) matches the specification. Flag any use of truncation where rounding might be required."
Logic errors that manifest at scale are the hardest to catch in code review and the most expensive in production. For every loop that aggregates or compounds a value, derive what the output should be algebraically for N iterations, then verify the code implements that formula exactly.
principal * rate each year instead of multiplying total *= (1 + rate). For a 10% rate on $1000 over 5 years, what is the error in the final output?return user.has_session or user.has_role. The specification requires BOTH. Which combination of user states would the wrong operator incorrectly admit?or, a user with a valid session but no required role passes the check. The correct and operator would return False here. This is a security vulnerability in access-control code.or operator returns True if either condition is true. A user with has_session=True and has_role=False would be incorrectly admitted — they have a session but lack the required role.This AI-generated function is meant to compute a running probability: given a list of independent probabilities, it should return the probability that all events occur (i.e., the joint probability — a product). It has two logic errors. Identify both and trace what output the wrong version produces for a concrete input.
joint_probability([0.5, 0.5, 0.5]) returns under the wrong code, and what it should return. Discuss why initializing to 0 vs. 1 matters for products, and why this error class is dangerous in risk-modeling systems.joint_probability function. Walk me through both logic errors — then let's trace the output for [0.5, 0.5, 0.5] under the wrong code versus the correct code.for i in range(1, n) when the specification requires processing all n items starting from the first. What is wrong?s[2:5] return from the string "abcdefg"?0 as a price. What category of error is this?except: block in Python is dangerous primarily because:total += principal * rate each year. For a $1000 principal at 10% for 3 years, what does it return?return user.active and user.is_admin or user.is_superuser. Due to Python operator precedence, this evaluates as:and binds tighter than or. This evaluates as (active AND admin) OR superuser — meaning any superuser passes the check regardless of their active status, which is likely a security defect.and binds before or. The expression groups as (active AND admin) OR superuser. This means a deactivated superuser is still admitted — a likely security defect.joint_probability function, why does initializing the accumulator to 0 instead of 1 cause a critical error even if the operator were corrected to multiplication?