1. In the Intercom 2022 research case, what did the researchers note about the AI's role versus human interviews in understanding the unexpected usage pattern?
Correct. The Intercom researchers explicitly noted this division: LLM clustering surfaced the pattern (the what), while six follow-up interviews explained the motivation behind it (the why). This is the characteristic shape of human-AI research collaboration.
Incorrect. The lesson is explicit: AI found the what (the unexpected usage pattern), and six follow-up human interviews found the why (the motivation behind it). The division of labor is the key lesson.
2. What is "baseline poisoning" in visual AI testing?
Correct. Baseline poisoning is a governance failure: when a broken UI state is accepted as the reference, the AI learns to approve the bug and flag correct future behavior as regressions.
Baseline poisoning is an approval governance problem — not a technical attack or performance issue. It's why mature teams require multi-engineer sign-off on any baseline change.
3. Duolingo's 2019 churn model used over 30 features. Which of the following was NOT among the feature categories described in Lesson 3?
Correct. Device manufacturer was not listed. Duolingo's features included streak risk, lesson completion rate, error rate trajectory, lesson difficulty vs. skill level, and notification response rate — all engagement-quality signals.
Incorrect. Device manufacturer was not among the features described. Duolingo's model focused on engagement-quality signals: streak risk, completion rates, error trajectories, difficulty alignment, and notification response rates.
4. IBM Watson for Oncology was shut down in 2022. What validation methodology failure does the STAT News 2018 investigation identify as the root cause?
Correct. Training on hypothetical cases for a system that would be used in real clinical decisions is a consequence-class mismatch. High-magnitude irreversible applications require validation against real outcomes at the scale and distribution of intended deployment.
Incorrect. The root cause was training on hypothetical cases rather than real patient data. This is a fundamental mismatch between training methodology and the high-stakes real-world consequences of the deployment context.
5. What does "warehouse-native analytics" mean, and what advantage does it offer over extraction-based approaches?
Correct. Warehouse-native analytics runs ML and analytics computations inside the warehouse itself, eliminating data movement latency and duplication compared to extracting data to external analytics tools.
Incorrect. Warehouse-native analytics means running computations (including ML inference) directly inside platforms like Snowflake or BigQuery, eliminating the latency and duplication of data extraction pipelines.
6. What primary economic shift does AI-assisted wireframing introduce to the product development process?
Correct. AI reduces the time to generate layout variants from days to minutes, making it irrational to skip lo-fi prototyping — the cost of exploring ten concepts is now measured in minutes.
Incorrect. The primary shift is compressing exploration phase economics — generating ten layout variants in minutes instead of a week, making lo-fi prototyping more cost-effective than skipping it.
7. Airbnb's 2020 ML funnel analysis found that its mobile web checkout drop-off root cause was located where relative to the payment screen where abandonment was observed?
Correct. The auto-complete failure occurred three screens before the observed drop-off — a cross-session-context signal that was invisible to standard funnel analysis but detectable by the gradient-boosted tree model.
Incorrect. Airbnb's model found the root cause three screens before the payment page — an Android OS auto-complete failure on the address field that forced re-entry and ultimately led to abandonment at the payment step.
8. IBM's Equal Access Checker uses ML for which specific accessibility challenge?
Correct. IBM Equal Access Checker uses ML to simulate how screen readers interpret ambiguous ARIA markup and dynamic updates — catching patterns that pass static rules but fail for real assistive technology users.
IBM's tool addresses screen reader interpretation — a gap between what static rule checkers approve and what actually works for assistive technology users in practice.
9. Heap's 2023 "Conversion Signals" feature exemplifies which shift in AI product analytics tool design?
Correct. Heap's Conversion Signals automated the entire feature discovery process — training a classification model on all instrumented events and surfacing the top behavioural predictors as a plain-language ranked list, bypassing SQL entirely.
Incorrect. Heap's Conversion Signals represents the shift from hypothesis-first manual analysis to automated ML discovery — the tool finds which behaviours predict conversion/abandonment without any SQL queries from the product manager.
10. SHAP values are used in production AI systems for what trust-related purpose, and what is the key translation challenge?
Correct. "payment_history: -0.32" is not a user-facing explanation. Translating that to "Your recent late payments were the primary factor" requires product judgment — preserving the accuracy while making it actionable and honest about uncertainty.
Incorrect. SHAP generates local explanations — it tells you why a specific prediction was made by attributing scores to each input feature. The challenge is translating numerical attribution into honest, plain-language explanations users can act on.
11. What is the primary reason AI-generated PRD content must be treated as a first draft rather than a final specification?
Correct. AI lacks access to internal data, actual user behavior, and team-specific context — making its output contextually unreliable without domain expert review.
AI produces structurally sound but contextually hollow PRD content because it has no access to internal data, user behavior, or strategic context — all of which require human input to fill in accurately.
12. According to the 2020 Microsoft Research study, what percentage of CI pipeline failures at large companies were caused by flaky tests?
Correct. The 2020 Microsoft Research study found over 40% of CI failures at large software companies were caused by flaky tests — not real bugs.
The documented finding is over 40% — a large enough proportion to justify significant investment in flakiness detection and remediation tools.
13. How does AI improve canary deployment decisions beyond fixed error-rate thresholds?
Correct. AI-informed canary analysis — as used by Shopify — detects subtle degradation patterns like tail-latency increases in specific regions that don't breach absolute thresholds but represent real regressions.
AI canary analysis detects statistical anomalies — patterns that fall outside learned normal behavior — even when absolute error rates stay below fixed thresholds.
14. What is the recommended approach for AI-generated wireframes in the product workflow?
Correct. AI-generated wireframes are conversation starters, not deliverables. They serve as the basis for structured team critique sessions to evaluate assumptions and edge cases.
Incorrect. The recommended approach treats AI wireframes as conversation starters — bases for structured critique sessions that surface assumptions and explore edge cases before committing to a direction.
15. In the four-step AI synthesis workflow, what happens in Step 4 and why does it remain essential even with AI speed?
Correct. Member-checking — having participants verify that themes reflect their actual experience — remains essential regardless of how fast the analysis was produced.
Incorrect. Step 4 is member-checking. The obligation to validate findings with participants doesn't diminish just because synthesis was faster.
16. Which of the following best describes "agentic coding" in the context of AI development tools?
Correct. Agentic coding involves multi-step autonomous AI actions across a codebase — exemplified by tools like Devin and Cursor Composer.
Agentic coding means autonomous multi-step AI action across files without per-step human approval.
17. Applitools' Visual AI engine was introduced in what year to address pixel-diff false-positive problems?
Correct. Applitools introduced its Visual AI engine using CNNs in 2017, addressing the pixel-diff false-positive problem that was making screenshot-based visual testing unworkable at teams like Salesforce and Adobe.
Applitools Visual AI launched in 2017 — the pivotal year when AI-driven visual testing became commercially viable.
18. What does Spotify's "consumption bingeing before cancellation" finding illustrate about the design of churn prediction feature sets?
Correct. The bingeing pattern illustrates that ML finds counterintuitive signals humans would dismiss — high engagement as a churn precursor — because the model learns from observed outcomes rather than human assumptions about what churn looks like.
Incorrect. Spotify's finding illustrates that ML surfaces counterintuitive churn signals — ones human analysts would dismiss or never think to include — by learning from actual cancellation outcomes rather than intuitive assumptions.
19. Ryan Singer's concept of "scope creep by good intention" (Shape Up) describes which mechanism?
Correct. Each small addition seems reasonable individually, but collectively they compound into major scope expansion — without anyone having explicitly decided to expand the scope.
"Scope creep by good intention" specifically describes individually reasonable additions that collectively double engineering time — the cumulative effect of small yeses that no one explicitly approved as a scope increase.
20. Spotify's predictive test selection approach to CI optimization reduced test run times by up to:
Correct. Spotify reported up to 80% reduction in test execution time on specific services using ML-based test selection, as described in their 2021 engineering blog post.
Spotify's engineering blog documented up to 80% reduction in CI test run times using predictive ML-based test selection.