AI in Science — Final Exam

1. LLM hallucination is particularly dangerous in scientific contexts because:

Correct.

The core danger is that hallucinations sound as confident and fluent as accurate outputs. Review Lesson 3's hallucination section.

2. The S8 tension in cosmology was sharpened by AI-assisted analysis of which type of observational data?

Correct. Weak lensing surveys like DES Year 3 used ML-assisted shape measurement pipelines to produce dark matter maps that revealed the S8 tension with CMB predictions.

The S8 tension is driven by weak gravitational lensing surveys — DES, KiDS — which use AI-calibrated shape measurements to map dark matter and find clustering weaker than CMB predictions.

3. What did the 2016 Nature survey of 1,576 researchers find regarding reproducibility in practice?

Correct. The survey established that reproducibility failures are widespread, cross-disciplinary, and common even within researchers' own prior work.

Over 70% had failed to reproduce another's results and over 50% had failed to reproduce their own — establishing the crisis as widespread and cross-disciplinary.

4. Why do all major journals prohibit listing AI as a scientific author?

Correct. Authorship is about responsibility, not just contribution. AI cannot be held accountable, respond to correspondence, or retract erroneous work.

The principle is accountability: being an author means being responsible for the work's accuracy and integrity. AI systems cannot fulfill this role.

5. Which of the following best describes the difference between "reproducibility" and "replicability" in scientific methodology?

Correct. The distinction matters: reproducibility is a computational property; replicability is a scientific property about generalizability.

Reproducibility (same data, same code) and replicability (new data, independent team) are distinct. AI failures often affect reproducibility while the underlying science may still be replicable.

6. According to DiMasi et al. (2016), what fraction of drug candidates entering Phase I clinical trials ultimately receive FDA approval?

Correct. Approximately 9.6% of compounds entering Phase I receive approval — meaning roughly 90% of clinical entrants fail before reaching patients.

The figure is approximately 9–10%. The high attrition rate is the central economic driver of drug development costs.

7. The Alkaissi and McFarlane (2023) study on ChatGPT-generated medical bibliographies found that approximately what percentage of references were entirely fabricated?

Correct. 69% fabrication — a rate that clearly establishes LLM-generated bibliographies as requiring full manual verification.

The study found ~69% of ChatGPT-generated references were entirely fabricated, establishing the need for systematic verification of all LLM citation outputs.

8. What is the key advantage of a neural network emulator over running full N-body cosmological simulations in MCMC inference?

Correct. Trained on a small set of full simulations, emulators return predictions in milliseconds — making MCMC chains with thousands of steps computationally feasible.

The emulator's power is speed: once trained, it returns predictions in milliseconds rather than the months a full simulation would require, enabling thorough MCMC parameter exploration.

9. The FAIR data principles require that scientific data be Findable, Accessible, Interoperable, and Reusable. Which of the following would MOST directly violate the "Interoperable" requirement?

Correct. Interoperability requires that data use open standards and formats so that systems and communities can exchange and interpret data without specialized proprietary tools.

Interoperability means data uses open standards so it can be used across systems. A proprietary format requiring licensed software directly undermines interoperability.

10. A "foundation model" in science differs from a task-specific model primarily because:

Correct. The key property is broad pre-training plus adaptability — the upstream training cost is amortized across thousands of downstream applications.

Foundation models are defined by broad pre-training and adaptability — one model, many applications, with knowledge transferring across domains.

11. Pangu-Weather was developed by which organisation?

Correct. Pangu-Weather was developed by Huawei's research team and published in Nature in 2023, using a 3D Earth Attention Network trained on ERA5.

Pangu-Weather was developed by Huawei and published in Nature (2023). DeepMind produced GraphCast; ECMWF remains the primary physics-based operational centre.

12. Shallue and Vanderburg's 2017 neural network used which two complementary views of a transit signal for classification?

Correct. The dual-input architecture used a global view (full phase-folded dip) and a local view (zoomed transit window), achieving 96% accuracy on the test set.

The network used a "global" view of the full phase-folded light curve and a "local" zoomed view of the transit region — a dual-input architecture that became widely adopted.

13. The CMS 2021 autoencoder anomaly detection trigger was trained exclusively on which type of data?

Correct. Trained only on SM events, the autoencoder learns what "normal" collisions look like — events with anomalously high reconstruction error are flagged as potential new physics.

The autoencoder was trained only on Standard Model events. It learns normal topology; anything it reconstructs poorly is flagged as anomalous — a model-independent search strategy.

14. The estimated universe of drug-like molecules is approximately how large?

Correct. The chemical space of drug-like molecules is estimated at 10²³–10⁶⁰ — an astronomically larger number than any physical library, which is the fundamental argument for AI-guided exploration of chemical space.

The estimate is 10²³–10⁶⁰ drug-like molecules — dwarfing the ~10 million compounds in the largest physical HTS libraries by many orders of magnitude.

15. How many amino acids are in the standard genetic code?

Correct. The standard genetic code encodes 20 amino acids (plus stop signals). Their chemical diversity — ranging from tiny glycine to bulky tryptophan, from charged arginine to hydrophobic leucine — is the raw material of all protein function.

There are 20 standard amino acids. The 64 codons in the genetic code map to these 20 amino acids (plus stop codons) with redundancy.

16. What is the fundamental advantage of ML interatomic potentials (MLIPs) over DFT for materials screening?

Correct. MLIPs trade some accuracy for enormous speed, making large-scale screening tractable.

Incorrect. MLIPs are faster approximations, not more accurate alternatives. They require DFT training data. Review Lesson 1.

17. The Event Horizon Telescope produced its first black hole image in what year, and of which object?

Correct. The EHT's first released black hole image in April 2019 showed M87*, the supermassive black hole 55 million light-years away in galaxy M87.

The first EHT black hole image was released on April 10, 2019, showing M87* — a 6.5-billion-solar-mass black hole at the center of galaxy M87.

18. In the ML Reproducibility Challenge (2021), what was the most common obstacle to reproducing ML paper results when authors' code was unavailable?

Correct. These were all structural documentation failures — not deliberate concealment — demonstrating that prose-only methods descriptions are systematically inadequate.

The obstacles were technical documentation failures: random seeds, hyperparameter choices, library versions, and preprocessing details described in words but not implemented in shareable code.

19. What is the fundamental risk of assuming "stationarity" in statistical climate downscaling?

Exactly right. Stationarity assumes the statistical relationship between large-scale and local climate is constant through time. Under anthropogenic forcing, physical processes can change (e.g., shifted storm tracks, altered moisture transport), invalidating the historical training relationship.

The stationarity assumption says the coarse-to-fine relationship learned from historical data holds in future conditions. This breaks down if climate change alters the physical processes governing local climate — shifted storm tracks, changed atmospheric circulation patterns, and altered moisture regimes all violate stationarity.

20. What was the role of an LLM in the Microsoft-PNNL materials discovery campaign?

Correct. Literature mining via LLM — parsing tables, resolving units, standardising measurements — created the training data that downstream ML property predictors depended on. Review Lesson 3.

Incorrect. The LLM mined literature for ionic conductivity data. Review the LLM callout in Lesson 3.