Module 8 · Lesson 1

Autonomous Discovery: When AI Runs the Experiment

Self-driving laboratories, closed-loop optimization, and the collapse of the hypothesis-test cycle.

What happens when an AI system can design, execute, and interpret experiments without waiting for a human to decide what to try next?

The Acceleration Consortium at the University of Toronto describes its flagship robot as a "self-driving laboratory." The system — a mobile platform called Ada — navigates a chemistry lab autonomously, selecting reagents, running reactions, measuring outcomes, and using those measurements to update a Bayesian optimization model that decides what to synthesize next. In one published study it completed 688 experiments in 8 days, a pace no human team could sustain. The goal was not to replace chemists but to exhaust combinatorial space that would otherwise remain unexplored.

The Closed-Loop Laboratory

Traditional experiments follow a linear arc: hypothesize → design → execute → analyze → publish → repeat. Each step is separated by human review, often days or weeks apart. The closed-loop laboratory collapses this into a continuous cycle that runs at machine speed.

The key components are: robotic execution (liquid handlers, mobile platforms, plate readers), real-time analysis (inline spectroscopy, computer vision), and an active-learning algorithm — usually Bayesian optimization or a reinforcement-learning agent — that selects the next experiment based on prior results. The loop closes when the algorithm's output feeds directly into the robot's task queue.

Documented Milestones

In 2020, researchers at the University of Liverpool published in Nature a mobile robot chemist that autonomously discovered improved photocatalysts for hydrogen production — running 688 experiments over 8 days, finding a catalyst 6× better than the starting point. The robot navigated the lab, operated equipment, and updated its search model with no human intervention during runs.

In 2023, Merck and MIT demonstrated a closed-loop platform for pharmaceutical process optimization that reduced the time to identify optimal reaction conditions from months to days. The system used a neural network surrogate model trained on in-line mass spectrometry data, allowing it to predict reaction yield before the experiment fully completed — a form of predictive truncation that dramatically cut reagent waste.

Also in 2023, the A-Lab at Lawrence Berkeley National Laboratory used AI-driven synthesis planning to autonomously produce 41 of 58 targeted inorganic compounds in 17 days — a success rate of ~71% with zero human synthesis decisions.

688

Experiments in 8 days — Liverpool mobile robot chemist (Nature, 2020)

41/58

Novel inorganic compounds autonomously synthesized — A-Lab, Berkeley (2023)

Active Learning and Bayesian Optimization

The intelligence behind closed-loop labs is mostly active learning — a framework where the model identifies which experiments would reduce its uncertainty most. Bayesian optimization is the dominant approach: it maintains a probabilistic surrogate model of the experimental landscape, then selects the next point by maximizing an acquisition function that balances exploration of unknown regions and exploitation of known good regions.

This is fundamentally different from grid search or random sampling. A grid search of a 10-dimensional chemical space with 10 values per dimension requires 10 billion experiments. Bayesian optimization can find near-optimal solutions in hundreds — because it learns the structure of the landscape as it goes.

Key Concept

Acquisition function: The mathematical rule an active-learning system uses to decide which experiment to run next. Common choices include Expected Improvement (EI), Upper Confidence Bound (UCB), and Probability of Improvement (PI). Each encodes a different tradeoff between trying something new versus refining something promising.

Key Terms

Closed-loop lab —An experimental system where AI analysis feeds directly back into experimental design without human handoff.

Bayesian optimization —An iterative search strategy using a probabilistic surrogate model to select experiments that maximize an objective with minimal trials.

Active learning —A paradigm where the model selects its own training data — specifically the examples that reduce uncertainty most — rather than passively consuming a fixed dataset.

Surrogate model —A fast, approximate model (e.g., Gaussian process) trained on prior experimental data to predict outcomes in regions not yet tested.

Why It Matters

When AI closes the experimental loop, the bottleneck shifts from execution speed to question quality. Researchers who once spent most of their time pipetting now spend it deciding what objectives to optimize — and what constraints to impose. The role of scientific judgment moves upstream, to goals and criteria rather than protocols.

Lesson 1 Quiz

Autonomous Discovery & Closed-Loop Laboratories

1. The University of Liverpool's 2020 mobile robot chemist (published in Nature) ran how many experiments in 8 days?

Correct. The robot completed 688 experiments in 8 days, discovering a photocatalyst roughly 6× better than the starting formulation.

Not quite. The Liverpool robot ran 688 experiments — a pace impossible for human teams to sustain manually.

2. In Bayesian optimization, the "acquisition function" determines:

Correct. The acquisition function encodes the tradeoff between exploring uncertain regions and exploiting known promising areas of the experimental space.

The acquisition function is the mathematical rule for selecting the next experiment — balancing exploration vs. exploitation.

3. The A-Lab at Lawrence Berkeley National Laboratory autonomously synthesized 41 of 58 targeted compounds in how many days?

Correct. The A-Lab achieved a ~71% success rate in 17 days with no human synthesis decisions.

The A-Lab completed the task in 17 days — 41 novel inorganic compounds with no human intervention in synthesis decisions.

4. What is the key difference between a "closed-loop laboratory" and a traditional experimental workflow?

Correct. The defining feature is continuous cycling — results inform the next experiment automatically, removing the inter-experiment human delay.

The key difference is that the loop closes without human review at each step — results feed directly into the next experimental decision.

Lab 1: Designing a Closed-Loop Experiment

Practice session · AI research assistant

Objective

You're advising a research team that wants to set up their first AI-driven closed-loop laboratory for optimizing a chemical reaction. Work through the key design decisions with your AI research assistant — what objective to optimize, what constraints to set, which active-learning strategy to use, and how to handle the robot–algorithm interface.

Start by telling the assistant what field or type of experiment your team works on (chemistry, materials, biology — real or hypothetical). Then ask it to help you design the closed-loop architecture step by step.

AI Research Assistant

Closed-Loop Lab Design

Welcome. I'm your AI research assistant for this module. Let's design your closed-loop laboratory together. First — what type of experiment or research domain are you working in? Tell me the reaction, process, or system you want to optimize, and we'll build the architecture from the ground up.

Module 8 · Lesson 2

Foundation Models for Science: Toward a Universal Research Intelligence

Large pre-trained models crossing disciplinary boundaries — and what that means for how science is organized.

Can a single AI model trained across all of science produce insights that domain-specific tools miss — and what are the risks of that concentration?

When Google DeepMind released AlphaFold 3 in May 2024, it extended beyond proteins to all molecules — DNA, RNA, small molecules, ions — and their interactions. Within weeks, structural biologists who had spent years crystallizing proteins reported that the first thing they now do is run AlphaFold 3 before deciding whether to proceed experimentally. A single model, pre-trained once, had restructured the opening move of an entire scientific discipline.

What Is a Foundation Model for Science?

A foundation model is a large neural network trained on broad data that can be adapted — via fine-tuning or prompting — to many downstream tasks. In science, this means models pre-trained on vast corpora of literature, protein sequences, molecular structures, genomic data, or physical simulations, then applied to specific problems without retraining from scratch.

The defining feature is transfer: knowledge learned in one context transfers to another. A model that learns molecular representations from millions of drug-like molecules can, with relatively few examples, predict binding affinity for a novel target. The cost of the upstream training is amortized across thousands of downstream applications.

Documented Cross-Domain Models

ESM-2 (Meta AI, 2022): A language model trained on 250 million protein sequences. Unlike AlphaFold (which predicts 3D structure), ESM-2 captures evolutionary relationships and can predict the effect of mutations — enabling applications from enzyme engineering to variant effect prediction across all proteins, not just those with solved structures.

Galactica (Meta AI, 2022): A 120-billion-parameter model trained on 48 million scientific papers, textbooks, and databases. Designed to write scientific text, predict citations, and solve reasoning-heavy scientific questions. Retracted from public release within three days due to confident-sounding factual errors — a landmark case study in the risks of large scientific language models.

Gemini 1.5 Pro applied to genomics (Google, 2024): Researchers used Gemini's long-context window (up to 1 million tokens) to process entire genomic sequences in a single context, enabling cross-gene reasoning that previous models could not perform due to context-length limits.

Unified Forecasting Model — GraphCast (DeepMind, 2023): Trained on 40 years of weather reanalysis data, GraphCast outperformed the European Centre for Medium-Range Weather Forecasts (ECMWF) on 90% of 1,380 prediction targets, running a 10-day global forecast in under a minute on a single TPU — versus hours of supercomputing time for traditional numerical models.

250M

Protein sequences in ESM-2 training corpus — enabling mutation effect prediction across all known proteins

<60s

Time for GraphCast to generate a 10-day global weather forecast vs. hours of supercomputer time

Risks of Concentration

Foundation models introduce a structural risk: when a single pre-trained model becomes the universal first step for a discipline, its biases propagate everywhere. If ESM-2 systematically underrepresents archaeal proteins (it does — they're rare in training data), every downstream application inherits that blind spot.

The Galactica incident illustrated a second risk: confident hallucination. The model would generate plausible-sounding but incorrect citations, chemical structures, and mathematical derivations — with no uncertainty signal. Researchers unaware of this could trust outputs that were wrong.

A third risk is homogenization: if all labs in a field use the same foundation model as a starting point, they may converge on similar hypotheses, reducing the diversity of scientific exploration that has historically been a source of unexpected breakthroughs.

Historical Parallel

The consolidation of scientific databases — UniProt, GenBank, PDB — created similar concentration risks decades ago. When GenBank had data integrity errors, they propagated into thousands of analyses. Foundation models may amplify this dynamic, because their influence is not just on raw data but on the interpretation layer itself.

Key Terms

Foundation model —A large neural network pre-trained on broad data, designed to be adapted to many downstream tasks via fine-tuning or prompting.

Transfer learning —Using knowledge acquired in one domain or task to improve performance on a different but related domain or task.

Hallucination —AI output that is fluent and confident-sounding but factually incorrect, with no internal uncertainty signal.

Homogenization risk —The danger that widespread adoption of a single model or approach causes scientific communities to converge on similar hypotheses, reducing exploratory diversity.

Lesson 2 Quiz

Foundation Models for Science

1. Meta AI's ESM-2 protein language model was trained on approximately how many protein sequences?

Correct. ESM-2 was trained on 250 million protein sequences, giving it broad coverage of evolutionary relationships across the protein universe.

ESM-2 was trained on 250 million protein sequences — a corpus broad enough to capture deep evolutionary patterns across the protein universe.

2. Galactica was pulled from public access within three days primarily because:

Correct. Galactica's confident hallucinations — plausible-sounding but wrong citations, structures, and derivations — prompted its rapid withdrawal.

The core problem was hallucination: confident, fluent scientific-sounding outputs that were factually wrong, with no uncertainty signal.

3. DeepMind's GraphCast outperformed ECMWF numerical weather prediction on what fraction of prediction targets?

Correct. GraphCast outperformed ECMWF on 90% of 1,380 prediction targets while generating forecasts in under a minute.

GraphCast beat ECMWF on 90% of 1,380 targets — a striking demonstration of data-driven models matching or exceeding physics-based numerical simulation.

4. "Homogenization risk" in the context of scientific foundation models refers to:

Correct. When a single foundation model guides a field, labs may inadvertently converge on the same hypotheses, reducing the exploratory diversity that generates unexpected discoveries.

Homogenization risk is about scientific monoculture: universal adoption of one model may cause all labs to explore the same corners of idea space.

Lab 2: Evaluating a Foundation Model for Your Field

Practice session · AI research assistant

Objective

You're advising your institution's research computing committee on whether to adopt a major scientific foundation model (ESM-2, AlphaFold 3, or a domain-relevant alternative) as a shared infrastructure for your field. Work with the AI assistant to identify the model's strengths, known failure modes, data biases, and the institutional risks of widespread adoption.

Name a scientific foundation model you've heard of (or ask for a recommendation), then work through: What does it do well? Where does it fail? What biases exist in its training data? And what happens to scientific culture if everyone uses it?

AI Research Assistant

Foundation Model Evaluation

Hello. I'm here to help you evaluate a scientific foundation model for potential institutional adoption. Which model are you considering — or would you like me to suggest one relevant to your field? Once we've identified it, we'll systematically examine its capabilities, failure modes, training data biases, and the broader risks of making it a shared scientific infrastructure.

Module 8 · Lesson 3

Human–AI Research Teams: Collaboration, Credit, and the Division of Scientific Labor

How AI is reshaping who does what in research — and the emerging ethics of credit, accountability, and scientific authorship.

When an AI system generates a hypothesis that leads to a Nobel Prize, who gets the credit — and who is responsible when it leads to a retraction?

The 2024 Nobel Prize in Chemistry was awarded in part to Demis Hassabis and John Jumper of DeepMind for AlphaFold — the first time a Nobel Prize explicitly recognized an AI system as central to a scientific breakthrough. The citation noted that AlphaFold "solved a 50-year-old problem." The prize went to the system's architects, not to the scientists who used it. This raised an immediate question across the research community: what is the correct unit of scientific credit when the most important tool is also intelligent?

The Emerging Division of Labor

In contemporary AI-augmented research, labor is being redistributed along a rough hierarchy. AI systems now routinely handle: literature synthesis (scanning and summarizing thousands of papers), hypothesis generation (proposing candidate mechanisms from data patterns), data analysis (running statistical models, identifying outliers), code generation (writing analysis pipelines), and draft writing (generating manuscript sections from structured inputs).

Humans retain responsibility for: problem selection (deciding what is worth studying), experimental judgment (knowing when a result is suspicious), ethical navigation (recognizing dual-use risks, consent issues, equity implications), and accountability (standing behind published claims).

Documented Case

In 2023, two lawyers in the Southern District of New York submitted a brief containing citations to six court cases that did not exist — generated by ChatGPT and accepted without verification. The lawyers were sanctioned. The event established a legal precedent: AI-generated errors are the professional responsibility of the human who submits them. Science is moving toward the same principle, but formal policies lag.

Authorship: What Journals Are Doing

By 2024, virtually all major journals had issued AI authorship policies. The consensus position: AI cannot be listed as an author because authorship implies accountability — the ability to stand behind claims, respond to correspondence, and retract work if errors are found. AI systems cannot do any of these things.

Nature requires disclosure of any AI use in the research process. Science prohibits AI-generated text in submitted manuscripts unless authors explicitly declare and justify it. PLOS ONE allows AI tools for language editing but not for generating scientific content. These policies are evolving rapidly and inconsistently.

A parallel debate concerns reproducibility: if a paper was written partly by GPT-4, but GPT-4 is updated between submission and peer review, can reviewers reproduce the AI's contribution? The version of the model used becomes a methodological detail as important as the version of a statistical package.

January 2023 — Science and Nature both issue initial AI authorship policies prohibiting AI as listed author.

May 2023 — ChatGPT hallucination case in U.S. federal court; lawyers sanctioned. Establishes human accountability for AI outputs.

October 2023 — COPE (Committee on Publication Ethics) publishes formal guidelines requiring disclosure of AI tool use in research and writing.

October 2024 — Nobel Prize in Chemistry awarded to AlphaFold architects, prompting open debate about scientific credit in the AI era.

Skill Atrophy and the Expertise Problem

A subtler risk of deep AI integration is skill atrophy. If junior researchers never learn to manually identify outliers, write analysis code, or synthesize literature because AI does it automatically, they may lack the judgment to recognize when the AI is wrong. This is the aviation analogy: autopilot dependency has been implicated in accidents where pilots could not manually handle situations outside the AI's design envelope.

Several leading research institutions — including MIT and the Broad Institute — have begun requiring graduate students to demonstrate core computational skills without AI assistance, precisely to prevent this outcome. The goal is not to avoid AI but to ensure researchers can critically evaluate AI outputs, which requires understanding the underlying process.

The Key Principle

AI in research is a tool with judgment, not just a tool with speed. This means the human scientist must maintain enough expertise to interrogate AI outputs — to ask "how do I know this is right?" rather than "how do I use this output?" The 2024 Nobel acknowledged AI's power; it also implicitly underscored that the humans who understand what the tool is doing remain irreplaceable.

Key Terms

Scientific authorship —The formal attribution of responsibility for a research work; requires ability to defend, correct, and retract — currently withheld from AI systems by all major journals.

Skill atrophy —The erosion of expert human capabilities due to over-reliance on automated tools, reducing the ability to critically evaluate those tools' outputs.

Reproducibility (AI context) —The challenge of reproducing AI-assisted research when the model version, prompts, or parameters used are not fully documented.

Lesson 3 Quiz

Human–AI Research Teams & Scientific Credit

1. The 2024 Nobel Prize in Chemistry recognized AlphaFold. Who received the award?

Correct. The Nobel went to AlphaFold's architects (Hassabis and Jumper) — not to the AI system itself and not to its users.

The Nobel went to Demis Hassabis and John Jumper — the architects of AlphaFold. AI systems cannot receive Nobel Prizes; the humans who built the system were credited.

2. Why do major journals prohibit listing AI as a co-author?

Correct. Authorship is not about contribution — it's about accountability. AI cannot respond to correspondence, defend claims, or retract papers.

The core issue is accountability. Authorship implies the ability to stand behind and be responsible for scientific claims — AI systems cannot do this.

3. The 2023 U.S. federal court case involving ChatGPT-generated legal citations established what principle for science?

Correct. The lawyers were sanctioned precisely because they submitted AI outputs without verification — establishing that human professionals are accountable for AI errors in their work.

The sanctioning of the lawyers established clear precedent: the human professional who submits AI-generated content bears full responsibility for its accuracy.

4. "Skill atrophy" in AI-augmented research refers to:

Correct. If researchers never practice the underlying skills, they lose the ability to recognize when the AI is wrong — a critical failure mode analogous to autopilot dependency in aviation.

Skill atrophy means human researchers lose core competencies through automation — and then can't critically evaluate the AI that replaced those skills. MIT and the Broad Institute have programs specifically to prevent this.

Lab 3: Drafting an AI Use Policy for Your Lab

Practice session · AI research assistant

Objective

Your PI has asked you to draft a lab-level AI use policy that covers: which research tasks AI tools may assist with, disclosure requirements for papers, authorship guidelines, and how to handle AI-generated errors discovered after submission. Work with the AI assistant to build this policy document step by step.

Start by describing your research context (field, lab size, typical output types — papers, datasets, code). Then ask the assistant to help you draft each section of the policy, pushing back on any clauses that seem too restrictive or too permissive.

AI Research Assistant

Research AI Policy Drafting

Hello. I'll help you draft a lab-level AI use policy. Before we start, tell me about your research context: What field are you in? How large is the lab? What kinds of outputs do you produce — journal articles, datasets, code, grant proposals? With that picture in mind, we'll build a policy that's practical, compliant with major journal requirements, and defensible to your institution.

Module 8 · Lesson 4

Equity, Access, and the Future Architecture of Science

Who benefits from AI-accelerated research — and who is systematically excluded as compute and data concentrate in a few institutions.

If the most powerful AI research tools require $100 million in compute infrastructure, what happens to science at universities in low-income countries — and to the scientific questions that only those communities would think to ask?

The Lacuna Fund — a consortium funded by the Rockefeller Foundation and others — was created to address a specific problem: the training data that powers global AI does not include most of the world. Medical imaging datasets from sub-Saharan Africa were effectively absent from the models being deployed in African hospitals. Dermatology models trained on predominantly light-skinned images misclassified skin conditions in darker-skinned patients at twice the rate. The Fund began commissioning labeled datasets from underrepresented populations — but the underlying infrastructure gap remained.

The Compute Concentration Problem

Training a frontier AI model requires resources available to perhaps a dozen organizations worldwide. GPT-4's training run cost an estimated $50–100 million. The compute required to train AlphaFold 2 from scratch was substantial enough that replication is beyond most academic labs. This is a structural departure from previous phases of scientific computing, where university clusters were meaningfully competitive with industry.

The practical consequence: the most capable AI research tools are either proprietary (available as APIs with access fees) or require institutional cloud computing budgets that most universities in the Global South cannot sustain. A researcher at the University of Lagos has fundamentally different access to AI-accelerated research than one at MIT — not because of intellectual capacity, but because of compute economics.

$50M+

Estimated training cost for GPT-4 — representing a compute threshold inaccessible to virtually all academic labs globally

2×

Higher misclassification rate for dermatology AI on darker skin tones — due to underrepresentation in training data

The Data Representation Gap

AI systems inherit the biases of their training data. In biomedical research, this is well-documented: genome-wide association studies (GWAS) have historically been conducted predominantly on populations of European ancestry. By 2016, over 80% of GWAS participants were of European descent, meaning polygenic risk scores and AI-driven genomic models had poor transferability to African, South Asian, and East Asian populations.

The H3Africa (Human Heredity and Health in Africa) initiative was established specifically to build African genomic datasets. By 2023, it had enrolled over 50,000 participants and demonstrated that variants discovered in African populations explained disease risk that European-ancestry GWAS entirely missed — because the variants were rare or absent in European populations. This is scientific knowledge that would not exist without deliberately inclusive data collection.

Similar gaps exist in environmental monitoring (AI climate models trained on data-rich regions), agricultural AI (crop models trained on temperate zones applied to tropical farming), and linguistic science (language models dramatically underperforming on African and Indigenous languages).

Concrete Initiative

The African Institute for Mathematical Sciences (AIMS) and the Deep Learning Indaba — a community-driven conference now drawing over 500 African ML researchers — represent grassroots efforts to build AI research capacity without waiting for compute parity. The Indaba has explicitly prioritized research on African languages, climate adaptation, and health equity, producing work that would not emerge from a Silicon Valley lab.

Structural Proposals and Their Limits

Several structural responses have been proposed. Open-weight model release (Meta's LLaMA series, Mistral) reduces inference costs but does not solve fine-tuning costs for genuinely resource-constrained labs. AI compute grants from NSF and cloud providers (AWS, Google Cloud for researchers) are meaningful at the margin but not at scale. Federated learning — training on distributed data without centralizing it — offers a path for privacy-preserving collaboration across institutions but requires coordination infrastructure.

The deeper structural issue is that the scientific questions that matter most for underrepresented populations — tropical disease mechanisms, drought-resilient crop genetics, informal-settlement health patterns — are not the questions that maximize returns for the compute-heavy organizations that drive AI research. Market incentives and scientific need are misaligned.

Long-Term Stakes

If AI accelerates science primarily for well-resourced institutions, the gap between what is known and what is acted on for different populations will widen — not because of lack of effort, but because the infrastructure of knowledge production itself has become unequal. The most important design choices for the next decade of AI in science may not be algorithmic — they may be about who controls the infrastructure, who owns the data, and which scientific questions the field chooses to fund.

Key Terms

Compute concentration —The consolidation of AI training capacity in a small number of institutions, creating structural inequality in who can develop frontier research tools.

Training data bias —Systematic underrepresentation of certain populations or conditions in the data used to train AI models, causing those models to perform worse for underrepresented groups.

Federated learning —A machine learning approach where models are trained across multiple decentralized devices or servers holding local data, without exchanging the raw data itself.

Polygenic risk score —A number summarizing an individual's genetic predisposition to a trait or disease, derived from GWAS — known to be less accurate in populations underrepresented in those studies.

Lesson 4 Quiz

Equity, Access & the Future Architecture of Science

1. The H3Africa initiative was established primarily to:

Correct. H3Africa was created specifically to address the fact that over 80% of GWAS participants historically were of European ancestry, limiting the applicability of genomic AI models for African populations.

H3Africa specifically targets the genomic data gap — the vast majority of GWAS historically enrolled European-ancestry participants, making genomic models poorly transferable to African populations.

2. By 2016, approximately what percentage of genome-wide association study (GWAS) participants were of European ancestry?

Correct. Over 80% of GWAS participants were of European ancestry by 2016, creating massive gaps in the applicability of AI-driven genomic models for other populations.

The figure was over 80% European-ancestry participants — a concentration that directly affects the accuracy of AI-driven genomic models for anyone outside that demographic.

3. The Deep Learning Indaba represents:

Correct. The Deep Learning Indaba is a grassroots effort — now drawing over 500 African ML researchers — prioritizing African languages, climate, and health equity research that would not emerge from resource-rich institutions.

The Deep Learning Indaba is a community-driven conference — not corporate, not regulatory — building local AI research capacity and focusing on questions that matter for African contexts.

4. "Federated learning" addresses the equity gap by:

Correct. Federated learning allows training on data that never leaves local institutions — enabling diverse collaborations without requiring data centralization or matching the compute of large organizations.

Federated learning is a technical approach: train models where the data lives, without moving it. This enables collaboration across institutions with different data governance rules and reduces the need for centralized, expensive infrastructure.

Lab 4: Designing an Equitable AI Research Initiative

Practice session · AI research assistant

Objective

You've been tasked with designing a proposal for an AI-augmented research initiative that specifically addresses a scientific question underserved by current AI tools — due to data gaps, compute barriers, or misaligned incentives. Work with the AI assistant to identify the problem, propose a research design, and anticipate the equity challenges your initiative will face.

Describe a scientific question or health/environmental problem that you think is systematically underfunded or underserved by current AI research tools. Work with the assistant to design an initiative that addresses both the science and the structural barriers — data collection, compute access, local partnership, and publication equity.

AI Research Assistant

Equitable Research Design

Welcome. I'll help you design a research initiative that addresses both a scientific gap and the structural inequities that created it. Start by describing a problem or scientific question you believe is systematically underserved by current AI-driven research — whether because of missing training data, compute barriers, or misaligned funding incentives. We'll then work through research design, data strategy, partnerships, and the equity dimensions together.

Module 8 Test

The Future of AI-Augmented Research · 15 questions · Pass mark: 80%

1. A "closed-loop laboratory" is defined by:

Correct. The defining feature is continuous cycling — results automatically determine the next experiment.

A closed-loop lab is one where AI analysis directly and automatically determines the next experimental step — no human sign-off required between cycles.

2. The University of Liverpool's 2020 mobile robot chemist discovered a photocatalyst that was approximately how much better than the starting point?

Correct. The robot found a photocatalyst for hydrogen production approximately 6× better than the initial formulation after 688 autonomous experiments.

The Liverpool robot achieved a ~6× improvement in photocatalyst performance — a result that drove home the practical value of autonomous experimentation.

3. In Bayesian optimization, a "surrogate model" is:

Correct. The surrogate (typically a Gaussian process) approximates the experimental landscape, allowing the acquisition function to identify promising next experiments without running them.

A surrogate model is the fast, approximate predictor at the heart of Bayesian optimization — it estimates outcomes across the experimental space so the system can decide where to look next.

4. The A-Lab at Lawrence Berkeley National Laboratory achieved what result in 17 days of autonomous synthesis?

Correct. The A-Lab achieved a ~71% success rate on novel inorganic compound synthesis in 17 days — with zero human intervention in synthesis decisions.

The A-Lab synthesized 41 of 58 target compounds in 17 days with no human synthesis decisions — a landmark in autonomous materials discovery.

5. A "foundation model" in science differs from a task-specific model primarily because:

Correct. The key property is broad pre-training plus adaptability — the upstream training cost is amortized across thousands of downstream applications.

Foundation models are defined by broad pre-training and adaptability — one model, many applications, with knowledge transferring across domains.

6. Meta AI's Galactica model was withdrawn from public access primarily because:

Correct. Galactica's hallucinations — plausible citations, structures, and derivations that were wrong — prompted its withdrawal within three days of release.

Hallucination with confidence and no uncertainty signal was the core problem — wrong facts delivered in authoritative scientific prose.

7. DeepMind's GraphCast generated a 10-day global weather forecast in approximately:

Correct. GraphCast runs a full 10-day global forecast in under a minute on a single TPU — versus hours of supercomputing for traditional numerical models.

GraphCast completes a 10-day global forecast in under one minute on a single TPU — a dramatic speed advantage over physics-based numerical weather prediction.

8. "Homogenization risk" in scientific AI refers to:

Correct. When one model structures the opening move for an entire field, it can reduce the diversity of hypotheses explored — narrowing the space of potential discoveries.

Homogenization risk is about scientific monoculture: universal use of a single model may cause a whole field to explore the same corners of hypothesis space, missing discoveries that require unusual starting points.

9. The 2024 Nobel Prize in Chemistry's recognition of AlphaFold established what precedent regarding scientific credit?

Correct. The Nobel going to AlphaFold's architects, not its users, prompted open debate about where scientific credit belongs when the tool is itself intelligent and transformative.

The Nobel went to those who built AlphaFold — not the tens of thousands who used it. This raised unresolved questions about credit in an era where tools do substantial intellectual work.

10. Why do all major journals prohibit listing AI as a scientific author?

Correct. Authorship is about responsibility, not just contribution. AI cannot be held accountable, respond to correspondence, or retract erroneous work.

The principle is accountability: being an author means being responsible for the work's accuracy and integrity. AI systems cannot fulfill this role.

11. "Skill atrophy" in AI-augmented research most closely parallels what aviation phenomenon?

Correct. Autopilot dependency — where pilots can no longer effectively fly manually in situations the automation wasn't designed for — is the direct analogue to researcher skill atrophy through AI over-reliance.

Autopilot dependency is the direct analogy: when automation handles routine tasks, humans lose the manual skills needed for edge cases — and those are exactly when skill matters most.

12. The Lacuna Fund was created to address:

Correct. The Lacuna Fund commissions labeled datasets from underrepresented populations — addressing the training data gaps that cause AI models to fail for large portions of the global population.

The Lacuna Fund specifically targets missing training data for underrepresented populations — for example, the absence of African medical imaging data from models deployed in African hospitals.

13. By 2016, what fraction of genome-wide association study participants were of European ancestry — creating a bias problem for genomic AI?

Correct. Over 80% of GWAS participants were of European ancestry, meaning polygenic risk scores and genomic AI models were poorly calibrated for the majority of the world's population.

Over 80% of GWAS participants were European-ancestry — a concentration that directly limits the accuracy of AI-driven genomic tools for most of the world's population.

14. Federated learning addresses equity concerns in AI research by:

Correct. Federated learning keeps data local — training the model on distributed nodes without centralizing raw data — enabling privacy-preserving collaboration across institutions with different governance rules.

Federated learning trains models where the data already lives, without requiring centralization. This enables collaboration across institutions with different data governance, privacy laws, and compute resources.

15. The deepest structural problem with AI and research equity identified in this module is:

Correct. The structural misalignment — between who profits from AI research and which scientific questions most need answering — is the deepest challenge. It's not solved by better algorithms alone.

The deepest problem is incentive misalignment: the organizations with AI infrastructure invest in questions that return profit, while the scientific questions most urgent for underrepresented populations — tropical diseases, drought-resilient crops, informal-settlement health — are systematically underfunded.