Module 3 · Lesson 1

The Traditional Drug Pipeline and Why It Breaks

From Bench to Bedside — A Decades-Long Obstacle Course

Why does finding a single new medicine take over a decade and cost more than a billion dollars — and where does AI change the math?

In 2012, Merck Research Laboratories published an internal analysis showing that the average cost to bring a single new molecular entity to market had crossed $1.4 billion in fully capitalized expenditure. The company had screened more than three million compounds over a decade targeting a single metabolic pathway. One drug advanced to Phase III. It failed.

The story was not unusual. Across the industry, nine out of every ten drug candidates that entered human trials never reached patients. The traditional pipeline was not broken — it was working exactly as designed, filtering at enormous cost.

The Classic Drug Discovery Pipeline

Before AI entered the picture, pharmaceutical discovery followed a largely fixed sequence developed in the mid-twentieth century. Each stage acts as a filter, but the filters are expensive and slow.

Target Identification

Researchers identify a biological molecule — typically a protein — whose activity is implicated in a disease. This step relies heavily on genomic data, literature review, and animal models. It can take 2–4 years and still yield the wrong target.

Hit Discovery / High-Throughput Screening

Chemical libraries of hundreds of thousands to millions of compounds are tested against the target in automated assays. A "hit" is any compound showing measurable activity. Hit rates typically run 0.01–0.1%. The equipment costs tens of millions of dollars to operate annually.

Lead Optimization

Medicinal chemists iteratively synthesize analogues of promising hits, modifying functional groups to improve potency, selectivity, and early ADMET properties (absorption, distribution, metabolism, excretion, toxicity). Each synthesis-test cycle takes days to weeks.

Preclinical Studies

Lead candidates enter animal toxicology and pharmacology studies. The FDA requires extensive in vitro and in vivo safety data before any human exposure. Roughly 40% of failures at this stage trace back to ADMET problems that were not predicted earlier.

Clinical Trials (Phase I–III)

Phase I: safety in ~20–80 healthy volunteers. Phase II: efficacy signals in ~100–300 patients. Phase III: large-scale efficacy and safety in thousands of patients. Each phase takes 1–4 years. Phase III alone costs $200–$900 million.

Regulatory Review and Approval

The FDA (US) or EMA (EU) review submission dossiers that can exceed 100,000 pages. Standard review takes 10–12 months; priority review 6 months. Total time from target identification to first prescription: 10–15 years.

Where the Money and Time Actually Go

The Tufts Center for the Study of Drug Development estimated the average fully capitalized cost of a new drug approval at $2.6 billion in 2014 dollars (DiMasi et al., 2016). The majority of that figure comes not from successful drugs but from the cost of failures — the capital invested in compounds that never made it to patients.

Three failure modes account for most attrition: lack of clinical efficacy (~45% of failures), unacceptable safety or toxicity (~30%), and commercial or strategic decisions (~25%). Crucially, many efficacy and toxicity failures could theoretically be predicted earlier — if we had better computational tools.

Key Statistic

The probability of a compound entering Phase I clinical trials and eventually receiving FDA approval is approximately 9.6% (IQVIA, 2019). For oncology specifically, it drops to around 5.1%. Every compound that clears Phase III represents roughly twenty that silently failed earlier.

The Chemical Space Problem

The universe of drug-like molecules — those with roughly the right size and properties to function as medicines — is estimated at between 10²³ and 10⁶⁰ compounds. The largest physical chemical libraries ever assembled contain perhaps 10 million compounds. High-throughput screening cannot explore more than a tiny fraction of this space.

This is the core mathematical problem that AI is positioned to address. Machine learning models trained on known molecule–activity relationships can, in principle, predict which regions of chemical space are worth exploring — before a single flask is purchased or a single assay is run.

Key Terms

ADMETAbsorption, Distribution, Metabolism, Excretion, and Toxicity — the five pharmacokinetic/pharmacodynamic properties that determine whether a drug can survive the human body long enough to work.

Lead CompoundA molecule with confirmed activity at the target and preliminary ADMET properties good enough to justify further optimization.

HTS (High-Throughput Screening)Automated robotic testing of large chemical libraries against biological targets, typically using fluorescence or luminescence readouts.

Chemical SpaceThe theoretical universe of all possible molecular structures. Estimated at 10⁶⁰ drug-like molecules, of which only a minuscule fraction has ever been synthesized.

Attrition RateThe fraction of drug candidates that fail at any given pipeline stage. High attrition is the primary driver of drug development cost.

Why This Matters for AI

Every inefficiency in the traditional pipeline — the brute-force screening, the iterative synthesis cycles, the late-stage toxicity surprises — represents a task that predictive machine learning could accelerate or replace. The lessons ahead map AI tools onto each of these bottlenecks. Understanding why the traditional pipeline is slow is prerequisite to understanding why AI methods are transformative.

Lesson 1 Quiz

The Traditional Drug Pipeline

Three questions · Select the best answer

1. According to the Tufts Center for the Study of Drug Development (DiMasi et al., 2016), what is the approximate fully capitalized average cost to bring a new drug to market?

Correct. DiMasi et al. (2016) estimated the average fully capitalized cost at $2.6 billion in 2014 dollars, accounting for the cost of failed compounds that never reached approval.

Not quite. The Tufts estimate was $2.6 billion in 2014 dollars — a figure that includes the capitalized cost of failed compounds across the portfolio.

2. What is the primary reason the estimated cost of drug development is so high, even relative to the drugs that do succeed?

Correct. The massive attrition rate means that the cost of all the failed compounds must be recovered through the pricing of the small fraction of drugs that succeed. This "cost of failure" structure is the central economic challenge of drug development.

Not quite. The dominant cost driver is the capitalized investment in the many compounds that fail — roughly 90% of Phase I entrants never reach approval.

3. Which of the following best describes the "chemical space problem" in drug discovery?

Correct. The estimated 10²³–10⁶⁰ drug-like molecules vastly exceeds the largest HTS libraries (~10 million compounds), meaning experimental screening can only sample an infinitesimal fraction of potentially useful chemical space.

Not quite. The chemical space problem refers to the mathematical impossibility of experimentally screening even a tiny fraction of the estimated 10⁶⁰ drug-like molecules that could theoretically exist.

Lesson 1 Lab

Pipeline Economics Advisor

AI-assisted discussion · minimum 3 exchanges to complete

Lab Scenario

You are advising a biotech startup that wants to develop a first-in-class small molecule inhibitor for a novel neurological target. The founders have a molecular biology background but limited drug development experience. They have asked you to explain the traditional pipeline and help them think about where AI could reduce their time and capital risk.

Start by asking the AI advisor: "Where in the traditional drug discovery pipeline do most drugs fail, and what does that mean for our capital allocation?"

Drug Pipeline Advisor

AI Lab

Welcome. I'm your drug discovery pipeline advisor. Ask me anything about the stages of drug development, failure rates, costs, or where AI tools are beginning to change the economics. What would you like to explore?

Module 3 · Lesson 2

Protein Structure Prediction and AlphaFold

Solving the 50-Year-Old Problem That Unlocked Modern Drug Design

How did a DeepMind AI system solve in months what structural biologists had spent five decades trying to crack — and what does it mean for finding drugs?

Every two years since 1994, the structural biology community has held a competition called CASP — Critical Assessment of Protein Structure Prediction. Participating teams receive amino acid sequences and compete to predict the three-dimensional shape of the resulting protein. For two decades, progress was incremental. Typical prediction accuracy, measured by a score called GDT (Global Distance Test), hovered around 40–50 for difficult targets.

At CASP14 in November 2020, DeepMind's AlphaFold2 submitted predictions scoring above 92 GDT on average — comparable to experimental X-ray crystallography. The results were so far above competitors that several judges initially questioned whether the system had somehow accessed experimental data. It had not. It had simply learned the physical principles of protein folding from evolutionary sequence data.

John Moult, one of CASP's founders, described it as "a stunning advance" that solved a problem the scientific community had worked on for fifty years. Within a year, DeepMind and EMBL-EBI released predictions for 200 million proteins — essentially the entire known proteome — as a free public resource.

The Protein Folding Problem

Proteins are the molecular machinery of life. A protein's function is determined entirely by its three-dimensional shape — the way a chain of amino acids folds into a precise structure. Most drug targets are proteins: enzymes, receptors, ion channels, transporters. To design a molecule that binds and modulates a protein, you need to know its structure.

Determining protein structure experimentally requires X-ray crystallography, cryo-electron microscopy (cryo-EM), or NMR spectroscopy. Each method takes months to years, costs hundreds of thousands of dollars, and often fails entirely — many proteins simply won't crystallize. As of 2020, structural biologists had solved approximately 170,000 protein structures over fifty years. The human genome alone encodes roughly 20,000 proteins, and the broader proteome including variants and post-translational modifications is far larger.

Anfinsen's dogma, established by Nobel laureate Christian Anfinsen in 1972, states that a protein's three-dimensional structure is fully determined by its amino acid sequence. The thermodynamically stable fold is encoded in the sequence itself. The challenge was computational: given a sequence of hundreds or thousands of amino acids, finding the energy-minimum three-dimensional arrangement is, in principle, an astronomical search problem.

How AlphaFold2 Works

AlphaFold2 (released 2021, described in Nature by Jumper et al.) uses a transformer-based neural network architecture with several key innovations. The model was trained on the Protein Data Bank — all ~170,000 experimentally solved structures — plus evolutionary sequence data from databases of millions of related proteins.

The central insight is that co-evolution encodes contact information. When two amino acid positions in a protein consistently mutate together across thousands of related species, they are likely in physical contact in the folded structure. AlphaFold2's "Evoformer" module processes a multiple sequence alignment of related proteins, extracting these co-evolutionary signals to build a probabilistic map of which residues are spatially close.

A "structure module" then takes this representation and iteratively builds the three-dimensional atomic coordinates, using physical constraints and a self-attention mechanism that refines the structure over multiple iterations. The final output includes per-residue confidence scores (pLDDT) that accurately indicate which regions are reliably predicted versus disordered.

Real-World Impact — 2022–2024

By mid-2022, researchers at the University of California San Francisco used AlphaFold2 structures to identify potential binding sites on previously "undruggable" proteins — targets that had resisted structure-based drug design because no experimental structure existed. The Institute of Cancer Research in London used AlphaFold predictions to design inhibitors for a cancer target within months rather than the years typically required for structure determination alone.

Structure-Based Drug Design

Once a protein structure is known — whether experimentally or through AlphaFold — the binding pocket where a small molecule drug can attach becomes visible. Structure-based drug design (SBDD) uses this three-dimensional shape to guide molecular design.

Computational docking programs predict how candidate molecules would fit into a binding pocket, estimating the binding energy and geometry. This allows chemists to screen millions of virtual molecules before purchasing or synthesizing a single compound. AlphaFold dramatically expanded SBDD by providing structures for the thousands of proteins that had never been crystallized.

A notable 2022 study published in Science (Jumper et al. team's follow-up) demonstrated that AlphaFold structures were accurate enough for molecular docking to produce experimentally confirmed binders — a validation that the predicted structures were not merely academic curiosities but genuine tools for drug discovery.

Limitations of AlphaFold in Drug Discovery

Despite its transformative impact, AlphaFold has important limitations in the drug discovery context. The model predicts static apo structures — the protein in isolation, without bound ligands, cofactors, or interacting partners. Many drug targets change shape when they bind a molecule (induced fit), and these alternative conformations are critical for drug design. AlphaFold does not reliably predict these conformational ensembles.

The model also struggles with intrinsically disordered proteins (IDPs) — proteins that have no fixed structure — which are increasingly recognized as important drug targets (including many cancer-driving transcription factors). And while AlphaFold predicts single-chain structures well, the structures of protein complexes and large assemblies remain more challenging, though the successor AlphaFold3 (2024) has substantially improved multimer and protein-ligand predictions.

GDT ScoreGlobal Distance Test — a metric for protein structure prediction accuracy. Scores above ~90 are considered equivalent to experimental resolution.

pLDDTPredicted Local Distance Difference Test — AlphaFold's per-residue confidence score. Scores above 90 indicate high confidence; below 50 suggest likely disorder.

Apo StructureA protein structure solved or predicted in the absence of bound ligands. Contrasted with holo structures, which contain bound molecules.

Molecular DockingComputational prediction of how a small molecule fits into a protein binding site, used to estimate binding affinity and guide drug design.

The Bigger Picture

AlphaFold didn't just solve a computational biology problem — it removed a rate-limiting step in drug discovery. For the first time, researchers working on neglected tropical diseases, rare genetic disorders, and novel cancer targets can access structural information that would previously have required years of experimental effort or might never have been obtained at all. The bottleneck has shifted from "get the structure" to "design a molecule that exploits it."

Lesson 2 Quiz

Protein Structure Prediction and AlphaFold

Three questions · Select the best answer

1. At the CASP14 competition in 2020, AlphaFold2 achieved an average GDT score of approximately what level, making it comparable to experimental methods?

Correct. AlphaFold2's average GDT score above 92 at CASP14 was widely described as solving the protein folding problem — a result so far above competitors that some judges initially questioned its validity.

Not quite. AlphaFold2 scored above 92 GDT on average — comparable to experimental crystallography and dramatically above previous computational methods.

2. What is the key evolutionary insight that AlphaFold2's Evoformer module exploits to predict protein structure?

Correct. Co-evolutionary signals in multiple sequence alignments — pairs of residues that mutate together across species — encode spatial contact information. This insight is at the core of AlphaFold2's accuracy.

Not quite. The key insight is co-evolution: amino acids that consistently mutate together across thousands of related species are almost certainly in physical contact in the folded structure, providing long-range structural constraints.

3. Which of the following is a genuine limitation of AlphaFold2 for drug discovery applications?

Correct. AlphaFold2 predicts the apo (ligand-free) structure and does not reliably model the conformational changes (induced fit) that occur when a drug binds. This limits its direct use for designing molecules that exploit flexible binding pockets.

Not quite. The key limitation for drug discovery is that AlphaFold2 predicts static apo structures — the protein without any bound ligand — and doesn't reliably capture induced fit or conformational ensembles critical for drug design.

Lesson 2 Lab

AlphaFold Structure Interpretation

AI-assisted discussion · minimum 3 exchanges to complete

Lab Scenario

Your team has downloaded an AlphaFold2 prediction for a protein of interest — a kinase implicated in a rare pediatric cancer. The pLDDT scores vary significantly across the protein: the catalytic domain scores above 90, but a 40-residue N-terminal region scores below 50. You need to decide how to use this structure for drug design.

Start by asking: "Our AlphaFold structure has high confidence scores in the catalytic domain but very low scores in the N-terminal region. How should we interpret this for drug design purposes?"

Structural Biology Advisor

AI Lab

Hello! I'm your structural biology and AlphaFold interpretation advisor. I can help you understand what protein structure predictions mean for drug design — including how to read confidence scores, identify druggable pockets, and work around AlphaFold's limitations. What would you like to explore?

Module 3 · Lesson 3

Generative AI for Molecular Design

From Screening the Known to Designing the Novel

How are generative neural networks creating drug candidates that no chemist has ever synthesized — and how do we know if they're worth making?

In September 2019, Insilico Medicine announced that its AI platform had designed a novel inhibitor for fibrosis-associated kinase DDR1 in just 46 days — from target to synthesized, experimentally confirmed molecule. The work was published in Nature Biotechnology. The timeline that would typically require 2–5 years and cost tens of millions of dollars had been compressed by a factor of roughly fifteen.

The generative model had explored molecular structures outside the training distribution — designing compounds that no medicinal chemist would likely have proposed, yet which demonstrated nanomolar potency in cell assays. By 2022, Insilico's AI-designed drug ISM001-055, targeting idiopathic pulmonary fibrosis, had entered Phase I clinical trials — the first fully AI-generated clinical drug candidate to do so.

The Generative Design Paradigm

Traditional drug discovery searches through molecules that already exist in a library or can be readily synthesized from known building blocks. Generative molecular design inverts this logic: instead of searching, the AI creates — proposing novel molecular structures optimized for specified properties.

This paradigm shift matters because the best drug for a given target may be a molecule that no human chemist has ever thought to make. Generative models can, in principle, navigate chemical space more efficiently than enumeration or random search, guided by learned structure-activity relationships.

Molecular Representations

For a neural network to work with molecules, those molecules must be encoded as mathematical objects. Several representations have been developed, each with tradeoffs:

Representation	Format	Used For	Limitation
SMILES	Linear text string (e.g., CC(=O)Oc1ccccc1C(=O)O)	RNNs, transformers; easy to process with NLP tools	Small changes in string can produce very different molecules
Molecular Graphs	Atoms as nodes, bonds as edges	Graph Neural Networks (GNNs)	More complex to implement; variable-size inputs
3D Coordinates	XYZ positions of all atoms	Physics-based models, equivariant networks	Computationally expensive; conformer generation needed
Fingerprints	Fixed-length binary vectors encoding substructure presence	Classic ML, similarity search	Loses structural information; not generative-friendly

Key Generative Architectures

Variational Autoencoders (VAEs): A VAE learns to compress molecules into a continuous latent space, then reconstruct them. Once trained, new molecules can be generated by sampling from or interpolating within that latent space. The landmark 2018 paper by Gómez-Bombarelli et al. (ACS Central Science) demonstrated that this latent space could be searched using Bayesian optimization to find molecules with desired properties — the drug-like property scores could be treated as a function to optimize in the continuous embedding space.

Generative Adversarial Networks (GANs): A generator network proposes molecules; a discriminator network learns to distinguish generated molecules from real ones. The adversarial training drives the generator to produce increasingly realistic molecular structures. MolGAN (De Cao and Kipf, 2018) applied this architecture directly to molecular graphs, bypassing string representations entirely.

Transformer-Based Models: Treating SMILES strings as sequences, transformer architectures (like those underlying GPT) can be trained on large chemical databases to generate novel molecules token by token. ChemBERTa, MolBERT, and Chemformer are examples. When fine-tuned with reinforcement learning toward specific property objectives, these models produce molecules with optimized drug-like properties.

Diffusion Models: The newest generation of generative models, diffusion approaches (e.g., DiffSBDD, DiffDock) generate 3D molecular structures conditioned on a protein binding pocket — directly designing molecules that are geometrically complementary to a target. This represents a step closer to end-to-end structure-based drug design.

Case Study — Halicin (MIT / Stokes et al., 2020)

In a landmark 2020 Cell paper, MIT researchers trained a GNN to predict antibiotic activity from molecular structure, using a training set of ~2,500 molecules with known growth inhibition data against E. coli. They then ran the model on a library of ~6,000 FDA-approved drugs and ~100 million molecules in virtual libraries. The model flagged a compound called SU3327 (renamed halicin) — originally developed as a diabetes drug and having no known antibiotic activity. Experimental testing confirmed that halicin kills drug-resistant bacteria including Clostridioides difficile and pan-resistant Acinetobacter baumannii through a novel mechanism (disruption of the proton gradient across the bacterial membrane). The AI had found an antibiotic in a region of molecular space that classical antibiotic screening had completely missed.

Multi-Parameter Optimization

A critical challenge in generative drug design is that optimizing for one property often degrades another. A molecule with high potency at the target may have poor solubility. A highly selective compound may be metabolized too rapidly. In practice, drug-like molecules must simultaneously satisfy constraints across five to ten properties.

Reinforcement learning (RL) has emerged as the dominant approach for multi-objective optimization in generative chemistry. The generative model acts as the "policy," producing molecular structures as "actions." A reward function aggregates scores across multiple properties — docking score, QED (quantitative estimate of drug-likeness), predicted solubility, predicted metabolic stability — and the model learns through gradient updates that reward-weighted proposals are favored.

The 2020 paper introducing REINVENT (Olivecrona et al., extended by Blaschke et al.) demonstrated that an RL-fine-tuned RNN could rapidly learn to generate molecules satisfying multiple simultaneous constraints in a matter of hours of training — a task that would require weeks of manual medicinal chemistry iteration.

Synthesizability: The Critical Constraint

A persistent criticism of early generative molecular design was that AI-proposed molecules often could not actually be synthesized in a chemistry laboratory — the models had no knowledge of chemical reactions or reagent availability. A beautiful molecule that cannot be made is worthless.

Several approaches now address synthesizability. ASKCOS (MIT) and AiZynthFinder (AstraZeneca) are retrosynthetic planning tools that predict synthetic routes to target molecules. IBM's RXN for Chemistry uses a transformer trained on millions of chemical reactions to predict yields and reaction conditions. Incorporating synthesizability scores into the generative reward function — as in GuacaMol (Brown et al., 2019) — now allows models to explicitly penalize proposals that lack known synthetic routes.

SMILESSimplified Molecular Input Line Entry System — a text-based notation for molecular structure. Example: "CCO" represents ethanol.

Latent SpaceA continuous, lower-dimensional mathematical space in which a VAE encodes molecular structures. Points in latent space correspond to molecules; nearby points tend to be chemically similar.

QEDQuantitative Estimate of Drug-likeness — a composite score (0–1) encoding molecular weight, lipophilicity, hydrogen bond donors/acceptors, and other Lipinski-derived properties.

Retrosynthetic AnalysisWorking backwards from a target molecule to identify sequences of known chemical reactions that could produce it from available starting materials.

The Promise and the Caution

Generative AI can explore vast molecular space and propose structures optimized for computed properties — but computed properties are not the same as real ones. Models are only as good as the training data, and predicting complex ADMET outcomes (especially in vivo toxicity and metabolic stability) remains extremely challenging. The best current practice treats generative AI as a powerful hypothesis generator, with experimental biology as the ultimate arbiter.

Lesson 3 Quiz

Generative AI for Molecular Design

Three questions · Select the best answer

1. The antibiotic halicin was discovered in a 2020 Cell paper by MIT researchers. What made this discovery significant from an AI methodology standpoint?

Correct. The halicin discovery demonstrated that GNNs trained on existing activity data could identify completely unexpected activity in structurally dissimilar molecules — finding antibiotics where no antibiotic had previously been recognized.

Not quite. Halicin's significance was that an AI trained on antibiotic activity data recognized antibiotic potential in a diabetes drug — discovering a potent, mechanistically novel antibiotic from a region of chemical space that conventional antibiotic programs had never explored.

2. In a Variational Autoencoder (VAE) applied to molecular design, what is the purpose of the "latent space"?

Correct. The latent space is a continuous lower-dimensional representation of molecular structure. Once trained, the VAE can generate novel molecules by decoding points in this space — and Bayesian optimization or gradient methods can navigate the space toward desired property profiles.

Not quite. In a VAE, the latent space is a continuous mathematical embedding where molecules are represented as vectors. Sampling or optimizing within this space allows the model to generate novel molecular structures with desired properties.

3. Why is synthesizability a critical constraint in generative molecular design, and how is it typically addressed?

Correct. Early generative models had no knowledge of chemistry and frequently proposed un-synthesizable structures. Retrosynthetic AI tools and synthesizability penalty terms in reward functions are now standard practice to ensure generated molecules can actually be made.

Not quite. Early generative models frequently produced un-synthesizable molecules. The field now uses retrosynthetic planning tools (AiZynthFinder, ASKCOS) and synthesizability scoring terms within the generative reward function to constrain proposals to chemically accessible structures.

Lesson 3 Lab

Generative Design Strategy Consultant

AI-assisted discussion · minimum 3 exchanges to complete

Lab Scenario

You are a medicinal chemist at a small biotech. Your team's GNN-based generative model has produced 500 novel candidate molecules targeting an antimicrobial resistance protein. You need to prioritize which molecules to synthesize first, given a budget for roughly 20 syntheses.

Start by asking: "Our generative model produced 500 novel candidates for an antibacterial target. We can only synthesize 20. What computational filters and prioritization criteria should we apply before making synthesis decisions?"

Generative Chemistry Advisor

AI Lab

Hi! I'm your generative molecular design advisor. I can help you think through computational prioritization strategies, multi-parameter optimization, synthesizability assessment, and how to set up decision criteria for moving AI-generated candidates into the laboratory. What's your challenge?

Module 3 · Lesson 4

Clinical AI, ADMET Prediction, and the Road to the Clinic

From Optimized Molecule to Human Trial — The Final Computational Frontier

Once you have a promising molecule, how does AI predict whether it will survive the human body — and can it tell you which patients will actually benefit?

In January 2020, as COVID-19 was beginning to spread beyond China, researchers at BenevolentAI used their knowledge graph and literature mining AI to rapidly query which existing approved drugs might address the viral entry mechanisms of SARS-CoV-2. Within days, the system flagged baricitinib — a JAK1/JAK2 inhibitor approved for rheumatoid arthritis — as a candidate that might both block viral endocytosis and suppress the inflammatory cytokine storm associated with severe disease.

The reasoning was published in The Lancet in February 2020 — one of the earliest computational drug repurposing analyses of the pandemic. By November 2020, the NIH-sponsored ACTT-2 trial demonstrated that baricitinib plus remdesivir was superior to remdesivir alone. In 2021, the FDA granted Emergency Use Authorization for baricitinib in COVID-19, and in 2022, a full approval followed. The AI had not designed the drug — but it had identified a repurposing opportunity that shortened the path to clinical use by years.

ADMET Prediction: The Survival Filter

A molecule can bind its target with picomolar affinity and still fail as a drug if it cannot survive the journey through the human body. ADMET failures account for roughly 30–40% of drug attrition, and historically, many were discovered only in late-stage clinical trials — after hundreds of millions of dollars had been spent.

Machine learning ADMET prediction attempts to forecast these failures computationally, before synthesis if possible. Models are trained on large datasets of experimentally measured properties — typically from regulatory submissions, literature databases, and proprietary pharmaceutical datasets — and used to score new candidate molecules.

ADMET Property	Why It Matters	ML Approach	Current Accuracy
Aqueous Solubility	Poor solubility limits oral bioavailability; insoluble drugs don't reach the bloodstream	GNN, random forest on molecular descriptors	~0.7–0.8 R² vs. experiment
CYP450 Inhibition	Cytochrome P450 inhibition causes drug-drug interactions and liver toxicity	Multi-task neural networks; five major CYP isoforms modeled simultaneously	AUC 0.85–0.92 for major isoforms
hERG Toxicity	hERG channel blockade causes fatal cardiac arrhythmia; major safety filter	GNN classifiers; structure-activity rules	AUC ~0.85–0.90
Blood-Brain Barrier	CNS drugs must cross; non-CNS drugs must not (to avoid CNS side effects)	Classification models on lipophilicity and MW	~80–85% accuracy
Oral Bioavailability	What fraction of an oral dose reaches systemic circulation?	Most challenging; complex in vivo process; ~60–70% accuracy	AUC ~0.70–0.78
Hepatotoxicity	Drug-induced liver injury (DILI) is a leading cause of post-market withdrawal	Multi-instance learning; mechanistic models	AUC ~0.75–0.82

Tool Spotlight — ADMETlab 2.0 / Therapeutics Data Commons

ADMETlab 2.0 (published 2021, Jiangsu University) provides a free web server predicting over 53 ADMET endpoints simultaneously for any molecule. The Therapeutics Data Commons (Harvard, 2021) provides standardized ML-ready datasets for hundreds of drug discovery tasks, enabling reproducible benchmarking. Both represent major steps toward making ADMET prediction accessible beyond large pharmaceutical companies.

AI in Clinical Trial Design and Patient Selection

Drug discovery AI is not limited to the preclinical phase. Once a candidate enters human trials, AI tools address three distinct challenges in clinical development:

1. Patient Stratification. Genomic and biomarker data can predict which patient subpopulations are most likely to respond to a drug — turning a failed broad-population trial into a successful precision medicine trial. Flatiron Health and Foundation Medicine (both acquired by Roche) have built large real-world evidence datasets used to identify responder biomarkers. The FDA's 2019 approval of pembrolizumab (Keytruda) for any solid tumor with microsatellite instability was a landmark in AI-informed precision oncology.

2. Trial Recruitment and Site Selection. Natural language processing applied to electronic health records can identify eligible patients faster than manual screening. IBM Watson for Clinical Trial Matching demonstrated reduced screening time per eligible patient, though real-world deployment results were mixed. More recently, companies like Medidata and Veeva have deployed ML systems that reduce screen failure rates by 20–30% in pilot trials.

3. Adaptive Trial Design and Safety Monitoring. Machine learning models monitoring incoming trial data can flag safety signals earlier, recommend dose adjustments, and adapt randomization ratios based on interim outcomes. The FDA has been increasingly receptive to AI-informed adaptive designs; its 2019 guidance on adaptive designs explicitly acknowledged machine learning as a valid analytical tool.

Drug Repurposing at Scale

The baricitinib/COVID-19 case exemplifies a broader AI application: drug repurposing — finding new uses for already-approved drugs. Repurposing dramatically reduces development risk and time because safety data already exists. The challenge is systematically identifying which approved drugs might work for which new indications.

Knowledge graph–based AI systems (like BenevolentAI's KG, or SPOKE at UCSF) integrate drug, gene, disease, pathway, and literature data into a massive network. Graph neural networks trained on these knowledge graphs predict missing links — "drug X might treat disease Y" — by identifying structural patterns in the graph that associate known drug-disease relationships.

A 2021 Nature Machine Intelligence study by Zeng et al. (Vanderbilt) used a network proximity measure in a drug-target-disease graph to systematically identify repurposing opportunities for COVID-19, generating a ranked list of candidates that was subsequently validated against real-world patient data from electronic health records. Several top-ranked candidates showed significant protective associations in clinical data.

The Regulatory Landscape for AI in Drug Discovery

Regulatory agencies have begun developing frameworks for AI in drug development. The FDA's Action Plan for AI/ML-based Software as a Medical Device (2021) and its Discussion Paper on AI in Drug Development (2023) acknowledge AI as a valid tool for multiple pipeline stages, while emphasizing the need for transparency, model documentation, and validation against held-out data from the proposed use domain.

The European Medicines Agency's Reflection Paper on AI in the Lifecycle of Medicines (2023) similarly calls for "explainability" — the ability to articulate why an AI model made a given prediction — as a key criterion for regulatory acceptance. This creates pressure toward interpretable models (graph networks with attention weights, for example) over black-box approaches in regulatory submissions.

Drug RepurposingIdentifying new therapeutic indications for already-approved drugs. Reduces development risk because safety data already exists, allowing faster clinical translation.

Knowledge GraphA graph-structured database connecting entities (genes, proteins, diseases, drugs, pathways) with typed relationships. GNNs trained on knowledge graphs can predict new entity relationships.

Patient StratificationDividing a patient population into subgroups based on biomarkers, genomics, or clinical features to identify which subgroup is most likely to respond to a specific treatment.

ExplainabilityThe degree to which an AI model's predictions can be interpreted in terms of input features. Increasingly required by regulatory agencies for AI tools used in drug development submissions.

Module Summary

Across the four lessons, a complete picture emerges: AI is not replacing drug discovery — it is re-architecting it. Protein structure prediction (AlphaFold) has removed a structural biology bottleneck. Generative models are expanding the chemical space accessible for design. ADMET prediction is moving safety filtering upstream. Repurposing AI is finding new uses for existing medicines. The promise is not eliminating failure — it is making failures faster, cheaper, and more informative, so that the eventual successes reach patients sooner.

Lesson 4 Quiz

Clinical AI, ADMET Prediction, and the Road to the Clinic

Three questions · Select the best answer

1. How did BenevolentAI's system identify baricitinib as a COVID-19 candidate in early 2020?

Correct. BenevolentAI's knowledge graph AI identified baricitinib through drug repurposing — recognizing that its known mechanism (JAK inhibition, endocytosis pathway) was relevant to both viral entry and the cytokine storm of severe COVID-19. The finding was published in The Lancet in February 2020.

Not quite. BenevolentAI used a knowledge graph and literature mining system to identify baricitinib — an approved rheumatoid arthritis drug — as mechanistically relevant to COVID-19 viral entry and cytokine storm. This was drug repurposing, not de novo design.

2. Which ADMET property is typically the most challenging for machine learning models to predict accurately, and why?

Correct. Oral bioavailability is the ADMET endpoint with lowest prediction accuracy (~70–78% AUC) because it represents the combined outcome of intestinal absorption, gut metabolism, first-pass liver metabolism, and efflux transporters — a complex multi-step in vivo process that is difficult to model computationally.

Not quite. Oral bioavailability is typically the hardest ADMET property to predict accurately — it integrates multiple complex in vivo processes including intestinal absorption, gut and liver first-pass metabolism, and active efflux, making it a composite of several hard-to-model phenomena.

3. The FDA's 2019 approval of pembrolizumab for any MSI-high solid tumor was a landmark for AI-informed oncology because:

Correct. The pembrolizumab MSI-high approval was landmark because it was the first FDA approval defined entirely by a genomic biomarker (microsatellite instability) rather than tumor site — a direct consequence of AI-assisted patient stratification identifying responder populations across diverse cancer types.

Not quite. The pembrolizumab MSI-H approval was landmark as the FDA's first tissue-agnostic cancer approval — any solid tumor with microsatellite instability qualified, regardless of organ. This demonstrated that AI-enabled genomic patient stratification could identify responder populations cutting across traditional organ-based oncology categorization.

Lesson 4 Lab

ADMET and Clinical Strategy Advisor

AI-assisted discussion · minimum 3 exchanges to complete

Lab Scenario

Your lead compound has excellent target potency (IC₅₀ = 8 nM) but the ADMETlab prediction shows low predicted oral bioavailability (F ~18%), a moderate hERG liability signal (IC₅₀ predicted ~3 µM), and a pLDDT-predicted binding site that includes a lysine residue known to cause immunogenicity. You need to decide how to proceed before entering IND-enabling studies.

Start by asking: "Our lead has great potency but concerning ADMET flags — low oral bioavailability, a hERG signal, and a potential immunogenicity issue. How should we rank these problems and what are our options?"

ADMET & Clinical Strategy Advisor

AI Lab

Hello! I'm your ADMET and clinical strategy advisor. I can help you interpret computational ADMET predictions, prioritize safety concerns, think through medicinal chemistry strategies to improve pharmacokinetic profiles, and plan IND-enabling studies. What challenges are you facing with your lead compound?

Module 3 — Final Assessment

Drug Discovery and Molecular Design

15 questions · 80% required to pass

1. According to DiMasi et al. (2016), what fraction of drug candidates entering Phase I clinical trials ultimately receive FDA approval?

Correct. Approximately 9.6% of compounds entering Phase I receive approval — meaning roughly 90% of clinical entrants fail before reaching patients.

The figure is approximately 9–10%. The high attrition rate is the central economic driver of drug development costs.

2. What does ADMET stand for in drug discovery?

Correct. ADMET describes the five key pharmacokinetic/safety properties that determine whether a drug can survive the human body and reach its target safely.

ADMET stands for Absorption, Distribution, Metabolism, Excretion, and Toxicity — the five properties governing drug behavior in the body.

3. The estimated universe of drug-like molecules is approximately how large?

Correct. The chemical space of drug-like molecules is estimated at 10²³–10⁶⁰ — an astronomically larger number than any physical library, which is the fundamental argument for AI-guided exploration of chemical space.

The estimate is 10²³–10⁶⁰ drug-like molecules — dwarfing the ~10 million compounds in the largest physical HTS libraries by many orders of magnitude.

4. At the CASP14 competition in 2020, AlphaFold2 achieved what breakthrough?

Correct. AlphaFold2 predicted 3D protein structures from sequence with GDT scores above 92, comparable to experimental crystallography — effectively solving the 50-year-old protein folding problem.

AlphaFold2 solved the protein structure prediction problem — predicting 3D structure from amino acid sequence with experimental-level accuracy (GDT >92) at CASP14.

5. Anfinsen's dogma (1972, Nobel Prize) states that:

Correct. Anfinsen demonstrated that denatured ribonuclease A spontaneously refolded to its active conformation — proving that the sequence encodes the fold. This principle underpins all computational protein structure prediction.

Anfinsen's dogma states that a protein's 3D structure is fully determined by its amino acid sequence alone — the thermodynamic minimum energy fold is encoded in the sequence. This is the theoretical basis for AlphaFold and all computational structure prediction.

6. What co-evolutionary insight does AlphaFold2's Evoformer module primarily exploit?

Correct. Co-evolutionary coupling — pairs of residues that mutate together because they are spatially close — provides long-range structural constraints that AlphaFold2's Evoformer processes from multiple sequence alignments.

The key insight is co-evolution: residues that consistently co-mutate across thousands of homologous sequences are almost certainly in physical contact in the folded structure.

7. In the context of AlphaFold predictions, a pLDDT score below 50 most likely indicates:

Correct. Low pLDDT (below 50) consistently correlates with intrinsically disordered regions — protein segments that are genuinely flexible and do not adopt a single stable conformation. AlphaFold's own documentation confirms this interpretation.

pLDDT below 50 indicates an intrinsically disordered region — a segment without a stable fold in solution. AlphaFold itself recommends against interpreting low-pLDDT regions as structured.

8. What is a Variational Autoencoder (VAE) used for in molecular design?

Correct. A VAE learns to compress molecules into a continuous latent space and reconstruct them. New molecules can then be generated by sampling from this space or by navigating it toward property optima using Bayesian optimization.

A VAE encodes molecules into a continuous latent space — enabling novel molecule generation by sampling or optimizing within that mathematical space, guided by property objectives.

9. The discovery of halicin as an antibiotic (Stokes et al., Cell 2020) demonstrated which principle of AI in drug discovery?

Correct. Halicin was originally a diabetes drug. The GNN predicted antibiotic activity based on structural features learned from known antibiotics — demonstrating that ML can identify activity patterns invisible to human medicinal chemists working within traditional scaffold-based thinking.

Halicin showed that GNNs trained on bioactivity data can identify antibiotic properties in structurally unrelated molecules — exploring chemical space that conventional antibiotic discovery programs had completely overlooked.

10. Why is synthesizability an important constraint in AI generative molecular design?

Correct. Without synthesizability constraints, generative models produce chemically impossible or practically un-synthesizable molecules. Tools like AiZynthFinder and synthesizability score terms in reward functions address this critical gap.

Early generative models routinely proposed molecules that couldn't be synthesized. A beautiful computational molecule that cannot be made in the lab is worthless — synthesizability constraints are now essential in generative drug design.

11. Which ADMET property tends to have the lowest prediction accuracy by ML models, and what makes it difficult?

Correct. Oral bioavailability integrates intestinal permeability, gut metabolism, first-pass hepatic metabolism, and active efflux transporter effects — each individually hard to predict, and compounding in non-linear ways that make in silico accuracy typically only ~70–78% AUC.

Oral bioavailability is typically the hardest ADMET endpoint to predict — it's a composite of multiple complex in vivo processes (absorption, first-pass metabolism, efflux) that don't combine in simple additive ways.

12. BenevolentAI's identification of baricitinib for COVID-19 is best characterized as:

Correct. BenevolentAI used a knowledge graph and literature mining to identify baricitinib — an approved rheumatoid arthritis drug — as a repurposing candidate for COVID-19, published in The Lancet in February 2020 and later confirmed in clinical trials.

This is AI-assisted drug repurposing — using a knowledge graph to identify new therapeutic potential in an already-approved drug, dramatically shortening the development timeline because safety data already exists.

13. The FDA's 2019 tissue-agnostic approval of pembrolizumab for MSI-high tumors was significant because:

Correct. The MSI-H pembrolizumab approval was the FDA's first tumor-agnostic approval — defined entirely by a genomic biomarker, not tissue type. This demonstrated how AI-enabled genomic stratification is reshaping precision oncology approvals.

The MSI-H approval was landmark as the first FDA tissue-agnostic approval: any solid tumor with microsatellite instability qualified, regardless of organ. This is a direct result of AI-assisted genomic patient stratification in oncology.

14. What is the primary purpose of graph neural networks (GNNs) when applied to molecular property prediction?

Correct. GNNs represent molecules as graphs — atoms as nodes, bonds as edges — and use message-passing operations to learn chemical embeddings that encode local and global structural features, enabling accurate property prediction without requiring hand-crafted molecular descriptors.

GNNs represent molecules as graphs (atoms = nodes, bonds = edges) and learn chemical representations through message passing — capturing structural relationships that traditional molecular descriptors miss, enabling powerful property prediction models.

15. The European Medicines Agency's 2023 Reflection Paper on AI in medicines development emphasizes "explainability" as a key criterion. What does this requirement most directly address?

Correct. Regulatory explainability requirements address the fundamental concern that a model can be statistically accurate for the wrong reasons — "Clever Hans" models that correlate with confounders rather than true causal mechanisms. Interpretable models allow reviewers to judge whether the underlying reasoning is scientifically valid.

Explainability means being able to articulate why a model made a prediction — not just what it predicted. Regulators need to judge whether the model's reasoning reflects real biology, not spurious statistical correlations in training data.