Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).
Google Scholar
Schenone, M., Dančík, V., Wagner, B. K. & Clemons, P. A. Target identification and mechanism of action in chemical biology and drug discovery. Nat. Chem. Biol. 9, 232–240 (2013).
Google Scholar
Ashenden, S. K. in The Era of Artificial Intelligence, Machine Learning, and Data Science in the Pharmaceutical Industry Ch. 6 (Elsevier, 2021).
Smietana, K., Siatkowski, M. & Møller, M. Trends in clinical success rates. Nat. Rev. Drug Discov. 15, 379–380 (2016).
Google Scholar
Harrison, R. K. Phase II and phase III failures: 2013–2015. Nat. Rev. Drug Discov. 15, 817–818 (2016).
Google Scholar
Dowden, H. & Munro, J. Trends in clinical success rates and therapeutic focus. Nat. Rev. Drug Discov. 18, 495–496 (2019).
Google Scholar
Janai, J., Güney, F., Behl, A. & Geiger, A. Computer vision for autonomous vehicles: problems, datasets and state of the art. Found. Trends Comp. Graph. Vis. 12, 1–308 (2020).
Goldberg, S. B. et al. Machine learning and natural language processing in psychotherapy research: alliance as example use case. J. Couns. Psychol. 67, 438–448 (2020).
Google Scholar
Peterson, A. A. & Liu, D. R. Small-molecule discovery through DNA-encoded libraries. Nat. Rev. Drug Discov. 22, 699–722 (2023).
Google Scholar
Lim, K. S. et al. Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. J. Chem. Inf. Model. 62, 2316–2331 (2022).
Google Scholar
Hou, R., Xie, C., Gui, Y., Li, G. & Li, X. Machine-learning-based data analysis method for cell-based selection of DNA-encoded libraries. ACS Omega 8, 19057–19071 (2023).
Google Scholar
Van de Sande, B. et al. Applications of single-cell RNA sequencing in drug discovery and development. Nat. Rev. Drug Discov. 22, 496–520 (2023).
Google Scholar
Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Google Scholar
Chen, J. et al. Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data. Nat. Commun. 13, 6494 (2022).
Google Scholar
Godinez, W. J., Hossain, I., Lazic, S. E., Davies, J. W. & Zhang, X. A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics 33, 2010–2019 (2017).
Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).
Google Scholar
Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).
Google Scholar
Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118, e2105070118 (2021).
Google Scholar
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).
Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Google Scholar
Fernández-De Gortari, E., García-Jacas, C. R., Martinez-Mayorga, K. & Medina-Franco, J. L. Database fingerprint (DFP): an approach to represent molecular databases. J. Cheminform. 9, 9 (2017).
Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. https://doi.org/10.1038/s41589-023-01349-8 (2023).
Google Scholar
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Google Scholar
Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
Google Scholar
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2024).
Google Scholar
Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).
Google Scholar
Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with Deep Docking. Nat. Protoc. 17, 672–697 (2022).
Google Scholar
Tropsha, A., Isayev, O., Varnek, A., Schneider, G. & Cherkasov, A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat. Rev. Drug Discov. 23, 141–155 (2024).
Google Scholar
Acharya, A. et al. Supercomputer-based ensemble docking drug discovery pipeline with application to Covid-19. J. Chem. Inf. Model.60, 5832–5852 (2020).
Muratov, E. N. et al. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem. Soc. Rev. 50, 9121–9151 (2021).
Google Scholar
Sterling, T. & Irwin, J. J. ZINC 15 — ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).
Google Scholar
Rossetti, G. G. et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci. Rep. 12, 2505 (2022).
Google Scholar
Reymond, J. L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).
Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Google Scholar
Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145, 8736–8750 (2023).
Google Scholar
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. Preprint at arxiv.org/abs/1802.04364 (2018).
Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4, 180–186 (2022).
Google Scholar
Walters, W. P. & Murcko, M. Assessing the impact of generative AI on medicinal chemistry. Nat. Biotechnol. 38, 143–145 (2020).
Google Scholar
Cesaro, A., Bagheri, M., Torres, M., Wan, F. & de la Fuente-Nunez, C. Deep learning tools to accelerate antibiotic discovery. Expert Opin. Drug Discov. 18, 1245–1257 (2023).
Google Scholar
Rezende, D. J. & Mohamed, S. Variational inference with normalizing flows. In Proc. 32nd International Conference on Machine Learning 2, 1530–1538 (PMLR, 2015).
Shekhovtsov, A., Schlesinger, D. & Flach, B. VAE approximation error: ELBO and exponential families. Preprint at arxiv.org/abs/2102.09310 (2021).
Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at arxiv.org/abs/2001.09382 (2020).
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning 8867–8887 (2022).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 28, 31–36 (1988).
Google Scholar
Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol. 79, 102527 (2023).
Google Scholar
Flam-Shepherd, D., Zhu, K. & Aspuru-Guzik, A. Language models can learn complex molecular distributions. Nat. Commun. 13, 3293 (2022).
Google Scholar
Skinnider, M. A., Greg Stacey, R., Wishart, D. S. & Foster, L. J. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell. 3, 759–770 (2021).
Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Commun. Chem. 5, 129 (2022).
Ballarotto, M. et al. De novo design of Nurr1 agonists via fragment-augmented generative deep learning in low-data regime. J. Med. Chem. 66, 8170–8177 (2023).
Google Scholar
Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).
Google Scholar
Grisoni, F. et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, 3338–3349 (2021).
Google Scholar
Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inf. 37, 1700153 (2018).
Google Scholar
Vaswani, A. et al. Attention is all you need. Preprint at arxiv.org/abs/1706.03762 (2023).
Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076 (2021).
Google Scholar
Brown, N., Fiscato, M., Segler, M. H. S. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).
Google Scholar
Polykovskiy, D. et al. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).
Google Scholar
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Google Scholar
Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161–169 (2024).
Google Scholar
Born, J. & Manica, M. Regression Transformer enables concurrent sequence regression and generation for molecular language modelling. Nat. Mach. Intell. 5, 432–444 (2023).
Google Scholar
Frey, N. C. et al. Neural scaling of deep chemical models. Nat. Mach. Intell. 5, 1297–1305 (2023).
Google Scholar
Grechishnikova, D. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11, 321 (2021).
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Google Scholar
Stsiapanava, A. et al. Structure of the decoy module of human glycoprotein 2 and uromodulin and its interaction with bacterial adhesin FimH. Nat. Struct. Mol. Biol. 29, 190–193 (2022).
Google Scholar
Liu, H. et al. Cryo-EM structures of human hepatitis B and woodchuck hepatitis virus small spherical subviral particles. Sci. Adv. 8, eabo4184 (2022).
Google Scholar
Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chem. Sci. 14, 1443–1452 (2023).
Google Scholar
Yang, Q. et al. Structural comparison and drug screening of spike proteins of ten SARS-CoV-2 variants. Research 2022, 9781758 (2022).
Google Scholar
Yang, Q., Xia, D., Syed, A. A. S., Wang, Z. & Shi, Y. Highly accurate protein structure prediction and drug screen of monkeypox virus proteome. J. Infect. 86, 66–117 (2023).
Google Scholar
Ivanenkov, Y. A. et al. Chemistry42: an AI-driven platform for molecular design and optimization. J. Chem. Inf. Model. 63, 695–701 (2023).
Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Google Scholar
Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Google Scholar
Van Wart, H. E. & Birkedal-Hansen, H. The cysteine switch: a principle of regulation of metalloproteinase activity with potential applicability to the entire matrix metalloproteinase gene family. Proc. Natl Acad. Sci. USA 87, 5578–5582 (1990).
Google Scholar
Michaud, J. M., Madani, A. & Fraser, J. S. A language model beats AlphaFold2 on orphans. Nat. Biotechnol. 40, 1576–1577 (2022).
Google Scholar
Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).
Fang, X. et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nat. Mach. Intell. 5, 1087–1096 (2023).
Google Scholar
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2014).
Corso, G., Stärk, H., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arxiv.org/abs/2210.01776 (2022).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Google Scholar
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Google Scholar
Chaffin, M. et al. Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy. Nature 608, 174–180 (2022).
Google Scholar
Hughes, J. P., Rees, S. S., Kalindjian, S. B. & Philpott, K. L. Principles of early drug discovery. Br. J. Pharmacol. 162, 1239–1249 (2011).
Google Scholar
Goodnow, R. A. Hit and lead identification: integrated technology-based approaches. Drug Discov. Today Technol. 3, 367–375 (2006).
Google Scholar
Yang, L. et al. Transformer-based deep learning method for optimizing ADMET properties of lead compounds. Phys. Chem. Chem. Phys. 25, 2377–2385 (2023).
Google Scholar
Chen, Y., Yu, X., Li, W., Tang, Y. & Liu, G. In silico prediction of hERG blockers using machine learning and deep learning approaches. J. Appl. Toxicol. 43, 1462–1475 (2023).
Google Scholar
Sharma, B. et al. Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations. Sci. Rep. 13, 4908 (2023).
Google Scholar
Sun, D., Gao, W., Hu, H. & Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12, 3049–3062 (2022).
Google Scholar
Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 711–716 (2004).
Google Scholar
Lipinski, C. A. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004).
Google Scholar
Coutinho, A. L. et al. A robust, viable, and resource sparing HPLC-based log P method applied to common drugs. Int. J. Pharm. 644, 123325 (2023).
Google Scholar
Faller, B. & Ertl, P. Computational approaches to determine drug solubility. Adv. Drug Deliv. Rev. 59, 533–545 (2007).
Google Scholar
Aliagas, I., Gobbi, A., Lee, M. L. & Sellers, B. D. Comparison of log P and log D correction models trained with public and proprietary data sets. J. Comput. Aided Mol. Des. 36, 253–262 (2022).
Google Scholar
Win, Z. M., Cheong, A. M. Y. & Hopkins, W. S. Using machine learning to predict partition coefficient (log P) and distribution coefficient (log D) with molecular descriptors and liquid chromatography retention time. J. Chem. Inf. Model. 63, 1906–1913 (2023).
Google Scholar
Domingo-Almenara, X. et al. The METLIN small molecule dataset for machine learning-based retention time prediction. Nat. Commun. 10, 5811 (2019).
Google Scholar
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).
Google Scholar
Datta, R., Das, D. & Das, S. Efficient lipophilicity prediction of molecules employing deep-learning models. Chemometr. Intell. Lab. Syst. 213, 104309 (2021).
Prasad, S. & Brooks, B. R. A deep learning approach for the blind log P prediction in SAMPL6 challenge. J. Comput. Aided Mol. Des. 34, 535–542 (2020).
Google Scholar
Heijman, J., Voigt, N., Carlsson, L. G. & Dobrev, D. Cardiac safety assays. Curr. Opin. Pharmacol. 15, 16–21 (2014).
Google Scholar
Ackloo, S. et al. CACHE (Critical Assessment of Computational Hit-finding Experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6, 287–295 (2022).
Google Scholar
Swanson, K. et al. ADMET-AI: a machine learning ADMET platform for evaluation of large-scale chemical libraries. Zenodo https://doi.org/10.5281/zenodo.10372930 (2023).
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Google Scholar
Huang, R. et al. Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. https://doi.org/10.3389/fenvs.2015.00085 (2016).
Tingle, B. I. et al. ZINC-22—a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).
Google Scholar
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Google Scholar
Frye, L., Bhat, S., Akinsanya, K. & Abel, R. From computer-aided drug discovery to computer-driven drug discovery. Drug Discov. Today Technol. 39, 111–117 (2021).
Google Scholar
Zeng, W., Guo, L., Xu, S., Chen, J. & Zhou, J. High-throughput screening technology in industrial biotechnology. Trends Biotechnol. 38, 888–906 (2020).
Google Scholar
Sarkar, N. & Stokes, J. M. Practical applications of machine learning for anti-infective drug discovery. Med. Chem. Rev. 14, 345–375 (2023).
Arnold, A., Alexander, J., Liu, G. & Stokes, J. M. Applications of machine learning in microbial natural product drug discovery. Expert Opin. Drug Discov. 18, 1259–1272 (2023).
Google Scholar
Mullowney, M. W. et al. Artificial intelligence for natural product drug discovery. Nat. Rev. Drug Discov. 22, 895–916 (2023).
Google Scholar
Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).
Google Scholar
Grisoni, F. et al. Designing anticancer peptides by constructive machine learning. ChemMedChem 13, 1300–1302 (2018).
Google Scholar
Chen, J., Cheong, H. H. & Siu, S. W. I. xDeep-AcPEP: deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. J. Chem. Inf. Model. 61, 3789–3803 (2021).
Google Scholar
Walker, A. S. & Clardy, J. A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model. 61, 2560–2571 (2021).
Google Scholar
Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2023).
Wellawatte, G. P., Gandhi, H. A., Seshadri, A. & White, A. D. A perspective on explanations of molecular prediction models. J. Chem. Theory Comput. 19, 2149–2160 (2023).
Google Scholar
Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12, 3307 (2021).
Google Scholar
Ketkar, N. in Deep Learning with Python 97–111 (Apress, 2017).