Optimizing drug design by merging generative AI with a physics-based active learning framework

Machine Learning


  • Yang, H. et al. admetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 35, 1067–1069 (2019).

    CAS 
    PubMed 

    Google Scholar 

  • Huang, T. et al. MOST: most-similar ligand based approach to target prediction. BMC Bioinform. 18, 165 (2017).

    Google Scholar 

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Corso G., Stärk H., Jing B., Barzilay R. & Jaakkola T. DiffDock: diffusion steps, twists, and turns for molecular docking. [Preprint] Available at: http://arxiv.org/abs/2210.01776 [accessed 30 April 2024] (2023).

  • Singh, R., Sledzieski, S., Bryson, B., Cowen, L. & Berger, B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc. Natl. Acad. Sci. USA 120, e2220778120 (2023).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Priya, S., Tripathi, G., Singh, D. B., Jain, P. & Kumar, A. Machine learning approaches and their applications in drug discovery and design. Chem. Biol. Drug Des. 100, 136–153 (2022).

    CAS 
    PubMed 

    Google Scholar 

  • Dara, S., Dhamercherla, S., Jadav, S. S., Babu, C. M. & Ahsan, M. J. Machine learning in drug discovery: a review. Artif. Intell. Rev. 55, 1947–1999 (2022).

    PubMed 

    Google Scholar 

  • Bilodeau, C., Jin, W., Jaakkola, T., Barzilay, R. & Jensen, K. F. Generative models for molecular discovery: recent advances and challenges. WIREs Comput. Mol. Sci. 12, e1608 (2022).

    Google Scholar 

  • Tong, X. et al. Generative models for de novo drug design. J. Med. Chem. 64, 14011–14027 (2021).

    CAS 
    PubMed 

    Google Scholar 

  • Zeng, X. et al. Deep generative molecular design reshapes drug discovery. Cell Rep. Med. 3, 100794 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bian, Y. & Xie, X.-Q. Generative chemistry: drug discovery with deep learning generative models. J. Mol. Model. 27, 71 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gm, H., Gourisaria, M. K., Pandey, M. & Rautaray, S. S. A comprehensive survey and analysis of generative models in machine learning. Comput. Sci. Rev. 38, 100285 (2020).

    Google Scholar 

  • Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Polykovskiy, D. et al. Entangled conditional adversarial autoencoder for de novo drug discovery. Mol. Pharm. 15, 4398–4405 (2018).

    CAS 
    PubMed 

    Google Scholar 

  • Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de-novo drug design. Sci. Adv. 4, eaap7885 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, Y., Zhang, L. & Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminform. 10, 33 (2018).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Jin W., Barzilay R. & Jaakkola T. Multi-objective molecule generation using interpretable substructures. [Preprint] Available at: http://arxiv.org/abs/2002.03244 [accessed 7 May 2024](2020).

  • Maziarka, Ł. et al. Mol-CycleGAN: a generative model for molecular optimization. J. Cheminform. 12, 2 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Grisoni, F. et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, eabg3338 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    PubMed 
    PubMed Central 

    Google Scholar 

  • You J., Liu B., Ying R., Pande V. & Leskovec J. Graph convolutional policy network for goal-directed molecular graph generation. [Preprint] Available at: http://arxiv.org/abs/1806.02473 [accessed 9 May 2023] (2019).

  • Born, J. et al. Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2. Mach. Learn. Sci. Technol. 2, 025024 (2021).

    Google Scholar 

  • Flam-Shepherd, D., Zhu, K. & Aspuru-Guzik, A. Language models can learn complex molecular distributions. Nat. Commun. 13, 3293 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    CAS 
    PubMed 

    Google Scholar 

  • Kadurin, A. et al. The cornucopia of meaningful leads: applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget 8, 10883–10890 (2016).

    PubMed Central 

    Google Scholar 

  • Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inform. 37, 1700153 (2018).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    CAS 
    PubMed 

    Google Scholar 

  • Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. 60, 19477–19482 (2021).

    CAS 

    Google Scholar 

  • Krishnan, S. R., Bung, N., Bulusu, G. & Roy, A. Accelerating de novo drug design against novel proteins using deep learning. J. Chem. Inf. Model. 61, 621–630 (2021).

    CAS 
    PubMed 

    Google Scholar 

  • Korshunova, M. et al. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun. Chem. 5, 1–11 (2022).

    Google Scholar 

  • Chen, Y. et al. Deep generative model for drug design from protein target sequence. J. Cheminform. 15, 38 (2023).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kyro, G. W., Morgunov, A., Brent, R. I. & Batista, V. S. ChemSpaceAL: an efficient active learning methodology applied to protein-specific molecular generation. J. Chem. Inf. Model. 64, 653–665 (2024).

    CAS 
    PubMed 

    Google Scholar 

  • Loeffler, H., Wan, S., Klähn, M., Bhati, A. & Coveney, P. Optimal molecular design: generative active learning combining REINVENT with absolute binding free energy simulations. https://chemrxiv.org/engage/chemrxiv/article-details/662a030e418a5379b0aa3994. [Preprint] accessed 7 May 2024 (2024).

  • Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun. 15, 3408 (2024).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, 1700111 (2018).

    PubMed 

    Google Scholar 

  • Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).

    Google Scholar 

  • Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Swanson, K. et al. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics. Nat. Mach. Intell. 6, 338–353 (2024).

    Google Scholar 

  • Mohammadi S., O’Dowd B., Paulitz-Erdmann C. & Goerlitz L. Penalized variational autoencoder for molecular design. [Preprint] Available at: https://chemrxiv.org/engage/chemrxiv/article-details/60c74169f96a0012ee286438 [accessed 8 May 2024] (2019).

  • Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076 (2022).

    CAS 
    PubMed 

    Google Scholar 

  • Ahmad W., Simon E., Chithrananda S., Grand G. & Ramsundar B. ChemBERTa-2: towards chemical foundation models. [Preprint] Available at: http://arxiv.org/abs/2209.01712 [accessed 23 March 2025] (2022).

  • Balaji S., Magar R., Jadhav Y. & Farimani A. B. GPT-MolBERTa: GPT molecular features language model for molecular property prediction. [Preprint] Available at: http://arxiv.org/abs/2310.03030 [accessed 27 May 2025] (2023).

  • Huang, L. et al. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nat. Commun. 15, 2657 (2024).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Oestreich, M. et al. DrugDiff: small molecule diffusion model with flexible guidance towards molecular properties. J. Cheminform. 17, 23 (2025).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Alakhdar, A., Poczos, B. & Washburn, N. Diffusion models in de novo drug design. J. Chem. Inf. Model. 64, 7238–7256 (2024).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kingma, D. P. & Welling, M. An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307–392 (2019).

    Google Scholar 

  • Ali, S. & van Kaick, O. Evaluation of latent space learning with procedurally-generated datasets of shapes. In Proc. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 2086–2094 (IEEE, 2021).

  • Ding, X. et al. Active learning for drug design: a case study on the plasma exposure of orally administered drugs. J. Med. Chem. 64, 16838–16853 (2021).

    CAS 
    PubMed 

    Google Scholar 

  • Vasanthakumari, P. et al. A comprehensive investigation of active learning strategies for conducting anti-cancer drug screening. Cancers 16, 530 (2024).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Eisenstein, M. Active machine learning helps drug hunters tackle biology. Nat. Biotechnol. 38, 512–514 (2020).

    CAS 
    PubMed 

    Google Scholar 

  • Bertin, P. et al. RECOVER identifies synergistic drug combinations in vitro through sequential model optimization. Cell Rep. Methods 3, 100599 (2023).

  • Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 12, 7866–7881 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nahal, Y. et al. Human-in-the-loop active learning for goal-oriented molecule generation. J. Cheminform. 16, 138 (2024).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Borrelli, K. W., Cossins, B. & Guallar, V. Exploring hierarchical refinement techniques for induced fit docking with protein and ligand flexibility. J. Comput. Chem. 31, 1224–1235 (2010).

    CAS 
    PubMed 

    Google Scholar 

  • Puch-Giner, I., Molina, A., Municoy, M., Pérez, C. & Guallar, V. Recent PELE developments and applications in drug discovery campaigns. Int. J. Mol. Sci. 23, 16090 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tadesse, S., Caldon, E. C., Tilley, W. & Wang, S. Cyclin-dependent kinase 2 inhibitors in cancer therapy: an update. J. Med. Chem. 62, 4233–4251 (2019).

    CAS 
    PubMed 

    Google Scholar 

  • Asghar, U., Witkiewicz, A. K., Turner, N. C. & Knudsen, E. S. The history and future of targeting cyclin-dependent kinases in cancer therapy. Nat. Rev. Drug Discov. 14, 130–146 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shapiro, G. I. Preclinical and clinical development of the cyclin-dependent kinase inhibitor flavopiridol. Clin. Cancer Res. 10, 4270s–4275s (2004).

    CAS 
    PubMed 

    Google Scholar 

  • Jackson, R. C., Barnett, A. L., McClue, S. J. & Green, S. R. Seliciclib, a cell-cycle modulator that acts through the inhibition of cyclin-dependent kinases. Expert Opin. Drug Discov. 3, 131–143 (2008).

    CAS 
    PubMed 

    Google Scholar 

  • Hunter, J. C. et al. Biochemical and structural analysis of common cancer-associated KRAS mutations. Mol. Cancer Res. 13, 1325–1335 (2015).

    CAS 
    PubMed 

    Google Scholar 

  • Biankin, A. V. et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491, 399–405 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • The Cancer Genome Atlas Research Network, Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

  • Neumann, J., Zeindl-Eberhart, E., Kirchner, T. & Jung, A. Frequency and type of KRAS mutations in routine diagnostic analysis of metastatic colorectal cancer. Pathol. Res. Pract. 205, 858–862 (2009).

    CAS 
    PubMed 

    Google Scholar 

  • Ostrem, J. M., Peters, U., Sos, M. L., Wells, J. A. & Shokat, K. M. K-Ras(G12C) inhibitors allosterically control GTP affinity and effector interactions. Nature 503, 548–551 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liu, J., Kang, R. & Tang, D. The KRAS-G12C inhibitor: activity and resistance. Cancer Gene Ther. 29, 875–878 (2022).

    CAS 
    PubMed 

    Google Scholar 

  • Kwan, A. K., Piazza, G. A., Keeton, A. B. & Leite, C. A. The path to the clinic: a comprehensive review on direct KRASG12C inhibitors. J. Exp. Clin. Cancer Res. 41, 27 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shin, Y. et al. Discovery of N-(1-Acryloylazetidin-3-yl)-2-(1H-indol-1-yl)acetamides as covalent inhibitors of KRASG12C. ACS Med. Chem. Lett. 10, 1302–1308 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mao, Z. et al. KRAS(G12D) can be targeted by potent inhibitors via formation of salt bridge. Cell Discov. 8, 1–14 (2022).

    Google Scholar 

  • Wang, X. et al. Identification of MRTX1133, a noncovalent, potent, and selective KRASG12D inhibitor. J. Med. Chem. 65, 3123–3133 (2022).

    CAS 
    PubMed 

    Google Scholar 

  • McInnes, L., Healy, J. & Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. [Preprint] Available at: http://arxiv.org/abs/1802.03426 [accessed 19 April 2023] (2020).

  • Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).

    CAS 
    PubMed 

    Google Scholar 

  • Boresch, S., Tettinger, F., Leitgeb, M. & Karplus, M. Absolute binding free energies: a quantitative approach for their calculation. J. Phys. Chem. B 107, 9535–9551 (2003).

    CAS 

    Google Scholar 

  • Bettayeb, K. et al. Meriolins, a new class of cell death inducing kinase inhibitors with enhanced selectivity for cyclin-dependent kinases. Cancer Res. 67, 8325–8334 (2007).

    CAS 
    PubMed 

    Google Scholar 

  • REAL Database-Enamine. Available at: https://enamine.net/compound-collections/real-compounds/real-database [accessed 14 April 2023].

  • infiniSee: the chemical space navigation platform for unlimited accessibles. BioSolveIT. Available at: https://www.biosolveit.de/products/infinisee/ [accessed 10 December 2024].

  • Kessler, D. et al. Drugging an undruggable pocket on KRAS. Proc. Natl. Acad. Sci. USA 116, 15823–15829 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

    CAS 
    PubMed 

    Google Scholar 

  • Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein−ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).

    CAS 
    PubMed 

    Google Scholar 

  • Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51, D1373–D1380 (2023).

    PubMed 

    Google Scholar 

  • ChemAxon, MedChemExpress: Master of Bioactive Molecules | Inhibitors, Screening Libraries & Proteins. MedchemExpress.com. Available at: https://www.medchemexpress.com/ [accessed 4 April 2023].

  • Lanman, B. A. et al. Discovery of a covalent inhibitor of KRASG12C (AMG 510) for the treatment of solid tumors. J. Med. Chem. 63, 52–65 (2020).

    CAS 
    PubMed 

    Google Scholar 

  • Zhang, Y. J. Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sastry, G. M., Adzhigirey, M., Day, T., Annabhimoju, R. & Sherman, W. Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 27, 221–234 (2013).

    PubMed 

    Google Scholar 

  • Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).

    CAS 
    PubMed 

    Google Scholar 

  • Halgren, T. A. Identifying and characterizing binding sites and assessing druggability. J. Chem. Inf. Model. 49, 377–389 (2009).

    CAS 
    PubMed 

    Google Scholar 

  • Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).

    Google Scholar 

  • Landrum, G. RDKit: Open-source cheminformatics. Available at: https://www.rdkit.org/ [accessed 4 April 2023].(2010).

  • Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Willett, P., Barnard, J. M. & Downs, G. M. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998).

    CAS 

    Google Scholar 

  • Morgan, H. L. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5, 107–113 (1965).

    CAS 

    Google Scholar 

  • Borrelli, K. W., Vitalis, A., Alcantara, R. & Guallar, V. PELE: protein energy landscape exploration. a novel Monte Carlo based technique. J. Chem. Theory Comput. 1, 1304–1311 (2005).

    CAS 
    PubMed 

    Google Scholar 

  • Lecina, D., Gilabert, J. F. & Guallar, V. Adaptive simulations, towards interactive protein-ligand modeling. Sci. Rep. 7, 8466 (2017).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Mobley, D. L. et al. Predicting absolute ligand binding free energies to a simple model site. J. Mol. Biol. 371, 1118–1134 (2007).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • MarvinSketch. https://chemaxon.com/marvin. Available at: https://chemaxon.com/marvin [accessed 19 April 2023].

  • Bookstein, A., Kulyukin, V. A. & Raita, T. Generalized hamming distance. Inf. Retr. 5, 353–375 (2002).

    Google Scholar 

  • Emanuel, S. et al. The in vitro and in vivo effects of JNJ-7706621: a dual inhibitor of cyclin-dependent kinases and aurora kinases. Cancer Res. 65, 9038–9046 (2005).

    CAS 
    PubMed 

    Google Scholar 



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *