Generative models enable the creation of new compounds for drug discovery, reducing resource intensity

Machine Learning


Identifying promising drug candidates (a process known as hit generation) typically requires extensive laboratory work and significant resources, but new research is investigating whether artificial intelligence can streamline this critical early step. Nagam Osman and Laura Toni from University College London, in collaboration with Vittorio Lembo and Giovanni Bottegoni from the University of Urbino, have demonstrated that machine learning models can effectively design novel molecules with properties suggestive of potential biological activity. This study represents an important step forward as it directly assesses the ability of these models to generate “hit-like” compounds that can effectively serve as virtual starting points for drug discovery, and the research team is validating this approach by synthesizing several promising candidates and confirming activity in laboratory tests. By establishing dedicated evaluation frameworks and benchmarking different production models, scientists have shown that these techniques can generate diverse and relevant compounds, with the potential to accelerate the search for new drugs and reduce reliance on traditional, expensive screening methods.

Generative model of diverse protein target binding

This study details the use of generative models, DiGress, MolRNN, and GraphINVENT. de novo Drug design focused on generating molecules that bind to seven protein targets: ADORA2A, D3R, GSK-3β, HSP90α, PPARα, SRC, and thrombin. Each model was trained using variations such as reinforcement learning, a focus on drug-like properties, and further refinement of those drug-like models. The team evaluated the resulting molecules using docking scores, lower scores indicating better binding, KL divergence to assess similarity to known ligands, and analysis of physicochemical properties such as molecular weight and LogP. Results generally show that the model performs well with low KL divergence, suggesting that the generated molecules have docking score distributions similar to known ligands, although PPARα, SRC, and thrombin presented additional challenges.

Models like Hit consistently performed better than models trained with reinforcement learning, and fine-tuning often improved performance. In particular, the compounds generated for GSK-3β exceeded both existing ligand sets and hit-like inhibitors in activity values, demonstrating the potential to generate novel molecules with improved activity. Analysis of the binding conformation revealed important interactions within the GSK-3β binding site. This study highlights the potential of generative models to: de novo Drug design with hit-like training strategies that have proven particularly effective.

Generative model for creating hit-like molecules

This study pioneers a new approach to drug discovery, investigating whether generative models can efficiently create hit-like molecules and streamline the initial hit identification step. Researchers focused on generating compounds suitable for direct incorporation into traditional screening workflows, explicitly framing hit-like molecule generation as a standalone task. The team benchmarked and trained autoregressive and diffusion-based generative models across a variety of datasets and configurations. The generated molecules were subjected to rigorous evaluation using a multistage filter pipeline that defined a hit-like chemical space based on physicochemical properties, structural features, and predicted biological activity. In our experiments, we used standard metrics along with target-specific docking scores to comprehensively assess the quality of the generated compounds. While the synthesis and in vitro confirmation of activity of several GSK-3β hits demonstrated the practical applicability of the generation approach, this study also identified limitations of current evaluation metrics and gaps in available training data.

Deep learning generates drug discovery compounds

Scientists have demonstrated that deep learning models can effectively generate compounds suitable for the early stages of drug discovery, potentially streamlining the hit identification process. This study is the first to explicitly frame the generation of hit-like molecules as an independent task and empirically assess whether generative models can directly support this critical step in pharmaceutical research. The team benchmarked autoregressive and diffusion-based generative models and evaluated their output across multiple datasets and training configurations using a new evaluation framework. Experiments revealed that these models were successful in producing potent and diverse biologically relevant compounds, demonstrating their ability to create molecules with properties consistent with known drug candidates. A multistage filtering pipeline integrating physicochemical properties, structural features, and bioactivity criteria was developed to evaluate the generated molecules. Several GSK-3β hits generated by the model were synthesized and confirmed to be active in vitro, validating the approach and demonstrating the ability of the model to generate truly bioactive molecules.

Deep learning generates effective drug candidates

This study demonstrates that deep learning models can successfully generate novel compounds that exhibit properties suitable for early drug screening and effectively address critical steps in drug development. By framing the generation of hit-like molecules as a separate task, scientists show that these generative models can generate chemically active and diverse compounds with measurable biological activity, providing a potential route to accelerate the early stages of drug discovery. The research team confirmed the activity of several of the compounds produced in clinical tests, validating the approach and highlighting its potential for identifying promising drug candidates. Lack of training data consisting of high-quality, hit-like molecules limited performance, especially for specific protein targets. Standard metrics used to assess molecular diversity and similarity do not consistently match predicted biological activity, suggesting the need for more biologically relevant benchmarks. Future work will focus on creating richer datasets and developing improved model architectures that can effectively learn from limited target-specific data.



Source link