To streamline drug discovery, the team developed an algorithmic framework to identify the best molecular candidates.

Machine Learning


This article has been reviewed in accordance with Science X's editorial processes and policies. Our editors have ensured the reliability of the content while highlighting the following attributes:

Fact-checked

Peer-reviewed publications

Proofread


An overview of SPARROW and its role in the molecular design cycle. Each molecule in the candidate set consists of molecular ideas from any combination of algorithmic or expert sources, annotated with its expected properties and potential synthetic routes. These annotations can be quantitative structure-property relationship models with or without uncertainty quantification, as well as computer-aided synthetic planning tools and/or human experts. SPARROW then weighs the utility and synthetic cost of each candidate as a batch rather than one at a time, and selects the best subset of candidates for synthesis and testing. In the retrosynthetic graph shown, orange circles represent reaction nodes. Pink, blue, and green circles represent target compounds, intermediates, and purchaseable compounds, respectively. Credits: Nature Computational Science (2024). DOI: 10.1038/s43588-024-00639-y

× close


An overview of SPARROW and its role in the molecular design cycle. Each molecule in the candidate set consists of molecular ideas from any combination of algorithmic or expert sources, annotated with its expected properties and potential synthetic routes. These annotations can be quantitative structure-property relationship models with or without uncertainty quantification, as well as computer-aided synthetic planning tools and/or human experts. SPARROW then weighs the utility and synthetic cost of each candidate as a batch rather than one at a time, and selects the best subset of candidates for synthesis and testing. In the retrosynthetic graph shown, orange circles represent reaction nodes. Pink, blue, and green circles represent target compounds, intermediates, and purchaseable compounds, respectively. Credits: Nature Computational Science (2024). DOI: 10.1038/s43588-024-00639-y

The use of AI to discover new drugs is becoming increasingly efficient, as researchers deploy machine learning models to identify molecules from billions of options that may have the desired properties for new drug development.

But weighing the cost of synthesizing the best candidates is no easy task, even when scientists use AI, because there are so many variables to consider, from the price of materials to the risk of something going wrong.

The myriad challenges associated with identifying the best and most cost-effective molecules to test are one of the reasons it takes so long to develop new drugs and a major factor in the soaring prices of prescription drugs.

To help scientists make cost-sensitive choices, MIT researchers have developed an algorithmic framework that automatically identifies optimal molecular candidates, maximizing the chance that they will have desirable properties while minimizing the cost of synthesis. The algorithm also identifies the materials and experimental steps needed to synthesize these molecules.

Their quantitative framework, known as Synthetic Planning and Reward-based Pathway Optimization Workflow (SPARROW), takes into account the cost of synthesizing large numbers of molecules at once, as multiple candidates are often derived from the same compound. Furthermore, this integrated approach collects key information on molecular design, property prediction, and synthetic planning from online repositories and widely used AI tools.

The paper is published in the journal Nature Computational Science.

As well as helping pharmaceutical companies discover new drugs more efficiently, SPARROW can also be used for applications such as inventing new pesticides and discovering speciality materials for organic electronics.

“Choosing compounds is really an art at this point, and sometimes a very successful art, but there are lots of other models and predictive tools that provide information about how molecules work and how they're synthesized, and you can and should use that information to make decisions,” said Conor Corey, Class of '57 Career Development Assistant Professor in the MIT Departments of Chemical Engineering, Electrical Engineering, and Computer Science, and senior author of the SPARROW paper.

In addition to Corey, lead author Jenna Frommer also contributed to the paper.

Complex cost considerations

In some sense, whether a scientist should synthesize and test a particular molecule comes down to the cost of synthesis versus the value of the experiment—but determining cost or value is itself a difficult problem.

For example, an experiment may require expensive materials or have a high risk of failure. In terms of value, you need to consider how useful it is to know the properties of this molecule, or whether there is a high level of uncertainty in predicting them.

At the same time, pharmaceutical companies are increasingly using batch synthesis to improve efficiency: instead of testing molecules one at a time, they use combinations of chemical building blocks to test multiple candidates at once. However, this means that the chemical reactions all require the same experimental conditions, which makes cost and value more difficult to estimate.

SPARROW tackles this challenge by considering common intermediate compounds involved in the synthesis of molecules and incorporating that information into the cost vs. value function.

“When you think of the optimization game of designing a set of molecules, the cost of adding a new structure depends on the molecules you've already selected,” Corey says.

The framework also takes into account the cost of starting materials, the number of reactions involved in each synthetic route, and the likelihood that those reactions will be successful on the first try.

To use SPARROW, scientists provide a set of molecular compounds they want to test and a definition of the properties they hope to discover.

From there, SPARROW collects information about the molecules and their synthetic routes, compares the value of each to the cost of synthesizing a batch of candidate compounds, automatically selects the optimal subset of candidate compounds that meet the user's criteria, and finds the most cost-effective synthetic routes for those compounds.

“We do all this optimization in one step, so all competing objectives can be met simultaneously,” Fromer says.

Multipurpose Framework

SPARROW is unique in that it can incorporate molecular structures that have been hand-designed by humans, that exist in a virtual catalogue, or never-before-seen molecules invented by generative AI models.

“We have many different sources of ideas, and one of the attractions of SPARROW is that it gives us the ability to treat all of those ideas equally,” Coley adds.

The researchers applied and evaluated SPARROW on three case studies: The case studies, based on real-world problems faced by chemists, were designed to test SPARROW's ability to find cost-effective synthetic plans while manipulating a variety of input molecules.

The researchers found that SPARROW could effectively capture the marginal costs of batch synthesis and identify common experimental steps and intermediate chemicals. Moreover, SPARROW could be scaled up to handle hundreds of potential molecular candidates.

“The machine learning for chemistry community has a lot of models that are effective for retrosynthesis and predicting molecular properties, for example, but how do you use them in practice? Our framework aims to unlock the value of this prior research. By creating SPARROW, we hope to guide other researchers in thinking about compound downselection using their own cost and utility functions,” Frommer says.

In the future, the researchers would like to introduce more complexity into SPARROW: for example, they would like the algorithm to take into account that the value of testing a single compound may not always be constant, and they would like to further factor in parallel chemistry into the cost vs. value function.

“Frommer and Corey's work brings algorithmic decision-making closer to the practical realities of chemical synthesis. With existing computational design algorithms, the task of how to best synthesize a set of designs is left to the medicinal chemist, resulting in suboptimal choices and unnecessary work for the medicinal chemist,” says Patrick Riley, senior vice president of artificial intelligence at Relay Therapeutics, who was not involved in the research.

“This paper provides a principled path for considering co-synthesis, which we hope will lead to higher quality and more acceptable algorithm designs.”

“Identifying which compounds to synthesize while providing useful new information, while carefully balancing time, cost, and the likelihood of progressing toward a goal, is one of the most challenging tasks for drug discovery teams.

“Frommer and Corey's SPARROW approach accomplishes this in an effective and automated way, providing a useful tool for human medicinal chemistry teams and taking an important step toward a fully autonomous approach to drug discovery,” added John Chodera, a computational chemist at Memorial Sloan Kettering Cancer Center, who was not involved in the research.

For more information:
Jenna C. Frommer et al. “An algorithmic framework for synthetic cost-aware decision-making in molecular design” Nature Computational Science (2024). DOI: 10.1038/s43588-024-00639-y

Journal Information:
Nature Computational Science



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *