Building AI models that understand chemical principles | Massachusetts Institute of Technology News

Machine Learning


Of all possible compounds, it is estimated that 10 are involved.20 and 1060 It may have potential as a small molecule drug.

Experimentally evaluating each of these compounds would be far too time-consuming for chemists. In recent years, researchers have begun to use artificial intelligence to identify compounds that could be good drug candidates.

One of these researchers is MIT Associate Professor Connor Coley PhD ’19, Class of 1957, Associate Professor of Career Development, who holds joint appointments in the departments of Chemical Engineering, Electrical Engineering, and Computer Science and the MIT Schwarzman College of Computing. His research straddles the boundaries of chemical engineering and computer science, developing and deploying computational models to analyze vast numbers of possible compounds, design new compounds, and predict the reaction pathways that might produce those compounds.

“This is a very general approach that can be applied to any application of organic molecules, but the main application we see is small molecule drug discovery,” he says.

The intersection of AI and science

Corey’s interest in science runs in his family. In fact, he says there are more scientists in his family than non-scientists. This includes her father, who is a radiologist. His mother earned a degree in molecular biophysics and biochemistry before attending the MIT Sloan School of Management. And his grandmother is a mathematics professor.

As a high school student in Dublin, Ohio, Corey participated in Science Olympiad competitions and graduated from high school at age 16. He then headed to Caltech, where he chose chemical engineering as his major. Because chemical engineering allows you to combine your interests in science and mathematics.

During my undergraduate years, I also developed an interest in computer science, working in a structural biology laboratory to solve protein crystal structures using the Fortran programming language. After graduating from Caltech, he decided to pursue chemical engineering and came to MIT in 2014 to pursue his Ph.D.

With advice from Professors Clafs Jensen and William Green, Corey worked on ways to optimize automated chemical reactions. His research focused on combining machine learning and chemoinformatics (the application of computational methods to analyze chemical data) to plan reaction pathways that can create new drug molecules. He also worked on designing hardware that could be used to perform these reactions automatically.

Part of that research was done through a DARPA-funded program called Make-It. The program focused on using machine learning and data science to improve the synthesis of drugs and other useful compounds from simple building blocks.

“That was my real entry point into thinking about chemoinformatics, thinking about machine learning, and thinking about how models can be used to understand how different chemicals are made and what reactions are possible,” Corey says.

Corey began applying for faculty jobs as a graduate student and accepted an offer from MIT when he was 25 years old. He received mixed advice about jobs at the same school he attended graduate school, and ultimately decided that the position at MIT was too attractive to turn down.

“MIT is a very special place in terms of resources and mobility between departments. MIT seems to be doing a very good job of supporting the intersection of AI and science, and has been able to continue to be a vibrant ecosystem,” he says. “The talent of our students, the enthusiasm of our students, and the incredible strength of collaboration definitely outweighed any potential concerns about staying in the same place.”

chemistry intuition

Corey deferred his faculty position for a year to do a postdoc at the Broad Institute, where he sought more experience in chemical biology and drug discovery. There, he worked on ways to identify small molecules that could bind and interact with disease-associated mutant proteins among the billions of candidates contained in DNA-encoded libraries.

After returning to MIT in 2020, he founded a research group with the mission of deploying AI to not only synthesize existing compounds with therapeutic effects, but also to design new molecules with desirable properties and new methods for their production. Over the past few years, his lab has developed a variety of computational approaches to address these goals.

“We try to think about how best to combine chemical challenges with potential computational solutions, and often that combination motivates the development of new methods,” Corey says. One of the models his lab developed, known as ShEPhERD, was trained to evaluate potential new drug molecules based on how they interact with target proteins, based on the drug molecule’s three-dimensional shape. This model is currently used by pharmaceutical companies to help discover new drugs.

“We are trying to give generative models more medicinal chemistry intuition, so they are aware of the appropriate criteria and considerations,” Corey says.

In another project, Coley’s lab developed a generative AI model called FlowER. It can be used to predict the reaction products that result from combining different chemical inputs.

In designing their model, the researchers incorporated an understanding of fundamental physical principles such as the law of conservation of mass. The model also had to take into account the feasibility of intermediate steps that need to take place on the path from reactants to products. The researchers found that these constraints improved the accuracy of the model’s predictions.

“Thinking about these intermediate steps, the mechanisms involved, and how reactions unfold is very natural for chemists. It’s how chemistry is taught, but it’s not what models inherently think about,” Corey says. “We, like chemistry experts, have spent a lot of time thinking about how to make sure our machine learning models are grounded in our understanding of reaction mechanisms.”

Students in his lab also work on a variety of areas related to the optimization of chemical reactions, including computer-assisted structure elucidation, laboratory automation, and optimal experimental design.

“Through these different research threads, we hope to advance the frontiers of AI in chemistry,” Corey says.



Source link