To be trusted, LLMs need to demonstrate their accomplishments

Machine Learning


Introduction to AI chemists

Artificial intelligence and large-scale language models offer promising ways to interpret vast amounts of data, but there are a number of caveats. In this C&EN column, we discuss what technology can do now, what it can do in the future, and what it shouldn’t do. All written by expert contributors.

Drug discovery is really, really difficult, and most new drug candidates fail. Humans are variable, the animals used in testing are not humans, pharmacokinetics and pharmacodynamics are difficult to predict, and unexpected off-target effects can cause toxicity. With the advent of artificial intelligence and machine learning tools such as AlphaFold, there is growing excitement about the potential to accelerate early-stage drug discovery. Even AI skeptics who professionally criticize the usefulness and ethics of ChatGPT and other large-scale language models (LLMs) often say: all bad. “

I’m not sure I agree that software like AlphaFold is an exception.

Computer-aided drug design (CADD) uses models of proteins and compounds to prioritize compounds to study as drug candidates. This approach accelerates the early stages of drug development by helping to focus attention on the most likely candidates while considering a frankly vast number of candidates (trillions of compounds). To be fair, medicinal chemists have always considered protein structure when designing drugs, but previously the tools and availability of protein structure were more limited. Over the past 50 years, computational tools have slowly (and increasingly rapidly) improved to address docking, molecular dynamics, free energy perturbation calculations, and single-point quantum mechanics calculations. The computational power available to perform these calculations has increased, and experimentally confirmed protein conformations have become available. However, using these tools and retrieving the data on which they depend requires a great deal of expertise.

When AlphaFold came along, suddenly every structure of every protein was available to everyone at the click of a button. Subsequently, LLM-guided docking emerged, simplifying protein-ligand CADD docking and greatly democratizing protein-ligand interaction screening. As I was writing this column, a high school student approached me and a colleague and asked us to help him test some drug candidates he had identified through “vibe CADDing,” where he talked to an AI assistant to perform CADD. All steps of CADD can be performed without the help of PhDs from different disciplines and without having studied any organic or physical chemistry, let alone quantum or medicinal chemistry.

The problem is that, in practice, conducting these studies still requires extensive collaboration. Protein Data Bank (PDB) structures are single frozen conformations of highly dynamic proteins. The structure cannot be used directly in CADD studies without considering the mechanics, the protein’s specific natural environment, and the effects of ligands immersed in the protein as a way to create crystals or reduce motion enough to obtain a good cryo-EM ensemble.

Structural biologists are aware of these considerations. Many proteins have dozens (or hundreds) of different entries in the PDB. These entries exist because their structures are not all the same. Although computational all-atom simulations can be used to model some of this dynamics, the assumptions used in these calculations are an abstraction of physical reality. And assuming that . . . That leads to strange errors for you and me. However, if you run this process regularly, you will know what to look for and how to validate your model.

If a protein is not present in the PDB, homology modeling can be used to attempt to recreate a reasonable 3D structural ensemble. Because the models are built on unstable foundations, it can take months or years for highly skilled experts to validate them against unique, integrated experimental and computational data.

In contrast, AlphaFold yields structures in seconds. Using it will be very efficient. But my team often sees AlphaFold make the very human mistake of imposing structural order where there is only chaos. Proteins with long, unstructured domains that fly around frequently cannot be captured by current technology. Therefore, these parts (or the whole) of the protein are not included in the PDB dataset used to train AlphaFold. A road model trained only on grid cities cannot adequately predict the street plans of Rome or Boston. The problem is that the available data is biased toward well-organized structures. But it’s not a big deal.

The main problem is that I don’t know how the program got the structure. It is not possible to trace back the process and see the assumptions, procedures, and models used to generate the structure. I can’t confirm if that’s correct. Obtaining a good protein model (note that it is not a structure; almost all useful CADD models are collections of individual conformations with dynamic modeling components) is the most important step in CADD.

Good science requires a “controlling chain of reason” that connects input data and observations to output conclusions. Every step of the logic chain must be auditable by our colleagues and anyone who wants to make use of our work. We must be able to challenge and, in some cases, falsify every decision point. That way, when something inevitably goes wrong, you’ll be able to figure out where you went wrong. Protein structure models cannot do this.

Therefore, I do not use AlphaFold (or similar LLM) to generate structures. I made this decision because the only way to validate what the program offers is to check it using an “old school” (i.e. circa 2020, but with better computing tools in 2026) protein preparation approach. If the program’s product is more or less accurate than mine, the results will be interesting. It’s interesting even if it’s completely wrong and the program is leading others down a path of futility.

Of course, this is not to say that these models are always wrong. That suggestion is far from the truth. The problem is that it’s not everytime right. And if you’re trusting them for CADD (and not as pretty images) then you should. Sometimes the old ways are wrong. But once the data starts coming back, you know why it’s wrong and you have a chain of custody for why, so you can make adjustments. For LLM, there is only one step. If you can’t test the “why?” at each step in the chain, you won’t be able to determine whether your hypothesis is realistic and worth spending a lot of money to test. I also don’t know if it would be a better idea to host a huge bonfire party fueled by investors’ cash. And that situation makes me nervous. I’m too nervous to take the LLM.


Portrait of John Trant

Credit: Provided by John Trant

John Trant He is Associate Professor and Faculty of Science Research Chair at the University of Windsor, where he leads an interdisciplinary team of scientists working on molecular problems of relevance to society.

The views expressed are those of the authors and not necessarily those of C&EN or the American Chemical Society.

Do you have a perspective on the use of AI in chemistry that you would like to share with our readers? Email editor Chris Gorski. c_gorski@acs.org Please suggest a column you would like to write.



Source link