AI rebuilds molecules from exploded debris

Machine Learning


Researchers at SLAC National Accelerator Laboratory in the US, XFEL in Europe, and partner institutions recently built a generative AI model that can reconstruct a molecule’s structure from the movement of its ions after it has been detonated with X-rays, using a technique called Coulomb explosion imaging. Data from the European XFEL’s Small Quantum Systems (SQS) instrument are critical to demonstrating the applicability of this newly developed method.

The study, published in Nature Communications, is an important step toward being able to take snapshots of molecules during chemical reactions, an advance that could have important implications for medicine and industry. This machine learning model accurately predicts the shape of a variety of molecules made up of fewer than 10 atoms, paving the way for applying the technique to larger molecules. “We are very excited to be able to provide the first author of this study,” said Xiang Li, associate scientist at SLAC’s Linac Coherent Light Source (LCLS) and lead author of the study. “This is the first AI model built to reconstruct molecular structure from Coulomb explosion imaging.” Rebecca Boll, SQS instrument scientist and co-author of the paper, added, “It is very difficult to piece together molecules exploded in real space from their recorded momentum. Artificial intelligence will help achieve this.”

A new way to see molecules

Currently, there are limited options available for imaging isolated gas-phase molecules. For example, electron microscopy cannot image floating molecules because the object must be fixed in place. Additionally, for diffraction-based techniques to work, the sample of molecules must be dense enough to produce a strong signal at the detector. The resulting image is technically an average of many molecules, which limits researchers from studying details that are only visible when imaging isolated molecules.

In the paper, the researchers focused instead on Coulomb explosion imaging. In this technique, a pulse of X-rays hits a single molecule in a vacuum chamber, stripping the molecule of its electrons. This leaves positive ions that explosively repel each other and hit the detector. Detectors can capture that momentum and use it to reconstruct the structure of the molecule. “This technology has the ability to separate small chemically relevant details,” said James Cryan, interim vice president for science and research and development at LCLS and co-author of the paper.

“We have previously used Coulomb explosion imaging with SQS with great success,” said co-author Michael Meyer, lead scientist for the European XFEL instrument. “But in many cases, computational constraints have so far made it impossible to actually reconstruct the molecular structure.” The X-ray pulse rapidly strips away the electrons, but the remaining ions do not explode immediately. During this short delay, the atoms may move slightly, making it difficult to reconstruct the original structure using Coulomb’s law of electrostatic forces. “Simply using that law is not accurate because it only works if the charging process is instantaneous,” Lee explained.

To make things even more complicated, each time an atom is added within a molecule, the complexity increases exponentially. “It’s very difficult to work backwards to get the original structure,” says co-author Fai Ho, a physicist at Argonne National Laboratory in the US. “It’s like breaking glass and trying to put the broken pieces back together. Many problems in modern physics and chemistry involve reconstructing hidden structures from indirect measurements. This work shows how AI can help tackle such reverse problems.”

Machine learning of molecular structures

The research team set out to build a machine learning model that could overcome this computing constraint. They developed and trained the model at SLAC’s Shared Science Data Facility (S3DF). Generative AI models are well-suited to this task because they “think” differently than standard computer simulations. Instead of working through a set of equations, it learns by finding patterns in the training data. Then use those patterns to make statistical predictions.

To collect training data, the team turned to a simulation Ho built. This simulation analyzes the molecular structure and calculates the momentum of ions after a Coulomb explosion. After running for more than a month, the compute-intensive simulations using both quantum mechanics and classical physics equations produced a dataset of 76,000 molecular samples.

The researchers initially trained the AI ​​on just this dataset, which was small by AI training standards, and found that the model was predicting inaccurate structures from the explosion data. So they added another dataset obtained using only classical physics and reran the training. The second set was less accurate but about 100 times larger than the first set.

This two-step training was the key to predicting accurate structures.

The researchers tested the AI ​​model to predict some molecular structures in simulated data that could not be seen through training. The model, which the researchers called MOLEXA (short for “Molecular Structure Reconstruction from Coulomb Explosion Imaging”), captured the ion’s momentum and calculated its most likely structure. “We found that this two-step training process reduced prediction error by a factor of two,” Li said.

The team then tested MOLEXA using experimental datasets recorded in SQS. The molecules they tested included water, tetrafluoromethane, and ethanol. They input experimental ion momentum into the model, reconstructed the molecular structure, and compared the reconstructions to known structures listed by the National Institute of Standards and Technology.

They found that their predictions largely overlapped with the established structure. Overall, the bond was in the correct position with only a slight change in angle. The positional error was typically less than half the length of a typical chemical bond. “In fact, this model performs better than that in most cases,” Lee added. “This is just a starting point for future studies, which will not only improve the accuracy of the model but also extend its applicability to larger molecular systems.”

Extension to larger molecules and chemical reactions

“Experiments like Coulomb explosion imaging often generate huge amounts of data that are difficult to interpret,” explains Sergei Molodotsov, Scientific Director of European XFEL. “By employing artificial intelligence to analyze this data, we will be able to expand the range of experiments that can be performed at our facility, allowing users to explore studies previously considered too complex.” In future work, the researchers plan to scale up the number of atoms that the machine learning model can recombine and apply the model to time-resolved experiments at LCLS and Europe’s XFEL. This helps researchers reconstruct snapshots of molecules in motion and create flipbook-like molecular movies that provide insight into how chemical reactions unfold.

The research team is also currently testing the model’s ability to reconstruct molecules from incomplete data. Detectors often miss ions produced in Coulomb explosions. For example, Mr. Li wants to know: Can AI reconstruct ethanol molecules even if one or more hydrogen ions are not registered in the detector?

If these issues are resolved, the technology could have further applications in biological and chemical research. For example, proteins are made up of thousands of atoms. “That’s really the goal,” Lee said. “It allows us to study systems that are more biologically or industrially relevant.”

The team also included researchers from Stanford University’s Pulse Institute. Stanford University; Kansas State University; Max Planck Institute for Nuclear Physics, Germany. Fritz Haber Institute, Germany. and Sorbonne University in France.



Source link