Newswise — New forms of fentanyl are created every day. For law enforcement, this is a challenge. How do you identify a chemical you’ve never seen before?
Researchers at Lawrence Livermore National Laboratory (LLNL) aim to answer that question using a machine learning model that can distinguish opioids from other chemicals with more than 95% accuracy in a laboratory setting. The basis for this new method was published in Analytical Methods.
Now, to identify synthetic opioids like fentanyl, chemists are trying to match their characteristics to a library of hundreds of known samples. But research suggests there may be thousands of unknown forms, some more dangerous than others. Recognizing these new versions requires a reference-free identification system, a way to capture opioids that do not yet exist in chemical databases.
“When law enforcement discovers a new clandestine drug trade, those labs are often producing fentanyl derivatives they’ve never seen before. They can’t just go check the database, and they can’t just go back and ask who made it and how they made it,” said author Colin Ponce, a computational mathematician at LLNL. “And we’re going to have another sample taken tomorrow, so law enforcement needs to quickly identify the samples that we find. I think this is a bit of a unique situation.”
Machine learning may seem like a natural fit for identifying new or unknown opioids. And to some extent that is true. This method is most effective for large datasets that are difficult to generate for toxic substances such as synthetic opioids.
The team also needed to generate chemical data to get the machine learning algorithm off the ground. They did this by combining LLNL’s mass spectrometry capabilities with an autosampler, allowing them to measure hundreds of samples under the same experimental conditions. This minimized the variables in the machine learning algorithm.
“In the world of AI, data is gold, and without good data, you can’t generate accurate machine learning models,” says LLNL chemist and author Carolyn Fisher. “Good data is what we can manage and generate at LLNL.”
With that data in hand, they tried various machine learning techniques and landed on the best method: a random forest model.
“When a model like this finally makes it into the hands of a user, its output must be interpretable and reliable,” said Kourosh Arasteh, a scientist at LLNL and author. “We considered machine learning techniques ranging from simple regression and random forests to more complex neural network approaches to balance interpretability and performance.”
The Random Forest approach is performed through a collection of decision trees. Each tree asks a series of questions about the data and, based on each answer, derives a prediction of whether it is an opioid or not. Together they vote on the final classification.
“Our 650 samples is not the same as having 300,000 samples. On the machine learning side, we needed to make sure we were designing techniques that were appropriate for that kind of scale,” Ponce says.
In this study, we used analytically pure samples to train and test our algorithm. These ideal chemicals are free of contaminants and impurities.
“The challenge is that nothing in the real world is analytically pure,” Fisher says. “The next step is to add background noise to help the AI understand what to consider during the classification task.”
Fisher and Ponce emphasized that this research would not have been possible without collaboration across the fields of data science and chemistry. The two are friends outside of work, and this research, an institute-led research and development project, grew out of a series of spontaneous conversations between them.
“To me, this project represents exactly what LLNL does best,” said fellow author and LLNL Software Engineer Steven Magana-Zook. “When chemists and data scientists work together, we get results that neither group could achieve alone. That kind of cross-disciplinary work is what makes this place so powerful.”
Although this approach is essential to the work, it initially proved to be an obstacle. The research team faced rejection of their manuscript from two journals. Chemistry reviewers did not fully grasp the machine learning aspects, and computational experts felt uncertain about chemistry.
“I don’t think people talk about failure enough. Failure is common in science. We fail far more often than we succeed,” Fisher said. “But we continue to iterate and improve. I’m proud of our resilience.”
The team’s tenacity paid off. Looking forward, we aim to further develop our algorithm using real-world samples with higher background signals.
Other LLNL co-authors include Roald Leif, Alex Vu, Mark Dreyer, Brian Mayer, and Audrey Williams.
