image:
Application of machine learning in untargeted analysis of environmental organic pollutants
view more
Credit: Liu Yuwei§, Xiong Haoyang§, Liu Jinhua, Xie Huaijun, Chen Jingwen
A new review highlights how machine learning is transforming the way scientists detect and measure organic pollutants in the environment, providing powerful new tools to overcome long-standing analytical challenges.
Environmental organic pollutants are extremely diverse, ranging from pharmaceuticals and pesticides to industrial additives and their conversion products. Many of these compounds lack commercially available reference standards, making it difficult to identify and quantify them using traditional analytical methods.
In a comprehensive review published in Artificial intelligence and environmentresearchers summarize recent advances in the application of machine learning to untargeted analysis based on liquid chromatography coupled with high-resolution mass spectrometry. This study outlines how data-driven models are reshaping both qualitative identification and quantitative estimation of pollutants.
Untargeted analysis can detect thousands of chemical signatures in a single environmental sample. However, typically only a small fraction of these signals can be reliably identified using existing spectral libraries. “Currently, less than a few percent of environmentally relevant compounds can be confidently identified using traditional workflows,” the authors explain. This data interpretation bottleneck severely limits the potential of high-resolution mass spectrometry in environmental science.
Machine learning offers a way forward.
According to the authors, machine learning models can predict tandem mass spectra from known molecular structures, effectively expanding spectral libraries in silico. These tools can also infer molecular formulas, structural fragments, and molecular fingerprints directly from experimental spectra to significantly narrow down candidate structures.
“Machine learning allows us to move from manual, expert-driven interpretation to automated, scalable analysis,” the authors said. “This allows us to extract complex relationships from high-dimensional spectral data that are extremely difficult to capture using traditional rule-based approaches.”
This review goes beyond identification and also focuses on advances in molecular production. Generative models can suggest plausible chemical structures directly from spectral information, even if the compound is not present in existing databases. This feature is particularly important for emerging contaminants and transformation products that have not been formally cataloged.
Orthogonal parameters such as retention time and collision cross section further increase identification reliability. This review describes how modern neural network models can accurately predict these properties across a variety of chromatography and ion mobility platforms, reducing false positives and improving structural confirmation.
Quantification presents additional challenges. Without authentic standards, it is difficult to convert signal strength into reliable concentration estimates. Recent machine learning approaches address this gap by predicting ionization efficiency and response coefficients based on molecular structure and experimental conditions. These models allow semi-quantitative analysis of all detected compounds without the need for reference standards.
“Reliable quantification is essential for exposure assessment and risk assessment,” the authors emphasize. “Machine learning-based prediction of ionization behavior provides a practical path to standard free quantification in large-scale screening.”
Despite rapid progress, significant challenges remain. Portability of models between instruments, limited representation of environmental contaminants in training datasets, and the need for improved interpretability are among the key issues discussed in this review. The authors call for multimodal learning strategies that integrate molecular features with experimental parameters and expanded databases that better reflect the environmental chemistry space.
Looking to the future, researchers envision an integrated and automated machine learning-driven screening platform that can combine identification, property prediction, and quantification within a unified framework.
“Future systems will be more accurate, transferable, and interpretable,” the authors conclude. “Such advances will enable scalable and intelligent screening of organic contaminants in complex environmental samples, ultimately supporting better environmental monitoring and public health protection.”
===
Reference magazines: Ryu, Y.-W; Xiong, H.-Y; Ryu, J.-H; et al. Application of machine learning in untargeted analysis of environmental organic pollutants. AI environment. 2026, 1(1): 11−22. DOI: 10.66178/aie-0026-0003
https://www.the-newpress.com/aie/article/doi/10.66178/aie-0026-0003
===
About the journal:
Artificial intelligence and environment is an international interdisciplinary platform for communicating basic and applied research advances at the intersection of environmental science and artificial intelligence (AI). It serves as an innovative, efficient and professional platform for researchers around the world across the fields of geoscience, environmental science, big data science and AI, and is dedicated to delivering discoveries from this rapidly expanding field of science. It is a peer-reviewed open access journal that publishes critical reviews, original research, rapid communication, perspectives, commentaries, and perspective papers.
follow us above facebook, ×and blue sky.
Research method
literature review
Research theme
not applicable
Article title
Application of machine learning in untargeted analysis of environmental organic pollutants
Article publication date
February 10, 2026
Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert! Use of Information by Contributing Institutions or via the EurekAlert System.
