Fine particulate matter with an aerodynamic diameter of 2.5 micrometers or less, known as PM2.5one of the most harmful air pollutants affecting both human health and the global climate. These tiny particles can penetrate deep into the lungs and bloodstream, causing respiratory and cardiovascular disease. afternoon2.5 is not a single substance, but a complex mixture of chemical components such as sulfates, nitrates, ammonium, organic matter, and elemental carbon. PM toxicity and environmental impact2.5 is strongly correlated with its chemical composition, detailed chemical information is essential for accurate health risk assessment and effective contamination control.
Despite its importance, obtaining high-resolution data on PM2.5 Chemical composition remains a major challenge. Traditional methods rely on expensive laboratory-based chemical analyzes or atmospheric chemical transport models, both of which have limitations. Chemical measurements are costly and labor intensive, while numerical models are sensitive to uncertainties in emissions inventories, meteorological conditions, and physicochemical mechanisms. These challenges create large data gaps that limit the ability of policymakers and researchers to track pollution sources and design targeted mitigation strategies.
To address this gap, a research team led by Professor Ting Yan, together with other researchers Dr. Hongyi Li, Dr. Yining Tan, and Professor Jihua Wang from the Institute of Atmospheric Physics (IAP) at the Chinese Academy of Sciences (CAS) in Beijing, China, and Dr. Yiming Du from the Shenyang Environmental Monitoring Center in Shenyang, China, investigated whether PM could be recovered using advanced artificial intelligence (AI) techniques.2.5 Determine chemical composition without relying on direct chemical measurements. This paper was published online on March 29, 2024 and published in the journal on May 1, 2025. Environmental Science Journal.
In this study, researchers developed an optimized deep learning framework that integrates convolutional neural networks (CNN), bidirectional long short-term memory networks (BiLSTM), and Bayesian optimization. The model was designed to capture both complex nonlinear relationships and temporal patterns in atmospheric data. Unlike previous machine learning approaches, this framework does not require prior knowledge about chemical composition as input features. Instead, it relies on 22 regularly monitored variables, including particulate matter concentrations, gaseous pollutants, meteorological parameters, indicators of atmospheric conditions, and aerosol optical properties.
The model was trained using hourly observations from the urban supersite of Shenyang in Northeast China, which is known to have frequent PM occurrences, and was independently tested.2.5 Pollution from long-term industrial activities. To ensure robustness under different air quality conditions, the team selected two contrasting months from 2019. July represents a relatively clean summer climate, and December is characterized by severe winter pollution. We used Bayesian optimization to automatically identify the most effective combination of hyperparameters for each PM.2.5 Analyze chemical components and improve accuracy while keeping computational costs low.
The results demonstrated that the model was able to accurately estimate the hourly concentrations of five major PMs.2.5 Chemical composition: sulfates, nitrates, ammonium, organics, elements. Across independent test sets, the correlation coefficient was greater than 0.91, and the root mean square error ranged from 0.31 to 2.66 micrograms per cubic meter. The model successfully reproduced daily and hourly variations, including sharp increases during pollution episodes, and showed strong generalization performance when applied to the original time series data. “Our results show that it is possible to obtain reliable chemical composition information without expensive chemical analysis.” Professor Yang explains: “This approach can significantly expand access to high resolution by combining deep learning with routinely available monitoring data. afternoon2.5 Chemical information. ”
The developed CNN-BiLSTM-BO framework consistently showed superior performance when compared with traditional machine learning models such as multiple linear regression, support vector machines, random forests, and standalone long short-term memory networks. It also showed clear advantages compared to widely used global reanalysis datasets, with higher errors and weaker agreement with ground observations. To increase interpretability, the researchers used a random forest approach to analyze the importance of features. They PM2.5,afternoon1The most influential variables overall were visibility and temperature. Furthermore, seasonal differences revealed important atmospheric processes, such as an increased role of volatile organic compounds and ozone in organic matter formation in summer, and a stronger influence of sulfur dioxide on sulfate formation in winter, reflecting heating-related emissions.
“Linking model predictions to physical and chemical factors ensures that AI-based tools are not only accurate, but also scientifically meaningful. ” Professor Yang says.
Although the study focuses on one city and two seasons, the researchers emphasize that the framework is flexible and scalable. Additional data from other regions, seasons, and physical constraints can be used to improve spatiotemporal coverage and expand to support broader air quality management efforts.
Overall, this study highlights the potential of deep learning to fill critical data gaps, strengthen pollution monitoring systems, and support evidence-based strategies to protect public health and environmental sustainability.
***
reference
DOI: 10.1016/j.jes.2024.03.037
Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert! Use of Information by Contributing Institutions or via the EurekAlert System.
