Deriving ozone and PM pollution vertical profiles using lightweight, cost-effective sensors and deep learning

Reliable, sensitive sensors on mobile platforms, like the presented hexacopter, enable atmospheric studies in previously unexplored conditions. However, drone-based surveys are labor-intensive, limiting dataset size. This presents two challenges: ensuring high data precision due to a limited number of samples and addressing limited dataset information. This section details the experimental system and methods used to overcome these challenges.

A versatile and precise mobile system

We designed a versatile drone system with quick configuration adjustments, easy debugging, and extended operation. The system integrates high-quality, cost-effective and lightweight sensors and can be mounted on any platform carrying over 1 kg for a minimal configuration. To reduce weight, components are made from carbon-fiber composites or 3D-printed. Aerosols of sizes 130 nm–3 µm in diameter are measured using a HandixScientific POPS, an optical scattering particulate matter sensor with a power consumption of 5 W at 12 V and weighing 600 g that is placed inside a customized carbon-fiber box for physical protection. Larger aerosols (350 nm–40 µm) are measured with a cost-effective Alphasense OPC-N3 that is less accurate than the POPS⁵⁷. Ozone is measured by a 2BTechnologies POM⁴³, a 254 nm absorption spectrometer that measures up to 10 ppm with 0.1 ppb resolution and power consumption of 3 W at 12 V and weighs 450 g. PM_2.5 absorption is analyzed with an AethLab MA200 micro-aethalometer, measuring particle light absorption in 5 different wavelengths ranging from 375 nm to 880 nm, weighs 420 g, and was recently validated in our lab⁵⁶. An iMet-XQ2 UAV measures pressure, temperature, and humidity and weighs 60 g. An in-house developed NO₂ sensor can be installed but was not available at the time of the presented campaign. The sensor is a broadband cavity enhanced absorption spectrometer BBCEAS³⁹, that utilizes deep-learning for enhanced bias corrections. We include it in the description of the system (Fig. 7, Table S1) as it is presently available for integration and will be described in detail elsewhere. All sensors connect to a Raspberry Pi 4 via USB or Arduino-based relay for RS-485/RS-422 communication. Data is averaged at 1-second intervals and stored as CSV logs using custom software optimized for flexible sensor configurations. A custom dashboard supports real-time and offline analysis for any sensor in the setup (Fig. S1). The drone platform, a DJI Matrice 600 hexacopter, carries up to 5.5 kg and achieves 22-minute flights, enabling detailed vertical profiling up to approximately 2.5 km above sea level, but is limited to 200 meters with special permits from local authorities. Computational fluid dynamics (CFD) simulations confirm minimal rotor influence on air sampling, ensuring reliable data collection sampling⁵⁸. The simulation concludes that the sampled air represents a volume extending to 7 meters around it. Although our setup is different in both weight and geometry, since the measurements are conducted 20 meters apart, we assume no interference between measured points. This point can be further explored but is out of the scope of the presented work. Because the simulation indicates most air velocity is concentrated beneath the drone, we continuously ascend to minimize interference along the flight path.

Fig. 7: Multiple configurations of the versatile system for various scientific purposes: drone-based aerosol-focused setup (left), drone-based gas-focused (middle), and general-purpose setup installed on an electric car (right).

Flights strategy and campaign details

The measurement strategy involved hovering for 120 seconds at each altitude before ascending 20 meters. Flights were typically repeated every 3 hours, increasing to 1.5-hour intervals at sunrise and sunset. The presented campaign was conducted during several 24-hour periods in May 2023 for the aerosol-focused configuration and during August 2023 for gas-focused: May 4th–5th, 8th–9th^, and 23rd–24th and August 25th–26th^, and 28th–29th, a total of 10 different days. All flights were conducted at the Weizmann Institute campus, Rehovot, Israel. During the campaign, two extreme pollution events occurred: the first was a rapid dust event that occurred during May the 5th, and the other was a biomass burning event that occurs annually in Israel (Lag Ba’Omer), when nationwide bonfires are being held, during the night of May the 9th and is studied extensively by our lab^44,45,46^,59. Measurements were collected from 66 flights per height, or approximately 2 hours per altitude across varying environmental conditions (temperature, humidity, pressure, weekday, hour, etc.) to characterize the diurnal cycle presented in text. The ratio of weekends to weekdays flights is about 1:2, respectively. The total aerosol-focused flights used in the study were 36, in a ratio of 1:1 of background to events. Beside the data collected using the portable system, other freely available datasets were used to obtain the following ground measurements at the same times: NO, NO₂, and O₃ measured by the Israeli Environmental Protection Ministry at the closest Rehovot station, and global radiation (consisting of both scattered and direct radiation) and ground wind vector measured by the Israeli Meteorological Service. For atmospheric stratification analysis, we use virtual potential temperature and air density, accounting for both pressure, temperature and humidity⁴². Additionally, absolute humidity was calculated from relative humidity and temperature. All ground-level measurements were also taken in multiple time lags, -3,-2 and -1 h before the flights, as well as at the same time (total of 4 lags). All times are shown in local time.

Leveraging machine learning for enhanced insights

In this section, we discuss several machine learning techniques designed to enhance our analysis. First, we describe a data imputation method that addresses communication failures arisen from the cost-effective nature of the system, made possible through sensor redundancy. Next, we introduce specialized photochemical feature engineering approaches to optimize information extraction from the available measurements. Finally, we detail the deep learning framework used to analyze the vertical ozone profile.

Particulates or ozone concentrations at a specific location and altitude are governed by complex mechanisms influenced by environmental conditions and nearby sources, including concentrations of NO₂, NO, and the specific VOCs present at the location⁶⁰. In this study, we examined the ozone concentration dependencies per height to characterize the diurnal vertical ozone profile. To do so, we compared the measured ozone concentrations with several other measurements. Since only a few factors can be measured using a portable system, a special analysis is needed to study the pollutant per-height dependencies. The aerosol vertical profiles during dust events are compared to background characteristics, a comparison that further demonstrates the importance of mobile surveying systems and their potential.

Occasional missing measurements of environmental temperature and relative humidity occurred due to unstable USB cable connectivity of the iMet-XQ2 environmental sensor, leading to missing temperature and relative humidity measurements during 3 flights. However, since other instruments measure and report these values, it is possible to model the iMet-XQ2 readings as a function of the other measurements. Since different uncertainties of these instruments complicate this modeling, it can be done using machine learning techniques. We imputed the missing environmental flight measurements using XGBoost⁶¹, a highly efficient and simple-to-use machine learning model. The model was trained on over 80% of the available, (i.e., not missing) data, resulting in 149,626 datapoints for training, and tested over the rest 20%, 37,407 datapoints. All the data was smoothed using a 10-second averaging window before training to allow for better learning. Overall, the model presented high performance in learning to estimate the iMet-XQ2 missing T and RH data based on the other available instruments (R² = 0.983 for T and 0.938 for RH) (Fig. 8, S2). This procedure highlights the importance of sensor redundancy in low-cost systems.

**Fig. 8: Missing data imputation: when available, data from less reliable temperature (top) and relative humidity (bottom) sensors were used to estimate the iMet-XQ2 measurements.**

To further examine the ozone dependencies per height, in addition to the measured parameters, we define several new factors that potentially contain information about the occurring chemical reactions. These factors are not intended for direct ozone profile calculations but are analyzed as correlated variables that may hold information for predicting the profile using observation-based methods. Such factors can be useful when a concise representation of information is required, such as in scenarios with limited sample sizes.

NO₂ molecules are photolyzed during the daytime, yielding NO molecules and free oxygen atoms that react with O₂ to form ozone. However, NO quickly reacts with ozone to generate back NO₂. This photo-stationary relation is expressed by the Leighton Relationship⁶⁰:

$$\frac{\left[{NO}\right]}{\left[N{O}_{2}\right]}=\frac{j}{k[{O}_{3}]}$$

(a)

or:

$$\left[{O}_{3}\right]=\frac{j}{k}\cdot \frac{\left[N{O}_{2}\right]}{\left[{NO}\right]}$$

(b)

where $j$ denotes the photolysis frequency of NO₂ and $k$ denotes the rate constant for the reaction of O₃ with NO that yields NO₂ and O₂. But since $j$ describes the photolysis rate it is proportional to the amount of radiation. Furthermore, $k$ is temperature dependent. The Arrhenius equation relates a reaction rate coefficient of a general chemical reaction $k$ to the activation energy needed for the reaction to happen ${E}_{a}$ and temperature:

$$k=A\cdot {e}^{-\frac{{E}_{a}}{{RT}}}$$

(c)

where $T$ is the height-resolved measured temperature, $A$ is a factor proportional to the collision frequency that is often treated as temperature-independent and $R$ is the universal gas constant. By returning to Eq. (b), and since the factors A, ${E}_{a}$ and $R$ are constant, this exponential relation between the O₃ + NO reaction rate coefficient and the temperature can be used to define Factor 1. This factor is hypothesized to somewhat correlate with the height-resolved ozone profile even though NO and NO₂ are measured only on the ground:

$${Factor}1={e}^{\frac{1}{T}}\cdot {rad}\cdot \frac{\left[N{O}_{2}\right]}{\left[{NO}\right]}$$

(d)

where ${rad}$ is the ground-measured radiation, and $\left[N{O}_{2}\right],[{NO}]$ are the ground-measured concentrations of NO₂ and NO at Rehovot station, located about 1.4 km away from the campaign site.

A common practice in data science is data normalization, which usually helps underlining patterns and correlations in the data^33,52,53,62. Hence, two additional factors are defined:

$${Factor}2={e}^{\frac{1}{T}}\cdot \overline{{rad}}\cdot \frac{\left[N{O}_{2}\right]}{\left[{NO}\right]}$$

(e)

$${Factor}3={e}^{\frac{1}{T}}\cdot \overline{{rad}}\cdot \bar{\left(\frac{\left[N{O}_{2}\right]}{\left[{NO}\right]}\right)}$$

(f)

where the overhead line accent for a general term $x$, having mean ${\mu }_{x}$ and standard deviation ${\sigma }_{x}$, indicates a zero-mean, 1-standard-deviation normalized term, defined by:

$$\bar{x}=\frac{x-{\mu }_{x}}{{\sigma }_{x}}$$

(g)

Factors 2 and 3 differ in the normalized values that define them. The main difference between them and Factor 1 is that they utilize normalized radiation, so that nighttime radiation will not zero out their predictions completely and may make them meaningful for ozone profiles during the night.

Two more factors are defined from ground-level data. Factor 1 depends on the altitude only through the temperature, which enters the formulation via an inverse-temperature exponent. Because it is easier to acquire ground-level data than the height-resolved measurements, it makes sense to test another factor:

$${Factor}\,0={e}^{\frac{1}{{T}_{0}}}\cdot {rad}\cdot \frac{\left[N{O}_{2}\right]}{\left[{NO}\right]}$$

(h)

where ${T}_{0}$ is the ground level temperature. Since it contains continuously measured variables, we take this factor in several time lags as the rest of the ground-level data (−3, −2 and −1 h before the flights), introducing another advantage. We can also try to use a mixed-lags factor, composed of the current temperature and radiation but with historical NO₂ and NO that may still be present in the air and transported to the campaign site:

$${Factor}\;0m1={e}^{\frac{1}{{T}_{0}(t=0)}}\cdot {rad}\left(t=0\right)\cdot \frac{\left[N{O}_{2}\right]\left(t=-1H\right)}{\left[{NO}\right]\left(t=-1H\right)}$$

(i)

We explore how these factors relate to ozone concentrations and how much information they provide for predicting the ozone vertical profile.

To find similarities between vertical ozone profiles and other factors, we first define the height-resolved linear correlation between ozone and any other variable (measured or calculated), as:

$${corr}\left({O}_{3}^{h},x\right)=\frac{E[({O}_{3}^{h}-{\mu }_{{O}_{3}})(x-{\mu }_{x})]}{{\sigma }_{{O}_{3}^{h}}{\sigma }_{x}}$$

(j)

where E is the expected value operation, ${O}_{3}^{h}$ is the measured ozone concentration at height h, $\mu ,\sigma$ are the variables’ mean and standard deviation. For drone-measured variables such as T and RH, we calculate the correlations between the height-resolved ozone and the height-resolved factor at the same height, e.g.,: ${corr}\left({O}_{3}^{h},{T}^{h}\right)$.

A more sophisticated analysis is needed to account for complicated correlations that involve either non-linearities or multi-variable interactions. Here, we implement a deep learning method that uses the full set of variables as input and the height-resolved ozone concentrations as output. We can then explore each factor’s importance for ozone profile prediction, clueing on the information this factor contains. However, the amount of data in this study is limited, and so is the amount of information the dataset contains, making it challenging for any machine learning algorithm to utilize this information for meaningful predictions. Instead, since we intend to explore the amount of information each factor contains and not truly predict ozone concentrations, we can train an ensemble of models over random folds or splits of the available data to predict ozone profiles and average the resulting factors’ importance.

For the prediction task, we define an artificial neural network architecture of the following structure (Fig. 9): an input tensor of shape [B,f_in], where B = 1024 is the batch size used and f_in = 165 are the input features as detailed in the SI (Text S3), is randomly masked and fed into a an autoencoder⁶³, which compresses the input into a latent space of L = 32 (green). The [B,f_out] (f_out = 11) target ozone concentrations are then predicted using a fully connected prediction head with 3 layers and hidden dimension of 32. In parallel, to generalize the compression and to encourage the network to learn meaningful features instead of random details in the training data, the autoencoder decodes the compressed tensor back into the input space, and its decoding performance is evaluated as well in the loss function as an auxiliary task. The masking is done by randomly removing input values and replacing them with unique numbers, which are one learnable parameter for each feature. To encourage the network to learn to ignore these values, the mask itself, another tensor of shape [B,f_in] with values of 0 where masked and 1 where valid, is concatenated with the input to yield a tensor of shape [B,2*f_in], which is then introduced into the encoder. The masking rate is controlled and increases linearly during training to gradually train the autoencoder and the prediction head in parallel, from 0 (no masking at all) to 0.1 (10% of the features are omitted). The decoders are composed of 3 fully connected layers each with internal blocks of hidden dimensions of 32 for the prediction head and 64 for the auxiliary decoder head. Between each layer there is a LeakyReLU activation layer for non-linearity⁶⁴, and one layer per block of 0.1 dropout for regularization. The total number of trainable parameters is 32,341 (see Text S4 for a summary of the network structure).

**Fig. 9: Artificial neural network architecture.**

Data samples were first normalized by reducing their means and dividing by their standard deviations, yielding a relatively uniformly distributed dataset (validation set means and standard deviations were taken from the training set). Data was augmented with random noise of up to 5%. We chose the Adam optimizer with a learning rate of 10⁻³ and weight decay of 10⁻⁴ for regularization⁶⁵, having well documented usage and is fitting the tasks of fully-connected regressions. A reduce-on-plateau learning rate scheduler was used with a factor of 0.5 and patience of 1, halving the learning rate whenever the validation loss plateaued. The loss function is the mean squared error (MSE) of the predictions and targets for both the prediction task and the auxiliary task. The total loss is the weighted sum of both MSE losses, with the weight of the main task being 0.95 and the auxiliary task being 0.05. These hyperparameters were carefully selected by trial and error to achieve learning convergence in about 30 epochs. Each epoch is composed of 32 randomly generated masks and augmented data samples per flight, that are regenerated each epoch. The validation set is composed of pre-generated 256 samples per flight and does not change during training. This training setup allowed for automatic training, and the procedure was repeated 100 times for the feature importance analysis, with each run differing mostly by its validation and training flights. The splits yielded 53 complete flights for training and 13 complete flights for validation (80% to 20%). The models were re-initialized and re-trained for 50 epochs at each run and saved when their validation loss achieved a minimum to avoid overfitting. Finally, the top 25, rated by their validation average RMSE, were selected for the feature importance analysis. Training results are presented in the SI (Fig. S6 to S8).

Feature importance is defined by the Integrated Gradients (IG)⁶⁶, a method for quantifying each input feature’s contribution to each output feature’s prediction. IG is calculated by summing the gradients of the model’s output with respect to the input along a linear path between a baseline input and the actual input samples. This path is discretized into m = 50 steps, and the result is scaled by the difference between the input and the baseline. This results in a smooth and stable attribution of input changes to the model’s outputs. Concretely, we calculate the IG by the following method:

$$I{G}_{b,j,i}=\left|\left({x}_{b,i}-{x}_{b,i}^{{\prime} }\right)\cdot \frac{1}{m}\mathop{\sum }\limits_{k=1}^{m}\frac{\partial {F}_{j}\left({x}_{b,i}^{{\prime} }+\frac{k}{m}({x}_{b,i}-{x}_{b,i}^{{\prime} })\right)}{\partial {x}_{i}}\right|,{IG}\in {{\mathbb{R}}}^{B\times {f}_{{out}}\times {f}_{{in}}}$$

(k)

where b,j,i index the batch, output feature and input feature, respectively. The term $\frac{\partial {F}_{j}\left(y\right)}{\partial {x}_{i}}$ represents the gradient of the model’s F output feature j w.r.t input feature ${x}_{i}$, for an arbitrary model input y. The baseline, ${x}_{b,i}^{{\prime} }$, is set as zero, which corresponds to the means in our normalized dataset. Thus, the summation represents a discrete approximation of the integral over the path from the mean to the actual input. This calculation is repeated for all validation samples and aggregated into an IG_full tensor of shape [S,f_out, f_in], where S is the number of validation samples. To normalize the contributions per output feature, we divide each attribution vector by the sum of its absolute values along the input dimension (f_in), so that for each output j, the sum over i is 1. Finally, we average the attributions over all samples, resulting in a Feature Importance tensor of shape [f_out, f_in], shown in Fig. 3.