Using machine learning technology, atmospheric N2O records filled with continuous gaps over the past 800,000 years

Machine Learning


Existing Icecore Data

This study primarily utilized the published GHG concentration datasets from EPICA dome C (EDC, 75°06'S, 123°21'E) and Vostok (near the center of the Antarctic ice sheet, near 78°S, 106°E) synchronized on the EDC 3-hour scale.28. Furthermore, WAIS split ICE core (WDC) co2 Records were available in high resolutions from 10.6 to 68 ka29. To use this record, it had to be synchronized with the EDC3 time scale (originally WD2014 timescale). To do this, I compared the WDC ch4 record30 EDC ch4 record26 And based on some sudden CH, we decided on the tie point of gas age4 change31. We also coordinated2 Difference in concentration between Epica Dome C and WDC records. Both records show similar trends, but only small offsets during the period of 10.6-24 ka.2 The difference was found to be 9 ppm. This offset differs from recent research.32 Proposed a value of 6.06 ppm, here we explain how our values were derived through the following steps: First, Vostok and EDC Co.2 Records were interpolated to match the temporal resolution of WDC Co.2 data. Next,2 Differences were calculated using a matching age scale, but only up to EDCs up to age 24,650 as old EDC Co2 The data was too sparse to compare with high resolution WDC records. The calculation steps and results are available in the Data Set Repository (see Data Availability). So subtract 9 ppm from WDC Co2 To correct this offset, record a period of 10.6-68 ka. To incorporate continuous high resolution n2o Data ranging from 0 to 134 ka, Köhleretal.32. They are32 We identified established criteria for stacking data and a combination of three records.2o From NGRIP, EDC and TALOS domes – it probably provided a representative average global n2o Value. Figure 4 shows existing ICE core data. after that2o Data can be divided into two sections: existing sections (or continuous sections) and gap sections (or discontinuous sections shown in the gray shade area). Existing n2o Data is primarily located during the interglacial period, meaning that the gray shaded section occurs in the coldest parts of the Glacier age. To prepare the input data for modeling, all GHG data was first resampled at 10-year intervals using interpolation. Next, we applied a 250-year time frame to the EDC 3-hour scale of all GHG records to smooth out the data.

Figure 4: Records of ICE core greenhouse gases from the past 800 kills.
Figure 4

n2o, ch4and co2 Plotted on the EDC3 time scale7,9,10,16,20,21,26,29,32,40,41. The gray shade area marks the gap spacing of n2oData (AICC2012 Timescale is also available in downloadable data repository).

Machine Learning Methods

This study employed six general techniques with multiple submethods (Fig. 5a) in the existing n.2Data using o co2 And ch4 Enter the data as a predictor. Performance metrics were then calculated among the existing n2o Data and modeled data exceeded 800 kyr, and the best method was chosen to simulate gap parts. Finally, n2oThe gap was met. Figure 5 shows the main techniques used in this study and their sub-techniques, showing a total of 20 methods.

Figure 5: Algorithms used in machine learning (ML) and process flow charts.
Figure 5

a Six major ML methods and detailed algorithms. b Main flow chart of the learning (160-800 ka) and simulation (0-160 ka) processes. This process is used to assess the performance of the selected main model. MOV250 = 250-year moving average of used data.

Artificial neural networks (ANNs) are powerful ways to develop nonlinear functions between dependent and independent variables. An ANN architecture typically consists of three major sequential layers: (i) Input layer containing input parameters (in this study, CO).2 And ch4), (ii) a hidden layer containing the calculation steps between the input and output, and (iii) an output layer leading to the predicted value (where n2oData). For more information, you can refer to the user manual for the MATLAB R2022B ANN toolbox to build the desired ANN structure.

Ensemble methods are a kind of ML paradigm in that multiple models are trained to solve problems and combined the results achieved to obtain better results. It attempts to reduce bias or variance among weaker learners, and ultimately produces strong learners to achieve better performance. There are two main meta-algorithms. That is, boosting wood and bagged wood. Boost Tree Methods Train learners sequentially, but bagged trees consider independent methods for objective learners33.

GPR is a nonparametric, stochastic ML method widely used in regression tasks due to the flexibility of complex and nonlinear relations in modeling. GPR assumes that the observed data can be modeled as a sample of a multivariate Gaussian distribution in which each observation is related to Gaussian pre-related. The strength of GPR lies in its ability to provide not only predictions but also estimates of the uncertainty of these predictions. The general form of GPR is defined as follows:

$$f\left(x\right)\sim{gp}(\mu\left(x\right),\,k\left(x,{x}^{{\prime}};\theta\right)$$

(1)

where \(f \left(x \right)\) A latent function that represents the underlying data trend. \(\mu \left(x \right)\) It is an average function and is often assumed to be zero unless the prior information suggests something else. \(k \left(x, {x}^{{\prime}} {;} \ theta \right)\) A covariance (kernel) function that defines similarity between data points \(x \) and \({x}^{{\prime}} \)and \(\ theta \) Represents hyperparameters. The predicted distribution of the new data points is derived by conditioning pre-Gaussian Gaussians with the observed data and is derived into a closed equation of both the mean and variance of the prediction. In this study, four types of GPR algorithms were used: square exponential GPR, Matern 5/2 GPR, exponential GPR, and GPR-RQ.

The GPR-RQ kernel is a popular covariance feature in GPR. This is because it can be interpreted as a scale mixture of squared exponents (SE) kernels with different characteristics length scales. This makes the RQ kernel extremely flexible when capturing both short and long distance dependencies of data. This is especially useful when modeling atmospheric data with multi-scale variability, such as Ice Core Records. The mathematical formulation of the RQ kernel is:

$${k}_{{rq}}\left(x,{x}^{{\prime}}\right)={\sigma}_{f}^{2}{\left(1+\frac {{x- {x}^{{{\prime}})}^{2}}{2\alpha {l}^{2}}\right)}^{\! \! – \ alpha} $$

(2)

where \({\sigma} _ {f}^{2} \) A signal variance that controls the overall variance of a function. \(l \) It is a length scale parameter, where the correlation between points determines how quickly the speed with distance decays. \(\alpha\) is a scale mix parameter that governs the relative weighting of different length scales.

Unlike SE Kernels, which assume a single smoothness scale, the RQ kernel can model data at various smoothness levels for addition \(\alpha\) parameter. It effectively captures both fine (short-term) variability and broader (long-term) trends in the dataset, making it ideal for paleoclimatic data analysis where variability occurs at different time scales.34,35. In this study, we used the rq kernel to model the complex temporal variability of n.2o Concentration of Antarctic ice cores. The multiscale nature of the RQ kernel allowed us to capture both interglacial glacial transitions and finer scale variations. Hyperparameters (σf, l, Alpha) was optimized using a marginal likelihood maximization approach to ensure that the model was tailored to the specific characteristics of the ICE core data.

We also applied a regression tree algorithm in which the target variables were continuous, and a decision tree was applied to predict its range of values. First, this method iteratively divides the data into partitions or branches until you achieve the best split from average to minimized errors on two separate partitions. This rule is repeated for all new branches, and the entire process repeats to the terminal node that is the final response. This technique is divided into three types: fine, medium and coarse.36.

The Support Vector Machine (SVM) is a powerful ML algorithm that can solve complex problems by performing optimal data transformations that determine boundaries between data points. The main purpose of SVM is to identify hyperplanes that can effectively separate two classes. n– Dimensional space. In SVM regression, the training dataset constitutes predictors and observed variables, and the goal is to identify the main use features36,37. This study utilized five different SVM algorithms, including nonlinear SVMs, which require the incorporation of “kernel functions.” Each SVM algorithm has a different kernel function equation, and readers are advised to refer to the REF work for more information. 37,38.

Linear Method

This method establishes a simple relationship39 During response (n2o) and predictor variables (co2 And ch4). Four algorithms were tested in this study (Figure 5A).

Model Selection Process

Three phases were used to simulate n2oData. First, the ML model was applied to the existing n, as outlined in the “Existing ICE Core Data Section”.2Data using o co2 And ch4 Data as predictors (i.e., initial model input data excluded the gray area in Figure 4). The validation metrics were then evaluated to determine the best ML model for the remaining simulations. Following this, n2oSimulate the gap using the input data, resulting in a continuous time series of n.2o Ranges from 0 to 800 kills. Finally, to cross-check the performance of the most selected models in the first phase, we modeled a recording period of 0-160 kyr using model fit parameters of 160-800 kyr (Fig. 5b).

Mutual verification of models

Cross-validation requires training different ML models on individual segments of input data and evaluating them on complementary data segments. In this study, we allocated 70% of the data to train the model, 15% for validation, and reserved the rest to test the performance of the model during the cross-validation procedure. Finally, a variety of validation metrics were employed to assess the accuracy of the ML model in generating reliable simulations. In this study, r2 (Coefficient of Determination), rmse (root mean square error) and the mean absolute error (May) was calculated as the formula. (3) – (5).

$${r}^{2} = \frac {{\left(\mathop {\sum} \nolimits_ {i = 1}^{n} \left[{x}_{i}-{\bar{x}}_{i}\right]\left[{y}_{i}-{\bar{y}}_{i}\right]\right)}^{2}} {\mathop {\sum} \nolimits_ {i = 1}^{n} {\left[{x}_{i}-{\bar{x}}_{i}\right]}^{2}\mathop {\sum}\nolimits_ {i = 1}^{n} {\left[{y}_{i}-{\bar{y}}_{i}\right]}^{2}} $$

(3)

$${rmse} = \sqrt {\frac {\mathop {\sum} \nolimits_ {i = 1}^{n} {\left({x}_{i} – {y}_{i} \right)}^{2}}}}}}}

(4)

$${mae}=\frac{1}{n}\mathop{\sum}\limits_{i=1}^{n}\left | {y}_{i} – {x}_{i}\right | $$

(5)

where n Number of measurements,\(\,{x} _{i} \) and \({y} _ {i} \) There are measured and simulated values. \({\bar {x}} _{i} \) and \({\bar {y}} _ {i} \) Shows the average value of \({x} _ {i} \) and \({y} _ {i} \). In this study, all calculations and models were performed using MATLAB R2022B.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *