An intelligent hybrid deep learning-machine learning model for monthly groundwater level prediction

PSO algorithm

The PSO algorithm is a robust tool for finding optimal solutions in continuous and discrete search spaces. The algorithm has been applied in various fields, including water resource management, model parameter adjustment, energy management, signal processing, and image analysis^41,42. The PSO algorithm offers several advantages, such as fast convergence to optimal solutions, simplicity in implementation, the ability to handle nonlinear and multidimensional problems, and flexibility in integrating with other computational intelligence techniques.

It is executed as follows:

(1)

The algorithm begins the optimization process by generating an initial population of particles, which are spread throughout the search space⁴³.
(2)

Each particle has a position and a velocity, which are updated based on Eqs. (1) and (2)⁴³.

$$v_{i,d}^{t + 1} = w.v_{i,d}^{t} + \alpha_{1} r_{1} \left( {p_{i,d}^{t} – x_{i,d}^{t} } \right) + \alpha_{2} r_{2} \left( {g_{d}^{t} – x_{i,d}^{t} } \right)$$

(1)

$$x_{i,d}^{t + 1} = x_{i,d}^{t} + v_{i,d}^{t + 1}$$

(2)

where $p_{i,d}^{t}$: Local best position of particles $x_{i,d}^{t + 1}$: The location of the ith particle at iteration t + 1, $v_{i,d}^{t + 1}$: The velocity of the ith particle at iteration t + 1, d: Number of dimensions, $\alpha_{1}$ and $\alpha_{2}$: Acceleration coefficients, $g_{d}^{t}$: Global best location of particles $r_{1}$ and $r_{2}$: Random numbers, $w$: weight coefficients, and t: number of iterations.

(3)

The process continues until the stopping condition is met.

Coati optimization (COO) algorithm

Coatis are omnivorous mammals with a diverse diet that includes invertebrates (e.g., insects, spiders, and tarantulas), small vertebrates (such as rodents, lizards, birds, and bird eggs), and occasionally fruits and plant matter²⁴. However, one of their most notable prey items is the green iguana, which is commonly found on tree trunks and branches across tropical and subtropical regions.

Coatis exhibit two important behaviors, which are explained as follows:

1.

Hunting behavior: The hunting behavior of coatis involves a two-step strategy that enables them to successfully capture agile prey, such as the green iguana. This strategy is described as follows:
- Some coatis climb the tree to frighten the iguana²⁴.
- The startled iguana loses its balance and falls to the ground. The other coatis, which remain on the ground, then quickly move toward the iguana and attempt to capture it²³.
2.

Escape behavior: When coatis encounter a potential threat, they exhibit escape behavior, which is a key characteristic of these animals. This behavior refers to their movement to a new location near their current position. The new location often includes elevated terrain or rocky outcrops that give them a better vantage point to monitor the threat.

These two behaviors are modeled in the COO algorithm, which is executed as follows:

(1)

The optimization process starts by generating a population of search agents, including coatis.
(2)

The locations of the search agents are randomly initialized within the search space. These locations represent candidate solutions, which are updated during the optimization process²³.
(3)

The iguana’s location is randomly initialized within the search space. This location represents the optimal solution, which serves as a reference point for guiding the search agents during the optimization process.
(4)

During the hunting phase, climbing coatis update their locations based on the position of the iguana on the tree, while other coatis update their positions according to the fall location of the iguana²³. Similarly, search agents in the COO algorithm update their positions based on the different locations of the iguana. Equations (3)–(5) present the mathematical formulation for updating the positions of coatis during this phase.

where $IG_{i}$: The location of the iguana on the tree, $CO_{ij}^{p1}$: The new location of the search agent, $CO_{ij}$: The location of the ith search agent in the jth dimension, $I$: an Integer, and $r$: A random number.

where $IG_{i}^{G}$: The location of the iguana on the ground, $LB_{j}$ and $UB_{j}$: The upper and lower bounds of the decision variables, and $r$: a random number, $F_{{IG^{G} }}$: Fitness function value of the iguana, $F_{i}$: Fitness function value of the ith iguana, $CO_{i,j}^{p1}$: The new location of the search agent, and $CO_{i,j}$: The pith location of the search agent in the jth dimension.

(5)

Each search agent mimics the escape behavior of coatis by moving to a new location near its position. Equation 6 shows the mathematical formulation of this behavior²³.

$$CO_{i,j}^{p1} = CO_{ij} + \left( {1 – 2r} \right).\left( {LB_{j}^{local} + r.\left( {UB_{j}^{local} – LB_{j}^{local} } \right)} \right),LB_{j}^{local} = \frac{{LB_{j} }}{t},UB_{j}^{local} = \frac{{UB_{j} }}{t}$$

(6)

where $LB_{j}^{local}$ and $UB_{j}^{local}$: The local lower and upper bounds of the decision variables.
(6)

The optimization process continues until a predefined termination criterion is met. This criterion can include the maximum number of iterations, the attainment of a satisfactory fitness value, or the lack of significant improvement in the best solution over a consecutive number of iterations.

It is important to note that the COO algorithm offers several advantages, such as a strong ability to explore complex search spaces, fast convergence toward global optima, robustness against getting trapped in local minima, and flexibility to be combined with other optimization or machine learning techniques for enhanced performance.

Hybrid PSO-COO

PSO may produce suboptimal solutions, which can reduce its performance in solving complex problems. However, our study addresses this problem by developing the hybrid PSO-COO technique. This technique is executed as follows.

(1)

An initial population of particles is generated.
(2)

The PSO algorithm generates initial solutions by updating the velocities and positions of the particles.
(3)

The initial solutions are defined as the initial population.
(4)

The COO algorithm refines the initial solutions by updating the positions of the coatis using Eqs. 3–6.
(5)

The fitness of each refined solution is evaluated using a fitness function, such as the mean absolute error.
(6)

The stopping condition is checked. If it is satisfied, the optimization process ends. Otherwise, the process returns to Step 2.

While PSO can be combined with a local search strategy, this combination cannot solve complex problems. Traditional local search methods often depend heavily on the quality of the initial solutions and tend to become trapped in the vicinity of local optima. As a result, they are unable to compensate for PSO’s premature convergence behavior or its loss of swarm diversity in later iterations. Moreover, techniques such as chaotic maps, mutation operators, and simple perturbation-based searches, which are frequently used to enhance PSO, typically improve exploration only marginally and still lack the capability to perform systematic and adaptive exploitation in the later stages of optimization. Thus, our study utilizes PSO-COO to overcome these limitations. The new optimizer combines rapid global search of PSO in the early phase with the COO algorithm’s strong local refinement and escape mechanisms, enabling a balanced and adaptive search strategy. By allowing PSO to first generate a diverse set of candidate solutions and then employing COO to iteratively refine and improve these solutions, the hybrid PSO-COO framework minimizes the risk of premature convergence, enhances solution quality, and stabilizes the convergence trajectory.

GRU model

The GRU model is a robust deep learning technique that efficiently handles sequential and time-dependent data. The GRU model has several advantages. For example, it can capture both short-term and long-term dependencies, reduce risk of vanishing gradients.

The GRU model includes five components, which produce initial outputs as follows:

(1)

Previous hidden state: The previous hidden state contains prior outputs of the model, allowing it to update predictions based on historical information⁴⁴.
(2)

Reset gate: The reset gate determines which components of the previous hidden state and current input should be discarded or retained for later use in the model.
(3)

Update gate: This gate determines which components of the previous hidden state and the current input should be preserved or updated for generating the new hidden state⁴⁵.
(4)

Candidate hidden state: This components state stores the essential information for generating the new hidden state.
(5)

New hidden state: The new hidden state represents GRU outputs, which can serve as input to the subsequent components of a hybrid AI model.

Equations (7)–(10) present the mathematical formulation of these components and their interactions in the GRU⁴⁵.

$$r_{t} = \chi \left( {w_{r} in_{t} + u_{r} h_{t – 1} + b_{r} } \right)$$

(7)

$$z_{t} = \chi \left( {w_{z} in_{t} + u_{z} h_{t – 1} + b_{z} } \right)$$

(8)

$$\overline{h}_{t} = \tanh \left( {w_{h} x_{t} + u_{h} \left( {r_{t} \odot h_{t – 1} } \right) + b_{h} } \right)$$

(9)

$$h_{t} = \left( {1 – z_{t} } \right) \odot h_{t – 1} + z_{t} \odot \overline{h}_{t}$$

(10)

where $b_{r}$ and $b_{z}$: Bias values, $in_{t}$: The current input $r_{t}$: The output of the reset gate, $z_{t}$: The output of the update gate, $w_{r}$ and $u_{r}$: Weight coefficients of the reset gate, $w_{z}$ and $u_{z}$: Weight coefficients of the update gate, $\overline{h}_{t}$: The candidate hidden state, $h_{t}$: The new hidden state, and $\odot$: Element-wise (Hadamard) multiplication. However, it is essential to note that GRU parameters require careful optimization to achieve high prediction accuracy. Therefore, the PSO-COO hybrid algorithm is employed in this study to systematically optimize the GRU parameters.

ANFIS model

The ANFIS model is a hybrid intelligent system that serves as the final component in the MGCOO-GRU-ANFIS framework. It combines the learning capabilities of neural networks with the reasoning power of fuzzy logic to model complex, nonlinear relationships³². ANFIS has been widely applied in hydrology, environmental modeling, and water resource management due to its ability to handle uncertainty and imprecise data⁴⁶. Its advantages include high prediction accuracy, adaptability to changing system dynamics, and the capacity to generate interpretable “if–then” rules that provide insights into the relationships among input variables.

The ANFIS model includes five layers:

(1)

Fuzzification Layer: Each node in this layer represents a fuzzy membership function (FMF) that converts the input values into fuzzy sets³³. Equations (11) and (12) describe the mathematical formulation of this layer and the associated membership function³⁶.

$$\psi_{1}^{\left( i \right)} = \xi_{{A_{i} }} \left( x \right)$$

(11)

$$\xi_{{A_{i} }} \left( x \right) = \exp \left[ { – \frac{{\left( {x – m_{i} } \right)^{2} }}{{2\phi_{i}^{2} }}} \right]$$

(12)

where $\psi_{1}^{\left( i \right)}$: The output of the ith node in layer 1, $x$: The input variable, $\xi_{{A_{i} }} \left( x \right)$: The membership function, $m_{i}$ and $\phi_{i}$: Parameters of the membership function.

(2)

Rule layer: In this layer, each node corresponds to a fuzzy rule. This rule can be expressed as a conditional statement, such as: “If temperature is high and rainfall is low, then GWL is low⁴⁷.

Equation 13 presents the mathematical formulation of the rule layer. Based on this equation, the output of the rule layer is the firing strength, which represents how each fuzzy rule is activated based on the current inputs⁴⁷.

$$\lambda_{i} = \xi_{{A_{i} }} \left( x \right).\xi_{{B_{i} }} \left( y \right)$$

(13)

where $\lambda_{i}$: The firing strength, and $\xi_{{A_{i} }} \left( x \right)$ and $\xi_{{B_{i} }} \left( y \right)$: The membership function values of the inputs $x$ and $y$.

(3)

Normalization layer: In this layer, the firing strengths are normalized to ensure that their sum equals 1³⁶. Equation (14) mathematically represents this process.

$$\overline{\lambda }_{i} = \frac{{\lambda_{i} }}{{\sum\limits_{i = 1}^{R} {\lambda_{i} } }}$$

(14)

where $\overline{\lambda }_{i}$: The normalized firing strength of the ith rule, $\lambda_{i}$: The firing strength obtained from Layer 2, and R: The total number of fuzzy rules.

(4)

Defuzzification Layer: Each node in this layer represents the output of a fuzzy rule. This output is obtained by multiplying the normalized firing strength of each rule (from Layer 3) with a first-order polynomial function of the input variables. Equation 15 shows the mathematical formulation of this computation.

$$f_{i} = \overline{\lambda }_{i} = \left( {p_{i} x + q_{i} y + r} \right)$$

(15)

$f_{i}$: The output of the defuzzification Layer. $p_{i}$, $q_{i}$, and $r$ are are known as consequent parameters.

(5)

Output Layer: The final layer generates the overall system output, which is mathematically represented by Eq. (16).

$$F = \sum\limits_{i = 1}^{R} {\overline{\lambda }_{i} } \left( {f_{i} } \right)$$

(16)

The precision of the ANFIS model is influenced by consequent parameters, $m_{i}$, and $\phi_{i}$. Therefore, it is crucial to carefully adjust these parameters during training. Our study addresses this need by employing the PSO-COO algorithm. The PSO-COO algorithm fine-tunes these parameters by systematically exploring the parameter space.

Development of PCGA

In this paper, the PCGA model is used to forecast monthly groundwater levels. The model is constructed as follows:

(1)

The hyperparameters of the GRU and the consequent parameters of the ANFIS model are encoded as the initial population of coatis and particles in the PSO-COO algorithm. Equation (17) shows the mathematical representation of this encoding process.

$$X_{i} = \left[ {\theta_{GRU}^{i} ,\theta_{ANFIS}^{i} } \right]$$

(17)

where $X_{i}$: The position of the i-th coati or particle, $\theta_{GRU}^{i}$: The set of GRU parameters, and $\theta_{ANFIS}^{i}$: The set of ANFIS parameters.

(2)

The initial positions of the coatis and particles in the search space are randomly initialized to ensure diverse exploration of potential parameter combinations.
(3)

Key input variables are identified and selected through an advanced feature selection algorithm, which is described in Section “Evaluation metrics”.
(4)

The time series data of key input variables are input into the GRU model. Using its two gating mechanisms and other internal components, the model extracts hidden patterns from the data.
(5)

The extracted patterns are input into ANFIS to generate the final predictions.
(6)

The PSO-COO algorithm iteratively updates the positions of the particles and coatis using Eqs. 1–7. During each iteration, the fitness of each coati and particle is evaluated based on the prediction error of the hybrid model on a training dataset.
(7)

The stopping condition is checked. If it is met, the process proceeds to the next step; otherwise, it returns to the fourth step.
(8)

The coati or particle with the best fitness value is identified. Its position represents the optimal values of the ANFIS and GRU parameters.
(9)

The model is applied to forecast testing data.

Figure 1 shows the structure of the hybrid model.

During optimization, the PSO-COO algorithm evaluates each candidate solution (i.e., each set of GRU and ANFIS parameters) by computing a fitness value based on an error metric such as RMSE. The goal is to iteratively adjust the parameters so that the hybrid model produces the smallest possible error on the training dataset. In other words, the training process aims to find the optimal parameter set that yields the highest prediction accuracy for groundwater level forecasting.

An advanced feature selection algorithm

Many studies have reported that average temperature (AVT), minimum temperature (MIT), maximum temperature (MAT), rainfall (RAI), and relative humidity (RH) significantly influence GWL and can improve prediction accuracy. Moreover, when these factors are integrated into a hybrid AI model, the model can better adapt to abrupt climate anomalies such as droughts or extreme precipitation events, which are often difficult to capture with conventional statistical techniques.

Thus, our study selects these five meteorological parameters as input variables. Our study also uses lag times of 1–12 months to account for the cumulative effect of past weather conditions on current GWL. These lag times also enable the model to capture temporal dependencies, which are critical for accurately predicting GWL.

Thus, the total number of input variables is 60 (5 meteorological variables × 12 lag times), which can significantly complicate the modelling process. These 60 input variables also create a high-dimensional dataset that may pose challenges for model training. Therefore, our study addresses these issues by selecting key input variables.

The key input variables include meteorological parameters and their corresponding lagged values, which exert the strongest influence on groundwater levels (GWLs). Using these variables reduces computational complexity and improves training efficiency, as their number is smaller than that of the total input variables. Consequently, researchers have developed various techniques, such as feature selection algorithms (FSAs), to identify and select key input variables. These algorithms can effectively determine key input variables.

Feature selection algorithms (FSAs) consist of an optimization technique, such as the PSO-COO algorithm, and a trainer, such as the extreme gradient boosting (XGBoost) model, which operate in two steps. First, an optimizer generates binary strings that contain various inputs. Then, these strings are fed into the trainer to examine their prediction performance. The string that has the best performance contains key input variables.

Consequently, our study develops the PSO-COO-XGBoost algorithm to select key input variables for predicting groundwater levels. The algorithm is executed as follows:

(1)

The PSO-COO algorithm produces binary strings that represent subsets of the original 60 input variables.
(2)

To produce subsets of input variables, binary strings are decoded.
(3)

XGBoost is trained on each subset of input variables. The training process involves building an ensemble of decision trees. These trees then generate predictions of GWLs based on the learned relationships.
(4)

The prediction performance of each subset is evaluated using a fitness function, such as the mean absolute error (MAE).
(5)

The stop condition is checked. If it is met, the algorithm proceeds to step 7; otherwise, it moves to step 6.
(6)

The initial binary strings are updated using the operators of the PSO-COO algorithm, including Eqs. (1)–(7). The process then proceeds to the second level.
(7)

The subset of input variables with the best predictive performance is chosen. This subset represents the key variables used for accurately predicting GWL.

Comparative models

Our study employs the PCGA model to predict GWL. However, to comprehensively evaluate the model’s predictive performance, it should be compared with other models that have previously been used for GWL forecasting. These models include and multiple linear regression (MLR), artificial neural networks (ANN), and RNNs, models, which are explained in the next section. These models are chosen as benchmarks because they represent widely used approaches in groundwater level (GWL) predictions. Moreover, they cover a spectrum of modeling approaches, from traditional statistical techniques to advanced machine learning and deep learning frameworks. This diversity allows for a comprehensive assessment of the proposed PSO-COO–GRU–ANFIS model against both conventional and state-of-the-art methods. Specifically, ANN models capture nonlinear relationships between input variables and groundwater levels, RNNs effectively model temporal dependencies in sequential hydrological data, and MLR provides a baseline using linear associations.

ANN

ANN is a machine learning model that is inspired by biological neural systems. It can understand the complex relationships between input and output variables by learning data patterns^48,49. The ANN model comprises different layers, which are described as follow:

1.

Input layer: This layer receives external variables, or predictors, and forwards them to the hidden layer for further processing⁵⁰.
2.

Hidden layer: It computes weighted sums of the inputs and passes them through activation functions, which introduce non-linearity to the learning processes. By performing these operations, this layer produces intermediate outputs, which are then passed to the next layer⁴⁹.
3.

Output layer: It generates the final predictions using intermediate outputs.

However, the precision of the ANN model relies on different parameters, such as the number of neurons, the learning rate, and the number of hidden layers, and improper adjustment of these parameters can lead to overfitting, underfitting, or slow convergence.

Therefore, our study develops the PSO-COO ANN to model, which uses PSO-COO to fin-tune its parameters.

RNN

The RNN model is a deep learning technique that can learn dynamic behaviors across time steps. The model offers several advantages, such as its ability to capture temporal dependencies in sequential data. Moreover, it has an internal memory named the hidden state⁵¹.

The RNN model is executed in two steps. First, it updates the hidden state at each time step. Equation (18) defines this update⁵¹. Then, it uses the updated hidden state to produce the final output at each time step. Equation (19) shows the computation of the output.

$$h_{t} = \sigma \left( {\chi_{xh} IN_{t} + \chi_{hh} h_{t – 1} + b_{h} } \right)$$

(18)

$$y_{t} = \chi_{hy} h_{t} + b_{y}$$

(19)

where $h_{t}$: The hidden state, $\sigma$: Activation function $\chi_{hh}$, $\chi_{hy}$: Weight coefficients, $y_{t}$: The final output, $b_{y}$ and $b_{h}$: Bias values, and $IN_{t}$: Input data.

The RNN model has different parameters, such as the number of hidden layers and neurons, learning rate, and batch size, which can significantly affect its performance. Therefore, it is essential to accurately adjust these parameters.

Our study addresses this issue by combining the RNN model with the PSO-COO algorithm. By efficiently exploring the hyperparameter space, the PSO-COO algorithm determines the optimal values of RNN parameters. Moreover, the resulting hybrid model, called the PSO-COO-RNN model, achieves higher prediction accuracy, faster convergence, and improved stability compared to the conventional RNN model.

MLR model

MLR is a basic predictive tool that produces outputs using a linear equation. In the context of hydrology, MLR can be applied to predict GWL based on predictors such as rainfall, temperature, and relative humidity⁵². The general form of the MLR model (Eq. 20) is expressed as:

$$F = \varphi_{0} + \varphi_{1} X_{1} + \varphi_{2} X_{2} + .. + \varphi_{n} X_{n} + \psi$$

(20)

where $X_{1}$, $X_{2}$, and $X_{n}$: Input variables, $\varphi_{0}$: The intercept $F$: The MLR output, $\varphi_{1}$, $\varphi_{2}$, $\varphi_{n}$: Regression coefficients, and $\psi$: The error term. The regression coefficients determine the strength and direction of the relationship between each predictor and the output variable. They also affect the precision of the MLR model. Therefore, it is crucial to adjust these parameters effectively.

Therefore, our study develops the PSO-COO MLR to model, which uses PSO-COO to fin-tune its parameters.

Evaluation metrics

To rigorously assess the robustness and reliability of the proposed PCGA model and the comparative models, our study employs a comprehensive set of evaluation metrics. These metrics evaluate both the accuracy and consistency of the predictions and provide a detailed understanding of model performance under varying conditions. These metrics also allow for a systematic comparison between the proposed PCGA model and benchmark model. These metrics are explained as follows:

(1)

Mean absolute error (MAE): MAE measures the average magnitude of errors between predicted values and observed (true) values, without considering their direction (i.e., whether the prediction is over or under the true value)

$$MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {GWL_{iob} – GWL_{ipr} } \right|}$$

(21)
(2)

Nash–Sutcliffe efficiency: NSE is a normalized statistical metric that has been widely utilized in hydrology to assess the precision of forecasts. The NSE value of 1 shows a perfect match between observed and predicted outputs. NSE is particularly useful because it emphasizes both bias and variance in the model predictions (Eq. 22).

$$NSE = 1 – \frac{{\sum\limits_{i = 1}^{n} {\left( {GWL_{iob} – GWL_{ipr} } \right)} }}{{\sum\limits_{i = 1}^{n} {\left( {GWL_{iob} – GW\overline{L}_{iob} } \right)} }}$$

(22)
(3)

Explained variance score (EVS): EVS quantifies the proportion of variance in the observed data that is captured by the predictive model. Its value ranges from 0 to 1, where a value of 1 indicates that the best prediction accuracy (Eq. 23).

$$EVS = 1 – \frac{{Var\left( {GWL_{ob} – GWL_{pr} } \right)}}{{Var\left( {GWL_{ob} } \right)}}$$

(23)
(4)

Uncertainty at 95% (U95): U95 reflects the uncertainty distribution in the predictions. A narrow U95 value suggests high confidence in predictions, while a wide interval may indicate model instability (Eq. 24).

$$U_{95} = 1.96\left( {SD^{2} – RMSE^{2} } \right)^{0.50}$$

(24)
(5)

Kling–Gupta Efficiency (KGE): It is an index that assesses predictive accuracy by considering correlation, bias, and variability simultaneously (Eq. 25). Its value ranges from − ∞ to 1, where 1 indicates perfect agreement between observed and predicted values.

$$KGE = 1 – \sqrt {\left( {r – 1} \right) + \left( {\frac{{\left( {GWL_{pr} } \right)}}{{\left( {GWL_{ob} } \right)}} – 1} \right) + \left( {\frac{{CV_{pr} }}{{CV_{ob} }}} \right)}$$

(25)
(6)

Symmetric mean absolute percentage error (SMAPE): This index provides a symmetric assessment of prediction errors (Eq. 26). SMAPE is particularly suitable for groundwater-level prediction because it balances overestimations and underestimations and provides a more robust measure of relative error in datasets where values can vary widely. Lower SMPAE show better predictive performance.

$$SMAPE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\frac{{\left| {GWL_{pr} – GWL_{ob} } \right|}}{{\frac{{\left( {\left| {GWL_{pr} } \right| + \left| {GWL_{ob} } \right|} \right)}}{2}}}}$$

(26)

where $GWL_{pr}$: Predicted GWL, $GWL_{ob}$: Observed GWL, $r$: Correlation coefficient, $CV_{pr}$: The coefficient variation of the predicted GWL, $CV_{ob}$: The coefficient variation of the observed GWL, $SD$: Standard deviation, $Var$: Variance, $GWL_{pr}$: Predicted GWL, and $GWL_{ob}$: Observed GWL.

Source link