Estimating seepage in heterogeneous earthfill dams on permeable foundations using explainable machine learning

Machine Learning


Correlation exploration of dataset

Figure 6 illustrates the correlation structure among the studied input variables and the seepage discharge output using Pearson and Spearman correlation coefficients, respectively. The Pearson correlation heatmap (Fig. 6a) reveals that the upstream water head exhibits a strong positive linear correlation with seepage discharge, with a correlation coefficient of \(\:r=0.807\). This indicates that increases in reservoir head lead to a proportional increase in seepage discharge, consistent with classical seepage theory. Meanwhile, all remaining geometric parameters (B, D, Hcotα and Hcotθ), display negligible linear correlations with seepage discharge (\(\:\mid\:r\mid\:<0.10\)). Similarly, \(\:{K}^{{\prime\:}}\) shows a weak negative correlation (\(\:r=-0.193\)), while \(\:{K}_{f}\) exhibits a very weak positive correlation (\(\:r=0.072\)). The near-zero off-diagonal values among the input variables further indicate a low degree of multicollinearity, confirming that the selected predictors are largely independent and suitable for regression-based modeling.

The Spearman rank correlation heatmap (Fig. 6b) confirms the findings of the Pearson analysis. \(\:h\) again demonstrates a strong monotonic relationship with seepage discharge, with a Spearman coefficient of \(\:\rho\:=0.908\), reinforcing its dominant influence on seepage behavior. All other predictors exhibit weak or negligible monotonic correlations with seepage discharge (\(\:\mid\:\rho\:\mid\:<0.15\)), including the slope parameters and hydraulic conductivity descriptors. As a result, variables exhibiting strong Pearson correlations also demonstrate comparable Spearman coefficients, indicating that the dominant relationships are both linear and monotonic. This agreement confirms that the observed dependencies are not driven by outliers or nonlinear artifacts, thereby supporting the statistical robustness of the dataset. Moreover, the low inter-variable correlations suggest minimal redundancy among predictors, justifying the applicability of linear regression-based models, while still allowing more advanced ML approaches to capture higher-order interactions.

Fig. 6
Fig. 6

Heatmaps between studied variables based on (a) Pearson and (b) Spearman correlation coefficients.

Baseline linear regression model

A multiple linear regression (MLR) model was developed using the seven governing physical and geometrical variables to establish a baseline parametric relationship for seepage discharge prediction58. The model achieved a coefficient of determination of \(\:{R}^{2}=0.704\), indicating that 70.4% of the variability in seepage discharge is explained by the selected predictors. The equality between the adjusted and unadjusted \(\:{R}^{2}\) values confirm that the model does not suffer from overfitting, despite the inclusion of multiple explanatory variables. Additionally, the interpretation of the F-statistic and p-value in MLR was tested. The F-statistic tests the joint significance of all predictors, while the associated p-value quantifies the probability that the observed regression relationship occurred by chance; the large F-value and \(\:p<0.05\) confirm the strong statistical significance of the model. In the present study, the MLR model is highly statistically significant (\(\:F=1484\), \(\:p<0.001\)), demonstrating that the observed relationship between the predictors and the response variable is extremely unlikely to have occurred by random chance. The resulting regression equation (Eq. 5) is expressed as follows:

$$\:q=-24.4-0.348\:B+0.0416\:D+2.02\:h-0.0016\:\left(Hcot\:\alpha\:\right)-0.0216\:\left(Hcot\:\theta\:\right)-173.1\:{K}^{{\prime\:}}+0.892\:{K}_{f}$$

(5)

Table 3 summarizes the standard errors, t-statistics, p-values, 95% confidence intervals (CI), and variance inflation factors (VIFs) for all predictors. The upstream water head emerges as the most influential predictor, exhibiting a large positive coefficient (2.020) with a very high t-value (97.99, \(\:p<0.001\)). This finding is fully consistent with seepage theory and the correlation analysis presented earlier. \(\:D\) and \(\:{K}_{f}\) also show statistically significant positive effects (\(\:p<0.001\)), indicating that deeper and more permeable foundations contribute to increased seepage rates. Conversely, \(\:{K}^{{\prime\:}}\) exhibits a strong negative coefficient (− 173.09, \(\:p<0.001\)), reflecting the effectiveness of a low-permeability core in reducing seepage discharge.

Among the geometric descriptors, the crest width has a statistically significant negative influence on seepage, suggesting that wider crests increase seepage flow paths and hydraulic resistance. In contrast, \(\:Hcot\:\alpha\:\) and \(\:Hcot\:\theta\:\) are statistically insignificant (\(\:p>0.05\)), indicating a limited linear contribution to seepage discharge within the investigated parameter ranges. All predictors exhibit VIF values close to unity, confirming the absence of multicollinearity and supporting the numerical stability and interpretability of the regression model.

Table 3 Statistical significance of the MLR model.

Bayesian optimization

Train RMSE

Figure 7 illustrates the evolution of the training RMSE over 50 BO iterations for the five investigated ML models, reflecting their ability to efficiently explore the hyperparameter space and converge toward optimal configurations. The DT model exhibits pronounced instability, with large RMSE spikes during early iterations, indicating strong sensitivity to hyperparameter selection and a tendency toward overfitting due to the absence of ensemble averaging. The RF model shows improved stability relative to the DT model, with generally lower RMSE values and reduced fluctuations, although occasional sharp increases persist, reflecting sensitivity to tree depth and ensemble size. The SGB model demonstrates a smoother convergence pattern than the DT and RF models, maintaining relatively consistent RMSE levels across most iterations; however, intermittent RMSE peaks occur during exploration phases, indicating sensitivity to learning rate and boosting depth parameters.

In contrast, the LGB model exhibits the most stable and consistently low training RMSE throughout the optimization process, with a smooth convergence trajectory and minimal extreme fluctuations. The CGB model achieves very low minimum RMSE values, with several iterations approaching near-zero error; nevertheless, its higher oscillations compared to the LGB model suggest greater sensitivity to hyperparameter choices and a potential risk of overfitting if not properly controlled. Overall, the comparative assessment indicates that boosting-based ensemble models outperform tree-based approaches, with the CGB model providing the most balanced trade-off between accuracy and stability.

Fig. 7
Fig. 7

Evolution of training RMSE for BO Performance.

CV RMSE

Figure 8 presents the evolution of the cross-validated RMSE over 50 BO iterations for the five investigated models, providing insight into their generalization performance under varying hyperparameter configurations. The DT model again exhibits substantial variability, with pronounced RMSE spikes during early iterations, indicating strong sensitivity to hyperparameter selection and limited robustness when evaluated on unseen folds. Although CV RMSE gradually stabilizes in later iterations, the high fluctuation magnitude suggests poor generalization consistency. The RF model shows improved stability relative to the DT model, with lower average CV RMSE values; however, intermittent sharp increases remain, reflecting sensitivity to ensemble configuration and residual variance across folds.

The SGB model demonstrates a comparatively smoother CV RMSE trajectory, maintaining low error levels across most iterations, although isolated peaks appear during exploratory steps. In contrast, the LGB model exhibits the most stable CV RMSE behavior, with minimal fluctuations and consistently low error values, indicating strong generalization capability and effective regularization during hyperparameter tuning. The CGB model achieves CV RMSE values comparable to the LGB model and occasionally lower minima; however, its higher oscillations across iterations suggest greater sensitivity to hyperparameter choices. Overall, the CV RMSE analysis confirms that boosting-based ensemble models outperform tree-based approaches, with the LGB and CGB models providing most robust balance between predictive accuracy, thus minimizing the risk of overfitting.

Fig. 8
Fig. 8

Evolution of CV RMSE for BO Performance.

Overfitting gap

Figure 9 presents the evolution of the overfitting gap, defined as the difference between CV RMSE and training RMSE (CV RMSE − Train RMSE), across 50 BO iterations for the five investigated models. This metric provides a direct measure of each model’s tendency to overfit during hyperparameter tuning. The DT model exhibits pronounced fluctuations and several large positive gaps, particularly during early and mid-optimization stages, indicating limited generalization and strong sensitivity to hyperparameter configurations. Similarly, the RF model shows intermittent large overfitting gaps, reflecting instability in generalization performance despite ensemble averaging.

The SGB model demonstrates a comparatively smaller and more controlled overfitting gap, with most iterations remaining within a narrow RMSE range, suggesting improved generalization behavior. The LGB model exhibits the most consistently low overfitting gap, with minimal dispersion across iterations, indicating an effective balance between model complexity and regularization. In contrast, the CGB model, while achieving low training errors, shows persistently larger overfitting gaps, implying stronger sensitivity to hyperparameter choices and an increased risk of overfitting.

Fig. 9
Fig. 9

Evolution of the overfitting gap for BO Performance.

Hyperparameters tuning

Figure 10 summarizes the hyperparameter tuning outcomes for all investigated models, showing the search ranges, initial reference values, and final optimized hyperparameters selected through BO. For the DT model, the optimization favored a substantially deeper tree structure, with the maximum depth increasing from an initial value of 8 to a final value of 29, alongside an increase in maximum leaf nodes from 250 to 491. At the same time, higher values of minimum samples per leaf (from 3 to 8) and minimum samples per split (from 10 to 2) were selected, indicating a trade-off between model expressiveness and regularization. The final max_features value converged close to unity (≈ 0.99), allowing the tree to exploit nearly all predictors at each split. For the RF model, BO substantially increased the ensemble size, selecting 856 trees compared to an initial value of 250, while also increasing the maximum tree depth from 18 to 40. Regularization was enforced through moderate increases in min_samples_split (10→16) and min_samples_leaf (3→1), and by reducing max_features from 0.8 to approximately 0.60 and max_samples to around 0.62, thereby enhancing diversity among trees. The cost-complexity pruning parameter (ccp_alpha) converged toward zero, indicating that explicit pruning was less critical once ensemble averaging was established.

The SGB model converged toward a configuration characterized by a large number of estimators (≈ 2000) combined with a low learning rate (≈ 0.01), reflecting the classical boosting trade-off between incremental learning and model stability. The optimized tree depth increased from 3 to 8, while subsampling was reduced from 0.8 to approximately 0.70, introducing stochasticity to mitigate overfitting. Moderate values of min_samples_split (10) and min_samples_leaf (3) were retained, and max_features converged to approximately 0.58, indicating partial feature utilization at each split. For the LGB model, BO selected a high-capacity yet strongly regularized configuration. The number of estimators increased markedly from 500 to 3000, while the learning rate decreased from 0.05 to 0.005, ensuring stable convergence. Model complexity was controlled through a substantial increase in num_leaves (31 → 236) and max_depth (− 1 → 24), accompanied by a strong increase in min_child_samples (20 → 83). Both subsample and colsample_bytree converged to values around 0.53–0.80, promoting randomness and robustness. Regularization terms (reg_alpha and reg_lambda) remained near zero, suggesting that structural constraints dominated over explicit penalization. Finally, the CGB model converged toward 803 boosting iterations, a relatively high learning rate (≈ 0.30), and a tree depth of 6.0, indicating a preference for moderately deep trees with faster learning. Strong regularization was achieved through an increase in L2_leaf_regularization (3 → 60) and stabilization via random strength = 1 and rsm ≈ 0.9, ensuring robustness against noise and feature dominance.

Fig. 10
Fig. 10

Range, initial, and final hyperparameters for developed ML models.

Cross-validation analysis

Figure 11 compares RMSE values for the adopted models across five-fold and ten-fold schemes. Under the five-fold CV (Fig. 11a), the SGB and CGB models consistently achieve the lowest RMSE values across all folds, demonstrating superior predictive accuracy relative to the other approaches. Among these, the CGB model generally outperforms the SGB model, attaining the minimum RMSE in most folds and exhibiting strong fitting capability. The LGB model follows closely, maintaining competitive RMSE values with slightly higher dispersion than SGB and CGB. In contrast, the RF model shows moderate performance, while the DT model records the highest RMSE values and the largest inter-fold variability, indicating limited generalization ability. This behavior is further supported by the fold-wise RMSE ranges, which span 0.129–0.364 m3/d/m for SGB and 0.128–0.338 m3/d/m for CGB, compared to substantially wider ranges for the RF model (0.204–0.407 m3/d/m) and DT model (0.259–0.440 m3/d/m), confirming the superior accuracy and stability of boosting-based models under five-fold validation.

Under a ten-fold CV (Fig. 11b), a fold-dependent behavior similar to that observed in the five-fold case is identified, with clearer distinctions in model stability. The CGB model generally exhibits the most consistent performance across the majority of folds, achieving low RMSE values throughout the validation process; however, a pronounced degradation is observed in Fold 5, where the RMSE increases sharply to 0.547 m3/d/m, indicating sensitivity to that specific data partition. The SGB model follows, maintaining relatively stable and low RMSE values across folds within the range 0.080–0.408 m3/d/m, demonstrating strong overall generalization with fewer extreme deviations. The LGB model ranks third, showing competitive and consistent performance (0.087–0.419 m3/d/m) that is broadly comparable to the RF model (0.162–0.458 m3/d/m), both exhibiting moderate fold-to-fold variability. Finally, the DT model consistently records the highest RMSE values (0.181–0.465 m3/d/m) and the largest sensitivity to fold selection, confirming its comparatively weak generalization capability.

Fig. 11
Fig. 11

Performance of the ML models across (a) 5-fold and (b) 10-fold CV folds based on RMSE.

Assessment of ML models

Goodness of fit

Figure 12 presents scatter plots of the predicted versus actual seepage discharge (q) for the adopted models. Each subplot shows both the training (blue markers) and validation (green markers) data. The dashed black line represents the line of equality (perfect prediction), while the grey dashed lines indicate ± 10% deviation, giving a visual reference for prediction accuracy. For the DT model, approximately 80–85% of the training data points and 85–90% of the validation points lie within the ± 10% limits. The remaining points, particularly at low and intermediate seepage discharges, exhibit noticeable deviations, which is consistent with the comparatively higher RMSE values obtained for DT (training RMSE ≈ 0.274 m³/d/m; validation RMSE ≈ 0.223 m³/d/m). The RF model shows the same behavior as the RF, as several deviations remain visible at lower and intermediate discharge levels. This behavior aligns RMSE values (training RMSE ≈ 0.25 m³/d/m; validation RMSE ≈ 0.212 m³/d/m), reflecting improved performance than the DT model but still moderate goodness of fit.

In contrast, the SGB model exhibits a marked enhancement in predictive accuracy, with approximately 92–95% of both training and validation data points falling within the ± 10% bounds. The tight clustering around the line of equality across the entire discharge range is supported by lower RMSE values (training RMSE ≈ 0.20 m³/d/m; validation RMSE ≈ 0.11 m³/d/m), indicating strong agreement between predicted and observed seepage discharges and minimal performance degradation from training to validation. The LGB model demonstrates lower behavior than the SGB model, with about 91–93% of training and validation points lying within the ± 10% limits. The dispersion is largely restricted to intermediate discharge values. This level of consistency is reflected by RMSE values for both stages (training RMSE ≈ 0.23 m³/d/m; validation RMSE ≈ 0.14 m³/d/m). The CGB model achieves the highest concentration of predictions within the ± 10% bounds, with more than 98% of training and validation points closely aligned with the equality line. The near-perfect overlap between predicted and observed values is corroborated by the exceptionally low RMSE values (training RMSE ≈ 0.03 m³/d/m; validation RMSE ≈ 0.07 m³/d/m). The small increase in RMSE from training to validation suggests slight regularization effects but no meaningful loss of accuracy, indicating excellent goodness of fit.

Fig. 12
Fig. 12

Scatter plots of predicted versus actual seepage discharge during the training and validation stages.

Error analysis

Figure 13 enables a direct comparison of the predictive behavior of the investigated ML models by examining how rapidly their residual errors accumulate during the training and validation stages. In both stages, models whose REC curves rise more steeply and remain above others at low residual thresholds indicate a larger proportion of accurate predictions within tight error bounds. In the training stage (Fig. 13a), the CGB model exhibits the steepest initial rise in the low-error region, indicating that a larger fraction of their predictions falls within small residual thresholds compared to the other models. The LGB and SGB models follow closely, showing a rapid but slightly less pronounced ascent. The RF model displays a more gradual increase in cumulative probability at small residual values, while the DT model shows the slowest rise, indicating a wider spread of residual errors during training. At larger residual thresholds, except for the CGB model, the ML models converge toward similar cumulative levels, suggesting that extreme training errors are limited across approaches.

In the validation stage (Fig. 13b), the relative behavior of the models remains broadly consistent, though differences become more informative of generalization. The CGB curve continues to rise rapidly at small residual errors, maintaining a high cumulative proportion of low-error predictions, followed closely by LGB and SGB, which show comparable but slightly slower accumulation. The RF and DT models again demonstrate moderate-to-low behavior, with a noticeable delay in reaching high cumulative coverage compared to the boosting-based models. As in the training stage, all curves eventually converge at higher residual values, confirming that very large prediction errors are infrequent for all models. Overall, the REC comparison indicates that boosting-based models (CGB, LGB, and SGB) consistently achieve a higher proportion of low-residual predictions in both training and validation stages.

Fig. 13
Fig. 13

Performance of the ML models using RECs during (a) training and (b) validation stages.

Rank analysis

Table 4 summarizes the rank-aggregation analysis of the five adopted ML models using seven statistical indices across training and validation stages. Each metric was converted to an ordinal score (1 = best & 5 = worst), and the scores were summed to provide the overall rank. The results reveal clear differences in the predictive ability and robustness of the models.

The CGB model achieves the lowest aggregated score (19) and is therefore ranked first, reflecting consistently superior performance across nearly all accuracy and reliability metrics in both training and validation stages. Notably, CGB exhibits the smallest prediction uncertainty, as indicated by its very low \(\:{U}_{95}\)values (0.076 in training and 0.514 in validation), confirming tight prediction intervals alongside minimal error measures. The SGB model ranks second (total score = 34) with relatively narrow uncertainty bounds (\(\:{U}_{95}=0.549\) in training and 0.926 in validation). The LGB model follows in third place (total score = 45), offering competitive accuracy and stable generalization, though with moderately wider uncertainty ranges (\(\:{U}_{95}=0.647\) in training and 1.009 in validation). The RF model ranks fourth (total score = 52), showing moderate predictive skill but increased uncertainty in validation (\(\:{U}_{95}=1.005\)), while the DT model performs weakest overall (total score = 60), characterized by the largest uncertainty bounds (\(\:{U}_{95}=0.758\) in training and 1.015 in validation) and higher error levels. Collectively, the rank-aggregation results confirm that boosting-based ensemble models not only improve accuracy but also substantially reduce predictive uncertainty, with CGB providing the most reliable balance between precision and confidence in seepage discharge estimation.

Overall, the rank analysis consolidates evidence from multiple perspectives: pre- and post-tuning comparisons, cross-validation with 5 and 10 folds, scatter evaluations, and error distribution analysis. The findings converge to a clear conclusion: boosting algorithms outperform traditional tree-based models, with the CGB model offering the most accurate and stable predictions, closely followed by the SGB and LGB models, while the DT and RF models remain less reliable due to weaker generalization.

Table 4 Rank analysis of the adopted models.

SHAP analysis

SHAP is a method that clarifies how each feature influences a model’s predictions. It calculates feature contributions, with summary plots showing overall importance and dependence plots revealing feature interactions59. This simplifies model interpretation, ensuring transparency and trust. Figure 14 provides a comprehensive insight into the relative importance and contribution of each input parameter to the prediction of seepage discharge across the best predictive model (CGB model).

SHAP summary dot plot (Fig. 14a) illustrates both the magnitude and direction of each input parameter’s contribution to seepage discharge predictions. Each point represents an individual scenario, colored according to the feature value (low to high). The plot shows that the upstream waterhead (h) has the largest influence on seepage discharge, indicating increased seepage due to elevated hydraulic gradients across the dam body and foundation. The hydraulic conductivity ratio (K′) also exerts a strong influence, where higher contrasts between core and shell permeability significantly alter seepage paths. The foundation permeability (Kf) contributes positively to seepage, reflecting enhanced under-seepage through permeable foundations. Geometric parameters such as dam height (D) and crest width (B) show moderate but noticeable effects, while the slope-related parameters (Hcotθ and Hcotα) exhibit smaller and more localized contributions, indicating secondary control on seepage behavior. The SHAP summary bar plot in Fig. 14b quantifies the average contribution of each input parameter to seepage discharge prediction through their mean absolute SHAP values. The (h) clearly dominates the model response, with a mean SHAP value of approximately 3.55. The (K′) ranks second, with a mean SHAP value of about 0.76. In comparison, the remaining parameters exhibit substantially smaller contributions: (D) or (Kf) show moderate influence, with mean SHAP values of approximately 0.28 and 0.24, respectively. The crest width (B) contributes marginally (~ 0.11), while the slope-related parameters Hcotθ and Hcotα exhibit the lowest mean SHAP values (~ 0.06 and ~ 0.01, respectively), indicating a relatively weak direct impact on seepage discharge.

On the other hand, Fig. 14c illustrates the instance-wise variation of SHAP values across the dataset, highlighting how feature contributions fluctuate from one scenario to another. The heatmap shows pronounced variability in the contribution of (h) and (K′), particularly at higher predicted seepage values, indicating strong interaction effects between hydraulic loading and material heterogeneity. In contrast, geometric parameters display more uniform and lower-magnitude contributions across instances, confirming their relatively stable but secondary role in controlling seepage discharge. Finally, Fig. 14d depicts a representative decision plot, showing how individual feature contributions accumulate from the baseline prediction to the final seepage discharge estimate. The plot demonstrates that increases in (h) and (K′) drive the prediction upward most significantly, while (Kf ) and (D) further adjust the estimate depending on foundation permeability and dam geometry. Parameters related to slopes (Hcotθ and Hcotα) contribute marginally, occasionally offsetting or slightly reinforcing the dominant effects. This sequential contribution illustrates how seepage discharge emerges from the combined influence of hydraulic loading, material properties, and geometry.

Fig. 14
Fig. 14

SHAP visualizations: (a) Summary dot plot, (b) Summary bar plot, (c) Heatmap showing the fluctuation, and (d) Decision plot of features contribution to the model’s predictions.

Interactive GUI

To enable practical use, the best predictive ML model, the CGB model, is deployed through a user-friendly Tkinter-based desktop application60, supporting both offline and online use for easy access and instant predictions. Figure 15 presents the standalone desktop graphical user interface (GUI) developed for predicting seepage discharge through non-homogeneous earthfill dams on permeable foundations. The application accepts seven user-defined geometric and hydraulic inputs and instantly returns the predicted seepage discharge (q, m³/d/m), with automatic formatting in standard or scientific notation depending on magnitude. A reference schematic is embedded to clarify how each parameter maps to the physical dam profile, while tabs and toolbars allow batch evaluation, history tracking, and input management. The GUI is openly accessible at https://github.com/mkamel24/dam.

From an engineering perspective, the proposed framework can be directly applied in design and safety evaluation by using the developed open-access GUI to rapidly estimate seepage discharge under different hydraulic and geometric conditions. The predicted seepage rates can be readily compared with the design of seepage capacity (\(\:{Q}_{\text{design}}\)) to support screening-level safety checks and operational decision-making. However, seepage control decisions based on ML predictions are inherently constrained by the governing input parameters, as highlighted by the SHAP-based interpretation. The SHAP analysis clarifies how changes in key variables may positively or negatively influence seepage discharge, indicating which parameters can be effectively controlled through design or remediation measures and which reflect inherent site conditions.

Fig. 15
Fig. 15

Example screenshot of the desktop-based GUI to predict q (m3/s/m).

Models verification

Test dataset

Figure 16 presents scatter plots of predicted versus observed seepage discharge for the testing dataset, enabling a direct comparison of model performance under unseen conditions. Across all models, the data points generally align along the line of equality, indicating good overall agreement between predicted and actual values. The DT and RF models exhibit noticeable dispersion around the equality line, with several points falling outside the ± 10% bounds, particularly at low-to-intermediate discharge values. This behavior is reflected in their comparable coefficients of determination (\(\:{R}^{2}=0.988\) for both models) and relatively higher RMSE values (0.468 m³/d/m for DT and 0.467 m³/d/m for RF), indicating similar predictive accuracy and error spread in the testing stage.

In contrast, the SGB and LGB models show a tighter clustering of points around the equality line, with fewer deviations beyond the ± 10% limits across the discharge range. These patterns are accompanied by high \(\:{R}^{2}\)values (0.988 for SGB and 0.986 for LGB) and slightly lower RMSE values for SGB (0.459 m³/d/m) compared to LGB (0.496 m³/d/m), suggesting differences in dispersion despite similar correlation strength. The CGB model exhibits the most compact distribution of points around the equality line, with minimal scatter and limited outliers, particularly at higher discharge values. This visual behavior is supported by the highest coefficient of determination (\(\:{R}^{2}=0.996\)) and the lowest RMSE (0.253 m³/d/m) among the tested models. Overall, the comparison demonstrates that while all models maintain strong correlation on the testing dataset, they differ in the degree of dispersion and error magnitude, with boosting-based models generally showing tighter agreement between predicted and observed seepage discharge.

Fig. 16
Fig. 16

Verification of ML models via scatter plots using the unseen dataset (Testing dataset).

Case study: Hub dam – pakistan

To further validate the predictive capability of the proposed CGB model, an independent case study was conducted on Hub Dam, an earthfill dam located approximately 35 km northeast of Karachi, Pakistan (25°15′N, 67°07′E)28. Hub Dam has been extensively investigated in previous seepage studies, making it a suitable benchmark for external validation. Figure 17 illustrates the dam cross-section together with the adopted geometric configuration and hydraulic properties of the core, shell, and foundation materials.

Fig. 17
Fig. 17

Hub Dam cross-section used in the validation process via the best predictive model (CGB).

Table 5 compares the seepage discharge values predicted by the CGB model, the empirical equation proposed by Khursheed et al.41, and the numerical SEEP/W model reported by Arshad and Babar28 for three reservoir water levels. The results demonstrate a strong agreement between the CGB predictions and both reference approaches. Across all examined water levels, the CGB model yields seepage discharge estimates that closely match the numerical and empirical results, with percentage differences consistently remaining below 10%. For instance, at a water level of 339 m, the CGB-predicted seepage discharge (4.529 m³/d/m) differs by only 0.81% from the numerical SEEP/W result (4.493 m³/d/m), while the empirical equation predicts 4.225 m³/d/m, corresponding to a larger deviation of approximately 7.2%. In terms of predictive performance, the CGB model achieves a high coefficient of determination during the testing stage (R² = 0.996), which is fully comparable to the numerical modeling efficiency reported by Arshad and Babar28 (model efficiency ≈ 99.60%). Moreover, the CGB model outperforms the empirical formulation of Khursheed et al.41, which reported a lower coefficient of determination (R² ≈ 0.96) in similar seepage applications. These findings confirm that the CGB model not only reproduces physically based numerical results with high fidelity but also provides improved accuracy over empirical equations, underscoring its reliability and suitability for practical seepage assessment of earthfill dams.

Table 5 Verification of the best model in predicting seepage of Hub Dam – Pakistan against previous studies.



Source link