Experimental framework
The hydraulic experiments conducted in this study were performed in a meticulously controlled laboratory flume with the dimensions of 6.5 m in length, 0.6 m in width, and 0.5 m in depth11. To explore the effects of various geometric and hydraulic parameters on the discharge characteristics of the weirs, nine distinct experimental setups were tested. These included three different weir heights (P = 0.2, 0.3, and 0.4 m) and three varying cycle numbers (N = 2, 3, and 4). In addition, a linear weir was used as a reference to allow for comparative analysis of the performance between traditional and labyrinth weir designs.
The SCLWs were carefully manufactured from 6 mm thick plexiglass to ensure that their geometry was both precise and consistent across all experimental trials. These weirs were systematically arranged within the flume, and their effects on upstream flow depth (h) were studied under different operating conditions. The arrangement and positioning of the weirs within the flume were done in a way that allowed for controlled manipulation of hydraulic conditions, while minimizing external factors that could influence the results. Figure 1 represents a schematic view of the SCLW.

For collecting hydraulic data, a point gauge with an accuracy of 1 mm was used. Table 1 presents an overview of the geometric properties of the experimental models, as well as a summary of the variations tested in the study.
Dimensional analysis
Dimensionality reduction is a key aspect of the data preprocessing phase of ML models since it identifies the significance of various features and their relationship with the target variable40. One of the major problems for ML is dealing with high-dimensional datasets, where a huge number of features can lead to numerous issues, e.g., overfitting, increased computational demands, and reduced model interpretability41,42. With the help of dimensional analysis, it is possible to identify and eliminate redundant or irrelevant features, simplifying the model and making it more effective. Reduction of the feature space in this way enhances the precision and also the interpretability of the predictive model43. In the SCLWs specifically, it is important to accurately identify the most influential parameters that affect the Cd estimation. Knowing these governing parameters is essential in order to improve the Cd predictive capability. These parameters, which directly affect the flow behavior and discharge efficiency, are given in Eq. 1:
$${C_d}=f\left( {N,{l_C},h,V,P,g,\rho ,\mu ,\sigma } \right)$$
(1)
in which V is the flow velocity, g is gravitational acceleration, ρ is the fluid density, µ is to fluid viscosity, and σ is surface tension. To simplify the analysis and enhance the model, Buckingham’s π-theorem was applied. In this process, ρ, V, and P were selected as the repeating variables, and the relevant dimensionless groups governing Cd were systematically derived. This theorem reduces the complexity of the problem by expressing it in terms of non-dimensional parameters, thereby providing a more insightful understanding of how different variables influence each other. The dimensionless groups influencing Cd are expressed in Eq. 2:
$$\begin{aligned} \Pi _{1} = & N;{\text{ }}\Pi _{2} = \frac{{l_{C} }}{P};{\text{ }}\Pi _{3} = \frac{h}{P};{\text{ }}\Pi _{4} = \frac{{V^{2} }}{{gP}};{\text{ }}\Pi _{5} = \frac{{\rho VP}}{\mu };{\text{ }}\Pi _{6} = \frac{{\rho V^{2} P}}{\sigma } \\ {\text{ }}\frac{{\Pi _{4} }}{{\Pi _{3} }} = & {\text{Fr}};{\text{ }}\Pi _{5} \times \Pi _{3} = \text{Re} ;{\text{ }}\Pi _{6} \times \Pi _{3} = {\text{We}} \\ \end{aligned}$$
$${C_d}=f\left( {N,\frac{{{l_C}}}{P},\frac{h}{P},{\text{Fr}},\operatorname{Re} ,{\text{We}}} \right)$$
(2)
where Re is the Reynolds number and We is the Weber number. For the case of fully turbulent flow, with the required depth maintained over the weir crest, Eq. 2 is further simplified, leading to the following form in Eq. 320:
$${C_d}=f\left( {N,\frac{{{l_C}}}{P},\frac{h}{P}} \right)$$
(3)
This simplified relationship forms the foundation for further analysis and model development, where the effect of the Re and We on Cd can be evaluated in detail.
To assess whether the independent variables have a statistically significant effect on the modeling of the Cd, analysis of variance (ANOVA) was performed. This analysis was applied individually to each independent variable with respect to the Cd. Additionally, descriptive statistical measures including minimum, maximum, mean, standard deviation, kurtosis, and skewness were calculated for each feature. The results of the ANOVA, along with the descriptive statistics of the variables, are presented in Table 2.
The p-values obtained from the ANOVA for all independent variables (N, lc/P, h/P) were found to be less than 0.05, indicating that these parameters are statistically significant and have been appropriately selected for modeling the Cd.
Figure 2 shows violin plots and histograms of the distribution of Cd and the independent parameters. The figure shows what the parameters are like across the data. It shows whether the data has a big or small spread. It is helpful to see the spread for each parameter and to consider how each one might affect the prediction of Cd.

Statistical distributions and violin plots of key parameters in the SCLW dataset.
In Fig. 3 the heat map shows the matrix of the variables. The matrix shows the value of correlation between Cd and each of the others parameters. The value of correlation between Cd and N is 0.34. This value of correlation is weak positive correlation, and it means that the value of N tends to increase as the value of Cd increases, but this relationship is weak. The value of correlation between Cd and lC/P is − 0.19. This value of correlation is weak negative correlation. It means that the value of lC/P tends to increase as the value of Cd tends to decrease but this relationship is not very high. The value of correlation between Cd and h/P is − 0.43. This value of correlation is moderate negative correlation. It means that the value of h/P tends to decrease as the value of Cd increases but this relationship is not very high. In final, the analysis shows that the value of h/P has the strongest negative correlation with Cd and the variables N and lC/P are weak in the correlation with Cd.

Correlation Heatmap of Variables.
The complete experimental dataset used for modeling and analysis is presented in Table 3.
Figure 4 illustrates the partial dependence plots (PDP) for the three most influential variables on the predicted Cd.
-
N has a strong positive effect on Cd up to around N = 2, after which the effect plateaus.
-
The variable lc/P shows a slight but consistent positive relationship with Cd, indicating moderate influence.
-
In contrast, h/P demonstrates a strong negative effect, where increasing h/P leads to a noticeable decrease in Cd.
These trends confirm that h/P is the most impactful factor in reducing Cd, while N and lc/P contribute positively to Cd estimation.

PDP showing the effect of key input variables (N, lc/P, and h/P) on the predicted Cd.
Outlier removal using isolation forest
In this study, the Isolation Forest algorithm was used for the detection and removal of outliers. The algorithm is tailored to anomaly detection and thus suitable for outlier detection in data. In contrast to traditional outlier detection methods, which first generate a profile of normal data and then identify anomalies based on how much they differ from this profile, the Isolation Forest algorithm detects outliers directly. Outliers are easily isolated, because they are located in areas far from the bulk of the data points. This property makes the Isolation Forest algorithm highly effective for outlier detection in large and high-dimensional datasets44.
Figure 5 illustrates the operation of this algorithm. In part (a) of the figure, the process of isolating data points using Isolation Trees (iTrees) is shown. Specifically, outliers, represented in red in the figure, are easily isolated from the rest of the data with fewer splits. In contrast, regular data points, shown in blue, require more splits for isolation due to their proximity to other data points. Part (b) of the figure demonstrates the isolation of an anomalous point and a normal point in a two-dimensional space. As seen in this section, an anomalous point is isolated with a single split, while a normal point requires several splits to be isolated. This difference in the number of splits required to isolate anomalous and normal points is a key feature of the Isolation Forest algorithm, making it highly efficient for outlier detection45.

Isolation of outliers and normal points using isolation forest.
In Fig. 6, outliers for each of the variables under investigation are shown. The Boxplot is used to display the distribution of the data and identify unusual points or outliers. In this plot, the red points represent outliers that have been identified using the Isolation Forest model. These values fall outside the Interquartile Range and are considered anomalous data or outliers. Detection of outliers enables scientists to make better decisions regarding correction or deletion of such data points, thus creating more effective ML models.

Boxplot showing outliers identified by isolation forest model.
Sensitivity analysis
In the previous section, dimensional analysis was used to identify the key dimensionless parameters influencing the Cd in SCLWs, as described in Eq. 3. In this section, the influence of these dimensionless parameters on the Cd is thoroughly examined through sensitivity analysis. For this purpose, two methods, EBM and SHAP, have been used. In this section, the fundamental principles of these two methods are explained.
Explainable boosting machine (EBM)
EBM is a ML model designed specifically for interpretability. It is a type of generalized additive model (GAM) that combines the flexibility of ML with transparency. EBM provides insight into how each feature contributes to the model’s predictions46,47. EBM models are additive because the total prediction is the sum of the contribution provided by each feature. It is simple to interpret the contribution of every feature. Unlike traditional linear models, EBM is able to preserve non-linear relationships among outcomes and features without compromising interpretability. EBM provides human-understandable explanations of complex decisions by ML models. It is more interpretable than other complex ML models such as random forests and deep neural networks.
SHAP (Shapley additive Explanations)
SHAP is a technique to explain ML models through a game-theoretic framework using Shapley values48. It attributes a value to each feature for its contribution to the final prediction, promoting fairness by allocating the “payoff” (prediction) among the features based on their influence. SHAP can be applied to both global and local interpretability49,50. SHAP values are based on cooperative game theory, with the “players” being the features, and the “payoff” being the model’s prediction. Each feature’s contribution to the decision made by the model is quantified by computing how much each feature contributes to the prediction, taking into account all possible combinations of features. SHAP can both provide insight into the global model behavior (e.g., feature importance across all predictions) and the local model behavior (e.g., the contribution of individual features to an individual prediction). While computationally intensive, recent developments such as tree SHAP have enabled the use of SHAP on more complex models such as gradient-boosted trees and random forests. SHAP is especially suited for black-box models, including deep learning and ensemble models, that are notoriously hard to interpret51. SHAP enables a more transparent and accountable ML by making it clear why a model has made a certain prediction. SHAP is used extensively in applications such as healthcare, finance, and natural language processing to enhance the interpretability of complex models.
SHAP and EBM are both essential tools for making ML models more interpretable. SHAP is a post-hoc interpretability technique that yields detailed feature attributions, whereas EBM is an intrinsically interpretable model that incorporates the feature contributions into the model structure explicitly. The combination of these two methods enables us to better understand how ML models generate predictions and promotes explainability and trust in AI models.
Machine learning models
In recent years, ML models have gained significant traction in solving complex engineering challenges due to their ability to efficiently and accurately identify relationships between various system parameters15,52,53. These intelligent models are particularly valuable in cases where traditional analytical methods may be too cumbersome or impractical, offering an advantage in terms of both precision and scalability54,55,56.
For this study, several ML techniques were employed to predict the Cd of SCLWs. The models used in this research include DT, LightGBM, ELM-JFO, and TabNet-MFO. Each of these models was specifically selected for its ability to handle nonlinear relationships and provide high prediction accuracy. A comprehensive dataset was utilized, which was divided into training and testing subsets. The training set, consisting of 181 samples (75% of the total dataset), was used to teach the models the underlying patterns in the data. The remaining 60 samples (25% of the dataset) were reserved for testing and validating the models’ predictive performance. The implementation process for each of these ML models is outlined in Fig. 7, showcasing the systematic approach taken to prepare the data, train the models, and assess their performance.

Process of intelligent models for Cd prediction in SCLWs.
In the selection of ML models for this study, we attempted to find a balance between ease of implementation, prediction accuracy, and model scalability. Each of the four models selected—TabNet-MFO, ELM-JFO, LightGBM, and DT—was chosen for specific reasons. TabNet-MFO and ELM-JFO were utilized as novel hybrid models due to their high potential in learning complex patterns from structured data and their automated optimization feature. In comparison, LightGBM and DT, as mature and light-weighted models, were employed as baseline algorithms so that more complex approaches can be compared with them in terms of performance.
For models such as CatBoost, XGBoost, and Random Forest—which are among the strongest and most widely used regression task algorithms—it must be noted that they were not omitted due to their performance limitations. Rather, this study was interested in constructing and evaluating new and less studied models within the context of SCLWs. Additionally, the effectiveness of traditional ensemble approaches has been thoroughly examined in past studies, and their results are well established in the literature. Hence, repeating the same studies fell outside the targeted scope of this research.
Decision tree
The DT model is a ML model used for regression and classification tasks. It recursively splits the data based on the features that minimize a chosen criterion, such as entropy or information gain57. Entropy is a measure of uncertainty in a dataset. For a feature X and target classes y1, y2, ., yk entropy is calculated as58:
$$H(X)= – \sum\limits_{{i=1}}^{k} P ({y_i}){\log _2}P({y_i})$$
(4)
where P(yi) is the probability of each class. Information Gain is used to evaluate splits. It measures how much uncertainty (entropy) is reduced by splitting the dataset based on a particular feature:
$$IG(X)=H(S) – \sum\limits_{{i=1}}^{n} {\frac{{|{S_i}|}}{{|S|}}} H({S_i})$$
(5)
where H(S) is the original dataset entropy, Si are subsets of data after the split, and ∣Si∣/∣S∣ is the weight of each subset. The decision tree, at each step, selects the feature with maximum information gain for splitting the data. It continues going on in this way until stop criteria are met, e.g., tree depth or number of samples. Decision trees are popular due to their simplicity and interpretability but tend to overfit, especially if the tree is made too deep.
Tabular neural network
The TabNet is a novel deep learning architecture designed specifically for tabular data, introduced by Google Research in 2019. Unlike traditional neural networks that struggle with structured data, TabNet combines the power of attention mechanisms and sequential feature processing to achieve interpretability while maintaining high performance, rivaling gradient-boosted trees (e.g., XGBoost)59. TabNet aggregates the outputs of all decision steps to make the final prediction. The model incorporates attention mechanisms to guide feature selection. A feature transformer decides which features should be passed to the next decision step, using information from previous steps. This helps TabNet focus on the most important features for the current decision. The mathematical foundation of TabNet involves using a sparse mask at each decision step, which is learned through attention. The final prediction is a weighted sum of the outputs from each decision step60:
$$\hat {y}=\sum\limits_{{t=1}}^{T} {{\alpha _t}} \cdot{{\mathbf{f}}_t}({\mathbf{X}})$$
(6)
where αt is the weight for the t-th decision step, ft (X) is the function applied to the input features at step t, T is the total number of decision steps. TabNet is known for its interpretability. The attention mechanism and feature masks help users understand which features influenced the model’s predictions. It has also shown better performance compared to traditional ML models like gradient-boosted decision trees (GBDTs), making it particularly effective for complex tabular data tasks.
Extreme learning machine
The ELM is a ML algorithm for training single-layer feedforward neural networks (SLFNs). It is known for its fast training speed, as it eliminates the need for iterative optimization methods like backpropagation. The hidden layer weights are randomly initialized and fixed, while the output weights are calculated analytically, which significantly speeds up the training process. The hidden layer output is calculated using a nonlinear activation function ϕ61:
$$H({x_i})=\phi (W{x_i}+b)$$
(7)
where W is the input-to-hidden weight matrix, b is the bias, and ϕ is the activation function (e.g., sigmoid, RBF). The output is a linear combination of the hidden layer outputs62:
$${y_i}={{\mathbf{w}}^T}H({x_i})+{b_{out}}$$
(8)
where w is the output weight vector. The output weights are computed using the formula:
$${{\mathbf{W}}_{{\mathbf{out}}}}={({H^T}H+\lambda I)^{ – 1}}{H^T}T$$
(9)
where H is the matrix of hidden layer outputs, T is the target matrix, λ is a regularization parameter, and I is the identity matrix. ELM’s primary advantages are its speed, simplicity, and good generalization performance. It is particularly useful when training speed is critical and avoids complex iterative methods.
Light gradient boosting machine
LightGBM is a gradient boosting decision tree (GBDT) framework that is optimized for speed and performance on large datasets. It uses new techniques like histogram-based learning and leaf-wise tree growth that enable it to be faster and more scalable than conventional methods like XGBoost. Building trees in a novel manner is one of the main functionalities of LightGBM and is the primary reason for its improved performance. Unlike traditional gradient boosting methods, which use level-wise tree growth, LightGBM employs a leaf-wise approach63. In this strategy, the algorithm prioritizes the splitting of leaves that result in the most significant reduction in the loss function, allowing the model to build deeper trees with fewer iterations. This approach helps to improve the model’s accuracy while simultaneously reducing the computational complexity64. The objective function for LightGBM can be written as65:
$$L=\sum\limits_{{i=1}}^{n} \ell ({y_i},{\hat {y}_i})+\sum\limits_{{k=1}}^{K} \Omega ({f_k})$$
(10)
where \(l({y_i},{\hat {y}_i})\) is the loss function for a given model, usually squared error for regression or log loss for classification, \({\hat {y}_i}\) is the predicted value, yi is the true label, fk is the k-th tree, and Ω(fk) is the regularization term that penalizes the complexity of the tree, typically related to the number of leaf nodes and tree depth. LightGBM grows trees leaf-wise, selecting the leaf that results in the largest decrease in the loss function. This leads to more accurate trees. The leaf-wise growth strategy is based on minimizing the following:
$$\Delta {\hat {y}_k}= – \frac{{\sum\limits_{{i \in {L_k}}} {{\nabla _i}} }}{{\sum\limits_{{i \in {L_k}}} {{\text{ }}\backslash {\text{hessia}}{{\text{n}}_i}} +\lambda }}$$
(11)
in which \(\Delta {\hat {y}_k}\)is the predicted change in the leaf value for the k-th leaf, ∇i is the gradient of the loss function with respect to the i-th instance, \hessiani is the second derivative of the loss function (Hessian), Lk is the set of instances assigned to the k-th leaf, and λ is a regularization parameter to avoid overfitting. To speed up computation, LightGBM uses a histogram-based algorithm.
Hyperparameter optimization algorithms
Tuning the hyperparameters of models is crucial for improving performance and prediction accuracy in machine learning algorithms, as the optimal selection of parameters can make significant differences in the final results. In this study, the MFO algorithm was used for tuning the hyperparameters of the TabNet model, JFO for the ELM model, and Optuna for the DT and LightGBM models.
Moth flame optimization (MFO)
The MFO algorithm is a nature-inspired metaheuristic optimization technique introduced by Mirjalili in 2015. It mimics the navigation behavior of moths around artificial or natural light sources, known as transverse orientation. MFO is widely used for solving continuous and discrete optimization problems in engineering, machine learning, and other domains. Moths represent candidate solutions, while flames represent the best solutions (the optimal ones). Moths move toward the flame, following a spiral path. The position of each moth is updated as it gets closer to the best solution (flame). The movement is mathematically described by the equation66:
$${X_i}(t+1)={X^*}+({X_i}
(12)
where Xi(t) is the current position of moth i, X* is the position of the best solution (flame), a and b are constants controlling the spiral behavior, and r is a random vector. In the beginning, moths explore the search space widely. As they approach the optimal solution (flame), they refine their positions with greater focus on exploitation (fine-tuning the solution). MFO strikes a good balance between global exploration and local exploitation, making it effective for complex optimization problems. It is simple to implement and computationally efficient. MFO is applied in various fields, such as machine learning for hyperparameter tuning and engineering for optimizing system designs.
Jaya firefly optimization (JFO)
JFO algorithm is a hybrid optimization technique consisting of two algorithms, the Jaya and Firefly. This combination of methods—random search through the Jaya algorithm and random directed movement according to fireflies—enhances the process of search and replicates natural phenomenology. To explain the components, Jaya Optimization is an optimization algorithm that does not implement special control parameters and operates on basic random processes and local decision-making. In each cycle, the searcher improves by capitalizing on better solutions and evading worse ones, always in the direction of improvement67. Firefly Algorithm, on the other hand, draws inspiration from the way fireflies signal each other using light in dark environments. Fireflies move towards areas of brighter light and thus find their way. With the combination of these two algorithms, JFO is a robust optimization tool that leverages the power of random search mechanisms in addition to light-based guidance toward areas of optimality. It finds particular application in optimizing machine learning models like ELM when efficient exploration of complex hyperparameter spaces is necessary68.
Optuna
Optuna is a state-of-the-art algorithm for searching machine learning model hyperparameters with the goal to automatically find the optimal parameters through techniques like Bayesian optimization. Optuna first performs random searches in the parameter space and then refines the search using statistical models and Bayes techniques to achieve more accurate optimization. Optuna is one of the most sophisticated hyperparameter optimization software for machine learning models and is capable of efficiently identifying the optimal combination of parameters for complex models69. The primary characteristics of Optuna are its use of Bayesian search algorithms, which enables intelligent search over an extensive parameter space to identify the most optimal parameters70. The second characteristic is that it can scale and is thus highly appropriate for large-scale data and complex models. Besides, Optuna can efficiently be applied within distributed systems, thus enabling scaling and optimizing parameters faster. Optuna was used in this study to optimize the hyperparameters of DT and LightGBM models. Due to its advanced hyperparameter search capabilities and high accuracy in other machine learning models, Optuna was also used as the method to optimize such models. Overall, Optuna stands out for automatic search, scalability, and application of Bayesian optimization methods, making it an excellent package for optimizing complex machine learning models69.
Error evaluation metrics
Error evaluation metrics play a crucial role in assessing the performance and precision of predictive models71,72. These metrics assist researchers in pinpointing the strengths and weaknesses of different ML algorithms and ensemble techniques, which is vital for choosing the most appropriate model for predicting hydraulic parameters73,74. In this research, a variety of statistical error metrics, including R², RMSE, sMAPE, SI, WMAPE, MARE, RMSRE, and PBIAS are utilized to evaluate the accuracy of the Cd predictions for weirs. Table 4 summarizes each metric, its formula, and the optimal value for reference75.
Performance evaluation techniques
To evaluate and compare the effectiveness of the ML models used in this research, two primary techniques were employed: the Taylor Diagram and the Performance Index (PI). These methods provide valuable insights into the performance of the models by integrating key statistical metrics. Furthermore, both techniques were used for model comparison and ranking, while the R-Factor method was applied to assess the uncertainty of the models.
Taylor diagram
The Taylor Diagram is a visual tool used to assess model performance by integrating different statistical metrics, such as the Pearson Correlation Coefficient (PCC), SD, and centered RMSE (E’). This diagram is particularly useful for comparing predicted outcomes with actual data and is frequently applied in fields like hydrodynamics77.
The Taylor Diagram integrates these metrics into a single plot. The radial distance from the center signifies the model’s SD, while the angle with respect to the x-axis represents the PCC. The distance from the reference point indicates the E’ value. In this diagram:
-
A model that is positioned closer to the reference point is viewed as more precise, as it shows reduced prediction error.
-
A model that is nearer to the x-axis indicates a stronger correlation with actual values.
This diagram offers a clear and succinct visual comparison of model performance, effectively illustrating the interplay between the statistical metrics.
It is necessary to clarify that RMSE is a general metric for measuring prediction error, as it includes both bias (i.e., the difference between the mean of predicted and observed values) and the spread of the errors. In other words, RMSE reflects a combination of systematic and random errors. In contrast, E’ removes the mean from both the predicted and observed series before calculating the error. As a result, it measures only the variability of the error around the mean, effectively excluding the impact of bias.
Performance index (PI)
The Performance Index (PI) serves as a comprehensive metric for evaluating various ML models by analyzing the relationship between predicted and observed values. It merges two primary evaluation metrics—R² and RMSE. The PI is calculated by determining the ratio of R² to RMSE15:
$${\text{PI}}=\frac{{{R^2}}}{{{\text{RMSE}}}}$$
(13)
By combining these two measures, the PI offers a balanced, stable measure of model performance that allows models to be directly compared to one another. Models having larger PI values are deemed to have greater predictive accuracy and reliability.
R-Factor
The uncertainty assessment of the models in this study was conducted using the R-Factor metric. This index, commonly used in hydrological and environmental modeling, quantifies the ratio of the average width of the 95% confidence interval of the predictions to the standard deviation of the observed values. A lower R-Factor value indicates higher confidence in the predictions and less dispersion relative to the observed data. To calculate the confidence bands, a bootstrap method was applied to the residuals, effectively capturing the influence of random errors in the predictions78.
