Considering all the necessary data for the analysis, both multiple linear regression analysis (MLRA) and non-linear multiple regression analysis (MLRA) are performed. The nondimensional dependent variables considered in this study are maximum equilibrium scour depth \(\left({d}_{s}/a\right)\), the distance to scour depth from the end of rigid apron \(\left({x}_{0}/a\right)\), the height of dune \(\left({d}_{d}/a\right)\) and the distance to maximum height of dune crest from the end of rigid apron \(\left({x}_{d}/a\right)\). From Gamma test, densimetric Froude number (F_{d}), apron length (L), tail water level (d_{t}), median sediment size (D_{50}) are found to be influencing these four dependent parameters. For modelling, apron length (L), tail water level (d_{t}) and median sediment size (D_{50}) are made dimensionless numbers by dividing them with height of gate opening (a) such as L/a, d_{t}/a, D_{50}/a respectively. The results of variation of dependent parameters with input parameters are analyzed as demonstrated in Fig. 4a–d. Present research reports a rising trend between all dependent parameters against the apron length (L/a), as shown in Fig. 4a. The reason for this trend is attributable to the dissipation of energy of the jet as it travels over the apron. Hence, longer aprons are able to dissipate larger energy and reduce the erosive capacity of the jet. Similarly, rising trends are also visible for the variations of dependent parameters with tail water level (D_{t}/a), densimetric Froude number (F_{d}), and median sediment size (D_{50}), as shown in Fig. 4.

A number of single regressions models for maximum equilibrium scour depth \(\left({d}_{s}/a\right)\), the distance to scour depth from the end of rigid apron \(\left({x}_{0}/a\right)\), the height of dune \(\left({d}_{d}/a\right)\) and the distance to maximum height of dune crest from the end of rigid apron \(\left({x}_{d}/a\right)\) with different input parameters are established. After rigorous study, the best models for each couple (dependent vs independent) with high coefficient of determination R^{2} are identified. Then, multiple regression analysis has been performed and two equations (one for linear and another for nonlinear cases) are obtained for each dependent parameter, as provided below.

### Resulted equations from multiple linear regression analysis (MLRA)

MLRA for maximum equilibrium scour depth \(\left({d}_{s}/a\right)\)

$$\frac{{d}_{s}}{a}=0.293\left({d}_{t}/a\right)-0.055\left(L/a\right)+7.489\left({d}_{50}/a\right)+0.270{F}_{d}-1.437,$$

(8)

MLRA for distance to scour depth from the end of rigid apron \(\left({x}_{0}/a\right)\)

$$\frac{{x}_{0}}{a}=2.295\left({d}_{t}/a\right)-0.371\left(L/a\right)+58.996\left({d}_{50}/a\right)+1.807{F}_{d}-12.317,$$

(9)

MLRA for distance to maximum height of dune crest from the end of rigid apron \(\left({x}_{d}/a\right)\)

$$\frac{{x}_{d}}{a}=2.61\left({d}_{t}/a\right)-0.433\left(L/a\right)+83.677\left({d}_{50}/a\right)+2.204{F}_{d}-13.661.$$

(10)

MLRA for the height of dune \(\left({d}_{d}/a\right)\).

$$\frac{{d}_{d}}{a}=0.279\left({d}_{t}/a\right)-0.062\left(L/a\right)+1.476\left({d}_{50}/a\right)+0.071{F}_{d}-0.573.$$

(11)

### Resulted equations from multiple non-linear regression analysis (MNLRA)

MNLRA for maximum equilibrium scour depth \(\left({d}_{s}/a\right)\)

$$\frac{{d}_{s}}{a}=6.395 {\left(\frac{L}{a}\right)}^{-1.378}{\left(\frac{{d}_{t}}{a}\right)}^{0.90}{\left(\frac{{D}_{50}}{a}\right)}^{0.465}{\left({F}_{d}\right)}^{1.469}.$$

(12)

MNLRA for distance to scour depth of from the end of rigid apron \(\left({x}_{0}/a\right)\)

$$\frac{{x}_{0}}{a}=36.03 {\left(\frac{L}{a}\right)}^{-1.199}{\left(\frac{{d}_{t}}{a}\right)}^{0.828}{\left(\frac{{D}_{50}}{a}\right)}^{0.473}{\left({F}_{d}\right)}^{1.393}.$$

(13)

MNLRA for the distance to maximum height of dune crest from the end of rigid apron \(\left({x}_{d}/a\right)\)

$$\frac{{x}_{d}}{a}=40.94 {\left(\frac{L}{a}\right)}^{-1.04}{\left(\frac{{d}_{t}}{a}\right)}^{0.753}{\left(\frac{{D}_{50}}{a}\right)}^{0.466}{\left({F}_{d}\right)}^{1.258}.$$

(14)

MNLRA for the height of dune \(\left({d}_{d}/a\right)\)

$$\frac{{d}_{d}}{a}=18.395 {\left(\frac{L}{a}\right)}^{-1.534}{\left(\frac{{d}_{t}}{a}\right)}^{1.08}{\left(\frac{{D}_{50}}{a}\right)}^{0.703}{\left({F}_{d}\right)}^{1.022}.$$

(15)

The coefficient of determination R^{2} for multiple linear regression equation and multiple nonlinear regression equation are found to be 0.56 for \(\frac{{d}_{s}}{a}\), 0.66 for \(\frac{{x}_{0}}{a}\), 0.67 for \(\frac{{x}_{d}}{a}\), 0.56 for \(\frac{{d}_{d}}{a}\) and 0.66 for \(\frac{{d}_{s}}{a}\), 0.74 for \(\frac{{x}_{0}}{a}\), 0.74 for \(\frac{{x}_{d}}{a}\), 0.54 for \(\frac{{d}_{d}}{a}\) respectively which measures the percentage of how much of the total variance is explained by the independent variables. Further, an attempt has been made to apply two machine learning approaches such as ANN-PSO and GEP to model these four parameters \({d}_{s}/a\), \({x}_{0}/a\), \({d}_{d}/a\) and \({x}_{d}/a\).

#### Model development using ANN-PSO

In this ANN-PSO modelling, several trials were performed and the coefficients C_{1} and C_{2} were fixed at 1 and 2.5, 2 and 2.5, 1.5 and 2.5, 1.5 and 2.5 for \({d}_{s}/a\), \({x}_{0}/a\), \({d}_{d}/a\) and \({x}_{d}/a\) respectively. The error analysis results for the training data, testing data, and the entire dataset for various swarm sizes and number of neurons (N) for each dependent parameter were analysed. It was observed that the swarm size increases with the same values of C_{1} and C_{2}. While maintaining the number of neurons constant, the values of R^{2}, E, and I_{d} decrease, while the value of RMSE increases.

#### Model development using GEP

In this section, model development for four dependent parameters using the GEP approach is described. By incorporating all the four independent input parameters (L/a, d_{t}/a, D_{50}/a, F_{d}), GEP expression has been derived and GeneXpro Tools 5.0 software package is used for this analysis. Using normalized data, four attempts have been made with the variation of chromosome number, fitness function, and number of runs for modelling the wall jet scouring. Table 6 shows the corresponding parameters of the optimized GEP model.

The expression trees for models of \({d}_{s}/a\), \({x}_{0}/a\), \({d}_{d}/a\) and \({x}_{d}/a\) are presented in Fig. 5a–d, respectively. In this expression tree, *d*_{0} = *L/a*, *d*_{1} = *d*_{t}*/a, d*_{2} = *D*_{50}/*a* and *d*_{3} = *F*_{d}. In Sub-ET 1, 2 and 3 (Fig. 5a), C_{7} and C_{9} are constants, and their values are 3.45 and − 5.56 respectively for model of \({d}_{s}/a\). In Sub-ET 1 (Fig. 5b), C_{2} is constant, the value is − 8.93 for model of \({x}_{0}/a\). In Sub-ET 2 and 3 (Fig. 5c), C_{4} and C_{7} are the constants, and their values are 2.971 and − 0.376 respectively for model of \({x}_{d}/a\). In Sub-ET 1 and 3 (Fig. 5d), C_{4} is constant, the value is − 3.114 and 3.145 respectively for model of \({d}_{d}/a\). The equations derived from the expression trees are presented in Eqs. (16)–(19).

Expression for \({d}_{s}/a\):

$$\frac{{d}_{s}}{a}=\left[\frac{\left(\frac{{d}_{t}}{a}\right)}{\left({e}^{\frac{(L/a)}{3.45}}\right)+\left[\left(\frac{L}{a}\times \frac{{d}_{t}}{a}\right)-\frac{{d}_{t}}{a}\right]}\right]+\left[\frac{\left(\frac{{d}_{t}}{a}\right)}{2.187+\left[\left(\frac{L}{a}\times 3.45\right)-\frac{{d}_{t}}{a}\right]}\right]+\left[\frac{\left(\frac{{d}_{t}}{a}\right)}{{\left({3.45}^\frac{L}{a}\right)}^{\frac{{d}_{t}}{a}}+\left[\left(5.56+{F}_{d}\right)-\left(\frac{{D}_{50}}{a}\right)\right]}\right].$$

(16)

Expression for \({x}_{0}/a\):

$$\frac{{x}_{0}}{a}={e}^{\left(\frac{\text{ln}\left(\frac{{d}_{t}}{a}\right)}{{e}^{\frac{L}{a}^{{d}_{t}/a}}+\left(\frac{L}{a}-8.93\right)}\right)}+{e}^{\left(\left({\text{ln}}\left(2\times \frac{{D}_{50}}{a}\right)-\frac{\left(\frac{{d}_{t}}{a}\right)}{{F}_{d}}\right)\times \frac{L}{a}\right)}+{\text{ln}}\left(\frac{{d}_{t}}{a}\right).$$

(17)

Expression for \({x}_{d}/a\):

$$\frac{{x}_{d}}{a}={\text{ln}}\left(\frac{{d}_{t}}{a}\right)+\left[\left(\frac{{\left(\frac{{F}_{d}}{\left(\frac{{D}_{50}}{a}\right)}\right)}^{\left(\frac{L}{a}+\frac{{d}_{t}}{a}\right)}}{{{F}_{d}}^{2.971}+\frac{L}{a}}\right)\times \left(\frac{{D}_{50}}{a}\right)\right]+\left[1\times {\frac{{e}^{\frac{{D}_{50}}{a}}}{\left(\frac{L}{a}\right)}}^{\left(\frac{{D}_{50}}{a}-0.376\right)}\right].$$

(18)

Expression for \({d}_{d}/a\):

$$\frac{{d}_{d}}{a}=\left[\left(\frac{{D}_{50}}{a}\times \frac{{d}_{t}}{a}\right)\times \left(-3.114\times {F}_{d}\right)\times 9.697\times {\text{ln}}\left(\frac{L}{a}\right)\right]+\left(\frac{{D}_{50}}{a}\times \frac{{d}_{t}}{a}\right)\times {\left({F}_{d}\times \frac{L}{a}\right)}^{{F}_{d}}+\left[\left(\left(\left(\frac{{d}_{t}}{a}\times 3.145\right)\times {\text{ln}}\left(\frac{L}{a}\right)\right)\times {\left(\frac{{D}_{50}}{a}\right)}^{2}\times 3.145\right)\times {F}_{d}\right].$$

(19)

Figure 6a–d shows the relationship between observed and predicted values for the model of \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\) respectively. It is observed that the predicted model of ANN-PSO gives good agreement with observed values for all the four models, whereas GEP shows the unsatisfactory result of the present study.

#### Performance of uncertainty and reliability analysis

To perform a comprehensive statistical assessment of the proposed models, two indices namely confidence interval (U95) and reliability index are computed. The statistical evaluation of the present models, highlighting their predictive capabilities and robustness using uncertainty analysis and reliability index, is presented in Table 7.

Table 7 shows the confidence interval (U95) and reliability index (RI) of MLRA, MNLRA, ANN-PSO and GEP in predicting \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\). ANN-PSO model represented the lowest values of confidence interval (U95), i.e., 0.383, 2.539, 2.805 and 0.268 when compared to MLRA (0.402, 2.604, 3.101and 0.293), MNLRA (0.415, 2.598, 3.063 and 0.301) and GEP (0.483, 28.800, 19.276 and 0.409) for predicting \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\) respectively. Additionally, predictions of \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\) provided by ANN-PSO are more reliable (RI = 0.573, 0.591, 0.576 and 0.548) when compared to other present models. Moreover, MNLRA shows slightly less reliable (RI = 0.521, 0.545, 0.570 and 0.497) than ANN-PSO in predicting \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\). GEP shows wider confidence intervals (U95) and lower relative index, indicating higher uncertainty and less reliable model in predicting \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\). This analysis suggests that the ANN-PSO provides a more consistent and reliable model for the prediction of \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\).

#### Statistical error analysis

This section illustrates the performance of the two soft-computing models and two multiple regression models in predicting \({d}_{s}/a\), \({x}_{0}/a\), \({x}_{d}/a\) and \({d}_{d}/a\). To assess the strength of present approaches, seven statistical indices are accounted including two statistical indices such as Root mean square error (RMSE) and coefficient of determination (R^{2}), and two relative indices, E and I_{d}^{38,49,50,51,52}. The error indices are computed for all the present models in terms of MAE, MAPE, MSE, RMSE, R^{2}, E and I_{d} are depicted in Table 8.

From Table 8, it is found that for both \(\left({d}_{s}/a\right)\) and \(\left({x}_{0}/a\right)\), the error indices, i.e., MAE, MAPE, MSE and RMSE are less for MLRA and MNLRA as compared to the ANN-PSO and GEP. But, the error indices, i.e., MAE, MAPE, MSE and RMSE are found to be less for ANN-PSO as compared to MLRA, MNLRA and GEP for both \({x}_{d}/a\) and \({d}_{d}/a\). However, the R^{2} value is more in ANN-PSO model for all predicting parameter values. E and Id values are also found to be close to 1 for ANN-PSO models for all three predicting parameter values except \(\left({x}_{0}/a\right)\). For \(\left({x}_{0}/a\right)\), E and Id values are found to be close to 1 for MLRA model. By comparing all the statistical parameters, ANN-PSO model shows better result as compared to the other presented regression and soft computing techniques (Table 8).