### Compressive strength

Table 1 lists the optimal parameter values of the model after hyperparameter optimization, and the definition of each parameter is also shown, in which the LR model has no hyperparameter. Table 2 shows the three evaluation indicator values of the four models after hyperparameter optimization on the training and test sets. An observation can be made that the R^{2} values of the LR model on the train and test set were 0.72 and 0.70, respectively, which was the lowest among the four models, while the RMSE and MAE of the LR model on the training and test set were the largest. These indications suggest that the LR model is experiencing underfitting. The R^{2} of the KNN model was significantly higher than that of the LR model, and the RMSE and MAE values were also significantly reduced.

The RF and XGB models were superior to the LR and KNN models. The RF model had the best generalization ability, and its R^{2} was the highest among the four models, at 0.99 and 0.92 on the training and test sets, respectively. The XGB model exhibited signs of overfitting, evident from the R^{2} of 1.00 on the training set and R^{2} of 0.89 on the test set. As such, the XGB model was deemed inappropriate under the division of the compressive strength data set in the present study.

From the R^{2} comparison of the aforementioned models on the training set and the test set, it can be found (Table 2) that the R^{2} values of the LR and KNN models on the two data sets were small. Both models exhibited underfitting in compressive strength prediction. The LR model reached its limit state as it lacked hyperparameters to optimize, and thus was unable to enhance its accuracy in predicting the compressive strength of ECC further. While the KNN model’s prediction accuracy for compressive strength may not have matched that of XGB and RF models, it offered simplicity with only one hyperparameter, resulting in faster training speed. Despite this, the KNN model demonstrated acceptable performance on the test set. Overall, in terms of compressive strength prediction accuracy, the RF model emerged as the top performer.

Figure 7 shows the predicted and actual compressive strength values on the test set. The predicted values of the LR model, KNN model, XGB model, and RF model gradually became more aggregated on both sides of the straight line, indicating that the RF model has the best generalization level on the compressive strength test set.

While ensemble learning methods often exhibit exceptional accuracy, they lack transparency in explaining their output, leading to what is known as the “black box” of machine learning. To address this challenge, the SHapley Additive exPlanations (SHAP) approach was introduced. SHAP is a mathematical framework developed to elucidate the underlying prediction mechanisms of machine learning models. This methodology originated from Shapley game theory and was initially proposed by Lundberg and Lee^{71}.

Figure 8 shows the results of the feature importance analysis of the RF model (analyzing the optimal model). The SHAP value of a feature indicates its contribution to the target value. When the SHAP value is closer to 0, it suggests that the feature has a smaller impact on the target value. Conversely, when the SHAP value is farther from 0, it indicates a more significant contribution of the feature to the target value. An observation can be made from Fig. 8 that the water-cement ratio, silica fume, and water reducer contributed the most to the prediction of compressive strength. In contrast, the size of the compressive specimen, PVA fiber, and CGP contributed the least to the prediction of compressive strength. Although the feature importance of Fig. 8 can illustrate the contribution of feature parameters, it cannot indicate whether the feature had a positive or negative impact on the prediction results. However, the Global SHAP value in Fig. 9 can demonstrate how each feature positively or negatively affected the compressive strength. In Fig. 9, each point represents a sample, with red representing a high eigenvalue and blue representing a low eigenvalue. Taking ‘W’ as an example, the high eigenvalue (red) made a negative contribution to the model output, and the low eigenvalue (blue) made a positive contribution to the model output. A lower eigenvalue (blue) would cause a higher SHAP value, indicating that the water-cement ratio and compressive strength had a negative impact. Contrarily, the high eigenvalue (red) positively contributed to the model output, and the low eigenvalue (blue) negatively contributed to the model output. A higher eigenvalue (red) would cause a higher SHAP value, indicating that silica fume positively affected compressive strength. This pattern applies to other features analyzed similarly.

SHAP offers the capability to observe both local (individual) interpretability as well as global interpretability, a feature not achieved by traditional variable importance algorithms. In Fig. 10, the SHAP values of a randomly selected single sample from the training model are depicted. The colors, blue and red, represent negative and positive contributions, respectively. Each feature is represented by a bar, with the length indicating its contribution to the prediction result. Red bars signify features that increase the prediction value, while blue bars indicate features that decrease the prediction value. Notably, in this specific example, quartz sand exhibited the longest red bar, indicating its significant contribution to increasing compressive strength, while silica fume had the longest blue bar, suggesting its substantial role in decreasing compressive strength. Additionally, SHAP can illustrate partial dependence plots, which differ from traditional ones by using SHAP values on the y-axis rather than the target value. Figure 11 shows the SHAP partial dependency plots. As shown, the compressive strength was positively correlated with silica fume and water reducing agent and negatively correlated with water cement ratio. In addition, the relationship between silica fume and compressive strength was almost linear. In contrast, the relationship between the water-cement ratio of the water-reducing agent and compressive strength was virtually logarithmic. From Fig. 11b, it can be observed from Fig. 11b that when W < 0.5, the SHAP value decreased with the increase in W, and the compressive strength was highly sensitive to the change of W. When W > 0.5, although W was still inversely correlated with compressive strength, the sensitivity was significantly weaker than W < 0.5. According to Fig. 11c, it can be found that when WR < 0.1, the SHAP value increased with the increase in W, and the compressive strength was highly sensitive to the change of WR; when WR > 0.1, the sensitivity of compressive strength to WR was significantly weakened.

### Flexural strength

Table 3 shows the optimal parameter values of the model after hyperparameter optimization. Table 4 shows the evaluation indicators of the four models on the training and test sets. An observation can be made that the R^{2} values of the LR model and KNN model on the training set were smaller than those on the test set, which suggests the presence of data leakage, potentially leading to overly optimistic results^{72}. The RF and XGB models performed better, with R^{2} of 0.97 on the training set and R^{2} of 0.91 on the test set. The RMSE and MAE of the RF model on the training set were 1.16 and 0.80, and the RMSE and MAE on the test set were 2.32 and 1.62, smaller than the XGB.

Figure 12 shows the predicted and actual flexural strength values on the test set. The four models had good prediction results on the test set, but the LR and KNN models exhibited data leakage phenomena that could not be used to make accurate predictions. Thus, only the generalization ability of the RF and XGB models on the test set was deemed practical, and the RF model was the most superior in terms of predicting flexural strength.

In summary, under the premise of the present study’s flexural strength data set, the LR and KNN models exhibited data leakage, and thus were not feasible for ECC flexural strength prediction. The XGB and RF models demonstrated exceptional prediction accuracy for the flexural strength of ECC, effectively addressing a gap in the literature where previous studies did not predict the flexural strength of ECC.

Figure 13 shows the importance of analyzing RF model characteristics of flexural strength. It indicates that PE fiber, water-cement ratio, and fiber aspect ratio contributed the most to the prediction of flexural strength. In contrast, other fiber types contributed the least to the prediction of compressive strength.

Figure 14 shows that PE fiber positively affected flexural strength, while the water-cement ratio had a negative effect. Other characteristics can also be analyzed according to the Global SHAP value. Figure 15 shows the SHAP values of a random single sample of the training model. The most extensive red bar was coal gangue powder, which would increase the flexural strength for the specific sample of coal gangue powder. The most extensive blue bar was fiber content, which would reduce the flexural strength in this particular sample of fiber content.

In Fig. 16, the SHAP dependency partial plots reveal several key observations regarding the relationship between input features and flexural strength. Notably, flexural strength exhibited a positive correlation with fiber aspect ratio and the presence of PE fiber, while showing a negative correlation with the water-cement ratio. Moreover, the relationship between fiber aspect ratio and flexural strength appeared to be roughly linear, indicating that flexural strength increased with higher fiber aspect ratios. Conversely, the relationship between water-cement ratio and flexural strength appeared to be almost logarithmic. It can be found from Fig. 16b that when W < 0.75, the SHAP value decreased rapidly with the increase in W, indicating that the flexural strength was highly sensitive to the change of W in this range. When W > 0.75, although W was still inversely correlated with flexural strength, the sensitivity had almost disappeared.

### Tensile strength

Table 5 lists the optimal parameter values of the model after hyperparameter optimization. Table 6 shows the evaluation indicator values of the four models on the training and test sets. The R^{2} values of the LR model on the training and test sets were only 0.71 and 0.63, which shows that the LR model could not predict the tensile strength well. The R^{2} values of the KNN model on the training and test sets were 0.89 and 0.75, which indicates overfitting. The R^{2} values of the RF model on the training and test sets were 0.96 and 0.84; although RF exhibited slight overfitting, the prediction accuracy on the test set was relatively good. The R^{2} values of the XGB model on the training and test sets were 0.97 and 0.87. R^{2} was the largest among the four models, and the RMSE and MAE on the test set were only 1.04 and 0.68, the smallest among the four models.

From the collected data sets, it is evident that the variability of tensile strength in ECC surpassed that of compressive strength. In Table 6, it is apparent that both LR and KNN models exhibited underfitting. In the dataset used in the present study, the LR and KNN models failed to accurately predict the tensile strength of ECC. This could be attributed to the models’ simplicity and the considerable variability in tensile strength test values. As such, machine learning struggled to fully capture the underlying relationships within the data, necessitating improvements in model complexity. This is why RF and XGB models achieved more accurate predictions. Despite their increased complexity, tree models yielded superior prediction results.

Figure 17 shows the predicted and actual tensile strength values on the test set. The RF and XGB data were considerably aggregated near the line, indicating exceptional generalization ability of the two models for tensile strength. Comparatively, the XGB model was superior.

In the dataset used in the present study, the LR and KNN models failed to predict the tensile strength of ECC accurately. This could be attributed to the models’ simplicity and the considerable variability in tensile strength test values. As such, machine learning struggled to fully capture the underlying relationships within the data, necessitating improvements in model complexity. This is why RF and XGB models achieved more accurate predictions. Despite their increased complexity, tree models yielded superior prediction results.

Figure 18 shows analysis of the XGB model’s feature importance for tensile strength. PE fiber, water-cement ratio, and water reducer contributed the most to predicting tensile strength, while other fibers contributed minimally.

Figure 19 shows that PE fiber positively affected tensile strength, while the water-cement ratio had a negative effect. Other characteristics could also be analyzed according to the Global SHAP value. Figure 20 shows the SHAP value of a random single sample of the training model. The most extensive red bar was slag. For this specific sample, the slag would increase the tensile strength. The most extensive blue bar was PE fiber. In this particular sample, PE fiber would reduce the tensile strength.

Figure 21 shows the SHAP dependency partial plots. An observation can be made that the tensile strength was positively correlated with the water-reducing agent and whether PE fiber was present or negatively correlated with the water-cement ratio. In addition, the water-cement ratio and water-reducing agent were almost logarithmically related to the tensile strength. Figure 21a shows that when W < 0.5, the SHAP value decreased rapidly with the increase in W, indicating that the tensile strength was highly sensitive to the change of W in this range. When W > 0.5, although W was still inversely related to the tensile strength, the sensitivity became considerably weak. An observation can be made from Fig. 21b that when WR < 2, the SHAP value increased rapidly with the increase in WR, indicating that the tensile strength was highly sensitive to the change of WR in this range. When WR > 2, the sensitivity of tensile strength to WR changes in this range became significantly weaker.

### Tensile strain capacity

Table 7 lists the optimal parameter value of the model after hyperparameter optimization. Table 8 shows the evaluation indicator values of the four models on the training and test sets. The LR model was the weakest since the R^{2} values were only 0.63 and 0.61 on the training and test sets. The R^{2} values for the KNN model were 0.85 and 0.70 on the training and test sets, respectively, indicating a slight degree of overfitting. However, the model’s generalization ability on the test set remained moderate. The R^{2} of the RF model reached 0.92 on the test set, but was only 0.72 on the test set, indicating overfitting of the model. The XGB model was the best, although slight overfitting existed. The R^{2} values of the XGB model were 0.95 and 0.80 on the training and test set. As shown in previous research^{73}, the variation range of tensile strain capacity in the test was [± 0.56%, ± 1.77%]. However, the MAE on the test set of the XGB model in the present study was only 0.84%, which is entirely within a reasonable error range.

Figure 22 shows the predicted values and the actual values of tensile strain capacity on the test set. Among them, the XGB model prediction data was relatively more aggregated near the line and had the best effect on the prediction of tensile strain capacity. Notably, the prediction of tensile strain capacity could not be better than the other three mechanical indicators. This discrepancy can be attributed to the close relationship between tensile strain capacity and the uniform defects within the matrix of ECC. Many test groups do not introduce uniform artificial defects, leading to significant variability in tensile strain capacity test values^{74,75,76}.

In Fig. 23, the analysis of the importance of features for the XGB model reveals that PE fiber, fiber content, and oil film PVA fiber made the most significant contributions to predicting tensile strain capacity, while other fibers contributed minimally. Figure 24 shows that PE fiber, fiber content, and oil film PVA fiber positively impacted tensile stress capacity. Additionally, other characteristics can be further analyzed based on Global SHAP values.

Figure 25 shows the SHAP values of a random single sample of the training model. The most extensive red bar was fly ash. For this specific sample, fly ash would increase the tensile strain capacity. The most extensive blue bar was the fiber content. In this particular sample, the fiber content would reduce the tensile strain capacity. Figure 26 shows the SHAP dependency partial plots. An observation can be made that the tensile strain capacity was positively correlated with the water-cement ratio, fiber content, and whether there was PE fiber. In addition, the three features were roughly linear with the tensile strain capacity, and the sensitivity between tensile strain capacity and input features was moderate and stable.

### Control strategy of ECC mechanical indicator

According to the feature importance and SHAP analysis, there are different strategies to improve each mechanical indicator of ECC.

Reducing the water-cement ratio, increasing silica fume content, and selecting PE fiber are the most effective strategies for compressive strength. For flexural strength and tensile strength, selecting PE fiber, reducing the water-cement ratio, and increasing the fiber aspect ratio are the most effective methods. For tensile strain capacity, increasing the fiber aspect ratio, selecting PE fiber, and increasing fiber content will be the most effective methods. The control methods of ECC mechanical properties are shown in Table 9. An observation can be made that there are effective methods to improve the four mechanical properties, such as the use of PE fibers and the reduction of the water-cement ratio, thereby providing effective guidance for the design of ECC.