Hybrid machine learning approach for predicting compressive strength of sustainable concrete incorporating palm oil fuel ash

Machine Learning


The model was evaluated using hidden-layer sizes ranging from 8 to 20 neurons to identify the optimal architecture for the 6–H–1 network. To guarantee a fair comparison of performance, all models were run on the same dataset. With the highest R2 (0.983) and the lowest RMSE (3.1 MPa), the model with 15 hidden neurons produced the best results, as seen in Fig. 6. As a result, the 6–15-1 architecture was chosen as the final configuration to predict POFA concrete’s compressive strength.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.

Comparison of different number of hidden layer neurons.

Figure 7 compares the convergence behavior of the ANN and ANN-BBO models in terms of mean MSE over the training epochs. As shown, ANN reached its best performance at epoch 66 with an MSE of 28.521, whereas the ANN-BBO model attained a substantially lower MSE of 8.572 at epoch 40. These results indicate that BBO-based optimization enhanced the training efficiency of the neural network by reaching a lower error in fewer epochs. This improved convergence behavior suggests that ANN-BBO provided a more effective search for favorable network parameters than the conventional ANN training procedure.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.

Convergence curves of the performance of models.

Finally, Table 4 summarizes the settings of selected parameters for ANN and BBO models.

Table 4 Parameter settings for models.

Figure 8 shows the corresponding schematic of the chosen model structure, with six neurons representing the mix parameters in the input layer, fifteen neurons in the hidden layer, and one neuron representing the predicted compressive strength in the output layer.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.

Final structure for model with the best performance.

Regression analysis

The performance of the predictive models, ANN and ANN-BBO, was evaluated using the training, validation, and testing datasets. As Fig. 9 shows, the correlation between experimental and predicted compressive strength is strong: the points cluster near the y = x line with narrow deviation bands, indicating that both models capture the nonlinear relationship between input variables and compressive strength. However, the ANN-BBO shows better performance than the individual ANN, with R2 values of 0.9823, 0.986, and 0.9843 for the training, validation, and testing sets, respectively, compared with 0.9515, 0.9519, and 0.956 for the ANN. This demonstrates a clear improvement in predictive performance for ANN-BBO over ANN.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.

Experimental and predicted compressive strength values for training, validation, and testing datasets using ANN and ANN-BBO models.

Error histogram plot

The distribution of prediction errors for the ANN and ANN-BBO models for the compressive strength of POFA is shown in Fig. 10. Each histogram and scatter plot illustrates the deviation error between the experimental and predicted compressive strength values for the training, validation, and testing sets. In both models, most data points are clustered around the zero-error line, indicating that the predictions are generally unbiased and reasonably accurate.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.

Error distribution of the ANN and ANN-BBO models.

However, the ANN-BBO model exhibits a narrower and more symmetric error distribution: about 39% of the errors for the ANN model fall within the range [− 5%, 5%], whereas more than 60% of the results for ANN-BBO lie in the same interval, confirming the higher predictive accuracy of the hybrid model. This conclusion is also supported by the boxplots of percentage error. For the ANN-BBO model, 50% of the data lie between approximately -3.53% and + 4.20%, with a median error of 0.32%, while the same 50% of the errors for the ANN model lie in a wider range, from about − 6.96% to + 5.60%, with a median error of − 0.92%. These results demonstrate that the hybrid ANN-BBO model achieves greater precision and stability, effectively reducing random deviations and improving overall predictive accuracy compared with the ANN.

Figure 11 illustrates the empirical distribution of prediction errors obtained from 1000 bootstrap resamples for both ANN and ANN-BBO models. The distributions for both models are bell-shaped and almost symmetric, which indicates stable sampling behavior. Table 5 also presents the corresponding results for the bootstrap distribution. As shown in Fig. 11 and Table 5, for the ANN model the mean percentage error is 0.22%, with a 95% confidence interval (CI) ranging from − 0.82 to 1.26%, giving a CI width of 2.08%. On the other hand, the ANN-BBO model yields a mean error of 0.85%, with a narrower 95% confidence interval ranging from 0.23 to 1.51%, and a CI width of 1.28%. The bootstrap analysis showed that the ANN-BBO model had a slightly higher mean prediction error than the standalone ANN model; however, its confidence interval was noticeably narrower. This indicates that, although the average bootstrap error of ANN-BBO was marginally larger, its predictive performance was more consistent across bootstrap resamples and less sensitive to sampling variability. In contrast, the ANN model yielded a lower mean error but exhibited a wider CI, suggesting greater variation in performance across resampled datasets. This pattern is consistent with the bias–variance trade-off commonly observed in machine learning, where a model with slightly higher average error may nonetheless provide more stable predictions. Therefore, the bootstrap results should not be interpreted as showing uniformly better accuracy for ANN-BBO; rather, they indicate that ANN-BBO offered improved performance and stability, whereas ANN showed slightly lower average bootstrap error.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.

Bootstrap error distributions for ANN and ANN-BBO models based on 1000 resamples.

Table 5 Summary of bootstrap statistics for ANN and ANN-BBO models, including mean error.

The narrower confidence interval observed for the ANN-BBO model indicates reduced sampling variability and tighter clustering of bootstrap estimates. This suggests that ANN-BBO provides more reliable and consistent predictions under repeated resampling.

Evaluating performance metrics

To compare the two models more clearly, Fig. 12 presents a sensitivity analysis summary of the statistical metrics, and Table 6 lists the corresponding values. Figure 12a–f show, respectively, the a10-index, MAE, RMSE, RRMSE, VAF, and OBJ. Results are reported for the all, training, validation, and testing subsets for both ANN and ANN-BBO. Note that the ideal behavior is a10-index and VAF close to 100%, and MAE, RMSE, RRMSE, and OBJ close to 0.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.

Comparison of ANN and ANN-BBO performance across datasets.

Table 6 Numerical values of evaluation metrics for ANN and ANN-BBO on all, training, validation, and testing sets.

As can be seen from Fig. 12 and Table 6, the ANN-BBO model outperformed the ANN model, showing lower errors in all metrics. For example, the RMSE (Fig. 12c) values obtained from ANN-BBO for all, training, testing, and validation data are 3.087, 3.206, 3.167, and 2.915 MPa, respectively, while the corresponding RMSE values for ANN are 5.208, 5.138, 5.393, and 5.346 MPa, respectively. Additionally, the a10-index shows higher values for ANN-BBO than for the ANN model, reflecting the percentage of data records that satisfy 0.9 < experimental CS/predicted CS < 1.1.

Figure 13 presents a Taylor diagram comparing the performance of the ANN and ANN-BBO models. The diagram simultaneously displays the correlation coefficient and standard deviation, providing a compact visual summary of model accuracy across the training, validation, and testing datasets. Although both models show reliable performance in predicting the compressive strength of POFA, the ANN-BBO points lie closer to the reference, indicating higher correlation and lower deviation from the experimental data. The ANN points lie farther from the reference, reflecting slightly greater variability and error. This graphical illustration confirms the consistency, accuracy, and stability of the hybrid ANN-BBO model compared with the ANN, in agreement with the earlier performance results.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.

Taylor diagram for the ANN and ANN-BBO models.

To evaluate model robustness and reduce the bias associated with a single data split, a ten-fold cross-validation procedure was conducted. Figure 14 summarizes the R2 and RMSE values obtained for the ANN and ANN-BBO models over the ten folds, with the dotted lines representing the corresponding mean values. The ANN model exhibited R2 values ranging from 0.868 to 0.952, with a mean of 0.907, while the ANN-BBO model achieved R2 values between 0.918 and 0.984, with a higher mean of 0.954. In terms of error, the RMSE values for ANN varied from 5.200 to 8.237 MPa, with an average of 6.728 MPa. For ANN-BBO, the RMSE ranged from 3.088 to 6.725 MPa, with a lower average of 4.658 MPa. The average RMSE of ANN-BBO was therefore about 30.8% lower than that of ANN (see Fig. 14). These results show that the ANN-BBO model maintained better predictive accuracy across the cross-validation folds and provided a more reliable estimate of generalization performance.

Fig. 14
Fig. 14The alternative text for this image may have been generated using AI.

Cross-validation results for ANN and ANN-BBO models.

Contribution of variables

The parallel coordinate plot in Fig. 15 is mainly used as an exploratory check of how the ANN-BBO model behaves across the full input space. Each polyline represents one mixture, and the colors indicate different compressive strength ranges. As shown in Fig. 15, high-strength mixtures (green and cyan lines) tend to cluster around higher cement content, lower water-to-binder ratios, and moderate POFA and SP dosages. Low-strength mixtures (yellow and red lines) are more frequently associated with higher W/B, lower cement content, and less favorable combinations of aggregates and POFA. The wider spread of the yellow/red trajectories indicates larger variability in poorly proportioned mixtures.

Fig. 15
Fig. 15The alternative text for this image may have been generated using AI.

Parallel coordinate plot for the ANN-BBO model showing the relationships between input variables and predicted compressive strength.

Since the 28-day compressive strength is a key criterion in concrete mix design, the magnified section of Fig. 15 focuses on how each input parameter contributes to CS28. The dotted grey rectangles indicate the approximate optimum ranges of the input variables associated with higher 28-day strength. As illustrated in Fig. 15, the optimum ranges for cement, POFA, and superplasticizer contents are roughly 330–550, 110–220, and 6–13 kg/m3, respectively, while the favorable ranges for the CA/FA and W/B ratios are about 1.35–1.50 and 0.28–0.35, respectively.

Overall, this visualization confirms that the hybrid ANN-BBO model recognizes meaningful multidimensional patterns in the dataset, effectively separates well-designed mixtures from suboptimal ones, and demonstrates clear sensitivity to the parameters most influential on compressive strength.

Figure 16 presents the SHapley Additive exPlanations (SHAP) summary for the ANN-BBO model, showing the relative importance of the input variables and their effects on the predicted 28-day compressive strength. Each point represents an individual mixture, where the horizontal position indicates the SHAP value (i.e., the contribution of a given feature to increasing or decreasing the predicted strength), and the color reflects the magnitude of the feature value.

Fig. 16
Fig. 16The alternative text for this image may have been generated using AI.

Summary plot of SHAP values.

As shown in Fig. 16, age has the strongest overall influence on compressive strength, with higher curing ages generally contributing positively and lower ages contributing negatively. This result is physically expected because the SHAP analysis was performed on the full dataset, which includes compressive strength measurements at multiple curing ages rather than only 28-day strength. In this context, the strong contribution of age reflects the well-established time-dependent nature of strength development in cementitious materials, governed by hydration progress and microstructural evolution. At the same time, because the dataset spans multiple curing ages, age captures a substantial portion of the overall variation in compressive strength and should therefore be interpreted within the framework of multi-age prediction. W/B is also highly influential, with higher W/B values mostly contributing negatively and lower W/B values contributing positively, consistent with the expected reduction in strength at higher water contents. Cement content (C) and CA/FA exhibit moderate effects; higher cement content generally tends to increase strength, while CA/FA shows a mixed pattern, suggesting nonlinear interactions with other mixture variables. Superplasticizer dosage has a smaller but still observable influence, mainly positive at moderate levels. POFA shows the lowest SHAP magnitude, indicating that its effect is less direct and likely depends strongly on its interactions with other parameters. Overall, the SHAP results suggest that the ANN-BBO model captures relationships that are broadly consistent with established concrete behavior.

Validating the proposed model by comparison with literature models

To compare the proposed ANN-BBO model with other predictive models, its performance was evaluated against the results reported by Ali et al.43, who developed seven different AI models to predict the compressive strength of concrete containing POFA. These methods included LSSVM, LGBM, XGB, hybrid XGB-LGBM, ANN, and GEP. They trained and tested these models on a POFA database, and their performance metrics are summarized in Table 7 alongside the results of the ANN-BBO model proposed in the current study.

Table 7 Comparison of the proposed ANN-BBO model with other prediction models for predicting CS of POFA concrete.

As shown in Table 7, among the models proposed by Ali et al.43, the hybrid XGB-LGBM and ANN models performed better than the others, with R2 values of 0.976 and 0.968, respectively, and relatively low MAE values of 3.11 MPa and 3.82 MPa compared to other models. However, the ANN-BBO model still shows better performance: it achieves a higher R2 of 0.983, and both MAE and RRMSE are lower than those reported for the hybrid XGB-LGBM and ANN models. It is also important to note that these improvements were achieved on a larger database. While Ali et al.43 used 407 data points, the present study employed 469 mixtures (i.e., more than a 15% increase in dataset size). These results suggest that the developed model provides improved predictive performance relative to previously reported methods, while being trained on a larger dataset.

LSSVM Least square support vector machine, LGBM Light gradient boosting machine, XGB Extreme gradient boosting, GEP Gene expression programming.



Source link