Figure 6 shows results showcase the performance of four machine learning models (Decision Tree, SVM, KNN, and XGBoost) in predicting UCS. Each figure compares the actual and predicted values for both training and testing datasets. The actual vs. predicted values show reasonable alignment but with noticeable scatter, particularly for the testing data.

Composite DT, SVM, KNN and XGBoost models.
The model struggles with overfitting, as evident from higher variance in predictions for testing data compared to training data indicating limited generalization capability and lower reliability in real-world applications. Similarly, the SVM model shows a clear underestimation for higher values of UCS, resulting in deviations from the ideal diagonal line. The testing data demonstrates moderate alignment, but the model’s predictive capability diminishes for extreme values suggesting the model lacks robustness, especially for datasets with diverse environmental, mechanical, and drilling predictors. Furthermore, KNN exhibits improved alignment compared to DT and SVM but still suffers from moderate scatter in the testing data. Its performance is more consistent but does not achieve the level of accuracy seen in XGBoost as it delivers an almost perfect alignment between actual and predicted values, with minimal scatter for both training and testing datasets. The strong performance is evident from the model’s ability to predict across a wide range of UCS values, capturing complex relationships between predictors (mechanical, drilling, and environmental factors). The model’s superior performance, demonstrated through high R-squared values and low error metrics, highlights its robustness and adaptability. The composite strategy (using all predictor categories) enables XGBoost to outperform other models significantly by leveraging the comprehensive dataset effectively. Hence, the composite XGBoost model excels due to its superior predictive accuracy, robust validation, and practical utility, making it the optimal choice for modelling UCS in complex shale lithologies. Figure 7 shows the comparison of the actual versus predicted plots across the models—Decision Tree (DT), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and XGBoost—clearly illustrates the superior performance of XGBoost.

Simplified DT, SVM, KNN and XGBoost model.
While all models demonstrate a general alignment along the ideal y = x line, simple XGBoost achieves the tightest clustering of points, indicating the highest accuracy and minimal error for both training and testing datasets. In contrast, the DT model shows significant scatter, particularly in the testing data, suggesting poor generalization and a tendency to overfit. SVM performs better than DT but struggles with extreme values, showing deviations from the ideal line. KNN produces more consistent results than DT and SVM, but its predictions exhibit a wider spread around the ideal line, especially for the testing data. In comparison, simple XGBoost demonstrates excellent generalization, with minimal scatter and high predictive precision, reinforcing its robustness and suitability.
Model comparison
Figure 8 shows the performance metrics that reveal the superior accuracy and robustness of the Composite Model compared to other machine learning models, including Decision Tree, Cubic SVM, K-Nearest Neighbors (KNN), and XGBoost, as well as a simpler XGBoost model.

Model performance comparison using average performance indicating the mean value of evaluation metrics (like R2, RMSE, MAE, etc.) calculated across multiple models and datasets.
During training, the Composite Model achieves the lowest MSE (0.011), RMSE (0.011), and MAE (0.011), alongside the highest R-squared (0.991), demonstrating exceptional precision and its ability to explain nearly all variance in the training data. In testing, the Composite Model continues to outperform, achieving the best R-squared (0.981), while maintaining low MSE (0.011), RMSE (0.021), and MAE (0.021), reflecting its ability to generalize effectively to unseen data. When compared to the Simple Model, the Composite Model’s broader inclusion of predictors—integrating shale-water interaction, drilling, and fabric parameters—enables it to better capture complex relationships and dependencies, which the simpler model cannot fully address. This comprehensive approach results in superior performance metrics across all evaluation phases, making the composite models the most reliable and accurate tool for predicting UCS and ensuring its robustness for real-world applications.
Model validation
-
(a)
Level 1 Validation (K-Fold Cross Validation)
It is a widely used technique in machine learning to assess a model’s generalizability and prevent overfitting. The core idea is to split the data into K subsets and train the model on K-1 folds while testing it on the remaining fold. This process is repeated K times, with each fold serving as the test set exactly once. The results from each fold are then averaged to produce an overall performance metric. Unlike a simple train-test split, K-Fold ensures that each data point is used for both training and validation, reducing selection bias. Also, by averaging the performance across K iterations, K-Fold provides a more robust estimate of the model’s performance. The model development in this paper used K = 5.
-
(b)
Level 2 Validation (Using independent datasets for enhanced generalization)
Although the models demonstrated strong performance based on standard evaluation metrics, a comprehensive assessment of their effectiveness requires testing against data not used during the model development phase. To improve the generalizability of the proposed predictive models, validation with independent datasets from different geographic regions is crucial. While the current study shows high predictive accuracy within the primary study area, incorporating data from other parts of the country allows for a more robust evaluation of model stability and transferability. Validating against independent data helps assess the model’s performance under unfamiliar conditions, revealing any tendency toward overfitting to local patterns and supporting broader applicability58,59. Accordingly, to test the proposed models and assess their predictive capability, an independent dataset comprising 959 samples from a distinct region of the country was used for validation.
Figure 9 represents the performance of composite and simple XGBoost model showing distinctive trends. Both models show consistent trends across the sample population. The composite model showed even better prediction accuracy as the predictions generated by these models fall within an error margin of ± 1.51% as compared to a margin error of ± 1.86% for the Simple XGBoost model.

Validation of composite and simple models using independent datasets.
Figure 10 compares the validation performance of the simple and composite XGBoost models highlights the superiority of the composite approach in achieving more reliable and robust predictions.

Performance evaluation of the models in validation.
For the composite model, the training phase yielded a Mean Squared Error (MSE) of 0.16, Root Mean Squared Error (RMSE) of 0.01, Mean Absolute Error (MAE) of 0.01, and an R-squared value of 0.98, while the testing phase achieved an MSE of 0.14, RMSE of 0.02, MAE of 0.02, and R-squared of 0.92. In contrast, the simple model showed comparable performance during training, with an MSE of 0.03, RMSE of 0.03, MAE of 0.03, and R-squared of 0.98, but experienced a significant drop in testing accuracy, yielding an MSE of 0.16, RMSE of 0.06, MAE of 0.05, and R-squared of 0.67. The improved generalization of the composite model, as reflected in its higher R-squared and lower error metrics during testing, underscores its efficacy in capturing complex relationships within the dataset while mitigating overfitting. These findings emphasize the composite model’s suitability for real-world applications requiring both precision and reliability.
Sensitivity analysis
Figure 11 highlights the sensitivity (S%) of various predictors in the Simple XGBoost and Composite XGBoost models, showcasing the superior performance of the composite model across all parameters.

Sensitivity analysis of the predictors in proposed models measured in percentage (being a ratio having both numerator and denominator in the same unit).
The blue bars, representing the Composite XGBoost, consistently achieve higher sensitivity values compared to the orange bars for the Simple XGBoost. Predictors such as UPV, VR, PI, BA, and T exhibit particularly high sensitivity in the composite model, often nearing or exceeding 90%, indicating their strong contribution to the model’s predictive accuracy. In contrast, the Simple XGBoost model shows lower sensitivity for many predictors, including OMC, MC, and PI, suggesting that it captures these relationships less effectively. The enhanced sensitivity of the composite model is attributed to its inclusion of a broader range of variables, which allows it to better account for the complex interactions between predictors. This demonstrates that the composite model not only utilizes the predictors more effectively but also offers a more reliable framework for capturing the nuances of geotechnical behavior. This sensitivity-driven comparison underscores the importance of comprehensive predictor inclusion in achieving high prediction accuracy. Figure 12 highlights the superior performance of the composite XGBoost model (A) in predicting Unconfined Compressive Strength (UCS) compared to other models, within an absolute error threshold of ± 5%. Model A demonstrates the closest alignment with experimental values across all data points, consistently maintaining minimal deviations and accurately tracking observed trends. Unlike other models, such as C and D, which exhibit noticeable fluctuations and deviations from experimental values, the composite XGBoost model provides stable and precise predictions. Furthermore, its predictions remain tightly bound within the error bars, signifying high reliability and precision. Even across significant variations in UCS, such as peaks and troughs observed around counts 5, 15, and 25, the composite model captures these trends effectively, outperforming its counterparts. These results confirm the robustness and accuracy of the composite XGBoost model, establishing it as the most reliable tool for UCS prediction in geotechnical applications.

Model comparison with the existing models. A: Composite XGBoost, B: Simple XGBoost, C: Davoodi et al.45, D: Kolawole et al.46, E: Mollaei et al.47.
Figure 13 shows the Taylor diagram highlighting the superior performance of the composite XGBoost model (A) compared to other models (B, C, D, E) in predicting UCS. The composite XGBoost model achieves a correlation coefficient near 0.99, indicating an almost perfect linear relationship between the predicted and experimental values, surpassing the accuracy of all other models. Additionally, the composite model aligns closely with the reference standard deviation, reflecting its ability to replicate the variability of the experimental data accurately. Its proximity to the reference point further demonstrates that it has the lowest root mean square difference (RMSD), indicating minimal prediction error. Compared to the simple XGBoost model (B), the composite model benefits from a wider range of input variables, enabling more accurate and robust predictions. In contrast, models from previous studies (C, D, E) exhibit lower correlation coefficients, greater standard deviation mismatches, and higher prediction errors. Overall, the composite XGBoost model demonstrates superior accuracy, reliability, and consistency, making it the most effective tool for UCS prediction in geotechnical applications.

Model comparison with existing models in Taylor’s diagram illustrating the comparative performance of various models in predicting unconfined compressive strength (UCS). Model’s proximity to the reference point highlights its effectiveness in replicating both the strength and variability of experimental data. A: Composite XGBoost, B: Simple XGBoost, C: Davoodi et al.45, D: Kolawole et al.46, E: Mollaei et al.47.
Field implications
The innovative methodology employed in this study holds significant importance. The systematic and comparative assessments of diverse ML-based based models and the proposed hybrid model offer a nuanced approach that enables end users to make informed decisions for accurately predicting UCS using a wide range of critical input factors. The proposed models demonstrate high utility due to their minimal errors, both in relative and statistical analyses. Traditional UCS tests, which rely on costly procedures such as rock drilling, sample preservation, core cutting, finishing, testing, analysis, and evaluation, often encounter laboratory errors and complexities, particularly with shale rock samples. Replacing these tedious and error-prone tests with an ML model that incorporates non-destructive UPV tests, shale index characteristics, and field drilling parameters would significantly enhance efficiency and reliability. Furthermore, conventional UCS values often fail to represent actual strata conditions, as weaker samples, typically in broken form, are excluded from strength tests, leaving only the strongest core samples. The proposed models address this limitation by offering a more comprehensive and practical alternative for UCS prediction.
