This section discusses the dataset, prediction algorithms the proposed hybrid model, performance metrics and explainable artificial intelligence methods used in the study. The study used surface roughness data obtained from CNC turning experiments taking into account cutting parameters and tool wear conditions. To facilitate comparison, KNN, RF and ExT algorithms were employed in the prediction process. A Stacking hybrid model incorporating a Linear Regression-based meta-learner was developed to leverage the strengths of these algorithms. The performance of the models was evaluated using determination (R²), mean square error (MSE), MAE and mean absolute percentage error (MAPE) metrics to comprehensively analyze accuracy and error levels. To enhance the model’s interpretability SHAP and LIME methods were employed to elucidate the decision-making processes at the global and local levels.
Dataset
This study utilized a dataset obtained from CNC turning experiments published on the Kaggle platform25. The dataset was originally generated within a Master’s dissertation conducted at the Competence Center in Manufacturing (CCM) of the Aeronautics Institute of Technology (ITA) and is based on controlled turning experiments performed on AISI H13 steel under cutting fluid conditions.
The overall experimental scheme and signal acquisition process are illustrated in Fig. 1 which presents the turning configuration the relative positioning of the cutting tool and workpiece and the acquisition of surface roughness and machining force signals. The structure of the experimental design and the applied machining conditions for both experiments are schematically summarized in Fig. 2.
The experimental campaign consisted of two distinct experiments: Experiment 1 (Exp1) and Experiment 2 (Exp2). Experiment 1 (Exp1) followed a full-factorial design of experiments (DoE: 3³) conducted under theoretically new tool conditions. The depth of cut (ap) was varied at three levels (0.25, 0.5 and 0.8 mm) the cutting speed (Vc) at three levels (310, 350 and 390 m/min) and the feed rate (f) at three levels (0.07, 0.11 and 0.13 mm/rev). Each experimental condition was replicated twice resulting in a total of 54 machining runs. A new cutting tool was used for each experimental group. After each machining run surface roughness measurements were performed at six different locations on the machined surface yielding a total of 324 observations.
Experiment 2 (Exp2) was designed to investigate the influence of tool wear on surface roughness and included the tool condition as an additional experimental factor. In this experiment the cutting speed (Vc) was kept constant at 350 m/min. The depth of cut (ap) was varied at two levels (0.25 and 0.5 mm) the feed rate (f) at four levels (0.07, 0.09, 0.11 and 0.13 mm/rev) and the tool condition (TCond) at three levels defined by flank wear width (VBB): new tool (0.0 mm) mid-life tool (0.1 mm) and end-of-life tool (0.3 mm). The cutting tools corresponding to these wear states were prepared in advance during a dedicated tool preparation phase. Exp2 consisted of 48 machining runs with two replicas and surface roughness was again measured at six locations per condition resulting in 288 observations.

Experiment scheme and signal acquired25.
In both experiments multiple surface roughness parameters were measured after each machining operation including arithmetic mean roughness (Ra), skewness (Rsk), kurtosis (Rku), mean width of profile elements (Rsm) and total height (Rt). Machining forces were recorded using a Kistler Type 9265B dynamometer connected to a Kistler Type 5070 charge amplifier and Dynoware 2825 A acquisition software. Although cutting forces were measured in three orthogonal directions only the resultant cutting force (F) was used as an input variable in the proposed prediction model. The experiments were carried out on AISI H13 steel workpieces with an average hardness of 200 HV using a ROMI E280 CNC turning center (maximum spindle speed: 4000 rpm; nominal power: 18.5 kW). A Sandvik Coromant ISO TNMG 16 04 04-PF 4425 insert mounted on an ISO MTJNL 2020 K 16M1 tool holder was employed. A water-based cutting fluid (Blaser Swisslube Vasco 7000, 8% concentration, pH ≈ 8) was applied during machining. Surface roughness was measured using a Mitutoyo Surftest SJ-210 and tool wear was evaluated using a Dino-Lite AM4113ZT digital microscope.
Although the present study does not involve newly conducted experimental trials the detailed description above together with the schematic representations in Figs. 1 and 2 provides a transparent and technically grounded experimental context for the dataset employed. Accordingly the primary contribution of this work lies in the development and evaluation of a hybrid machine learning-based surface roughness prediction model using a well-documented and systematically generated experimental dataset.

Prediction algorithms
KNN is a non-parametric algorithm that classifies or regresses examples based on their distance to their nearest neighbors. In regression the example to be predicted is determined by the average of the k nearest examples in the training set and in classification, it is determined by the majority of the k nearest examples. Despite its simple structure, KNN’s performance in high-dimensional datasets is sensitive to distance measures26. RF (random forest) is an ensemble learning method based on the aggregation of multiple decision trees. Each tree is trained using randomly selected subsets of features and samples and the results are combined to produce the final prediction. This structure reduces the risk of overfitting and effectively captures non-linear relationships27. Similar to RF the ExT algorithm uses multiple decision trees; however, node splits are performed with a higher degree of randomness. This completely random selection of split points reduces the model’s variance and improves its generalization ability. This method provides faster computation and a structure that is more resistant to overfitting28. Linear regression is a classic statistical method that uses a linear model to explain the relationship between a dependent variable and one or more independent variables. It offers the advantages of simplicity, interpretability and fast computation. However, it may be inadequate for complex, nonlinear relationships. In this study linear regression is employed as a meta-learner in the stacking approach to generate the final prediction based on the outputs of various base models29.
All machine learning models were trained within a supervised learning framework using a 5-fold cross-validation strategy to ensure robust performance estimation and to reduce the risk of overfitting. The hyperparameter settings of each model were explicitly defined based on preliminary experiments and cross-validated performance. For the KNN regression model the number of neighbors was set to k = 5, uniform weighting was applied and the Euclidean distance metric was used. This configuration allows the model to capture local patterns while maintaining prediction stability. For the RF regression model the number of trees (n_estimators) was set to 200 and the maximum tree depth was limited to 10 to control model complexity and avoid overfitting. Similarly the ExT regression model was trained using 200 trees with a maximum tree depth of 10, introducing additional randomness while maintaining controlled model growth. The outputs of the KNN, RF and ET models were combined using a stacking ensemble approach, where a linear regression model was employed as the meta-learner. This ensemble structure integrates the complementary strengths of individual base learners and enhances generalization performance. All hyperparameter values were selected based on cross-validated performance within the same 5-fold cross-validation framework, ensuring fair comparison, reproducibility and robustness of the proposed hybrid modeling approach.
Recommended stacking hybrid model
In the context of this study the term hybrid model refers to a learning framework that combines multiple heterogeneous machine learning algorithms within a stacking ensemble structure, as well as the integration of different types of input information for surface roughness prediction. Similar to hybrid approaches reported in the literature, which often merge complementary models or data sources to improve robustness and accuracy the proposed method integrates distance-based and tree-based learners and combines machining parameters with process-related features. This hybrid formulation enables the model to capture both local and global data patterns, leading to improved predictive performance and generalization. This study proposes a stacking-based hybrid ensemble learning model for reliable surface roughness estimation. The goal of stacking is to overcome the limitations of models used alone by combining the strengths of different algorithms30. In this context the KNN, RF and ExT algorithms were used as base learners. The outputs obtained from each model were then combined and fed into a linear regression-based meta-learner. This structure leverages KNN’s ability to capture local patterns, RF’s high accuracy based on decision trees and ExT’s randomness approach, which reduces the risk of overfitting, along with the simple, generalizable structure of linear regression. Thus the weaknesses of the individual models are balanced and stronger prediction performance is achieved. To construct the stacking ensemble, a 5-fold cross-validation scheme was adopted. In each fold the base learners were trained on the training folds and generated predictions for the validation fold, resulting in out-of-fold predictions for the entire training set. These predictions were subsequently used to train the linear regression meta-learner. This cross-validation-based stacking strategy prevents information leakage and improves the robustness of the final model.
Performance metrics
Four fundamental performance metrics were used to evaluate the models: The coefficient of R², MSE, MAE and MAPE. These metrics allow for an assessment of prediction accuracy from different perspectives.
R² shows the degree to which the model explains the total variance of the dependent variable. As R² approaches 1 the model’s explanatory power increases.
$${R}^{2}=1-\frac{{\sum}_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{{\sum}_{i=1}^{n}{\left({y}_{i}-{\stackrel{-}{y}}_{i}\right)}^{2}}$$
(1)
Here \({y}_{i}\)represents the actual values, \({\widehat{y}}_{i}\)the estimated values, \({\stackrel{-}{y}}_{i}\) the average value and n the number of observations.
MSE is the average of the squares of the differences between actual values and estimates.
$$MSE=\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}$$
(2)
The MAE is the average of the absolute values of the prediction errors. It is highly interpretable and directly indicates the magnitude of the errors on average.
$$MAE=\frac{1}{n}\sum_{i=1}^{n}\left|{y}_{i}-{\widehat{y}}_{i}\right|$$
(3)
MAPE allows errors to be expressed as a percentage, which enables comparisons between studies of different scales.
$$MAPE=\frac{100}{n}\sum_{i=1}^{n}\left|\frac{{y}_{i}-{\widehat{y}}_{i}}{{y}_{i}}\right|$$
(4)
Using these metrics together provides a comprehensive assessment of the models’ accuracy levels and error distributions. Specifically, R² indicates the model’s overall explanatory power, while MSE and MAE reveal absolute error magnitudes. MAPE highlights the proportional size of errors thereby enhancing industrial comparability.
SHAP
The SHAP method was used to make the Stacking hybrid model more interpretable. Based on the concept of Shapley values in game theory, SHAP calculates the contribution of each input to the model by distributing it fairly. Developed to demonstrate the basis of predictions in machine learning models, SHAP has become one of the most widely used explainable artificial intelligence tools in recent years. Using SHAP in this study ensures high accuracy and makes the decision-making mechanism transparent and interpretable.
LIME
Another method used to improve model interpretability is LIME. Short for “Local Interpretable Model-Agnostic Explanations,” LIME is a model-agnostic explanation technique that can work independently of any machine learning model. It explains a model’s prediction for a single observation at the local level. In other words, it shows which inputs influence the prediction in a given example. Thus, not only is the model’s decision mechanism understandable at the general (global) level, it is also understandable through individual examples.
The methodology of the study
Figure 3 shows the steps involved in developing the proposed hybrid model. First the data set obtained from CNC turning experiments was loaded and then preprocessing was performed. Unnecessary columns were removed from the dataset, missing or erroneous observations were checked and all numerical variables were converted to a standard format. At this stage the machine learning models utilize machining and process-related parameters as input variables, including cutting speed (vc), feed rate (f), depth of cut (ap) the resultant cutting force (F) and the interaction term between depth of cut and feed rate (ap·f). The target output variable of all models is the arithmetic average surface roughness (Ra), which is predicted based on the given input variables. Then, feature engineering was performed. To more accurately represent the relationships between cutting parameters the interaction term (ap f) between step depth (ap) and feed rate (f) was added to the dataset. This allowed the model to capture the complex interactions between the parameters. Furthermore, quantile-based stratification was applied to the target variable, surface roughness (Ra), to balance the distribution in the training-test split. Next the dataset was split into training and test sets. Scaling operations (normalization/standardization) were then performed to obtain more reliable results in the modeling process. This improved the performance of distance-based algorithms, such as KNN. Three different algorithms were defined as base learners in the modeling process: KNN, RF and ET. Since these algorithms have different learning logics, each can capture different dimensions of the data. The models’ outputs were combined using the stacking approach and final predictions were generated through a linear regression-based meta-learner. This structure mitigates the limitations of individual models while leveraging their strengths. During training the model ran up to 1,500 iterations to ensure a balance between loss and error metrics. Then the performance evaluation phase was conducted to calculate metrics such as R², MSE, MAE and MAPE. A paired t-test was then applied to test the statistical significance of the differences between the models. Finally, SHAP and LIME analyses were performed to evaluate the model in terms of both accuracy and interpretability. SHAP revealed which parameters contributed most to the predictions across the entire dataset and LIME explained the model’s decisions for individual examples. Figure 3 is therefore not just a flowchart, but rather, it presents a holistic methodology from data preparation to model development, statistical performance testing and ultimately, explanation of the decision-making process.

Analysis flow diagram of the proposed hybrid model.
Figure 3. Schematic representation of the proposed stacking-based hybrid machine learning framework for surface roughness prediction, illustrating the relationship between input variables, base learners with out-of-fold predictions the meta-learner and the final Ra output.
