Decision Tree (DT)
DT regression serves as the ML model used to predict continuous numerical values. This algorithm constructs a hierarchical model that divides the dataset into segments derived from the values of a particular feature. This process generates decision rules that can be used to predict the target variablestwenty one.
DT regression involves creating a decision tree through iterative splitting of the dataset, using values from different functions as the criteria for splitting. At all internal nodes, data is split by selecting a feature and its corresponding threshold. The criteria for splitting are generally determined by metrics such as MSE and variance reduction. The procedure continues until the preset stop point has been reached, such as reaching the deepest point or filling a certain number of instances at the leaf or terminal node.twenty two.
Random Forest (RF)
RF uses voting to improve performance for learners, which are made up of multiple base trees. RF is particularly preferred because of its ability to predict numerous outcomes with minimal parameters. It is a popular choice because it excels at accurately handling high-dimensional feature spaces and small sample sizes. Furthermore, its parallelism allows for efficient processing of large systems in real applications.23,24.
To construct the RF model, the original dataset is split into n-instance sets using bootstrap techniques. An unmatched regression tree is generated for each bootstrap sample. Instead of using all available predictors, a random selection of K-based models is selected to perform the splitting function. This process is repeated until the desired property is achieved. Currently unobserved data is estimated by combining predictions from multiple C-trees. RF achieves the diversity of the largest tree and the variance of the smallest model by constructing decision trees from different training subsets. The RF regression predictor is expressed by the equationtwenty five:
$$\:\widehat{{f}_{rf}}\left(x\right)=\frac{1}{c}{\sum\:}_{i=1}^{c}{t}_{i}\left(x\right)$$
where C indicates the number of decision trees and x indicates the data points. Model \(\:{t}_{i}\left(x\right)\) Means a unique decision tree.
RF facilitates out-of-bag (OOB) error estimation by analyzing unselected instances during the bagging process. This approach provides a fair estimate of generalization errors without relying on external data. Additionally, each input variable is assigned a significant score. The model calculates the average performance reduction when a single input variable changes, while modifying all other variables without changing them.
Very Randomized Trees (ET)
ET is also known as Extra Trees, and Geurts et al. was introduced by.26 It belongs to the family of decision tree-based models. Since its inception, ET has been seen as a strengthening iteration of the RF model. However, the prominent distinction between the two has been highlighted by various researchers. First, ET builds the tree using all training patterns with different parameters, while RF employs a bagging procedure. Second, the RF model selects the optimal split, while the ET randomly selects node splits at the structural stage of each tree. In particular, ET models demonstrate significant improvements, particularly with slight increase in the bias of the predictive model, all maintaining low computational costs. From a mathematical perspective, ET consists of a set of decision trees (t), each tree (\(\:t \in \:1 \ dots \:t \)) utilize all training patterns individually during the training process to build a decision tree.
Let's show the set of training patterns as follows: \(\:\mathcal{d}=\{\left({x}_{1},{y}_{1}\right)\dots\:,\left({x}_{n},{y}_{n}\righ)\}\)where \(\:{x} _{i} \) Represents a functional vector \(\:{y} _{i} \) Represents the corresponding target value. For each tree tET utilizes the entire dataset \(\:\mathcal {d} \) Build a decision tree using various parameters.
Histogram-based gradient boost regression (HBGB)
HBGB is a powerful ensemble learning technique that combines the principle of gradient boost and histogram-based algorithm for efficient calculations. Unlike traditional gradient boosts, which typically utilize decision trees as the basic learner, HBGB constructs a histogram of feature values to accelerate the training process while maintaining prediction accuracy.27,28. HBGB minimizes loss function \(\:l \left(y,f \left(x \right)\right)\)where \(\:y \) It's true value \(\:f \left(x \right)\) Prediction output starts with an initial estimate \(\:{f}_{0} \left (x \right) = \text {mean} \left(y \right)\). For each iteration, the regression tree \(\:{h}_{m}\left(x\right)\) Fits negative slope of loss and the forecast is updated as follows: \(\:{f}_{m}\left(x\right)={f}_{m-1}\left(x\right)+{\upeta\:}\cdot\:{h}_{m}\left(x\)\)where \(\:{\upeta \:} \) Learning rate. Histogram binning accelerates tree construction by grouping continuous features into discrete bins, significantly reducing calculation and memory usage. This process continues until a predefined number of iterations is reached or performance is no longer significantly improved.
HBGB incorporates a histogram-based approach to enhance computational efficiency and scalability, while effectively taking advantage of the benefits of gradient boost. HBGB improves predictive performance by repeatedly fitting the regression tree to the negative gradient of the loss function. This makes it suitable for regression tasks on large datasets.
Half in a row
The successive half represents an iterative approach aimed at effectively navigating the hyperparameter space, ensuring performance of the peak model, ensuring the identification of the optimal combination.29,30. Half of the consecutive processes involve the following steps:
-
1.
Initialization: For each model, a set of hyperparameter configurations is defined first. These configurations include many hyperparameter combinations that need to be evaluated during tuning.
-
2.
Training and Evaluation: In some training sets, each model is trained with an initial hyperparameter configuration. The validity of all models is then evaluated with appropriate metrics, such as mean square error or cross-validation accuracy.
-
3.
Selection: Models that exhibit excellent performance are single-outed based on evaluation metrics. Low-performance models are discarded and only holds a subset of models that exhibit top-notch performance.
-
4.
Half: The amount of model will gradually decrease by eliminating some of the remaining models. This percentage is determined by a pre-established half coefficient. The model that is not excluded proceeds to the next iteration.
-
5.
Iteration: Continue steps 2-4 until one model is still there. With well-tuned hyperparameters, this model is considered the best model.
This approach optimizes the investigation of the hyperparameter space and concentrates computational resources on the most effective combinations. This systematic approach streamlines the identification of hyperparameters that improve performance of boost models31.
To optimize the ML model, a continuous harving algorithm was adopted as an efficient hyperparameter optimization technique. The configuration used included an initial pool of 64 candidate configurations per model, a factor of 2, and a maximum iteration limit of 6 rounds. Each candidate was trained and validated using a 5x cross-validation. In each round, only the top 45% of candidates (based on performance) were retained for the next iteration, effectively focusing resources on the most promising configuration.
The objective function for optimization was to maximize the average R² score of five times. This metric is chosen because of its ability to reflect the percentage of variance explained by the model and is suitable for assessing regression performance.
The hyperparameter search space for each model is summarized in Table 2. Ranges were selected based on previous experiments and model-specific properties.
After hyperparameter tuning, all models showed measurable improvements in predictive performance. The DT model showed a 9.6% increase in R², a 4.1% improvement in RF, a 2.7% improvement in ET, and a 3.3% increase in HBGB. These results highlight the continuous half effectiveness in improving model performance, while reducing computational overhead compared to thorough search techniques.
