Dataset
In this study, we modeled a data set consisting of 24 rows of information regarding the solubility of FBX (Febuxostat) drugs at various temperatures and pressures. Table 1 shows the entirety of all this information as input values. The dataset is obtained from Abourehab et al.19 FBX solubility measurements in Supercritical Co have been reported2 The system pressure was set between 120 and 270 bar, even at four temperature levels: 308, 318, 328, and 338 K. The data is also used by Hani et al.3 Construction of ML models and analysis of FBX solubility behavior.
The overall modeling workflow for this study is shown in Figure 1, with its components introduced in the following subsections. The first step in the modeling process is normalization of the data and splitting the data into training and testing. Normalization prevents excessive effects of either of the two input parameters due to the high range of variation in the final output. The model introduced in detail is optimized using the HHO algorithm in terms of hyperparameters to obtain the final model. Finally, the acquired models (both single mode and voting mode) are tested with test data not used during the training phase.
Implementation of machine learning regression models and visualizations is Python Programming language (3.8 version) andSklearn”, “”numpy“, and “matplotlib”The library of this research.

Overall workflow of this study for estimating FBX solubility in SCCO2.
k-nearest Neighbors Regression (KNN)
The basis of this approach is the distance metrics used to split the dataset into manageable chunks. Parameters that affect model performance were considered before the application proceeded. Therefore, these parameters can be optimized to increase the efficiency of the model. This model is optimized considering the number of neighbours, point weights, distances, and P parameters related to the Minkowski function. Grouping dependent variables or response data into subsets is indicated by nearby numbers. Uniform and weighted modes are established for the model's point weights, the latter being based on the distance between the points. The KNN model assigned equal importance to each data point, while the distance model gave more weight to nearby points. In addition to the standard Euclidean distance (D0), several different measurements were utilized to quantify the distance. Parameter optimization was performed using the relationships of Euclidean, Manhattan, Minkovsky and Chebyshev in subsequent equations20.
Gaussian Process Regression (GPR)
Gaussian process regression is an effective approach to modeling, exploring and exploiting unknown featurestwenty one. As a relatively new statistical ML algorithm within the context of Bayesian formulations, Gaussian Process Regression (GPR) has recently attracted attention for its use in modeling due to its ability to employ stochastic regression in the determination of hyperparameters in multidimensional, small and nonlinear data sets.twenty two.
The term “Gaussian process” (GP) is used to describe a set of variables. Some of them show Gaussian distributions and what is not23,24. Suitable measurements of GP quality include mean, covariance, and function. This procedure expands the Gaussian distribution (GDS). As part of this procedure, we use matrices and vectors to represent covariances and mean respectively.twenty five.
Voting Return
In this study, in addition to using standalone GPR and KNN models separately, we used a combined voting model by combining these two models as novel aspects of this paper. This model is actually constructed ng GPR model and nk After all, the KNN model is the average output of all these ng + nk Model. Therefore, this model requires tuning in addition to the hyperparameters of the model. ng and nk value.
HHO Algorithm
Hyperparameter optimization is one of the key activities in modeling physical systems with the help of ML models, but a wide range of optimizers have been developed, including metaheuristic methods.26,27. Another original aspect of this study is that the HHO method is involved in this optimization.
The natural behavior of the Harris Hawks as prey and predators served as an inspiration for the HHO algorithm, a bionic optimization algorithm. Perfect for understanding tricky optimization problems28,29. This algorithm takes into account the dynamics of the scene and the possibility of prey escape. Harris Hawk behaviors such as foraging and scouting have been modelled using a variety of position update strategies. A hybrid of herd and non-gradient optimization, the HHO algorithm consists of three phases: exploration, transition from exploration to exploitation, and exploitation.30. The adaptable structure allows for easy determination of initial regulating parameters. Previous research28 A detailed comparative study was conducted between HHO and 12 other optimization methods (GA, PSO, DE algorithm, etc.) using 29 benchmark functions.
