Correlation of raloxifene solubility in supercritical CO2 and drug solubility via hybrid machine learning and gradient-based optimization.

Machine Learning


Gradient-based optimization

Gradient-based optimization is a powerful technique used to adjust the hyperparameters of various ML methods, including regression models. The concept of gradient is used to provide information on the direction and magnitude of the steepest descent of the objective (fitness) function. By calculating the derivatives relative to the hyperparameter, the optimization procedure aims to gradually improve its value, reduce the objective function, and determine the most effective hyperparameter configuration.17,18.

In gradient-based optimization, the process begins by specifying an objective function that measures how well a given model runs with a given set of hyperparameters. This objective function is usually a measure of error or loss, such as an MSE or MAE. In fact, the derivatives of objective functions for hyperparameters can be obtained through methods such as automated differentiation and backpropagation. These gradients indicate the direction in which the hyperparameters are updated to reduce the objective function19.

It is important to note that gradient-based optimization requires that the objective function be differentiable for the hyperparameter. Additionally, care must be taken to avoid overfitting with local minimums during the optimization process. To alleviate these problems, techniques such as regularization and early halt can be adopted.

Gradient-based optimization of hyperparameters provides a systematic and efficient approach to finding the optimal hyperparameter values ​​for regression models. The force of gradients can be used to guide searches in hyperparameter spaces, leading to improved model performance and improved generalization to invisible data. The procedure is shown in Figure 219.

In this study, gradient-based optimization was implemented using Adam Optimization Algorithm and was selected for adaptive learning rates and efficiency of non-convex optimization tasks. The objective function was defined as the mean R-squared (R²) score from a 3x cross-validation and was maximized to ensure both model accuracy and generalizability. The Adam Optimizer consisted of a learning rate of 0.0012, Beta1 = 0.85, Beta2 = 0.98, and Epsilon = 1E-8, with early halt after 10 iterations without improvement in R² and repeated epochs of up to 130.

Figure 2
Figure 2

Gradation-based optimization of hyperparameter workflows.

Decision Tree (DT)

DT regression estimates the target by gradually dividing the dataset into smaller subsets in a hierarchical way, using each department based on the values ​​of the independent variables. For each split in the tree, the algorithm selects a predictor that produces the greatest reduction in variance, reflecting the degree of target heterogeneity. For example, if the leaf nodes are fewer than the smallest sample specified, partitioning proceeds recursively until the end condition is met.20. The DT regression algorithm generates hierarchical models presented in a tree-based format. This can be used to predict outcomes for inconspicuous data. The iterative process of the model involves passing through the tree structure by evaluating the independent variable values ​​each time a new data point is displayed, until it reaches the terminal node.twenty one.

The DT algorithm shows resilience to outliers and missing data, and has the ability to effectively handle both categorical and continuous variables. The risk of overfitting in this way is essentially mitigated by adjusting the hyperparameters.20. Establishing a correlation between dependent and independent variables can be achieved through the use of decision tree regression, a valuable technique for predicting continuous target variables. This can be achieved by visualizing the regression above.

Random Forest (RF) and Extra Trees (ET)

RF models are robust members of the model's ensemble learning category. The model in question shows flexibility and ease of use, enabling applications with a variety of regression and classification tasks22,23. The current discourse concentrates on the random forest model used to assign regressions.

The RF model consists of an ensemble of DTSs that cooperate to generate prognosis. The DT methodology follows the repeated division of the dataset into smaller subsets over time, taking advantage of the ability to demonstrate the best discriminant power for the target variable. The RF model extends this concept by generating a large number of decision trees and combining the results to formulate the ultimate prediction.24,25.

The equation for prediction of RF models is26:

$$\:f\left(x\right)=\frac{1}{m}{\sum\:}_{m=1}^{m}t\left({{\uptheta\:}}}_{m}\right)$$

where \(\:f \left(x \right)\) Means the predicted target value of the input vector \(\:x \), \(\:m \) It represents the amount of DT in the forest, \(\:{\uptheta \:}} _{m} \) Represents the parameters of the tree \(\:m \)and \(\:t\left(x,{{\uptheta\:}}_{m}\right)\) Represents the prediction of the tree \(\:m \) For input vectors \(\:x \) and parameters \(\:{\uptheta \:}} _{m} \). The output of the model is determined by taking average predictions from all decision trees in the forest.

In this equation, the prediction of the DT is determined by navigating the tree structure from the start node to the final node. All internal nodes form a selection using the values ​​of the input parameters (\(\:{x} _{i} \)) from the input vector (\(\:x \)), leads the traversal to the left or right child nodes based on the outcome of the decision. A prediction is generated by assigning the average target value for the training samples belonging to that particular leaf node at each final node (leaf node). parameter(\(\:{\uptheta \:}} _{m} \)) Each decision tree is derived from a random subset of bootstrap samples of training data and input parameters.

The ET algorithm is similar to RF, but injects extra randomness into the tree building process, often leading to an ensemble with even greater diversity among members. This diversity contributes to the robustness of the model and can improve performance on regression tasks27,28.

Gradient boost (GB)

GB is another popular ensemble learning technique that uses decision trees as a weak model. It is known for its ability to achieve high prediction accuracy by combining multiple weak models continuously. In this discussion, we will focus on DB using DT as a weak model for the regression task. The GB model is constructed by gradually expanding the ensemble in the decision tree. Here, all subsequent trees are instructed to correct the error. This iterative process allows the model to learn complex relationships and capture fine-grained patterns within the data.29.

In GB of DTS, the weak model is a shallow decision tree, often referred to as a “stump” or “shallow tree.” These trees have a small number of levels and are trained to make predictions based on a subset of input functions.30,31. Unlike RF and ET, where each tree is built independently, in GB, trees are constructed successively, and each tree learns from the mistakes of its predecessor.

The predictive equation for GB using a decision tree is32:

$$\:f\left(x\right)={\sum\:}_{m=1}^{m}{\gamma\:}_{m}*t\left(x,{\theta\:}_{m}\right)$$

In this equation, f(x) denotes the predicted target value of the input vector x, and m denotes the total count of dts. \(\:{\gamma \:} _{m} \)Means the contribution (or weight) of the tree m. \(\:{\ theta \:} _ {m} \) Represents the parameters of the tree m \(\:t\left(x,{\theta\:}_{m}\right)\) Corresponds to predictions created by tree M for input vector x using parameters \(\:{\ theta \:} _ {m} \). The final solution is determined by combining predictions from all decision trees. Each prediction is multiplied by its corresponding weight, \(\:{\gamma \:} _{m} \)and it is totaled.

In prediction, input vectors \(\:x \) Once the path is determined by the value of a particular feature, it travels through each decision tree from the root to the leaf \(\:\left({x}_{i}\right)\). For all internal nodes, the value of the selected feature determines which branch the traversal is. Finally, a leaf node generates a prediction by assigning the output value associated with that leaf node33.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *