Correlation of raloxifene solubility in supercritical CO2 and drug solubility via hybrid machine learning and gradient-based optimization.

Gradient-based optimization

Gradient-based optimization is a powerful technique used to adjust the hyperparameters of various ML methods, including regression models. The concept of gradient is used to provide information on the direction and magnitude of the steepest descent of the objective (fitness) function. By calculating the derivatives relative to the hyperparameter, the optimization procedure aims to gradually improve its value, reduce the objective function, and determine the most effective hyperparameter configuration.^17,18.

In gradient-based optimization, the process begins by specifying an objective function that measures how well a given model runs with a given set of hyperparameters. This objective function is usually a measure of error or loss, such as an MSE or MAE. In fact, the derivatives of objective functions for hyperparameters can be obtained through methods such as automated differentiation and backpropagation. These gradients indicate the direction in which the hyperparameters are updated to reduce the objective function¹⁹.

It is important to note that gradient-based optimization requires that the objective function be differentiable for the hyperparameter. Additionally, care must be taken to avoid overfitting with local minimums during the optimization process. To alleviate these problems, techniques such as regularization and early halt can be adopted.

Gradient-based optimization of hyperparameters provides a systematic and efficient approach to finding the optimal hyperparameter values for regression models. The force of gradients can be used to guide searches in hyperparameter spaces, leading to improved model performance and improved generalization to invisible data. The procedure is shown in Figure 2¹⁹.

In this study, gradient-based optimization was implemented using Adam Optimization Algorithm and was selected for adaptive learning rates and efficiency of non-convex optimization tasks. The objective function was defined as the mean R-squared (R²) score from a 3x cross-validation and was maximized to ensure both model accuracy and generalizability. The Adam Optimizer consisted of a learning rate of 0.0012, Beta1 = 0.85, Beta2 = 0.98, and Epsilon = 1E-8, with early halt after 10 iterations without improvement in R² and repeated epochs of up to 130.

Decision Tree (DT)

DT regression estimates the target by gradually dividing the dataset into smaller subsets in a hierarchical way, using each department based on the values of the independent variables. For each split in the tree, the algorithm selects a predictor that produces the greatest reduction in variance, reflecting the degree of target heterogeneity. For example, if the leaf nodes are fewer than the smallest sample specified, partitioning proceeds recursively until the end condition is met.²⁰. The DT regression algorithm generates hierarchical models presented in a tree-based format. This can be used to predict outcomes for inconspicuous data. The iterative process of the model involves passing through the tree structure by evaluating the independent variable values each time a new data point is displayed, until it reaches the terminal node.^{twenty one}.

The DT algorithm shows resilience to outliers and missing data, and has the ability to effectively handle both categorical and continuous variables. The risk of overfitting in this way is essentially mitigated by adjusting the hyperparameters.²⁰. Establishing a correlation between dependent and independent variables can be achieved through the use of decision tree regression, a valuable technique for predicting continuous target variables. This can be achieved by visualizing the regression above.

Random Forest (RF) and Extra Trees (ET)

RF models are robust members of the model's ensemble learning category. The model in question shows flexibility and ease of use, enabling applications with a variety of regression and classification tasks^22,23. The current discourse concentrates on the random forest model used to assign regressions.

The RF model consists of an ensemble of DTSs that cooperate to generate prognosis. The DT methodology follows the repeated division of the dataset into smaller subsets over time, taking advantage of the ability to demonstrate the best discriminant power for the target variable. The RF model extends this concept by generating a large number of decision trees and combining the results to formulate the ultimate prediction.^24,25.

The equation for prediction of RF models is²⁶:

$$\:f\left(x\right)=\frac{1}{m}{\sum\:}_{m=1}^{m}t\left({{\uptheta\:}}}_{m}\right)$$

where $\:f \left(x \right)$ Means the predicted target value of the input vector $\:x $, $\:m $ It represents the amount of DT in the forest, $\:{\uptheta \:}} _{m} $ Represents the parameters of the tree $\:m $and $\:t\left(x,{{\uptheta\:}}_{m}\right)$ Represents the prediction of the tree $\:m $ For input vectors $\:x $ and parameters $\:{\uptheta \:}} _{m} $. The output of the model is determined by taking average predictions from all decision trees in the forest.

In this equation, the prediction of the DT is determined by navigating the tree structure from the start node to the final node. All internal nodes form a selection using the values of the input parameters ($\:{x} _{i} $) from the input vector ($\:x $), leads the traversal to the left or right child nodes based on the outcome of the decision. A prediction is generated by assigning the average target value for the training samples belonging to that particular leaf node at each final node (leaf node). parameter($\:{\uptheta \:}} _{m} $) Each decision tree is derived from a random subset of bootstrap samples of training data and input parameters.

The ET algorithm is similar to RF, but injects extra randomness into the tree building process, often leading to an ensemble with even greater diversity among members. This diversity contributes to the robustness of the model and can improve performance on regression tasks^27,28.

Gradient boost (GB)

GB is another popular ensemble learning technique that uses decision trees as a weak model. It is known for its ability to achieve high prediction accuracy by combining multiple weak models continuously. In this discussion, we will focus on DB using DT as a weak model for the regression task. The GB model is constructed by gradually expanding the ensemble in the decision tree. Here, all subsequent trees are instructed to correct the error. This iterative process allows the model to learn complex relationships and capture fine-grained patterns within the data.²⁹.

In GB of DTS, the weak model is a shallow decision tree, often referred to as a “stump” or “shallow tree.” These trees have a small number of levels and are trained to make predictions based on a subset of input functions.^30,31. Unlike RF and ET, where each tree is built independently, in GB, trees are constructed successively, and each tree learns from the mistakes of its predecessor.

The predictive equation for GB using a decision tree is³²:

$$\:f\left(x\right)={\sum\:}_{m=1}^{m}{\gamma\:}_{m}*t\left(x,{\theta\:}_{m}\right)$$

In this equation, f(x) denotes the predicted target value of the input vector x, and m denotes the total count of dts. $\:{\gamma \:} _{m} $Means the contribution (or weight) of the tree m. $\:{\ theta \:} _ {m} $ Represents the parameters of the tree m $\:t\left(x,{\theta\:}_{m}\right)$ Corresponds to predictions created by tree M for input vector x using parameters $\:{\ theta \:} _ {m} $. The final solution is determined by combining predictions from all decision trees. Each prediction is multiplied by its corresponding weight, $\:{\gamma \:} _{m} $and it is totaled.

In prediction, input vectors $\:x $ Once the path is determined by the value of a particular feature, it travels through each decision tree from the root to the leaf $\:\left({x}_{i}\right)$. For all internal nodes, the value of the selected feature determines which branch the traversal is. Finally, a leaf node generates a prediction by assigning the output value associated with that leaf node³³.

Source link

创建Binance账户 commented on AI jobs in financial services: $350k for junior hires: Your article helped me a lot, is there any more re
1win commented on Do AI apps really need a GPU or NPU?: Saved as a favorite, I really like your website!
m777 commented on Create the content you envision: Everything is very open with a precise description
discount كازينو commented on Apple to process data from AI apps in virtual black box: Hi there! I know this is kinda off topic but I'd f
binance konto commented on Artificial Intelligence May Be Coming For Your Job: Your point of view caught my eye and was very inte

Correlation of raloxifene solubility in supercritical CO2 and drug solubility via hybrid machine learning and gradient-based optimization.

Gradient-based optimization

Decision Tree (DT)

Random Forest (RF) and Extra Trees (ET)

Gradient boost (GB)

Leave a Reply

RECENT POSTS

Lanai launches token tuner to turn AI token spend into measurable business value

Airbus and BMW sign deal with France’s Mistral to bring AI to defense and safety systems

Sprinklr adds video-native AI to CXM platform with ViralMoment acquisition

Gradient-based optimization

Decision Tree (DT)

Random Forest (RF) and Extra Trees (ET)

Gradient boost (GB)

Related Posts

Leave a Reply