Advanced hybrid intelligence assessment of nanomedicine delivery to various organs using machine learning and adaptive tree structured Parzen estimators

This study employs a systematic approach to applying ML models to calculate the biodistribution of NPs to different sites. This process begins with the preparation of the data (collected from)³), handling missing values, encoding categorical variables, identifying outliers, and standardizing numerical features to ensure data quality and model performance. Several ML models, including EN, KNN, and PR, were applied individually or in combination using ensemble methods such as boost. HyperParameter optimization was performed using ATPE (Adaptive Tree Structure Parzen Estimator) to improve model parameters and improve prediction accuracy.

The model was chosen for its clear yet complementary advantage. EN deals with multicollinearity and feature selection, KNN captures nonlinear relationships, and PR models capture complex patterns through polynomial terms. The boost ensemble method was chosen to improve performance by combining the strengths of these models. The ATPE optimizer was chosen for its efficiency in exploring large and complex hyperparameter spaces, providing faster convergence through dynamically balanced search and exploitation. These selections ensure a robust and accurate approach to modeling the biodistribution of NPs in this study.

Data preparation

Effective data preparation is a critical stage in the machine learning process as it directly affects the performance and reliability of the model created. This phase involves many preprocessing activities with the aim of converting raw data into the appropriate format for analysis. This task includes managing missing values, encoding categorical variables, identifying and refinement of outliers, and standardized numerical functions. Meticulous data preparation ensures that machine learning models can efficiently acquire knowledge from existing information, increasing the ability to make accurate predictions and generalize to new data.^13,14.

Assigning Missing Values: Assigning Missing Values replaces missing data with estimates. This allows the model to use all the data without missing entries. This can bias predictions and cause model errors. Here, we used mean assignment to replace missing values with observed averages of features. This method works best with outliers-free numerical data.
Leave-one-Out (LOO) encoding for categorical data: Leave-one-out (LOO) encoding is a technique for converting categorical variables to numbers, allowing machine learning models to process more efficiently. LOO encoding differs from traditional encoding methods such as one-hot encoding, which can result in higher dimensions and sparse data, as it seeks to maintain the predictive power of categorical variables while reducing dimensions. LOO encoding replaces each category value with the average of the target variable. This is calculated using all data points except those that are encoded. This method helps prevent overfitting by reducing the bias that can arise from directly using the target variables and ensuring that each encoded value is not overly affected by the specific data point being encoded¹⁵.
Outlier detection: Outlier detection is an important phase of data preparation as it identifies data points that diverge dramatically from key trends within the dataset. These outliers can distort statistical analysis and negatively affect the effectiveness of machine learning algorithms. The connection-based outlier factor (COF) method improves the local outlier factor (LOF) algorithm by addressing the limitation in which low density points are always considered outliers. Unlike LOF, COF highlights the connection patterns of neighboring data points, thereby providing a subtle evaluation of outlier status¹⁶. The COF score is calculated using the following ratios:

$$\:{\text{cof}}_{k}\left(p\right)=\frac{\left|{\mathcal{n}}_{\mathcal{k}}\lef t(p\right)\right|\cdot\:{\text{ac-dist}}}_{\mathcal{n}}_{\mathcal{k}}\left(p\right)}\left(p\right)}\left(p\right)}{{\sum\:}_{o\in\:{\mathcal{n}}_{\mathcal{k}}\left(p\ride)}{\text{ac-dist}}_{{\mathcal {n}}_{\mathcal {k}}\left(o \right)}\left(o \right)}$$

here, $\:{\text{ac-dist}}_{{\mathcal{n}}_{\mathcal{k}}}\left(p\right)}\left(p\right)}\left(p\right)={\sum\:}_{i=1}^{k-1}\frac{2\left(ki\right)}{k\left(k-1\right)}\text{dist}\left({e}_{i}\right)$ Represents the average chain distance from the point p for that k Nearest Neighbor $\:{\mathcal {n}}_{\mathcal {k}}\left(p\right)$and $\:{e} _{i} $ Indicates the relevant edge. This approach ensures a more comprehensive assessment of outliers by incorporating structural connectivity between data points and improving the robustness and reliability of the detection process.

MIN-MAX Scaling for Normalization: This scaling approach is a normalization method that converts numerical input data into a given range, usually between 0 and 1. This procedure involves changing the scale of each feature individually by subtracting the minimum value and dividing it by a range that is the variance between the maximum and minimum values. Therefore, the resulting values for each function are within the desired range, making it easier for the model to interpret and compare.¹⁷. Normalization with MIN-MAX scaling is particularly advantageous when features have different units or scales, thus ensuring an equal contribution of all features to machine learning models. By rescaling the data, this technique prevents features with a large numerical range that dominates learning algorithms that improve model convergence and performance, particularly for gradient-based optimization algorithms.

Palzen Estimator (ATPE) for Adaptive Tree Structures

The Tree-structured Parzen Estimator (TPE) is an advanced extension of Bayesian optimization, particularly suitable for hyperparameter tuning in ML. The term “tree structure” reflects the ability to manage conditional parameters, providing a search space for tree structures¹⁸. TPE is a global optimization algorithm that uses sequential modeling. It overcomes the specific limitations of traditional Bayesian optimization approaches by demonstrating resilience in managing conditional constraints such as maximum depth and learning speed of tree-based models. TPE optimizes the hyperparameter search process by modeling distributions $\:p \left(x | y \right)$ and $\:p \left(y \right)$reduces calculation overhead. The previous distribution of each parameter is represented by a truncated Gaussian mixed model. This is updated repeatedly based on new observations.¹⁹.

The Parzen Estimator (ATPE) of Adaptive Tree Structure is an extended version of the TPE designed to improve the efficiency of hyperparameter optimization. In contrast to TPE, this relies on static methodology with a given tactic to manage exploration and exploitation, incorporating adaptation mechanisms into ATPE. This dynamic adjustment based on search progression converges faster by optimizing the balance between exploration and exploitation. Therefore, ATPE is considerably more efficient in larger and more complex search spaces, especially compared to traditional TPE.

Boost

The boost approach is designed to increase the robustness of the ML model by integrating several weak learners into a more accurate and robust model. The key idea behind boost is to focus more on instances of false predictions, allowing the model to learn from the mistakes and improve iteration. This process leads to models that tend to be more accurate and generalizable.

Adaboost, or Adaptive Boosting, is one of the popular boost algorithms²⁰ This can be used to improve ML performance for data analysis and regression. AdaBoost works by training a set of weak models, often decision trees, where each model is weighted according to the performance of the training data. Specifically, adjust the weights of the incorrect model instances to make these instances more important in the next iteration. The final model is the weighted combination of these weak models. This allows for much stronger predictive performance compared to individual models.^{twenty one}.

This study shows the potential for Adaboost's ability to combine weak learners to make accurate predictions about the biodistribution of nanoparticles in various organs to improve the accuracy of drug delivery systems. By repeatedly improving predictions, drying agents can help fine-tune the targeting strategy of nanoparticles, thereby optimizing delivery to specific organs such as tumors, liver, and heart. This adaptability of this worship could lead to more personalized and efficient treatment strategies for practical medical applications, including cancer treatment.

Adaboost's success in predicting drug delivery efficiency across a variety of organs highlights the broader applicability of boost technology in optimizing drug delivery systems. The ability to process complex, higher dimensions of biomedical data positions Adaboost as a powerful tool in the development of targeted therapies that can reduce side effects and enhance treatment outcomes.

Base model

K-Nearest Neighbors (KNN) is a non-parametric algorithm used to estimate dependent variables y Based on the value of k The closest training sample in functional space^22,23. The mathematical form of KNN can be expressed as follows:

$$\:\widehat{y}\left(x\right)=\frac{1}{k}{\sum\:}_{{x}_{i}\in\:{n}_{k}\left(x

In this formulation, $\:{y} _{i} $ represents the target value of I– Nearest Neighbor $\:{x} _{i} $ Inside the set $\:{n}_{k}\left(x\right)$It consists of k Query Point's Nearest Neighbor x.

The key hyperparameters for KNN are:

kthe number of nearest neighbors used to make predictions.
p,The parameters used in Minkowski distance metrics calculate the distance between points.

Elastic Net (EN) regression combines both L1 (Lasso) and L2 (Ridge) regularizations to effectively handle correlation functions. EN aims to balance the benefits of both normalization methods to improve model performance and feature selection^{twenty three}. The elastic net objective function is defined as^24,25:

$$\:\anderset{{\upbeta\:}}{\text{min}}\left(\frac{1}{2n}{\sum\:}_{i=1}^{n}{{\left({y}_{i}-\widehat{y}\left({x}_{i}\right)\right)}^{2}+{\upalpha\:}{\uprho\:}{\sum\:}_{j=1}^{p}\left|{\upbeta\:}}_{j}\right|+{\upalpha\:}\left(1-{{{\uprho\: \:}\right)\frac {1}{2}{\sum \:}_{j=1}^{p}{{\upbeta \:}}_{j}^{2}\right)$$

term $\:\frac {1}{2n}{\sum \:}_{i=1}^{n}{\left({y}_{i} -\widehat {y}\left({x}_{i}\rig) It's an MSE component \(\:{\upalpha\:}{\uprho\:}{\sum\:}_{j=1}^{p}\left | {{\upbeta\:}}_{j}\right | $Represents the L1 normalization term \(\:{\upalpha\:}\left(1-{\uprho\:}\right)\frac{1}{2}{\sum\:}_{j=1}^{p}{{{\upbeta\:}}}}} {{{\upbeta Represents an L2 normalization term.

Enletions are especially useful when working with highly correlated datasets, as they allow you to select a subset of features and effectively manage multicollinearity.

Polynomial regression (PR) extends linear regression by using polynomial terms to model more complex relationships. Unlike linear regression that fits straight lines, polynomial regression fits degree polynomials d In the data²⁶.

Polynomial regression is expressed as $\:\widehat{y}\left({x}_{i};{{\upbeta\:}}_{0}), {{\upbeta\:}}_{d1}, \dots\:, {{\upbeta\:}_{dj};{\uptheta\:}\right)$ It is expressed as:

$$\:\widehat{y}\left(x\right)={{\upbeta\:}}_{0}+{\sum\:}_{j=1}^{p}\left({{\upbeta\:}}_{x}_{x}}_{j}+{{\upbeta\:}}_{2j}_{x}_{j}^{2}+\cdots\:+{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{

In this equation, $\:{\beta\:}_{0}, {\beta\:}_{1j}, {\beta\:}_{2j}, \dots\:, {\beta\:}_{dj}$ Coefficients estimated from the data, $\:\epsilon $ Represents the error term or residual, and degree d Among polynomials, they are important hyperparameters that affect the complexity of the model and the ability to fit the data.

The most frequently used objective function in PR is mean square error (MSE). This calculates the mean squared difference between the observed and projected values. degree d It affects the flexibility of the model and allows for more complex fitting at the risk of overfitting training data.

Source link