One-Year Individual Prediction of Mental Health Deterioration Using an Adaptive Learning Algorithm: A Multicenter Breast Cancer Prospective Study

Study population

The BOUNCE study was conducted in four European countries (Finland, Italy, Israel, and Portugal) to assess the psychosocial resilience of BC patients during the first 18 months after diagnosis by measuring It is intended to be evaluated as a function of academic and lifestyle. , and medical variables (disease and treatment related) (H2020 EU project BOUNCE GA no. 777167; see https://www.bounce-project.eu/ for more information). The study enrolled her 706 women between March 2018 and December 2019 according to the following criteria: Treatment of BC; (ii) Exclusions: history or active severe psychiatric disorder (major depression, bipolar disorder, psychosis), distant metastases, history or treatment of other malignancies within the last 5 years, 12 or more Other significant concomitant illness diagnosed within months, major surgery – severe illness or trauma within 4 weeks prior to study entry, or lack of complete recovery from the effects of surgery, pregnancy or breast-feeding. The BOUNCE study is a long-term observational study involving seven measured waves.Baseline (performed 2–5 weeks after surgery or biopsy; [M0], and then at 3-month intervals (M3, M6, M9, M12, M15, M12) with final follow-up measurements at M18. Data for each of the primary outcome variables were collected at all time points. Data from the remaining time points served secondary research goals throughout the project.

The entire BOUNCE study was approved by the Ethics Committee of the European Society of Oncology (approval number R868/18 – IEO 916) and the Ethics Committee of each participating clinical center. All participants were informed in detail about the purpose and procedure of the study and gave written informed consent. All methods were performed in accordance with relevant guidelines and regulations.

predictor

The current analysis considered socio-demographic, lifestyle and medical variables, and self-reported psychological characteristics registered at the time of BC diagnosis, even at the first follow-up assessment performed 3 months after diagnosis. Considered. The decision to pool predictive data from her first three months after diagnosis was guided by the following considerations. (b) This period defines a realistically short observation window for recording predictors of resilience in routine clinical practice, but not very long considering the 1-year study endpoint. (c) Previous studies have shown that psychological well-being usually occurs late in the disease course.

result variable

Self-reported mental health status at 12 months after diagnosis. Indexed by the 14-item Hospital Anxiety and Depression Scale (HADS) total score.¹⁶, served as the outcome variable in the current analysis (see Supplementary Information). A broad-language, clinically validated 16/42-point cut-off score was used to identify patients who reported potentially clinically significant symptoms at M0 and M12^17,18Patients were then assigned to two classes. Valid cut-offs for the HADS total score (mental health worsening group) and (b) those who reported mild symptoms throughout the year after diagnosis (mental health stable group). Thus, the worse mental health group consisted of those who scored less than 16 points on M0 and 16 points or more on M12, and the stable mental health group consisted of M0, M3, M6, M9 , and those who scored less than 16 points at the M12 assessment. point.

data analysis

The analytical pipeline employed to address the primary and secondary objectives of the study included preprocessing steps, feature selection, model training and testing.¹⁹Model 1 was designed to optimize the prediction of adverse mental health effects over a one-year period considering all available variables collected in M0 and M3, including HADS anxiety, HADS depression and global QoL. it was done. Model 2 was designed to obtain a personalized risk profile and focus on potentially modifiable factors (his HADS anxiety, HADS depression, and global QoL measured by M0 and M3). by omitting ). Feature selection using a random forest algorithm was incorporated into the ML-based pipeline along with the classification algorithm to select only relevant features for final model training and testing (see Supplementary Information). Area under the receiver operating characteristic curve (ROC AUC) is used to estimate the metrics for specificity, sensitivity, accuracy, precision, F-measure, and AUC of cross-validated models on the test set. evaluated performance.

Data preprocessing and missing data handling

First, raw data were rescaled to zero mean and unit variance, and ordinal variables were recoded to dummy binary variables. Cases and variables with more than 90% missing values were excluded from the final dataset. The remaining missing values were replaced by the global median (Supplementary analysis showed that applying multivariate imputation had little effect on model performance; see Supplementary Materials) .

Feature selection

Feature selection was performed using a metatransformer built on the random forest (RF) algorithm.²⁰ This assigns weights to features and ranks them according to their relative importance. The maximum number of features selected by the estimator was set to the default value (that is, the square root of the total number of features) to identify all significant variables contributing to risk prediction of poor mental health. A feature selection scheme was incorporated into the ML-based pipeline along with the classification algorithm to select only relevant features for final model training and testing.

Model training and validation

To address the fairly common problem of model overfitting in machine learning applications in clinical research, we employed a cross-validation scheme using holdout data for final model evaluation. Model overfitting means that models with fewer training errors (i.e., misclassification of training data) generalize poorer (expect new unseen data classification errors) than models with more training errors. It occurs because there is a possibility that As a result, we took an extra step to avoid partially overlapping subsets of cases by splitting the dataset into training and testing subsets using the validation set. Therefore, model testing was always performed on unseen cases that were not considered in the training phase and consequently did not influence the feature selection process. This procedure helps ensure that generalization errors are reduced while minimizing misclassification during the training phase.

In the current work, to prevent overfitting and maximize the generalization performance of the model on the test set, a data split of 5 times the hyperparameters (i.e., cross-validation with grid search) was trained, tested, and validated. Applied to a subset. For hyperparameter tuning and model selection, a grid search procedure with inner five-fold cross-validation was applied to the validation set. For this purpose, the best parameters were selected from the grid of parameter values of the trained model, allowing optimization of the classification results on the test set.

Classification with Balanced Random Forest Algorithm

Class imbalance was addressed using a random undersampling method to balance the combined subsets within the ensemble.Specifically, the balanced random forest classifier from the imbalanced learning MIT licensed library^{twenty one} Applied to address the imbalanced classification of classes in the dataset.Balanced Random Forest^{twenty two} combines a majority class downsampling technique with an ensemble learning approach to artificially adjust the class distribution so that classes are evenly represented on each tree in the forest. In this way, each bootstrap sample contains balanced downsampled data. Applying random undersampling to balance the different bootstraps of the RF classifier yields better classification performance than most of the existing conventional ML-based estimators, while at the same time yielding better results from imbalanced datasets. learning problems are reduced.

The following metrics evaluate the performance of learning algorithms applied to imbalanced data: specificity (true negative rate). sensitivity (true positive rate); precision, precision, and F value. These metrics are a function of the confusion matrix given the (correct) target values and the putative targets returned by the classifier during the testing phase. We also used Receiver Operating Characteristic (ROC) curves to represent the trade-off between false negative and false positive rates for all possible cutoffs. Area under the curve (AUC) was also calculated according to estimated ROC analysis.

Personalized Risk Profile (Model 2 only)

Following the analytical procedure described in the previous paragraph, a model-agnostic analysis was implemented on the set of variables emerging as key features from Model 2 to identify the most important predictors for a given mental health prediction. I was.^23,24This analysis supports the interpretability of a set of variables that emerged as key features towards patient classification. Specifically, model-agnostic analysis can be applied. (i) At a global (variable-specific) level, it helps clarify how each feature contributes to model decisions for each patient group. (ii) identify the most important predictors for a given mental health prediction at the local (i.e., patient-specific) level; Given the lack of precedence in the literature, we chose a mathematical model that does not make any assumptions about the data structure. A breakdown plot (local level) is Darex Python package^19,23 Default values have been applied to the arguments of the main function.

Source link