The predictive role of muscle quality index for testosterone deficiency in adult men based on interpretable machine learning methods | BMC Public Health

Machine Learning


Data Source

In this study, Wang et al. [18]. The NHANES was implemented by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC) and employs a sophisticated multi-stage stratified probability sampling design to ensure nationally representative estimates. [19]. Xing W et al. [20]we created a comprehensive analytical dataset by integrating all the research variables via a unique participant identifier (SEQN). The final dataset incorporates six important domains: Dietary assessment data providing (1) demographic characteristics, (2) lifestyle factors, (3) anthropometric measurements, (4) laboratory biomarkers, (5) questionnaire-derived health indicators, and (6) dietary assessment data providing multidimensional co-water flow for data analysis.

Ethics Statement

The NHANES protocol was approved by the NCHS Ethics Review Board (Protocol #2011-17), where written informed consent was obtained from all participants prior to data collection. Following strictly the NHANES data usage guidelines, our analysis adhered to the following ethical standards: (1) All protected health information was discontinued prior to analytical use. This secondary data analysis was determined to be exempt from the assessment of additional institutional review boards as it involves only anonymized, publicly available data sets.

Studying populations and design

The current analysis utilized the NHANES 2011-2014 cycle as it was the only available survey year that contains complete data on standardized grip strength measurements, a key component for calculating MQI. Strict inclusion criteria were applied to derive analytical samples from the initial cohort of 19,931 participants. The final analysis cohort included 2,628 eligible male participants (13.2% of the initial sample). This selection process ensured standardized assessment conditions and minimized potential confounding from circadian testosterone variation or musculoskeletal limitations. Based on previous research [21, 22],Participants under the age of 20 were excluded from the dataset to maximize dataset integrity. This is because many laboratory indicators of this age group were not collected in the NHANES data set. Skeletal muscle mass was measured using dual energy x-ray absorption measurement (DXA) scans. To ensure measurement accuracy, the grip strength was measured with both hands, and the grip strength of each hand was measured three times at 60 seconds intervals. The inclusion criteria were as follows: (1) Male. (2) Over 20 years old. (3) Completed total serum testosterone test. (4) Complete skeletal muscle mass measurements with both hands and grip strength tests. (5) Obtain informed consent. The exclusion criteria were (1) women. (2) Under 20 years of age. (3) Missing total testosterone data. (4) Hormonal medication use, history of testicular cancer or odorectomy, and collection of blood samples in the afternoon or evening. (5) History of hand or wrist surgery or pain over the past 3 months. (6) Missing data on skeletal muscle mass, grip strength test, age, and weighted variables.

Selecting and defining label features

Based on previous research [6, 7, 23] The expert consensus (obtained from four independent experts in sports medicine and andrology at the Second Associated Hospital at Wenzhou Medical University) initially considered 29 candidate variables (see Additional File 1) (see Additional File 1), including demographic characteristics, laboratory analysis, lifestyle factors, anthropometric measurements, and medical history. To ensure data quality, we excluded variables with missing values ​​exceeding 15%, including low-density lipoprotein (LDL), homeostasis model assessment of insulin resistance (HOMA-IR), insulin, triglyceride (TG), triglyceride glucose (TyG) index, TyG-body mass index (BMI) index, visceral adiposity index (VAI), and lipid accumulation product (wrap). To alleviate multicollinearity and enhance model generalization, dimension reduction was performed using variance inflation coefficient (VIF) analysis, retaining only variables with VIF <5. Follow the guidelines of the American Urological Association [24]TD was defined as serum total testosterone levels below 300 ng/dL and served as a binary classification label. Following feature selection, the final dataset consisted of one label (TD) and 18 features (9 categories and nine consecutive), which were then used for model training and evaluation.

Calculating Functions

To identify the best biomarkers for predicting TD in men, we evaluated several indicators related to insulin resistance (IR) and obesity. HOMA-IR: SON DH et al. [25]. TYG Index: Calculated using formula: ln [TG (mg/dL) × fasting glucose (mg/dL)/2]based on the method by Yang Z et al. [26]demonstrated its advantage over HOMA-IR in the evaluation of IR. TYG-BMI index: derived as the product of TYG index and BMI and acts as a composite marker for obesity-related IR [27]. VAI: For men, vai was calculated as: vai = wc(cm)/[39.68+(1.88× BMI)]× [TG (mmol/L)/1.03] × [1.31/high-density lipoprotein (HDL) (mmol/L)]following the methodology of Cheng et al. [28]. LAP: Determined using male sex-specific formulas: LAP = [WC (cm) − 65] Ebrahimi M et al. [29]. Systemic Immunological Inflammation Index (SII): Di X et al. [30].

Based on the Di X method, SII was calculated as platelet count × neutrophil count/lymphocyte count. MQI: Weng L et al. [7].

Handling missing values

Missing data is a general limitation of the NHANES data set. A complete case analysis by directly deleting missing values ​​can lead to both inefficient use of valuable healthcare data and potential selection bias. To maintain data integrity while maximizing retention of clinically relevant information, we implemented an advanced assignment approach for variables containing missing values ​​less than 15%. Specifically, we adopted the Missforest R package (version 1.5) that uses a random forest-based algorithm (hyperparameters: ntree=1000, maxiter=10, vorbose=true) to handle missing data (hyperparameters: ntree=1000, maxiter=10, verbose=true). [31]. This machine learning method was chosen for the following capabilities: (1) corresponds to complex variable relationships, (2) store the original data distribution, and (3) provide robust attribution to both continuous and categorical variables.

Predictive modeling strategies

To assess the performance and generalization capabilities of the model on invisible data, the dataset is randomly divided into 80:20 ratios, and 80% (n= 2,102) assigned to training, 20% (n= 526) Reserved as a test set. The training set was used for model development, but to enhance generalized performance, hyperparameter adjustments were performed via grid search on the test set. [32]. To further ensure robust estimates of model performance, we employed 5x cross-validation for each hyperparameter combination and selected the best configuration based on aggregated evaluation metrics. [33]. Class imbalances are a common challenge in medical research, as the prevalence of certain diseases is low. When trained with disproportionate data, machine learning models tend to bias towards majority classes, leading to inflated accuracy metrics that do not reflect the true predictive performance of underestimated classes [34]. In the dataset, the prevalence (target label) of TD was only 25.76%, indicating a significant class of imbalance. To mitigate this problem, we applied a synthetic minority oversampling technique (SMOTE) that generates synthetic samples of minority classes by interpolating adjacent instances of feature space. This approach effectively balances the class distribution balance while preserving the eigenstructure of the original data. However, Small may inadvertently alter the distribution of data or introduce overfits, especially when the synthetic sample amplifies noise or outliers. To assess the fidelity of the small-machined dataset, the jacker distance (JD) was calculated between the original and resampled data. A JD of zero confirmed that the synthetic sample did not skew the underlying data distribution [35]. Additionally, normalizing continuous variables using Minmaxscaler to address functional scaling and ensure uniform magnitude across the function [36]. Categorical variables were converted to machine-readable format via one-hot encoding, facilitating integration into classification models [37].

Considering the different performance characteristics of different ML algorithms, we adopted six widely used classification models to predict male TDs associated with MQI. To objectively determine the optimal model, we compared predictive performance using the area under the receiver operating characteristic curve (AUC), a robust metric for assessing classification performance, particularly on unbalanced data sets. Of the models tested, LGBM demonstrated the highest AUC and established it as the most effective predictor of male TD. Shap was applied to increase interpretability and provide clinically meaningful insight into the model's decision-making process. This quantifies the importance of features and reveals nonlinear relationships within the model.

SHAP Interpretability Analysis

To increase the model interpretability of the optimal model, we adopted SHAP values, a unified approach rooted in collaborative game theory, which provides mathematically rigorous distinctive attributions. [38]. Unlike traditional variable-important metrics that simply rank features by their relative importance, SHAP values ​​offer three important benefits: (1) quantify the exact magnitude of each function's contribution to individual predictions, (2) the direction of these contributions (determine the positive or negative effects on testosterone deficiency predictions), and (3) they maintain the model of the original model through the additive model. [39]. This approach allows for clinically meaningful interpretations of how specific features affect the model's decision-making process for TD classification.

Statistical analysis

All statistical analyses of the data in this study were performed using R version 4.3.1 (https://www.r-project.org) and Python version 3.11.5 (https://www.python.org). Because NHANES uses a complex sample survey design, we used weighted samples for our analysis. Continuous variables are expressed as median (ptwenty fivep75), and categorical variables are expressed as counts (percentages). Participants were grouped according to TD or NO-TD. Using the Wilcoxon rank sum test for complex sampling, we compared the differences between the two groups for continuous variables with Rao & Scott's quadratic corrected chi-square test for categorical variables. Multivariate logistic regression, trend testing, and subgroup analyses were used to analyze the relationship between MQI and TD. Evaluate the performance of each ML model using the areas under AUC, Accuracy, Accuracy, Recall, F1 Score, Brier Score, and Precision-Recall (PR) Curve (AP). The R packages used included Haven, Tableone, Gtsummary, Survey, Dplyr, Plyr, Tidyverse, Caret, Arsenal, Glmnet, Ggplot2, and MissForest. The Python libraries I used were Scikit-Learn1.2.2 and Imblearn Library version 0.10.1. This difference was considered statistically significant on both sides p<0.05.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *