Study population
CHARLS is a nationally representative, longitudinal survey of community-dwelling adults aged 45 years and older, encompassing 28 provinces across China. Operating under the auspices of the National Institute for Development Studies at Peking University, CHARLS secured ethical clearance from the Peking University Institutional Review Board (IRB00001052-11015), and all participants provided written informed consent prior to enrollment [18]. This study was conducted in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines to ensure the rigor and transparency of reporting. Further details on CHARLS, including access to datasets upon registration and application, are available at https://charls.pku.edu.cn/. [19, 20]
Our investigation performed a secondary analysis utilizing CHARLS data from the 2011 and 2015 waves. Exclusion criteria for participants were as follows: [1] age below 45 years [2], FI (wave 1) and CKM syndrome stage 4 at or prior to baseline [3], missing or aberrant exposure indicator values [4], absence of required indicators for CKM syndrome staging [5], lack of 2015 FI. An initial cohort of 17,705 participants from wave 1 was screened, with 4,354 individuals incorporated into the final analysis. The participants selection flowchart is depicted in Fig. 1.

Participant screening flowchart for this study
Definitions of exposure and outcome variables
CHARLS includes fasting venous blood samples collected by researchers from the Chinese Center for Disease Control and Prevention (China CDC), which were analyzed utilizing an enzyme colorimetric assay at the Youanmen Clinical Trial Center of Capital Medical University, Beijing [19]. During the physical examination, height, weight, and waist circumference (WC) were each measured three times, with the mean value utilized as the final measurement, using a vertical height gauge and a weight scale for height and weight, respectively, and a flexible tape measure encircling the waist at the level of the navel for WC, while the participant remained in a standing position [21]. In this research, the selected TyG-related indices, including the TyG index, C-reactive protein-TyG index (CTI), TyG-body mass index (TyG-BMI), TyG-WC, and TyG-waist-to-height ratio (TyG-WHtR), among which TyG-WHtR has been recognized as a risk factor for FI in European populations [22].
Based on the above manipulation, 12 IRs were finally included in this study calculated as follows.
$${{\text{TyG – index}}\,{\text{ = }}\,{\text{ln}}\left( {\frac{{{\text{Tg}}\left( {\frac{{{\text{mg}}}}{{{\text{dl}}}}} \right){{ \times FBS}}\left( {\frac{{{\text{mg}}}}{{{\text{dl}}}}} \right)}}{{\text{2}}}} \right)}$$
(1)
$$\:\begin{array}{c}\text{T}\text{y}\text{G}\text{-}\text{W}\text{C}\text{=}\text{T}\text{y}\text{G}\text{-}\text{i}\text{n}\text{d}\text{e}\times\text{W}\text{C}\left(\text{cm}\right)\:\end{array}$$
(2)
$$\:\begin{array}{c}\text{M}\text{E}\text{T}\text{S}\text{-}\text{I}\text{R}\text{=}\text{l}\text{n}\left(\left({2\times}\text{FBS}\left(\frac{\text{mg}}{\text{dl}}\right)\right)\right)\text{+}\left(\text{Tg}\left(\frac{\text{mg}}{\text{dl}}\right){\times}\frac{\text{BMI}\left(\frac{\text{kg}}{{\text{m}}^{\text{2}}}\right)}{\text{ln}\left(\text{HDL}\left(\frac{\text{mg}}{\text{dl}}\right)\right)}\right)\:\end{array}$$
(3)
$$\:\begin{array}{c}\text{T}\text{G}\text{-}\text{H}\text{D}\text{L}\text{}\text{r}\text{a}\text{t}\text{i}\text{o}\text{=}\frac{\text{TG}\left(\frac{\text{mg}}{\text{dl}}\right)}{\text{HDL}\left(\frac{\text{mg}}{\text{dl}}\right)}\:\end{array}$$
(4)
$$\:\begin{array}{c}\text{T}\text{y}\text{G}\text{-}\text{W}\text{H}\text{t}\text{R}\text{=}\text{T}\text{y}\text{G}\text{-}\text{i}\text{n}\text{d}\text{es}{\times}\frac{\text{weight}\left(\text{cm}\right)}{\text{height}\left(\text{cm}\right)}\:\end{array}$$
(5)
$$\:\begin{array}{c}\text{T}\text{y}\text{G}\text{-}\text{B}\text{M}\text{I}\text{=}\text{T}\text{y}\text{G}\text{-}\text{i}\text{n}\text{d}\text{e}\text{x}{\times}\text{w}\text{e}\text{i}\text{g}\text{h}\text{t}\text{(}\text{c}\text{m}\text{)}\text{/}\text{(}\text{h}\text{e}\text{i}\text{g}\text{h}\text{t}\left(\text{cm}\right){\text{)}}^{\text{2}}\:\end{array}$$
(6)
$$\:\begin{array}{c}\text{C}\text{T}\text{I}\text{=}\text{0.412}{\times}\text{l}\text{n}\left(\text{CRP}\left(\frac{\text{mg}}{\text{L}}\right)\right)\text{+}\text{T}\text{y}\text{G}\text{-}\text{i}\text{n}\text{d}\text{e}\text{x}\text{}\end{array}$$
(7)
$$\begin{aligned}\rm{eGDR}=21.158-(0.09\times\rm{Waist}(cm)\\-(3.407\times\rm{hypertension}(Yes=\frac{1}{no}=0))-(0.0551\times{HbA1c}(\%)\end{aligned}$$
(8)
$$\begin{aligned}\rm{CVAI}=267.93+0.68*age+0.0*bmi+4*waist\\+22*\rm{log}10(TG)-16.32*\rm{HDL}\end{aligned}$$
(9)
$$\begin{aligned}\rm{VAI}=\frac{waist}{39.68+(1.88*bmi)}*(\frac{TG}{1.03})*(\frac{1.31}{HDL})\end{aligned}$$
(10)
$$\begin{aligned}\rm{LAP}=TG_{(mmol/L)}\times(WC_{[cm]}-65)(men)\\=\rm{TG_{(mmol/L)}\times(WC_{[cm]}-58)}\rm(women)\end{aligned}$$
(11)
$$\begin{aligned}\rm{NON\text{-}HDL\text{-}C/HDL\text{-}C\;ratio}\\=\rm{(TC(mg/dL)-HDl\text{-}c(mg/dL))/HDL\text{-}c(mg/dL)}\end{aligned}$$
(12)
Following a systematic screening of the CHARLS dataset, 32 variables encompassing comorbidity, physical activity (PA), disability, depression, and cognition were retained to derive the FI, with all but one dichotomized according to based on questionnaire responses (Table S1). A score of 0 denoted the absence of a deficit, whereas 1 signified its presence, except for item 32, which was modeled as a continuous variable ranging from 0 to 1, with higher values reflecting greater cognitive impairment. In this investigation, participants with missing data exceeding 20% of the items on a given scale were excluded from the analysis [23]. The 32-item FI for each participant was computed by dividing the total number of present health deficits by 32, multiplying the quotient by 100, and converting this value onto a scale ranging from 0 to 100, with higher scores reflecting greater FI, which was classified as a 32-frailty score ≥ 0.25, consistent with established guidelines in the literature [24].
Definition of CKM syndrome stages 0–3
As outlined in the AHA Presidential Advisory Statement, CKM syndrome is stratified into 5 progressive stages, spanning from stage 0 to 4 [1]. Stage 0 is characterized by the absence of risk factors, comprising blood pressure (BP), normal body weight, blood glucose levels, lipid profiles, and renal function, with no signs of cardiovascular disease (CVD), highlighting the emphasis on primary prevention and the promotion of cardiovascular health. Stage 1 comprises individuals with obesity and impaired glucose metabolism, reflecting markers of excess or dysfunctional adiposity. Stage 2 encompasses individuals at moderate to high risk of CKD, defined by an estimated glomerular filtration rate (eGFR) of 30–60 ml/min/1.73 m2 and/or a self-reported CKD diagnosis, alongside metabolic risk factors including hypertriglyceridemia, hypertension, metabolic syndrome, and type 2 diabetes mellitus (T2DM), with eGFR serving as a key indicator for evaluating renal function. Stage 3 encompasses individuals who exhibit subclinical CVD, such as those at high 10-year CVD risk (Framingham risk score ≥ 21.5 for females and ≥ 21.6 for males) [22], or those with extremely high-risk CKD, defined by an eGFR < 30 ml/min/1.73 m², with the Framingham risk score estimating CVD risk based on various factors (Table S2-Table S4). Lastly, stage 4 involves individuals with clinical CVD. The FI assessment, the primary variable of interest in this study, encompassed clinical CVD, necessitating the exclusion of individuals classified as stage 4 in the CKM syndrome framework.
Covariates
Drawing upon existing literature, clinical expert assessments, and retrospective data analysis, a range of sociodemographic, socioeconomic, health behavior and lifestyle factors, as well as blood biomarkers, were considered as potential confounders [1]. Sociodemographic variables encompassed age (years) and sex (male or female) [2]. Socioeconomic parameters comprised place of residence (rural or urban), educational attainment (“no formal or primary education” or “middle school or above”), marital status (“married and cohabiting with spouse,” “married but separated from spouse,” or “unmarried, divorced, and widowed”), Regional category (“east,” “midland,” or “west”) and annual household expenditure [3]. Health behavior and lifestyle factors comprised smoking status (“non-smoker” or “smoker”), alcohol consumption (“non-drinker,” “drinking < once a month,” or “drinking >once a month”), PA, type of cooking fuel used (“solid fuel” or “clean fuel”), sleep duration (“≤6 h,” “6–8 h,” or “>8 h”), and social isolation [25]. Tobacco use classification was based on participants’ history of tobacco consumption, including chewing tobacco, pipe smoking, self-rolled cigarettes, and commercial cigarettes or cigars. Alcohol intake was determined through self-reported consumption over the past year, accounting for beverage type (spirits, wine, or beer). Smoking status was recorded as current or former use of cigarettes, cigars, pipes, or chewing tobacco. PA was quantified using the International PA Questionnaire (IPAQ), with time intervals in the CHARLS converted to median values: “≥4 hours”, “≥2 hours and < 4 hours”, “≤30 minutes and < 2 hours”, and “≥10 minutes and < 30 minutes”. PA scores were derived using metabolic equivalent values: PA score = 8.0 × total hours of vigorous activity per week + 4.0 × total hours of moderate activity per week + 3.3 × total hours of walking per week [26, 27]. Indoor air pollution was evaluated based on cooking fuel type, with solid fuels (coal, crop residue, wood) classified as “solid fuels” and cleaner alternatives (natural gas, biogas, liquefied petroleum gas) as “clean fuels” [28]. Sleep duration was assessed by self-reported average nightly hours during the past year and categorized as ≤ 6 h, 6–8 h, or >8 h. Depressive symptoms were evaluated utilizing the Center for Epidemiologic Studies Depression Scale (CESD-10) [29]. Social isolation was measured by a validated four-item scale, incorporating [1] living alone [2], marital status [3], frequency of contact with children (< 1 time/week), and [4] participation in social activities (< 1 time/month). A composite score (0–4) was calculated, with ≥ 2 indicating isolation [4, 30, 31–]. Blood biomarkers comprised white blood cell count (WBC), mean corpuscular volume (MCV), platelets, blood urea nitrogen (BUN), creatinine, low-density lipoprotein cholesterol (LDL-C), glycated hemoglobin (GHb), uric acid (UA), hematocrit, hemoglobin (Hb), cystatin C (CysC), C reactive protein (CRP), and eGFR. In summary, we chose a total of 24 covariates.
Statistical analysis
Statistical analyses were conducted using Python (version 3.9.12) and R (version 4.4.1). For continuous variables, data were presented as mean ± standard deviation (SD), and group differences were assessed through analysis of variance (ANOVA). Categorical variables, expressed as counts and corresponding percentages, were compared using the chi-square (χ²) test.
Logistic regression analyses were performed to investigate the association between IR indices and FI across individuals categorized into CKM syndrome stages 0 to 3. A series of five hierarchical models were specified. Model 1 incorporated adjustments for demographic and socioeconomic factors. Model 2 further accounted for behavioural covariates, including smoking status, alcohol consumption, and cooking fuel use. In Model 3, additional adjustments were applied for PA, social isolation, sleep duration, and depressive symptoms. Model 4 expanded upon the previous models by including pertinent blood biomarkers. We did not include BMI index and HDL in the model for adjustment because the exposure in this study involved TyG-BMI index and TG/HDL ratio. If any of the aforementioned factors were components of the IRs, they were excluded from the adjustment process to prevent collinearity. For example, as the exposure variables in this study included the TyG-BMI and TG/HDL ratio, BMI and HDL were not included as covariates in the regression models.
Receiver operating characteristic (ROC) analyses were conducted to evaluate the discriminative performance of all IRs. Optimal threshold values for each index were determined using the OptimalCutpoints R package, based on criteria including maximal sensitivity and specificity, diagnostic odds ratios (positive and negative), and the Youden index for predicting frailty in individuals at CKM syndrome stages 0–3. Thresholds were further stratified by CKM syndrome stages (0–3), and their predictive validity was evaluated through logistic regression modelling [33].
The final cohort comprised 4,354 participants, randomly split into training (70%) and validation (30%) sets to minimize overfitting. To determine the most predictive IR index for frailty in individuals with CKM syndrome stages 0–3, we employed an integrative approach combining multiple embedded ML-based feature selection methods [16]. We used 24 covariates as reference variables to compare our IR indices. For embedded feature selection, we initially applied recursive feature elimination (RFE) to screen variables. RFE improves classification performance by ranking a large set of features according to their importance within a given ML algorithm [34]. Subsequently, the Boruta algorithm was employed to introduce permuted “shadow” features by shuffling the values of each original variable. The RFE model was then trained on both the original and shadow features, and their relative importances were evaluated across multiple iterations. Features consistently exhibiting higher importance than their shadow counterparts were retained [35]. Finally, we applied the least absolute shrinkage and selection operator (LASSO), a regularization technique grounded in linear regression, which penalizes less informative features by shrinking their coefficients to zero, thereby selecting variables with non-zero coefficients [36]. Drawing on the results from three feature selection methods, the optimal IR index for predicting FI among individuals across CKM syndrome stages 0 to 3 was identified.
Subsequently, 24 covariates were incorporated as input features for model development. To refine the model, SHapley Additive exPlanations (SHAP) values were employed to evaluate feature importance, enabling a reduction from 24 to 12 features. Ten ML algorithms, including K-nearest neighbor (KNN), decision tree (DT), Categorical Boosting (CatBoost), Gaussian Naive Bayes (GNB), Light Gradient Boosting Machine (LGB), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGB), multi-layer perceptron (MLP), bootstrap aggregating (Bagging), and random forest (RF), were evaluated for their performance in predicting FI in individuals with CKM syndrome stages 0–3. Model performance and interpretability were performed utilizing the area under the receiver operating characteristic curve (AUC) and SHAP values. To enhance clinical applicability, the final predictive model was deployed as a web-based application utilizing the Streamlit Python framework.
