Machine learning-predicted insulin resistance is a risk factor for 12 types of cancer

Machine Learning


Machine learning-predicted insulin resistance was associated with a higher risk of diabetes, cardiovascular disease, and mortality in the UK Biobank

To evaluate the predictive capability of AI-IR (Fig. 1a) for the incidence of diabetes, cardiovascular disease, and mortality in the UK Biobank, we first looked at the effect of AI-IR on the incidence of diabetes among participants without diabetes at baseline and completed follow-up visit (N = 14,165, baseline characteristics are shown in Supplementary Table 1). During a mean follow-up of 4.28 years (95% confidence interval [CI], 3.41–5.15), 309 participants (2.18%) developed diabetes. AI-IR positive participants had a significantly higher risk of developing diabetes compared to AI-IR negative participants (odds ratio [OR], 7.31; 95% CI, 5.75–9.30; P < 1 × 10−10, Fig. 1b). Notably, although AI-IR was originally developed using fasting blood test results, blood samples in the UK Biobank were not consistently collected in the fasted state. To address this, we stratified participants into three groups based on fasting duration prior to blood collection (less than 4 h, 4 or more and less than 8 h, 8 h or more) and examined the association between AI-IR and the incidence of diabetes. Across all categories, AI-IR positive participants consistently showed a significantly higher risk of developing diabetes, supporting the robustness of our model (Supplementary Fig. 1a–c). Hereafter, to maintain maximum sample size, we did not stratify or exclude participants based on fasting duration. In a complementary analysis leveraging the National Health Service (NHS) medical record system, we looked at the effect of AI-IR on hospital admission with diabetes among participants without diabetes at baseline (Baseline characteristics of participants are shown in Supplementary Table 2). AI-IR positive participants had a markedly higher risk of admission with diabetes (hazard ratio [HR], 6.44; 95% CI, 6.19–6.69; P < 1 × 10−10, Fig. 1c). Of note, AI-IR was associated with a significantly higher risk of diabetes onset and hospital admission with diabetes after adjusting for age and sex, and even after adjusting for age, sex, and BMI, indicating that our machine learning-based prediction model is able to capture a BMI-independent effect of insulin resistance (Supplementary Fig. 2a, b).

Fig. 1: Effect of machine learning-predicted insulin resistance on diabetes and cardiovascular disease.
figure 1

a A diagram depicting the construction of a machine learning model to predict insulin resistance (HOMA-IR > 2.5) that uses nine clinical parameters. b Effect of AI-IR on the odds ratio (OR) for onset of diabetes during the follow-up period in participants without diabetes (DM) at baseline. Logistic regression was used with a two-sided Wald test. In the graph, points represent the estimated OR, and error bars represent 95% confidence intervals. c Effect of AI-IR on the hazard ratio (HR) for hospital admission due to diabetes in participants without DM at baseline. Cox proportional hazards model was used with a two-sided Wald test. In the graph, points represent the estimated HR, and error bars represent 95% confidence intervals. d, e Kaplan–Meier plot for cumulative incidence of 3-point MACE (d) or cardiovascular mortality (e) in DM (-); AI-IR (-), DM(-); AI-IR (+), and DM (+) group. In the graph, the lines represent the estimated cumulative incidence, and the shaded error bands represent 95% confidence intervals. The HRs adjusted for age and sex are also shown for 3-point MACE (d) and cardiovascular mortality (e). Cox proportional hazards model was used with a two-sided Wald test. Source data are provided as a Source data file.

Cardiovascular disease is a major complication of insulin resistance and diabetes. Therefore, we also looked at the effect of AI-IR on 3- and 4-point major adverse cardiovascular event (MACE). Effects on cardiovascular mortality and overall mortality were also examined. Baseline characteristics of participants are shown in Supplementary Table 2. As expected, a Kaplan-Meier plot revealed a significant difference in the cumulative incidence of 3-point MACE between participants with diabetes, those without diabetes but positive for AI-IR, and those without diabetes and negative for AI-IR (Fig. 1d, log-rank test P < 1 × 10−10). The HR adjusted for age and sex was highest in participants with diabetes (HR, 1.84; 95% CI, 1.76-1.92; P < 1  × 10−10), followed by those without diabetes but positive for AI-IR (HR, 1.38; 95% CI, 1.34–1.42; P < 1 × 10−10), and then those without diabetes and negative for AI-IR (Fig. 1d). AI-IR and diabetes was significantly associated with a higher risk of 3-point MACE even after adjusting for age, sex, and BMI (Supplementary Fig. 2c). We also observed that AI-IR was significantly associated with an increased risk of 4-point MACE, cardiovascular mortality, and overall mortality (Fig. 1e and Supplementary Fig. 2d–f).

Machine learning-predicted insulin resistance enables improved diabetes risk stratification compared to previously reported metrics

To examine whether AI-IR outperforms previously established simpler metrics of obesity, metabolic syndrome, and insulin resistance, we compared the predictive capabilities of AI-IR, BMI, metabolic syndrome (MetS), TG/HDL ratio, and TyG index for the incidence of diabetes among individuals without diabetes at baseline during the follow-up period. TG/HDL ratio24,25 and TyG index26,27,28 has been reported as surrogate markers of insulin resistance. Area under the curve (AUC) of the receiver operating characteristic (ROC) curve was highest for AI-IR (0.798, P < 1 × 10−4 vs other metrics), followed by MetS (0.748), BMI (0.721), TyG index (0.703), and TG/HDL ratio (0.702), indicating the AI-IR demonstrated the highest predictive performance among these metrics (Fig. 2a). Next, we categorized individuals without diabetes at baseline into four groups according to both AI-IR status (positive or negative) and their status on previously established metrics (positive or negative). BMI of 30, TG/HDL ratio of 3.025, and TyG index of 4.6827 were used as cut-off values according to previous literatures. When we categorized individuals into four groups based on AI-IR status (positive or negative) and BMI ( ≥ 30 or <30), we observed that the incidence of diabetes was significantly higher (P = 2.52 × 10−5) in the AI-IR single-positive group (OR, 6.14; 95% CI, 4.42–8.54; P < 1 × 10−20; adjusted for age and sex) compared with the BMI single-positive group (OR, 0.86; 95% CI, 0.35–2.12; P = 0.74). Of note, a BMI of 30 or higher alone did not significantly increase diabetes incidence when individuals were negative for AI-IR, suggesting that these subjects may represent a metabolically healthy obesity phenotype. In contrast, diabetes incidence was markedly higher (P = 9.78 × 10−7) in the AI-IR and BMI double-positive group (OR, 8.08; 95% CI 6.13–10.66; P < 1 × 10−20) compared with the BMI single positive group, underscoring that AI-IR provides substantially improved diabetes risk stratification among individuals with obesity (BMI ≥ 30; Fig. 2b). Similarly, when we categorized individuals without diabetes into four groups based on both AI-IR status (positive or negative) and MetS (positive or negative), the incidence of diabetes was significantly higher (P = 1.55 × 10−3) in the AI-IR single-positive group (OR, 6.85; 95% CI, 4.35–10.80; P = 1.2 × 10−16) compared with the MetS single-positive group (OR, 3.14; 95% CI, 2.07–4.77; P = 6.8 × 10−8). Again, the incidence of diabetes was even greater (P = 2.19 × 10−13) in the AI-IR and MetS double-positive group (OR, 11.71; 95% CI 8.53–16.06; P < 1 × 10−20) compared with the MetS single-positive group (Fig. 2c). We observed the same pattern for TG/HDL ratio and TyG index (Fig. 2d, e). Collectively, these results indicated that AI-IR was significantly associated with a higher risk of diabetes, cardiovascular disease, and mortality in the UK Biobank population. Moreover, AI-IR demonstrated the highest predictive capability for the incidence of diabetes compared to previously established metrics, providing the rationale for investigating the association between predicted insulin resistance and cancer incidence in subsequent analyses.

Fig. 2: Machine learning-predicted insulin resistance enables improved diabetes stratification compared to previously established metrics.
figure 2

a Receiver operating characteristic (ROC) curve illustrating the predictive performance of AI-IR, BMI, Metabolic Syndrome, TG/HDL ratio, and TyG index for the incidence of diabetes during the follow-up period. The area under the curve (AUC) for each metrics is also shown. be Effect of AI-IR and BMI (b), Metabolic syndrome (MetS) (c), TG/HDL ratio (d), or TyG index (e) on the odds ratio (OR) for onset of diabetes during the follow-up period in participants without diabetes at baseline, adjusted for age and sex. Logistic regression was used with a two-sided Wald test. In the graph, points represent the estimated OR, and error bars represent 95% confidence intervals. Source data are provided as a Source data file.

Effect of machine learning-predicted insulin resistance on the incidence of cancer

To investigate the effect of AI-IR on the incidence of cancer, we leveraged the linkage between the UK Biobank and the NHS medical record system and examined the incidence of cancer among participants who were cancer-free at the baseline visit (N = 372,395). We compared participants without diabetes and negative for AI-IR (N = 256,685), those without diabetes but positive for AI-IR (N = 94,782), and those with diabetes (N = 20,928). Of 372,395 participants who were cancer-free at baseline, 51,193 developed cancer (Supplementary Table 3). When we merged all types of cancer, we did not observe any differences in cancer incidence between the three groups. The HR adjusted for age and sex in participants without diabetes but positive for AI-IR was 1.012 (95% CI, 0.992–1.033; P = 0.228), and that in participants with diabetes was 0.973 (95% CI, 0.938–1.008; P = 0.129) (Fig. 3 and Supplementary Table 4). However, when we looked at individual types of cancer, both AI-IR and diabetes were associated with a significantly higher risk of incidence of multiple cancer types (The results of the 25 common cancers are shown in Fig. 3, and the complete list can be found in Supplementary Table 4). We examined the incidence of 36 cancers common to both males and females, 4 cancer types specific to males, and 3 cancer types specific to females. The Bonferroni-corrected P value for significance was 0.05/43 = 1.163 × 10−3. The effect of AI-IR on increasing the risk of the incidence was strongest for uterine cancer (HR; 2.340; 95% CI, 2.065-2.652; P = 1.00 × 10−9), followed by kidney cancer (HR, 1.557; 95% CI, 1.367–1.772; P = 1.00 × 10−9) and esophagus cancer (HR, 1.464; 95% CI, 1.253–1.710; P = 1.61 × 10−6). It was also associated with a higher incidence of renal pelvis cancer (HR, 1.417; 95% CI, 1.013–1.983; P = 0.0418), small intestine cancer (HR, 1.393; 95% CI, 1.019–1.905; P = 0.0376), stomach cancer (HR, 1.374; 95% CI, 1.132–1.667; P = 1.28 × 10−3), liver and gallbladder (GB) cancer (HR, 1.367; 95% CI, 1.114–1.678; P = 2.73 × 10−3), pancreas cancer (HR, 1.291; 95% CI, 1.117–1.492; P = 5.58 × 10−4), colon cancer (HR, 1.176; 95% CI, 1.084–1.276; P = 9.45 × 10−5), leukemia (HR, 1.164; 95% CI, 1.012–1.339; P = 0.0337), bronchial and lung cancer (HR, 1.136; 95% CI, 1.048–1.231; P = 2.02 × 10−3), and breast cancer (HR, 1.135; 95% CI, 1.075–1.199; P = 5.92 × 10−6). When we considered the Bonferroni correction, AI-IR was associated with a higher incidence of uterine, kidney, esophagus, pancreas, colon, and breast cancers. On the other hand, AI-IR was associated with a significantly lower incidence of skin cancer (HR, 0.852; 95% CI, 0.825–0.881; P = 1.00 × 10−9). For cancer types whose incidences were increased or decreased by AI-IR, diabetes also exhibited an effect in the same direction, whereby in many cases, the effect size was numerically larger (Fig. 3 and Supplementary Table 4).

Fig. 3: Effect of machine learning-predicted insulin resistance on the incidence of cancer.
figure 3

a Effect of AI-IR and diabetes on the incidence of overall and 25 common cancers. Cox proportional hazards model was used with a two-sided Wald test. In the graph, points represent the estimated HR, and error bars represent 95% confidence intervals. The HRs for indicated cancers, adjusted for age and sex were shown. For male-specific (prostate) or female-specific (uterine and ovary) cancers, HRs were adjusted only for age. The Bonferroni-corrected P value for significance was 0.05/43 = 1.163 × 10−3. Source data are provided as a Source data file.

When we defined composite cancers by merging 10 cancer types (common to both males and females) whose risks were either significantly increased (kidney, esophagus, pancreas, colon) or nominally increased (renal pelvis, small intestine, stomach, liver and GB, colon, and bronchial and lung) by AI-IR, a Kaplan–Meier plot revealed a significant difference between participants with diabetes, those without diabetes but positive for AI-IR, and those without diabetes and negative for AI-IR, on the cumulative incidence of the composite cancers (Fig. 4a). The HR adjusted for age and sex in participants without diabetes but positive for AI-IR was 1.25 (95% CI, 1.20–1.31; P < 1 × 10−11), and that in participants with diabetes was 1.40 (95% CI, 1.31–1.50; P < 1 × 10−11). When we adjusted also for BMI, the HR in participants without diabetes but positive for AI-IR was 1.16 (95% CI, 1.10–1.22; P = 1.2 × 10−8), suggesting that approximately 36% of AI-IR’ effect is mediated through BMI. Sex-specific analysis revealed that the impact of AI-IR on the incidence of the composite cancers remained consistent in both males and females (Supplementary Fig. 3a, b). When we combined female-specific uterine and breast cancers whose risks were significantly increased by AI-IR, we observed that effect the HR adjusted for age in participants without diabetes but positive for AI-IR was 1.26 (95% CI, 1.20–1.32; P < 1 × 10−11), which was comparable to that in participants with diabetes (HR, 1.23; 95% CI, 1.11–1.36; P < 3.9 × 10−5) (Fig. 4b). A Kaplan–Meier plot also revealed a significant difference between three groups on the cumulative incidence of specific cancers such as uterine cancer (Supplementary Fig. 4a), kidney cancer, colon cancer, and bronchial and lung cancer (Supplementary Fig. 4a–d).

Fig. 4: Effect of machine learning-predicted insulin resistance on the incidence of composite cancers, and its BMI-dependent and -independent effect on cancer incidence.
figure 4

a Kaplan–Meier plots of the cumulative incidence of the composite cancers in the DM(-); AI-IR (-), DM(-); AI-IR (+), and DM (+) groups. In the graph, the lines represent the estimated cumulative incidence, and the shaded error bands represent 95% confidence intervals. The HRs for the incidence of the composite cancers adjusted for age and sex or adjusted for age, sex, and BMI were also shown. Cox proportional hazards model was used with a two-sided Wald test. b Kaplan–Meier plots of the cumulative incidence of the uterine and breast cancer in females from the DM(-); AI-IR (-), DM(-); AI-IR (+), and DM (+) groups. In the graph, the lines represent the estimated cumulative incidence, and the shaded error bands represent 95% confidence intervals. The HRs for the incidence of these cancers adjusted for age or adjusted for age and BMI were also shown. Cox proportional hazards model was used with a two-sided Wald test. c For 13 cancer types whose incidences were significantly or nominally affected by AI-IR, the effects of AI-IR after adjustment for BMI were shown. Cox proportional hazards model was used with a two-sided Wald test. In the graph, points represent the estimated HR, and error bars represent 95% confidence intervals. HRs are adjusted for age and sex. The Bonferroni-corrected P value for significance was 0.05/43 = 1.163 × 10−3. Source data are provided as a Source data file.

Cancer risk is well known to increase with age29,30. Indeed, when we stratified participants by age at enrollment (40–69 years old), the incidence of the composite cancers per 1000 person-years increased with age (Supplementary Fig. 5a). Furthermore, both AI-IR and diabetes were associated with a higher incidence of the composite cancers across all ages (Supplementary Fig. 5a), indicating that AI-IR is the risk factor for the composite cancers across different age groups. Kaplan–Meier analyses further confirmed consistent differences in cumulative incidence of the composite cancers among three groups across different ages at enrollment (<50 years old, 50–59 years old, and ≥60 years) (Supplementary Fig. 5b–d). To more rigorously account for confounding by age, we also conducted analyses with age as the underlying time variable. Significant differences in cancer incidence persisted among participants with diabetes, those without diabetes but AI-IR positive, and those without diabetes and AI-IR negative (Supplementary Fig. 6a.) The HR adjusted for sex and BMI in participants without diabetes but AI-IR positive was 1.16 (95% CI, 1.10–1.22; P < 3.2 × 10−8), and that in participants with diabetes was 1.29 (95% CI, 1.20–1.39; P < 1 × 10−10). The effect of AI-IR on 3-point MACE was also consistent in these models (Supplementary Fig. 6b). Together, these findings demonstrate the robustness of AI-IR in predicting complications of insulin resistance while fully accounting for the effect of age.

BMI-dependent and -independent effect of machine learning-predicted insulin resistance on cancer incidence

In our prediction model, feature importance was highest for higher BMI (0.427), followed by higher FPG (0.115), lower HDL (0.115), and higher TG (0.097)23. To explore the BMI-independent effect of AI-IR on cancer incidence, we also performed an analysis adjusted for age, sex, and BMI (Fig. 4c and Supplementary Table 5). Among the cancers whose risks were positively associated with AI-IR, we observed that effects for renal pelvis, small intestine, stomach, liver and GB, pancreas, colon, leukemia, and breast cancer were BMI-dependent (Fig. 4c). However, we observed that the effect of AI-IR on the incidence of bronchial and lung cancer became stronger and significant even after Bonferroni correction when we adjusted for BMI in addition to age and sex (HR, 1.33; 95% CI, 1.20–1.47; P = 1.71 × 10−8), than when we adjusted only for age and sex (HR, 1.14; 95%CI, 1.05–1.23; P = 2.02 × 10−3) (Fig. 4c), indicating that the effect was independent of BMI. The effect on uterine, kidney, and esophagus cancer remained nominally significant. Additionally, the association between AI-IR and lower risk of skin cancer was also BMI-independent.

Lung cancer is strongly associated with smoking, the leading risk factor for the global burden of cancer31. We observed that the effect of AI-IR on the incidence of bronchial and lung cancer is significant even when we adjusted also for smoking status (never smoker, previous smoker, and current smoker); The same is true for the incidence of the composite cancers (Supplementary Fig. 5a, b). We also looked at a possible joint effect of AI-IR and smoking status on the incidence of bronchial and lung cancer (Supplementary Fig. 5c). In the never smoker group, HRs adjusted for age, sex, and BMI were not significantly different (P = 0.513) between the AI-IR positive group (HR, 1.08; 95% CI, 0.85–1.37) and the AI-IR negative group. In the current smoker group, HRs adjusted for the three factors were also not significantly different (P = 0.726) between the AI-IR positive group (HR, 17.42; 95% CI, 14.66–20.71) and the AI-IR negative group (HR, 17.89; 95% CI, 15.63–20.47). On the other hand, in the previous smoker group, HRs adjusted for the three factors in the AI-IR positive group (HR, 5.52; 95% CI, 4.69–6.49) was significantly higher than that in the AI-IR negative group (HR, 3.84; 95% CI, 3.35–4.40), when examined by post-hoc analysis (P = 7.78 × 10−8, Supplementary Fig. 5c). We observed similar interaction effect on the incidence of the composite cancers; predicted insulin resistance increased the incidence in the never smoker group and the previous smoker group, but not in the current smoker group (Supplementary Fig. 5d). Notably, the composite cancers include esophagus, pancreas, stomach, liver, and bronchial and lung cancers, all of which are known to have increased risk due to smoking32. These results suggest that the effect of AI-IR on the incidence of bronchial and lung cancer or the composite cancers is the most tangible in the previous smoker group; however, the effect may have been masked in the current smoker groups because the absolute risks of the incidence are overwhelmingly high.

Machine learning-predicted insulin resistance enables improved cancer risk stratification compared to previously reported metrics

Finally, to examine whether AI-IR provides a better prediction of the composite cancer incidence than previously established metrics such as BMI, MetS, TG/HDL ratio, and TyG index, we classified participants who were free of diabetes and cancer at baseline into four groups according to their AI-IR status (positive or negative) and the status of these metrics (positive or negative). When we consider AI-IR and BMI, we observed that the incidence of the composite cancers was significantly higher (P = 2.28 × 10−2) for the AI-IR single-positive group (HR, 1.18; 95% CI, 1.11–1.26; P < 9.5 × 10−8) compared with the BMI single-positive group (HR, 1.04; 95% CI, 0.95–1.15; P = 0.38). BMI of 30 or higher alone did not significantly increase the composite cancer incidence when individuals were negative for AI-IR. We also observed that the incidence of the composite cancers was significantly higher (P = 2.01 × 10−5) in the AI-IR and BMI double-positive group (HR, 1.30; 95% CI 1.24–1.37; P < 1 × 10−11) compared with the BMI single positive group, underscoring that AI-IR provides substantially improved cancer risk stratification among individuals with obesity (Fig. 5a). When considering AI-IR (positive or negative) and MetS (positive or negative), the AI-IR single positive group (HR, 1.21; 95% CI 1.12–1.31; P = 2.1 × 10−6) and the MetS single positive group (HR, 1.19; 95% CI 1.12–1.26; P = 1.8 × 10−9) exhibited comparable HRs (P = 0.649) for the incidence of the composite cancers. However, the incidence of composite cancers was significantly higher (P = 1.49 × 10⁻⁴) in the AI-IR and MetS double-positive group (HR, 1.34; 95% CI 1.27–1.40; P < 1 × 10⁻¹¹) compared with the MetS single-positive group (Fig. 5b). Similar patterns were observed when we consider AI-IR and TG/HDL ratio (Fig. 5c). When considering AI-IR and TyG index, the incidence of the composite cancers was significantly higher (P = 1.92 × 10−2) for the AI-IR single-positive group (HR, 1.23; 95% CI, 1.13–1.35; P = 7.2 × 10−6) compared with the TyG single-positive group (HR, 1.10; 95% CI, 1.05–1.16; P = 1.1 × 10−4). We also observed that the incidence of the composite cancers was significantly higher (P < 1 × 10−11) for the AI-IR and TyG double-positive group (HR, 1.32; 95% CI 1.26–1.39; P < 1 × 10−11) compared to the TyG single-positive group (Fig. 5d). Altogether, these results indicate that AI-IR enables improved cancer risk stratification compared to previously established metrics.

Fig. 5: Machine learning-predicted insulin resistance enables improved cancer risk stratification compared to previously established metrics.
figure 5

Effect of AI-IR and BMI (a), Metabolic syndrome (MetS) (b), TG/HDL ratio (c), or TyG index (d) on the hazard ratio (HR) for incidence of the composite cancer during the follow-up period in participants without diabetes at baseline. Cox proportional hazards model was used with a two-sided Wald test. In the graph, points represent the estimated OR, and error bars represent 95% confidence intervals. Source data are provided as a Source data file.



Source link