ECG features improve multimodal deep learning prediction of incident T2DM in a Middle Eastern cohort

Ethical approval

This research was carried out in compliance with Qatar’s Ministry of Public Health regulations. This work was approved by the Institutional Review Board of QBB in Qatar and used a de-identified dataset from the QBB. The QBB dataset is not publicly available, in compliance with the Qatar Biobank data-sharing policy.

Data collection and preprocessing

The data used in this study was collected from the QBB. The details of the data collection protocol adopted by the QBB was described in^2,12. In brief, participants were invited to Staff nurses interviewed the QBB to collect their background history. Then Multiple laboratory tests and imaging, such as ECG scans and retinal images, were collected. The data for this study included the clinical and ECG features extracted by the QBB team. In addition to ECG data, clinical known risk factors are selected as reported by the literature^{14,15,16,17,18,19,20,21}, six key factors were selected for analysis: non-invasive factors (age, gender, BMI, waist size, systolic and diastolic BP).

The ECG features comprised key summary measures—QRS duration, QT interval, corrected QT interval, PR interval, average RR interval, and T-wave axis—each extracted from three separate, nonconsecutive 30-second resting recordings per participant by the QBB team (see Kuwari et al.^2,12). These measures were chosen for their established links to cardiac electrophysiology and diabetes-related cardiovascular changes. While we report descriptive statistics based on the per-participant means of each ECG feature (e.g., QTInterval_mean) in Table 1, our predictive models were trained using all individual replicates (e.g., QTInterval_1, QTInterval_2, QTInterval_3) to capture lead- and time-specific variation. To assess redundancy, we computed pairwise correlations among all ECG features (Supplementary Figure S1). The results show high correlations among replicates of the same feature group (e.g., r ≈ 0.90–0.95 for QTInterval_1–3 and QRSDuration_1–3), indicating consistency. However, correlations across different feature groups were low (r < 0.3), suggesting minimal multicollinearity.

The ECG features and CRFs were processed using a structured preprocessing pipeline to handle missing values, outliers, and inconsistencies. Missing continuous variables were imputed with their mean values, whereas categorical variables were imputed with their mode. Outliers detected using interquartile range analysis were addressed by replacing the extreme values with the mean of the respective features. To standardize the data and support effective model training, all features were scaled using MinMaxScaler (Table 4).

Table 4 ECG features and their clinical significance.

Study design

The dataset considered in this study comprises two distinct cohorts derived from the QBB, enabling robust model development and validation. The cohort included baseline cross-sectional data and a longitudinal follow-up cohort to evaluate T2DM risk prediction. The development cohort consisted of 4,500 participants with cross-sectional baseline data. After applying exclusion criteria (e.g., missing ECG data or participants with pre-existing prediabetes), the final development cohort comprised 2043 participants, of whom 1107 had confirmed T2DM diagnoses and 937 were non-diabetic controls (see Fig. 1). The classification of T2DM cases and controls was conducted with the assistance of QBB medical practitioners and nurses.

The longitudinal test cohort initially included 1,500 participants with baseline data and a median follow-up duration of five years. Participants diagnosed with T2DM at baseline were excluded to retain only non-diabetic participants to focus on the prediction of incident T2DM (Fig. 1). Non-diabetic status at baseline was defined as not meeting any of the following diagnostic criteria: HbA1c ≥ 6.5%, fasting plasma glucose ≥ 126 mg/dL, or self-reported use of diabetes medications. Consequently, the final longitudinal cohort comprised 395 participants who did not meet diabetic criteria at baseline; 303 were metabolically healthy, whereas 92 exhibited prediabetic features such as elevated HbA1c below the diagnostic threshold. During the five-year follow-up, 140 participants (35.4%) developed T2DM. This comparatively high conversion rate could be explained by the inclusion of 92 participants with prediabetes in the “non-diabetic” baseline cohort. Meta-analyses show that individuals with prediabetes progress to overt T2DM at 5–10% per year²⁷; assuming a midpoint of 7% annually yields an expected cumulative incidence of roughly 35% over five years, closely matching our observed rate. While this enrichment enhances the model’s sensitivity to early pathophysiological changes, it may overestimate absolute risk in lower-risk settings, underscoring the critical need for external validation in diverse, population-based cohorts before clinical implementation.

Experiment setup and model development

We designed three experiments using unimodal and multimodal data configurations to evaluate the contributions of ECG data, CRFs, and their integration in predicting T2DM. The first configuration, the ECG-only model, employed a deep neural network (DNN) to process ECG-derived features exclusively. The second configuration, the CRF-only model, analyzed CRFs using a similar DNN architecture. The third configuration, the ECG + CRF multimodal model (ECG-DiaNet), combined ECG features and CRFs into a unified deep learning model, leveraging their complementary strengths to enhance prediction accuracy. The exclusive use of deep neural networks (DNNs) in this study is motivated by their proven effectiveness in integrating multimodal data and modeling intricate, nonlinear interactions among features, even with moderately sized datasets. Recent literature strongly supports the superiority or comparable performance of DNNs over traditional methods. Specifically, Wang et al.²⁸ demonstrated the higher accuracy of neural networks compared to logistic regression and decision trees in diabetes prediction. Additionally, studies by Butt et al.²⁹ and Ahuja et al.³⁰ further confirmed the advantages of multilayer perceptron models over classical approaches such as Random Forest. Crucially, the capability of DNNs to seamlessly integrate heterogeneous data modalities, such as ECG signals and CRFs, further justifies their use in our study. Trivedi et al.³¹ highlighted significant predictive improvements when ECG waveforms and clinical data were jointly modeled using multimodal DNN architectures. Moreover, DNNs inherently support continual learning and straightforward incremental updates, which are essential for handling evolving clinical datasets.

ECG-only and CRF-only networks

Both the ECG-only and CRF-only models shared the same neural network architecture. Each model consisted of a three-layer fully connected deep neural network. The input layer received either the ECG-derived features or the set of clinical and demographic risk factors. The first hidden layer contained 512 neurons with ReLU activation followed by a dropout layer (dropout rate = 0.1) to mitigate overfitting. The second hidden layer reduced the dimensionality to 256 neurons, again followed by ReLU activation and dropout (dropout rate = 0.25). The final output layer comprised a single neuron with a sigmoid activation function, yielding a probability score for T2DM risk. Hyperparameter tuning was conducted using a grid search strategy on the training folds of the development set during cross-validation. The grid included the following ranges: learning rate ∈ {1e-4, 5e-4, 1e-3}, batch size ∈ {8, 16, 32, 64}, and dropout rate ∈ {0.1, 0.25, 0.3, 0.5}. Optimal hyperparameters were selected based on the highest average AUROC across validation folds.

ECG+CRF multimodal model (ECG-DiaNet)

The ECG-DiaNet model was designed to integrate the complementary strengths of ECG signals and clinical risk factors for improved prediction of T2DM. Feature representations were extracted separately from each modality using the same architecture as the unimodal models, excluding the final output layer removed. This resulted in two 256-dimensional feature vectors—one from ECG data and one from CRFs. These vectors were concatenated into a single 512-dimensional representation and passed through a multimodal classifier. The classifier consisted of three fully connected layers with 256, 64, and 1 neurons, respectively. Each intermediate layer was followed by ReLU activation and a dropout layer (dropout rate = 0.25). The final layer used a sigmoid activation function to generate the T2DM risk probability. This early fusion strategy enabled effective learning of cross-modal interactions between cardiac and systemic metabolic signals.

Model training and validation

The model development process adhered to a rigorous training and validation protocol to ensure robustness and generalizability. Five-fold cross-validation was employed using the development cohort, which consisted of cross-sectional data from participants with and without T2DM. The dataset was randomly partitioned into five subsets, with four subsets used for training and the fifth reserved for validation in each iteration. To ensure stability and confidence in performance estimates, this process was repeated 1,000 times using bootstrapping. Hyperparameter tuning was conducted during cross-validation to optimize the architecture, learning rate, and regularization parameters of the models. Once the models were optimized, they were trained on the entire development cohort. The trained models were then evaluated on a longitudinal test cohort comprising non-diabetic individuals at baseline, assessing their risk of developing T2DM over a follow-up period. To avoid data leakage, there was no overlap between the training and evaluation sets, ensuring a robust assessment of the models’ ability to predict long-term T2DM risk. The binary cross-entropy loss of the model was optimized using the Adam optimizer³² with a learning rate of 5e-4. The models were trained with weight decay set to 5e-6 for 20 epochs with a batch size of 16.

Risk prediction and statistical analysis

The models were evaluated on the longitudinal test set to estimate the risk of developing T2DM during follow-up visits based on baseline data. For each metric, 1,000 bootstrap samples were generated, and the mean and standard errors were reported. Model performance was primarily assessed using the area under the receiver operating characteristic curve (AUROC), along with 95% confidence intervals. Additional metrics included the area under the precision-recall curve (AUPRC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Calibration was evaluated using the Brier score, which quantifies the accuracy of predicted probabilities, with lower scores indicating better calibration. Calibration curves were also plotted to assess the agreement between predicted and observed risks, ensuring the model’s reliability for clinical decision-making.

Risk stratification analysis

To evaluate the effectiveness of integrating ECG-derived scores into a traditional CRF-only model for predicting the onset of T2DM, we performed a risk stratification analysis. We classified participants into distinct risk categories based on their predicted scores. The predicted probabilities were calculated using baseline data, and the actual incidence of T2DM within each risk group was compared over a 5-year follow-up period. Specifically, we compared the risk stratification outcomes between the CRF-only model and the ECG-DiaNet model, which combines ECG-derived features with traditional CRFs.

Risk stratification was conducted by categorizing participants into three distinct risk groups: Low, Medium, and High. The categorization was based on quantile thresholds, with the lowest 20% of scores categorized as Low Risk, the middle 20–80% as Medium Risk, and the highest 20% as High Risk. The incidence rates, expressed as positive predictive values (PPV), were calculated for each risk group. This approach enabled us to evaluate how effectively each model distinguished participants based on their risk of developing T2DM. For visualization, PPV was calculated for each risk group under both models. The results were displayed as bar charts, where the distribution and PPV of each risk group could be directly compared.

Model interpretation

To interpret the contribution of individual ECG-derived features to the model’s prediction of incident T2DM, we employed SHapley Additive exPlanations (SHAP)³³. For SHAP-based interpretation, we used the mean absolute SHAP value of each feature across all test samples to quantify its overall impact. To evaluate the collective importance of ECG signals, we grouped SHAP values of all ECG-derived features and compared their cumulative contribution to those of CRFs.

Source link