data source
This study was conducted in Alberta, Canada. There, we have a single-payer health system with universal access, and 100% of all interactions with the health system are captured.
ECG data were linked to the following administrative health databases using a unique patient health number: (1) Discharge summary database (DAD) containing inpatient data. (2) the National Ambulatory Care Reporting System (NACRS) database of all hospital-based outpatient clinic and emergency department (ED) visits; (3) Alberta Health Care Insurance Plan Registry (AHCIP), which provides demographic information;
electrocardiogram data
We used standard 12-lead ECG traces (voltage-time series, sampled at 500 Hz for 10 seconds for each of the 12 leads) and ECG measurements (automatically generated by built-in algorithms in the Philips IntelliSpace ECG system). ECG measurements include atrial velocity, heart rate, RR interval, P wave duration, anterior P axis, horizontal P axis, PR interval, QRS duration, first 40 ms anterior QRS axis, last 40 ms anterior QRS axis, anterior QRS axis, initial 40ms QRS horizontal axis, final 40ms QRS horizontal axis, QRS horizontal axis, anterior ST wave axis (equivalent to ST deviation), anterior T axis, ST wave horizontal axis, horizontal T axis, Q wave onset, Fridericia rate-corrected QT interval, QT interval, Bazett rate-corrected QT interval.
analysis cohort
The study cohort was previously describedtwenty five. Briefly, from February 2007 to April 2020, patients admitted to 14 facilities in Alberta, Canada, accounted for 3,336,091 emergency department visits and 1,071,576 hospitalizations of 260,065 patients. Includes ECG. Concurrent medical events (emergency department visits and/or hospitalizations) that occurred for a patient within 48 hours were considered to be transfers and part of the same medical event. ECG recordings were linked to a medical episode if the acquisition date fell within the time window between the episode's admission and discharge dates. After excluding ECGs that cannot be associated with any episode, ECGs of patients under 18 years of age, and ECGs with poor signal quality (identified by warning flags generated by the ECG device manufacturer's built-in quality algorithm), we The cohort included 1,605,268 ECGs from 748,773 episodes of 244,077 patients (Figure 1).
prediction task
We identified patients with 15 common CV conditions (AF, SVT, VT, CA, AVB, UA, NSTEMI, STEMI, PTE, HCM, AS, MVP, MS, PHTN, HF. Identification was based on recording the corresponding International Classification of Diseases, Tenth Revision (ICD-10) code in the primary diagnosis field or any one of 24 secondary diagnosis fields of the medical episode associated with the ECG (Supplementary The validity of ICD coding in administrative medical databases has been previously established.36,37. If her ECG was performed during an ED or inpatient episode, all eligible diagnoses recorded for that episode were considered positive. Some diagnoses such as AF, SVT, VT, STEMI, AVB are usually identified by his ECG, but as a positive control to demonstrate the effectiveness of our model in detecting ECG-diagnosable conditions. incorporated into the research.
The goal of the predictive model was to output calibrated probabilities for each of the 15 selected conditions. These trained models can use ECGs acquired at any point during a medical episode. Please note that a single patient visit may include multiple ECGs. When training the model, we used all ECGs in the training/development set (which included multiple ECGs belonging to the same episode) to maximize learning. However, to evaluate the model, we used only the earliest ECG in a given episode within the test/holdout set. The aim is to create a predictive system that can be used at the time a patient's first ECG is acquired during treatment. Emergency department visit or hospitalization (see Evaluation section below for details).
We used ResNet-based DL for information-rich voltage time series and gradient boosting-based XGB for ECG measurements.twenty five. To determine whether demographic characteristics (age and gender) added incremental predictive value to the performance of a model trained on her ECG alone, we developed and reported a model in the following manner. (a) ECG only (DL: ECG trace). (b) ECG + age, gender (DL: ECG trace, age, gender [which is the primary model presented in this study]); (c) XGB: ECG measurements, age, gender.
learning algorithm
We employed a multilabel classification approach using binary labels of presence (yes) or absence (no) for each of the 15 diagnoses and estimated the probability that a new patient would have each of these diseases. Since the input for the model using ECG measurements was structured tabular data, we trained a gradient-boosted tree ensemble (XGB).38 On the other hand, we used a deep convolutional neural network for the model containing ECG voltage time series traces. For both XGB and DL models, 90% of the training data is used to train the model, and the remaining 10% is used as a tuning set to track performance degradation and “early stop” the training process. Reduced the possibility of overfitting.39. In DL, we learned a single ResNet model for a multi-class multi-label task.TenThe mapping of each ECG signal to 15 values corresponds to the probability that each of the 15 diagnoses is present. For gradient boosting, on the other hand, we learned 15 different binary XGB models, each mapping her ECG signal to one probability of an individual label. The details of the XGB and DL model implementation methodologies have been previously described.twenty five.
Evaluation and visualization
Evaluation design: We used a 60/40 split for training and evaluation data. The entire ECG dataset was divided into 60% random splits for model development (using 5x internal cross-validation for training and final model fine-tuning), and the remaining 40% for final external validation. I split it up as a holdout set. We ensured that her ECG from the same patient was not shared between development and evaluation data or between training/testing folds for internal cross-validation. As previously mentioned, the deployment scenario for the predictive system is expected to occur at the point of care, so we modeled it using only the patient's first ECG in a given episode, obtained during the emergency department or hospitalization. I rated it. The overall data and the number of ECGs, episodes, and patients used in the experimental split are shown in Figure 1 and Supplementary Table 5. In addition to the primary evaluation, extend the test to include all ECGs in the holdout set. Versatility of the DL model to process ECGs acquired at any point during an episode.
Additionally, we performed a “one-hospital withdrawal validation” using two large tertiary care hospitals to assess the robustness of the model with respect to distributional differences between hospital facilities. To ensure complete separation between the training and testing sets, we omitted the ECGs of patients admitted to both the training and testing hospitals during the study period, as shown in Supplementary Figure 1. Finally, to highlight the applicability of DL models in screening scenarios, we do the following: Integrating 15 disease labels into a composite prediction provides additional evaluation and improves diagnostic yield.20.
We reported the area under the receiver operating characteristic curve (AUROC, equivalent to the C index) and the area under the precision recall curve (AUPRC). We also calculated the F1 score, specificity, recall, and precision (equivalent to PPV) after dichotomizing the predicted probabilities into diagnostic/non-diagnostic classes using the optimal cut points obtained from the Youden index of the training set. , produced accuracy.40.We also used the calibration metric Brier Score41 (Lower scores indicate better calibration) Evaluate whether the predicted probabilities match the observed proportions.
Gender and pacemaker subgroups: We investigated model performance in specific patient subgroups based on patient gender. We also investigated the potential bias of her ECG obtained in the presence of cardiac pacing (including a pacemaker or implantable cardioverter-defibrillator). [ICD]Because ECG interpretation can be difficult in these situations, we compared model performance for ECGs without a pacemaker in the holdout set and for the entire holdout set (including ECGs both with and without a pacemaker). ) or use a ventricular assist device (VAD) (Figure 1). . Diagnostic and procedure codes used to identify the presence of a pacemaker are shown in Supplementary Table 7.
Model Comparison: For each evaluation, we report the performance from five internal cross-validations and the final performance on the holdout set using the same training and testing splits for different modeling scenarios. Performance was compared between models by sampling pairwise permuted holdout instances, yielding a total of 10,000 bootstrap replicates of pairwise differences in AUROC. In other words, it was a comparison between the non-pacemaker and the original. A difference in model performance is said to be statistically significant if the 95% confidence interval for the pairwise mean difference in AUROC does not include the zero value for the compared models.
Visualization: We used feature importance values based on the information obtained to identify the ECG measurements that primarily contribute to diagnostic prediction in the XGB model. Additionally, we used gradient-weighted class activation mapping (GradCAM) to visualize the gradient activation maps that contributed to the model predictions of the DL model's diagnosis.42 In the last convolutional layer. We also used feature importance values based on information gain to identify ECG measurements that primarily contribute to the diagnostic prediction of the XGB model.