Breakthrough AI models can detect silent structured heart disease from simple ECGs, catch dangerous conditions early, streamline patient care, and close diagnostic gaps missed in traditional screening.
Research: Detection of structural heart disease from ECG using AI. Image credit: DC Studio / Shutterstock
A recent study published in the journal Naturea group of researchers investigated whether artificial intelligence (AI) electrocardiogram (ECG) models can reliably detect diverse structural heart disease (SHD) in a variety of hospitals and care settings, and investigated whether they outweigh standard physician reviews. A model called Echonext was developed as a multitasking classifier to address collinearity between different SHD component labels.
background
Every minute, another US (US) patient enters hospital and has symptoms that could conceal the underlying SHD. SHD already emits more than $100 billion in countries each year. However, an estimated 6.4% of elderly people carry clinically important valvular diseases (VHD) that have never been diagnosed and have already been diagnosed, resulting in a total prevalence of more than 11%.
Early echocardiography saves lives, but travel costs for ultrasound labs, trained readers, and patients remain impaired, and busy clinicians guess who to scan.
Large digital ECG archives and modern AI offer low-cost alternatives. If a 10-second ECG can reliably reveal quiet illnesses, it can direct rare imaging resources to those who need them most.
Further research is needed to determine whether algorithm-induced screening improves survival and equity. Additionally, this paper discusses potential deployment strategies for such models, including both “gatekeeper” and “safety net” applications.
About the research
Investigators assembled 1,245,273 pairs of ECG echodiogram records from 230,318 adults treated at eight New York Presbyterian (NYP) hospitals between 2008 and 2022, and reserved patient-level splits for training, validation and testing.
SHD was labeled when present in left ventricular ejection fraction (LVEF) ≤45%, left ventricular wall thickness 1.3 cm, moderate or right ventricular dysfunction, pulmonary artery systolic pressure (PASP) ≥45 mm Hg, or tricuspid regime JET UV enormous spinal plaques ≥3.2 Mlming ≥3.2 murositenjiett with long-lasting left ventricular wall thickness. Definition of hypertension, any valve moderate or worse regurgitation/stenosis, or moderate/large pericardial exudate.
The authors point out that these thresholds are somewhat arbitrary, as various studies and guidelines may use different cutoffs.
A convolutional neural network named Echonext ingested raw 12-lead waveforms along with seven routine ECG parameters and age/sex data. Performance was measured first on the held NYP test set, then in an external cohort at the Montreal Heart Institute and the University of California, San Francisco.
Generalization across age, gender, race, ethnicity, and clinical contexts were assessed. The silent “shadow” deployment performed 84,875 consecutive economies with ECG from patients without previous echocardiography, saving scores but had no impact on care.
Finally, single-site pilots to detect structural heart disease using deep learning in ECG waveform arrays (discovery) (adults without recent imaging were invited to receive echocardiography stratified by risk scores in the predecessor model. Echonext was post hoc.
Research Results
The Echonext, an ECG model equipped with AI, excels in retrospect analysis. Within the eight hospitalization NYP test set, we detected a composite SHD with an area of 85.2% of receiver operating characteristics (AUROC) and an area below the precision recovery curve (AUPRC) of 78.5%. Accuracy remained consistent across academic and community campuses, and did not decline when training and testing sites were exchanged when generalization was demonstrated.
External validation at Cedars Sinai Medical Center, the Montreal Heart Institute (MHI), and the University of California, San Francisco resulted in an auroloc value of 78-80% despite the high prevalence of the disease.
Disease-specific performance: LVEF ≤ 45% achieved AUROC 90.4%, while mercury with PASP ≥ 45 mm reached 82.7%. The authors emphasize that AUPRC values for component diseases are highly dependent on the prevalence of the underlying disease and should not be directly compared in terms of conditions or use cases.
In a 150-Trace reader survey, Econe compared it to 13 cardiologists. Using broad age, gender, waveforms, and ECG intervals, doctors correctly identified SHD in 64% of cases. When AI only achieved 77% accuracy and presented the algorithmic risk score to clinicians, its accuracy increased conservatively to 69%, highlighting that the model captured hidden prognostic patterns from the eyes of experts. It is important to note that cardiologists in this assessment can only access identified ECGs and routine parameters without a clinical context that is not typical of standard clinical care.
To estimate large clinical opportunities, the team quietly ran Echonext in 2023 with 124,027 ECGs recorded from 84,875 adults who had never undergone echoconic imaging. The model flagged 9% of the traces as high risk. Nevertheless, normal care left 45% of these individuals without follow-up imaging. This suggests that an estimated 1,998 silent SHD cases could have been intercepted if the alert was live, based on the modelled prevalence and sensitivity scenarios provided in the paper.
Of the 15,094 patients who ultimately underwent echocardiography, Echonext maintained accuracy (AUROC 83%; AUPRC 81%), providing a positive predictor of 74%, enhancing reliability in modern workflows. This paper also provides modeled performance estimates at various prevalence scenarios and sensitivity thresholds that highlight practical implications for overall population screening.
Promising evidence comes from the Discovery Pilot. This recruited 100 adults hiding in imaging. Post-hoc eChonext scoring revealed a distinct layer of previously unrecognized SHD in 73% of high-risk participants, 28% of moderate-risk participants, and 6% of low-risk participants. The moderate to severe left VHD followed a similar gradient.
These results demonstrate the ability of the model to triage rare echocardiographic contrast resources for those who may most benefit, while saving low-risk individuals on unnecessary testing. The original trial used the predecessor model (Valvenet) to stratify risk and recruit participants, and the Echonext model was retrospectively applied to these participants for further analysis.
Conclusion
In summary, eChonext shows that AI-enhanced ECGs can detect SHDs associated with lower LVEF, increased PASP, and significant VHD. By flagging high-risk patients for timely echocardiography, the algorithm promises to reduce diagnostic delays and $1 billion burdens for SHD while maintaining fairness across sites and demographics. However, the authors warn that AI-based screening poses potential risks, including patient anxiety from false positives and bias in clinical recruitment, and may underscore the need for further research on these aspects.
The disclosure of code and data encourages independent verification. However, large-scale practical trials should ensure that AI-induced ECG screening truly improves survival, quality of life, and healthcare value. In particular, the authors have released large-scale identified datasets and benchmark AI models (Columbia Mini Model) to support further research and enable transparent comparisons of future algorithms.
Journal Reference:
- Poterucha, T. J., Jing, L., Ricart, R. P., Adjei-Mosi, M., Finer, J., Hartzel, D., Kelsey, C., Long, A., Rocha, D., Ruhl, Ja and Vanmaanen, D. (2025). Detection of structural heart disease from electrocardiogram using AI. Nature. doi:10.1038/s41586-025-09227-0, https://www.nature.com/articles/S41586-025-09227-0
