A breakthrough in diagnosing genetic diseases with over 98% accuracy

In a recent study published in NEJM AI, researchers developed an artificial intelligence (AI)-based Model Organism Aggregated Resources for Rare to select Mendelian disease-causing genes and their mutations based on clinical features and gene sequences. We developed the Variant ExpLoration (MARRVEL) model.

Research: AI-MARRVEL — Knowledge-driven AI system for diagnosing Mendelian disorders. Image credit: Antiv/Shutterstock.com study: AI-MARRVEL — Knowledge-driven AI system for diagnosing Mendelian disorders. Image credit: Antiv/Shutterstock.com

background

Millions of people around the world are born with genetic diseases, typically Mendelian diseases caused by mutations in a single gene. Identifying these mutations is labor intensive and requires considerable expertise.

A comprehensive, systematic and efficient procedure can improve the speed and accuracy of diagnosis. AI shows promise but has had only mediocre success in primary diagnosis.

Although bioinformatics-based reassessment is inexpensive, it has limited accuracy, is cumbersome to prioritize non-coding variations, and requires the use of simulated data.

About research

In this study, researchers introduced a knowledge-driven MARRVEL AI-based model (AIM) to identify Mendelian diseases.

AIM is a machine learning classifier that combines over 3.5 million variations from thousands of identified cases with expert-designed variables to enhance molecular diagnosis. The team compared AIM to patients in her three cohorts and created a confidence score for finding diagnosable cases in the cold pool.

They trained AIM on high-quality samples and professionally developed features. They tested the model on three patient datasets for a variety of applications, including dominant, recessive, and triple diagnosis, identification of new disease genes, and large-scale re-evaluation.

Researchers collected Human Phenotype Ontology (HPO) keywords and exome sequences from three patient groups: DiagLab, the Undiagnosed Disease Network (UDN), and the Deciphering Developmental Disorders (DDD) project. They split his DiagLab data into training and testing datasets and tested DDD and UDN separately.

They guided AIM through knowledge-driven feature engineering. This engineering uses the clinical We used our expertise and genetic principles. , sequence quality, and splicing prediction.

The team created six modules for genetic diagnostic decision-making, resulting in 47 additional characteristics. They used Random Forest classifier as their primary AI algorithm and consulted benchmark publications and top performers.

They used characteristics such as SpliceAI to prioritize splicing variations. They developed an AIM model without VarDB to study the impact of incorrect phenotypic data.

They used a “feature climbing” approach to assess the contribution of each feature and categorize all features according to their biological importance.

Researchers developed a cross-sample score to estimate the likelihood of a patient's diagnostic variation being successfully diagnosed using AIM.

The researchers divided the patients into two groups based on their confidence level. Manual testing was performed for patients with high confidence, and reanalysis was performed for patients with low confidence.

They constructed four confidence measures and evaluated them by applying them to UDN and DDD samples and distinguishing between positive and negative patients and unaffected relatives of new cases.

result

AIM dramatically improved the accuracy of genetic diagnosis, tripling the number of resolved cases compared to a benchmarked approach in three real-world cohorts. AIM achieved his 98% accuracy and he detected 57% of the 871 cases that were diagnosable.

It also showed promise in gene discovery for novel diseases by accurately predicting two recently reported genes from the undiagnosed disease network. AIM outperforms existing methods on his three separate datasets and outperforms Genomiser on UDN and DiagLab cohorts.

The AIM method successfully distinguished between non-diagnostic and diagnostic pathogenic variations in ClinVar. AIM without VarDB showed slightly lower performance, but still outperformed other benchmarking methods.

Expert feature development improved the accuracy of the target model while slowing training saturation. AIM maintained a top-1 diagnostic accuracy of 54% using his 20% of the training data. With more training samples, the model trained with engineering variables had 66% accuracy, while the model without engineering features had 58% accuracy.

The researchers found an 11% reduction in top-1 diagnostic accuracy, demonstrating the importance of accurate phenotypic annotation. Even with unhelpful phenotypic information, AIM achieved a top-5 diagnostic accuracy of 78%, highlighting the importance of molecular evidence.

When the OMIM-based phenotypic similarity score increased from 0 to 0.25, the predicted results increased from 60.0% to 90.0%. However, the subsequent increase above 0.3 was only a small increase, indicating the lack of a requirement for an exact match to the OMIM phenotype.

The trio classifier (AIM-Trio) outperformed the Exomiser and Genomiser Trio models, but slightly outperformed the proband-only model (AIM). The AIM-NDG model removed features linked to databases of recognized diseases.

Based on research findings, AIM is a machine learning genetic diagnostic tool that can identify novel disease genes and analyze thousands of samples in days. It is highly accurate and useful for early diagnosis, reanalysis of cold cases, and identification of new disease genes.

AIM analyzes approximately 3.5 million variable data points from thousands of diagnosed cases and provides a web interface for users to submit cases and explore findings.

However, limitations include not assessing structural or copy number changes and focusing on situations with coding mutations. Large-scale language models such as PhenoBCBERT and PhenoGPT have demonstrated higher performance.

Source link