By combining large-scale genetics and machine learning, researchers uncover hidden risk patterns and distinct patient subtypes that could change the way type 1 diabetes is identified and understood.
Research: Genetic associations and machine learning improve risk prediction for type 1 diabetes. Image credit: sasirin pamai/Shutterstock.com
Researchers performed genetic association analysis and machine learning techniques to classify and estimate genetic risk for type 1 diabetes. This research natural genetics.
Genetic and immune factors drive complex type 1 diabetes risk
Type 1 diabetes is a chronic metabolic disease characterized by the destruction of pancreatic beta cells, leading to a lack of insulin production and hyperglycemia (hyperglycemia). There is evidence that the disease develops when genetically susceptible people are exposed to environmental triggers.
The disease usually appears in childhood and adolescence. However, adults are also affected. Autoantibodies that specifically target insulin-secreting pancreatic cells are often used as biomarkers to predict the clinical development of type 1 diabetes. However, these autoantibodies are transient and infrequently found in adult-onset cases, limiting timely disease prediction.
To improve risk prediction, there has been a focus on genetic factors that can identify susceptible individuals. Genetic mutations in class I and II major histocompatibility complex (MHC) genes are the greatest risk factors for type 1 diabetes. When inherited collectively, these genes can increase the risk of disease by 16 times.
Genetic risk scores have been developed and widely used for early prediction of type 1 diabetes risk. This is essential to prevent adversities such as diabetic ketoacidosis at the time of diagnosis. In this study, researchers from the University of California and the Broad Institute conducted genetic association analysis and used the machine learning model T1GRS to improve the gold standard genetic risk score for type 1 diabetes.
Researchers conducted a genome-wide association study in 20,355 people with type 1 diabetes and 797,363 non-diabetic Europeans. Further analysis around the MHC region in 10,107 diabetic and 19,639 non-diabetic patients identified several genetic risk signals for type 1 diabetes. They used these signals to train a machine learning model to identify individuals who are genetically predisposed to developing type 1 diabetes.
Machine learning models improve genetic classification of type 1 diabetes
The researchers found that the machine learning model T1GRS improved classification accuracy and improved area under the curve (AUC) values across multiple cohorts. Classification was improved, especially among individuals without high-risk HLA haplotypes and individuals with more complex genome-wide risk profiles of European and African American descent.
The model showed 89% sensitivity and 84% specificity for type 1 diabetes at optimal thresholds in the discovery dataset, and showed high efficacy in differentiating diabetic patients from diabetic patients.
Researchers identified genetic variants at 79 known loci and eight unreported loci not previously associated with type 1 diabetes. They also performed both MHC-specific and genome-wide association studies and identified several novel type 1 diabetes-associated variants that affect immune function and gene activation.
A total of 199 identified risk variants were used to train the machine learning model, including 102 lead variants in non-MHC regions. The model used these mutations identified throughout the genome and within the MHC region to generate a T1GRS score to identify individuals at risk for type 1 diabetes. The main advantage of this model is that it captures nonlinear interactions between genetic variants and can identify numerous interactions between MHC and non-MHC loci that contribute to disease risk.
Analysis of the genetic factors that significantly influence each individual’s T1GRS score divided diabetic patients into four subtypes: T cell-enriched, MHC-enriched, pancreatic-enriched, and MHC-driven. This analysis revealed that people with known high-risk genetic variants for type 1 diabetes were more likely to develop the disease in childhood (early onset).
Individuals carrying genetic variations both within and outside the MHC region were more likely to develop slightly later than early-onset subtypes, due to differences in genetic contribution rather than clear differences in disease severity. Similarly, individuals carrying non-MHC variants People enriched with immune-related signals are more likely to develop the disease at an intermediate age.
Those carrying non-MHC variants enriched in pancreatic cell-associated signals were more likely to experience late-onset disease with the highest rates of complications such as cardiovascular disease, neurological disease, and chronic kidney disease.
T1GRS advances genetic screening across diverse populations
This study highlights the importance of combining genetic information with the machine learning model T1GRS for early prediction of type 1 diabetes risk in both children and adults. This model can predict disease risk across diverse individuals and ancestry with high accuracy, including those with more complex genetic risks, and clearly performs on par with, rather than superior to, ancestry-specific scores for the African American population.
These characteristics may make the T1GRS an improved clinical screening tool compared to previous genetic risk scores, most accurately predicting type 1 diabetes risk in high-risk individuals with extensive family history and early age at onset.
This study identifies four genetic subgroups of individuals with significant heterogeneity in clinical characteristics, such as age of onset and risk of diabetes-related complications, based on genetic risk scores generated by T1GRS. The researchers believe that this subgrouping may help guide clinical practice in type 1 diabetes.
There are still inherent limitations in the predictive power of genetic data, as both genetic and environmental factors can influence the complex pathophysiology of type 1 diabetes. When genetic data alone cannot fully capture disease risk, machine learning models that combine genetic data with molecular signals influenced by environmental factors can further improve disease risk prediction.
Click here to download your PDF copy.
