A machine learning approach for early diagnosis of Parkinson’s disease

Machine Learning


Among all neurological diseases, the incidence of Parkinson’s disease (PD) has increased significantly. PD is usually diagnosed based on motor symptoms such as resting tremor, rigidity, and bradykinesia. However, the ability to detect non-motor symptoms such as constipation, apathy, loss of smell, and sleep disturbances may aid early diagnosis of PD over years and decades.

recently ACS Central Science study, Scientists at the University of New South Wales (UNSW) are discussing a machine learning (ML)-based tool that could detect PD years before the first symptoms appear.

Research: Interpretable machine learning of metabolomics data reveals biomarkers for Parkinson's disease. Image credit: SomYuZu / Shutterstock.com

study: Interpretable machine learning of metabolomics data reveals biomarkers for Parkinson’s disease. Image credit: SomYuZu / Shutterstock.com

Background

Currently, the overall diagnostic accuracy of PD based on motor symptoms is 80%. This accuracy could be enhanced if PD were diagnosed based on biomarkers rather than relying primarily on physical symptoms.

Several diseases are detected based on biomarkers related to metabolic processes. Biological metabolites from plasma or serum samples are evaluated using analytical tools such as mass spectrometry (MS).

Recently, noninvasive diagnostic methods using skin sebum and exhaled breath are gaining popularity. Previous studies have shown that MS can predict different metabolic profiles between pre-PD candidates and healthy individuals.

This difference in metabolite profiles was observed up to 15 years before clinical diagnosis of PD. Therefore, metabolite biomarkers could potentially be used to detect PD much earlier than the currently used approaches.

ML approaches are widely used to develop accurate predictive models for disease diagnosis using large-scale metabolomics data. However, developing predictive models based on the entire metabolomics dataset is fraught with many drawbacks, including overtraining that can degrade diagnostic performance. Most models are developed using a smaller subset of features pre-determined by traditional statistical methods.

Some ML approaches, such as linear support vector machines (SVM) and partial least squares discriminant analysis (PLSDA), may fail to consider key features of metabolomics data sets. However, this limitation has been overcome by advanced ML techniques such as neural networks (NNs), which are specifically designed to process large-scale data.

NNs are used to develop models with nonlinear effects. A major drawback of NN-based predictive models is the lack of mechanistic information, making the model uninterpretable.

Recently, Shapley Additive Explanation (SHAP) was developed to interpret ML models. However, this technique has not yet been used to analyze metabolomics datasets.

About research

In the current study, the researchers used a variety of analytical tools such as gas chromatography MS (GC-MS), capillary electrophoresis MS (CE-MS), liquids, and other analytical tools to identify Spanish nutrition and cancer European studies. Blood samples obtained from a prospective study (EPIC) were evaluated. Chromatography-MS (LC-MS).

The EPIC study provided metabolomics data from plasma samples obtained up to 15 years after samples were originally collected from both healthy candidates and candidates who later developed PD.

UNSW researcher Diane Zhang has developed an ML tool called Knowledge Generation from Classification and Ranking Analysis Using Neural Networks (CRANK-MS). This tool was built to interpret NN-based frameworks to analyze metabolomics datasets generated by analytical tools.

CRANK-MS consists of several features, including integrated model parameters that provide a high-dimensional metabolomics dataset for analysis without the need for pre-selection of chemical features.

CRANK-MS also includes SHAP for retrospectively exploring and identifying key chemical features that aid in accurate model prediction. Additionally, SHAP enables benchmark testing using five well-known ML techniques to compare diagnostic performance and validate chemical signatures.

Metabolomics data from 39 patients who developed PD up to 15 years later were investigated through a newly developed ML-based tool. Comparing the metabolite profiles of 39 of her pre-PD patients with her 39 matched control patients yielded a unique combination of metabolites that could be used as an early warning sign of PD incidence. Of note, this ML approach was shown to be highly accurate in predicting her PD prior to clinical diagnosis.

Five metabolites scored consistently high in all six ML models, demonstrating their potential utility in predicting future development of PD. Classes of these metabolites include polyfluorinated alkyl substances (PFAS), triterpenoids, diacylglycerols, steroids, and cholestane steroids.

The 1,2-diacylglycerol (34:2) isomer, a diacylglycerol metabolite, was detected in certain vegetable oils, such as olive oil, frequently consumed in the Mediterranean diet. PFAS are environmental neurotoxins that can alter neuronal processing, signaling, and function. Therefore, both dietary and environmental factors may contribute to the development of PD.

Conclusion

CRANK-MS is open to all researchers interested in disease diagnosis using ML approaches based on metabolomics data.

The application of CRANK-MS to Parkinson’s disease detection is just one example of how AI can improve how we diagnose and monitor disease. Interestingly, CRANK-MS can be easily applied to other diseases to identify new biomarkers of interest. She further claimed that the tool was easy to use and that she could generate results “within 10 minutes on a conventional laptop.”

Reference magazines:

  • Zhang, DJ, Xue, C., Kolachalama, VB, & Donald, WA (2023) Interpretable machine learning on metabolomics data reveals biomarkers of Parkinson’s disease. ACS Central Science. doi:10.1021/accentsci.2c01468



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *