A comparison of different machine-learning techniques for the selection of a panel of metabolites allowing early detection of brain tumors

Machine Learning


Malignant gliomas are responsible for the majority of deaths associated with primary brain tumors. However, early diagnosis could improve the survival rate28. In recent years, significant progress has been made in understanding the fundamental metabolic changes related to glioma progression and biology2,29. Still, a reliable and accurate method for preoperative brain tumor identification has yet to be developed. Based on the literature review, it was confirmed that analysis of changes in blood metabolite profiles could be an attractive approach to discovering valuable novel glioma biomarkers2,30. It has been proven that targeted metabolomics analysis based on mass spectrometry may become a useful diagnostic platform in clinical practices due to its high sensitivity and effective throughput31. Therefore, aiming to improve brain tumors diagnosis, we used a targeted metabolomics approach (AbsoluteIDQ p180 kit), which allows quantification of up to 188 metabolites from 6 compound classes (AAs, biogenic amines, acylcarnitines, lysophosphatidylcholines, phosphatidylcholines (PC), sphingolipids, and sum of hexoses) for metabolic profiling of plasma samples of people with glioma, MT, and Con. However, working with biomedical data generated by high-throughput technology, such as the one used in this study, can be challenging due to its large size as well as enormous dimensionality, and natural diversity26,32. In this work, MLM was applied to consider all the presented variables during a brain tumor diagnostic strategy development.

Machine learning approaches are becoming of interest to provide actionable knowledge from large data sets generated using LC–MS/MS methods and to improve metabolic profiling endeavors. To the best of our knowledge, this study is the first to compare 10 different supervised MLMs, including the newly developed hybrid method (EvoHDTree), with the conventional approach to determine metabolomics-based prognostic signatures in gliomas. Previously, conventional approaches were widely used in the metabolomics studies of various diseases13,33. Currently, novel machine learning algorithms are gaining popularity for constructing predictive methods for various types of cancer12,17,20,21,22,23,24,34,35,36,37,38,39,40.

Decision trees are one of the most popular “white box” prediction techniques41. The success of tree-based approaches can be explained by their effectiveness, ease of interpretation, and extraction of possible diagnostic rules. However, according to recent literature reports, they could not be compatible with current biological data generated by high-throughput technologies due to the enormous dimensionality, experimental noise, and other perturbations25,32. For this reason, we proposed a new solution, EvoHDTree, combining DT techniques with evolutionary algorithms and the recently developed concept—RXA. This approach performed very well in the case of genomics data25. Therefore, it seemed reasonable to use it to analyze other omics data, namely metabolomics data. This innovative approach made it possible to prepare glioma diagnostic panels with high predictive coefficients.

Comparing the results (Table 1) for the four comparisons (different glioma grades vs. Con) for all the algorithms applied, we concluded that similar results were obtained using EvoHDTree and the conventional approach. Diagnosing a patient with LGG increases the likelihood of a cure before it transforms into HGG and thus significantly increases the chances of survival5. For this reason, we focused on the GI–II vs. Con comparison, in which we obtained better results using the new hybrid algorithm. Although the other machine learning methods utilized in this study identified a variety of discriminating metabolites, these methods yielded a considerably larger number of metabolites composing the diagnostic panels, which can make interpretation and subsequent application more challenging. A larger pool of discriminative features may initially appear beneficial, but it carries the risk of overfitting. In addition, the EvoHDTree algorithm selectively selected metabolites to construct predictive models to avoid repetition in each comparison (Fig. 2A), thus, we applied this method for the second part of the experiment. The unique composition of metabolites chosen for each comparison increases the possibility of distinguishing gliomas from MT. Notably, its novelty consists in its flexible tree node representation, which involves both classical univariate and bivariate tests inspired by the RXA concept. Furthermore, we improved evolutionary exploration and exploitation by incorporating our knowledge of decision tree induction and RXA methodology and designing more than a dozen specialized variants of recombination operators.

In the second part of the experiment, we used EvoHDTree to perform four comparisons between gliomas and MTs, as well as MT vs. Con. The purpose of this section was to assess whether there is an overlap between the metabolites used to construct the diagnostic panels for glioma and MT. Applying the same metabolites to distinguish brain tumors could introduce a bias and lead to misdiagnosis. Considering this, we have developed panels of metabolites that can distinguish glioma patients from MT subjects. Subsequently, we again validated nine predictive models using the LOOCV method to verify the obtained results. Despite the restrictive validation method employed, the ACC results obtained for the nine comparisons are still characterized by high predictive coefficients falling within the range of 0.750–0.975. LOOCV is widely regarded as an excellent tool to validate MLM properly in studies based on smaller study groups42. Niu et al.43 reported that there is no need to divide the dataset into a training set and a test set if the quality of the model is tested using the jackknife test (LOOCV), since the result obtained is a combination of many different independent tests of the dataset. Therefore, LOOCV is increasingly recognized and widely applied by researchers to test the power of prediction methods, despite the drawback of long computation time.

Early glioma detection ensures faster implementation of treatment and thus may contribute to prolonged survival30. Therefore, our study focused on a comparison involving LGG and Con. A diagnostic panel for GI-II vs. Con comparison prepared with the use of the EvoHDTree hybrid algorithm mainly used four metabolites (Fig. 1). These were three AAs (taurine, aspartate, asparagine) and sphingomyelin (SM) C24:1. Recently, differences in the levels of certain AAs in the blood of patients with glioma compared to Con have been demonstrated44,45. In our study, increased levels of SM C24:1 and asparagine and decreased levels of aspartate and taurine in GI-II vs. Con comparison were observed. According to Jothi et al.6, taurine occupied the top-most position in discriminating the grades of gliomas, followed by other AAs such as creatinine and glutamine. In addition, taurine has been considered a potential marker of apoptosis in gliomas46. Taurine exhibits antineoplastic and antioxidant properties, but its primary role is osmoregulation47. Moreover, taurine is presumed to be a determinant nutritional molecule during the regeneration and development of the central nervous system48. The decrease in aspartate with glioma grade growth is due to the conversion of this AA to asparagine using asparagine synthetase. Asparagine, as Thomas et al.49 proposed, is a crucial factor in brain tumor growth under nutrient-deprived conditions. In parallel to AA metabolism, our study also highlighted the role of lipids in this disease. In our study, SM C24:1 was positively correlated with tumor aggressiveness due to increasing mean concentration values of this lipid in subsequent glioma vs. Con comparisons. Based on a literature review, further tumor growth after the initiation of tumorigenesis is possible due to the evasion of effector cells, which is enabled through an increase in SM concentration in the cell surface membrane. Partial inhibition of the SM conversion to ceramide, an essential signaling molecule for tumor biology, cell proliferation, apoptosis, aging, and cell migration, facilitates tumor progression50,51,52.

Subsequent comparisons regarding HGG and Con prepared by the EvoHDTree algorithm were based on seven metabolites. For GIII vs. Con, these were kynurenine, creatinine, taurine, methionine, and PCs such as PC ae C44:6, PC aa C42:0, PC ae C38:5. Panels for the GIV vs. Con comparison were built using methionine, creatinine, phenylalanine, asymmetric dimethylarginine (ADMA), PC ae C32:1, PC aa C42:6, lysoPC a C18:0. In our study, upregulation of ADMA, phenylalanine, methionine, and almost all lipids and downregulation of PC aa C42:6, lysoPC a C18:0, kynurenine, and creatinine were observed in comparisons of HGG vs. Con. Du et al.53 demonstrated that the Indoleamine 2,3-dioxygenase 1/tryptophan 2,3-dioxygenase signaling pathway accounted for kynurenine release may regulate the expression of aquaporin 4, promoting motility of glioma cells. Additionally, Samanic et al.54 reported that in gliomas, the tryptophan/kynurenine ratio was positively correlated with the pathologic grades, which emphasized the perturbation in the kynurenine pathway in gliomas. ADMA, however, is involved in the dimethylarginine dimethylaminohydrolase/ADMA/nitric oxide pathway. Perturbation of this pathway can result in increased local availability of nitric oxide, which promotes tumor angiogenesis, as well as growth, invasion, and metastasis55. Moreover, Gorynska et al.16 reported the possibility of using solid-phase microextraction during metabolomic phenotyping of gliomas and proved the evidence for disruption of the phenylalanine metabolism pathway. Gorynska et al.16 found also that methionine disruption can be correlated with gliomas harboring 1p19q codeletion. Tumor-initiating cells in heterogeneous tumors exhibit increased methionine cycle activity driven by increased methionine adenosyltransferase 2A, which converts methionine to S-adenosylmethionine56. Creatine has been shown to be the sole precursor of creatinine. During an irreversible non-enzymatic reaction, creatine is converted to creatinine, which is excreted by the kidneys with the urine57. The decrease in creatine was observed in a study by Kinoshita et al.58 where they used nuclear magnetic resonance spectroscopy to compare brain tumor sections to normal cortex. Downregulation of creatinine levels in gliomas compared to Con may be associated with malnutrition or muscle atrophy, as it was presented by das Neves et al.59 in patients with non-small-cell lung cancer. Li et al.60 in their study show that the levels of some PCs (PC aa C38:4, PC aa C 36:3, PC aa C 38:6) and lysoPC a C18:0 in glioma tissue were higher than in control samples. Our study shows that the concentrations of lysoPC a C18:0 in the examined plasma were similar in GI, GII, GIII, MT, and control samples. However, the concentration of this lysoPC significantly decreased in G4 plasma samples, suggesting an increased accumulation of these lipids in HGG. Interestingly, Li et al.60 found an absence of PC aa C36:1 in glioma tissues compared to control brain tissues. In contrast, Yu et al.61 proved that PC (36:1) showed lower levels in glioma tissues than in parietal lobe tissues. The literature reports include information on changes in the lipidomic profile of glioma concerning glycerolipids, prenol lipids, cholesterol lipids, phospholipids, and sphingolipids. For this reason, altered lipid metabolism may affect the molecular phenotype of glioma60.

A diagnostic panel to distinguish MT from Con was prepared using: kynurenine, symmetric dimethylarginine (SDMA), ADMA, phenylalanine, trans-4-hydroxyproline, and phosphatidylcholines. Concentrations of kynurenine, trans-4-hydroxyproline, PC ae C38:6, PC aa C40:2, and PC aa C36:2 were higher in Con plasma than in MT. In contrast, concentrations of SDMA, ADMA, phenylalanine, PC ae C38:5, and PC ae C42:3 were lower in Con. However, to discriminate glioma from MT using EvoHDTree, we developed four diagnostic panels based mainly on lipid compounds (PCs, lysoPCs, and SMs), four AAs (arginine, tryptophan, taurine, and citrulline), and two acylcarnitines (butyrylcarnitine and octadecadienylcarnitine). Few metabolomics studies on MTs have been published. Gorynska et al.16, in their study of glioma and MT tissues, reported that patients with MTs had higher levels of aspartic acid, lysine, and arginine. Most metabolomics work on MTs has been done using nuclear magnetic resonance spectroscopy15,62,63,64. Baranovicova et al.63 used RF to build ROC curves to distinguish MT from Con. They used five metabolites for this purpose: creatine, pyruvate, citrate, formate, and glucose. In their paper, Monleon et al.62 describe that the metabolic phenotype of MTs with complex karyotypes exhibits standard features of aggressive tumor biochemistry, including increased turnover of membrane metabolites and high glycolytic activity. Decreased levels of ascorbate and glucose and increased lactate levels suggest a greater reliance on anaerobic pyruvate breakdown, indicating a locally hypoxic microenvironment62. Moreover, Ijare et al.64, in their study, indicated that alanine, glutamine, and glutamate were significantly elevated in MT grade II. They also demonstrated that blocking glutamine metabolism with the GLS1 inhibitor led to a decrease in meningioma cell proliferation. Interestingly, the higher glutamine metabolism observed in MT grade 1 resulted in improved sensitivity to treatment64.

Additionally, pathway analysis was performed to better understand small molecules dysregulation, which may be a source of potential specific early disturbances, possibly associated with the development of glioma. Through the pathway analysis we identified four the most important altered metabolic pathways, namely: (1) aminoacyl-tRNA biosynthesis, (2) arginine biosynthesis, (3) alanine, aspartate, and glutamate metabolism, (4) phenylalanine, tyrosine, and tryptophan biosynthesis (Fig. 4). These pathways are involved in the regulation of cell proliferation, survival, differentiation, and angiogenesis. The same biochemical pathways were found perturbed in gliomas in other studies1,16,65,66,−67.

However, this work has some limitations. The small number of LGG patients may have an impact on the validity of the statistical tests. Another potential limitation is the outdated classification of gliomas. In May 2021, WHO published a new tumor classification of the CNS, based on histological features and genetically defined mutation status4,68. In our experiment, patients were recruited before the publication of the novel WHO classification, thus the diagnosis was performed according to the actual classification at that time. Although promising, the obtained results require validation in a larger cohort of patients of different ethnicities and grouped based on the new classification. A larger cohort would allow more variation of cases to be indicated to algorithms at the learning stage.

In conclusion, this study provides a new strategy for LGG diagnosis using targeted plasma analysis based on LC–MS/MS and the newly developed hybrid EvoHDTree method. Thanks to this innovative approach, it was possible to prepare diagnostic panels with high predictive coefficients. In the future, the hybrid algorithm we applied could be adapted to other cancers apart from gliomas.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *