Comprehensive multi-omics and machine learning framework for glioma subtyping and precision therapeutics

Identification and classification of subtypes in glioma patients

We used CPI and gap statistical analysis to identify three subtypes in glioma patients, with the optimal average number of gliomas determined to be three. Consequently, we classified glioma patients into three subtypes (CS 1, CS 2 and CS 3), demonstrating the robustness of the classification system. The silhouette scores for the subtypes were 0.56 for CS 1, 0.69 for CS 2, and 0.73 for CS 3, confirming a clear separation and distinction between the subtypes. Each subtype exhibited unique molecular patterns spanning transcriptomic (mRNA, lncRNA, and miRNA), epigenetic methylation, and somatic mutation data (Fig. 1A–C). Additionally, we compared the clinical prognostic outcomes of glioma patients across the three subtypes. Notably, CS 2 patients had significantly shorter overall survival compared to CS 1 and CS 3, with overall survival (OS) being highly significant (p < 0.001; Fig. 1D). Among all subtypes, CS 2 demonstrated the most favorable survival outcomes.

Molecular characterisation, regulatory landscapes, and immune contexture across subtypes

To elucidate the molecular heterogeneity of glioma, we performed an integrative analysis of transcriptomic, epigenetic, and mutational profiles. Gene set variation analysis (GSVA) revealed distinct pathway activation patterns across subtypes (Fig. 2A). CS1 showed enrichment in glial differentiation, Notch, and TGF-β signaling, indicating a lineage-committed astrocytic state with moderate mitotic and Wnt pathway activity. CS2 exhibited robust activation of epithelial-mesenchymal transition (EMT), glycolysis, apoptosis, PI3K-AKT-mTOR signaling, and hypoxia pathways, consistent with a mesenchymal-like, invasive, and metabolically reprogrammed phenotype. CS3 was characterised by oxidative phosphorylation, myelination, and fatty acid metabolism, suggesting a proneural-like, metabolically active state with limited inflammatory activation.

Subtype-specific transcription factor and chromatin regulator activity patterns further reinforced these biological distinctions (Fig. 2B). CS1 was defined by elevated NFIA, SMAD3/4, and SOX2 activity, alongside chromatin modifier KDM4A, supporting astrocytic differentiation and TGF-β signaling. CS2 was dominated by WWTR1 (TAZ), ETV4, E2F1, SOX9, and STAT3, which are associated with proliferative, mesenchymal, and inflammatory programs. In contrast, CS3 showed prominent activation of SIRT3, EPAS1, TET1, PARP2, and ETV5, indicating hypoxia adaptation, DNA demethylation, and metabolic regulation.

The tumor immune microenvironment (TME) also exhibited clear subtype-specific features (Fig. 2C). CS2 had the highest MeTIL scores, stromal content, and CD8 + T-cell infiltration, alongside elevated expression of immune checkpoint molecules such as PD-L1 (CD274) and PD-L2 (PDCD1LG2), suggesting an immunologically hot but potentially exhausted TME. CS1 and CS3 showed lower overall immune activation. Specifically, CS1 displayed reduced PD-1 and CTLA-4 expression and diminished NK cell activity, implying an immunosuppressive state. CS3, though enriched in activated memory CD4 + T cells, had generally low checkpoint gene expression, suggesting immune quiescence rather than active evasion.

The robustness of these molecular subtypes was validated across external cohorts using Nearest Template Prediction (NTP). Subtypes mapped consistently in the CCGC and GEO datasets (Fig. 2D–E), with survival differences reproduced in both (log-rank p < 0.001 and p = 0.003, respectively; Fig. 2F–G). High concordance with PAM clustering further confirmed classification stability (p < 0.001; Fig. 2H–I). Together, these findings establish three biologically and immunologically distinct glioma subtypes, each with prognostic and therapeutic relevance.

Somatic mutation landscape and neoantigen load across glioma subtypes

To characterize the genomic alterations across molecular subtypes, we performed a comprehensive somatic mutation analysis in the TCGA-GBMLGG cohort (Fig. 3). Classical driver genes displayed subtype-specific distributions: CS1 was enriched for IDH1 (97%), TP53 (93%), and ATRX (74%) mutations; CS2 showed higher frequencies of EGFR (28%), PTEN (25%), and TP53 (22%) mutations; and CS3 featured frequent alterations in IDH1 (91%), CIC (63%), and FUBP1 (25%) (Fig. 3A, D–F). Analysis of mutational signatures revealed distinct etiologies: CS1 was associated with SBS11 and SBS1, suggestive of prior treatment exposure and age-related mutations; CS2 exhibited SBS1 combined with SBS10b, implying aging and potential POLE activity; CS3 was marked by SBS1 and SBS6, indicative of aging and mismatch repair deficiency (Fig. 3B). Notably, SBS1 exposure was significantly higher in CS2 than in the other subtypes (Fig. 3C, p< 2.2e − 16).

We further explored the interaction patterns among frequently mutated genes. CS1 and CS2 exhibited broad co-occurrence patterns (e.g., TP53–ATRX, EGFR–PTEN), while CS3 displayed a mix of co-mutations (e.g., CIC–FUBP1) and mutual exclusivity (e.g., IDH1–PIK3CA) (Fig. 3G–I). A heatmap of driver mutation frequencies across clusters confirmed subtype-enriched patterns, such as CIC and FUBP1 in CS3, EGFR and NF1 in CS2, and ATRX in CS1 (Fig. 3J). Finally, neoantigen prediction revealed that CS2 tumors exhibited significantly higher predicted neoantigen loads compared to CS1 and CS3 (Kruskal–Wallis p = 0.012), supporting the immunologically active nature of this subtype (Fig. 3K).

Development of the GloMICS prognostic model

We identified prognostically relevant genes from three independent datasets (TCGA, CGGA, and GEO) using univariate Cox regression analysis (p < 0.001). The intersecting set of survival-associated genes across these datasets was used to construct the GloMICS prognostic model. In the TCGA training cohort, we benchmarked 117 machine learning combinations using the MIME package. Among these, the Lasso + SuperPC algorithm demonstrated the best performance based on the average concordance index (C-index) across all validation cohorts (Fig. 4A). Meta-analysis of univariate Cox results further confirmed the consistency of hazard ratios across cohorts (Fig. 4B), with the Lasso + SuperPC model showing minimal overfitting and strong predictive accuracy (C-index: 0.74 in TCGA, 0.73 in CGGA, 0.66 in GEO; Fig. 4C).

The eight selected prognostic genes were consistently significant across multiple cohorts (Fig. 4D–E). Combined analysis using both random- and fixed-effect models showed robust hazard ratios: HR = 3.55 (95% CI 2.81–4.48, p < 0.001) and HR = 3.77 (95% CI 3.26–4.35, p < 0.001), respectively. Kaplan–Meier survival analysis stratified by the GloMICS score revealed significantly worse outcomes for high-risk patients across all cohorts (Fig. 4F), supporting the robustness of the GloMICS model for glioma prognostication.

Performance comparison and validation of the GloMICS prognostic model

We benchmarked the GloMICS model against 95 published glioma prognostic signatures from the MIME database. In terms of concordance index (C-index), GloMICS ranked sixth in the TCGA cohort, second in CGGA, and first in the GEO cohort, outperforming the majority of existing models (Fig. 5A). Kaplan–Meier analysis further confirmed that high GloMICS scores were significantly associated with worse overall survival across all datasets (Fig. 5B).

To assess the independent prognostic value of GloMICS, we performed univariate Cox regression incorporating age, sex, tumor mutation burden (TMB), tumor neoantigen burden (TNB), and molecular subtype (CS1–CS3), Age, and GloMICS risk score (RS) were significant predictors (p < 0.05) and were subsequently entered into a multivariate Cox model (Fig. 5C). Based on the multivariate results, we constructed a prognostic nomogram to estimate 1-, 2-, and 3-year survival probabilities (Fig. 5D). Calibration curves showed good concordance between predicted and observed outcomes (Fig. 5E).

Decision curve analysis (DCA) demonstrated that the nomogram provided greater net clinical benefit than models based on single variables such as age or RS (Fig. 5F). Furthermore, time-dependent C-index comparison confirmed the superior predictive performance of the nomogram compared to other representative models such as CMLS (Fig. 5G).

Tumor microenvironment and immune features in glioma based on GloMICS

To explore the immunological landscape of glioma, we analyzed the tumor microenvironment (TME) in relation to GloMICS risk groups using the IOBR framework. High-risk GloMICS patients exhibited significantly elevated infiltration of fibroblasts and regulatory T cells (Tregs), alongside a marked increase in exhausted CD8⁺ T cell subsets, such as CD8_c2_Teff and CD8_c6_Tcm (Fig. 6A). In contrast, low-risk patients showed higher levels of cytotoxic and naïve CD8⁺ T cells (e.g., CD8⁺ naive, CD8_c3_Tn) and NK cell populations, suggesting a more immunoreactive TME associated with better prognosis.

Despite comparable Treg levels across risk groups, the high GloMICS group was enriched for cancer-associated fibroblasts (CAFs) and extracellular matrix (ECM) components (Fig. 6B–C), suggesting an immunosuppressive niche driven by stromal remodeling and immune exclusion. These TME features may contribute to immune evasion and therapy resistance.

Consistent with this, multiple immunotherapy-related gene signatures, including immune checkpoints, were upregulated in the high-risk group (Fig. 6D), indicating potential benefit from targeted immunotherapeutic approaches such as dual checkpoint inhibition or T cell rejuvenation strategies.

Analysis of genomic correlates revealed that tumor mutational burden (TMB) was significantly higher in the high GloMICS group, whereas tumor neoantigen burden (TNB) showed minimal variation (Fig. 6E–F). Furthermore, the abundance of CD8⁺ T cells was paradoxically elevated in the high-risk group (Fig. 6G–H), likely reflecting dysfunctional or exhausted phenotypes.

Survival stratification by immune-related variables demonstrated that GloMICS provided additional prognostic resolution beyond TMB, TNB, and M1 macrophage levels (Fig. 6I–K). Notably, patients with low GloMICS scores and concurrently low TMB, TNB, or CD8⁺ T cell infiltration had the most favorable survival outcomes.

Evaluation of GloMICS and immune response in glioma immunotherapy

To investigate the relevance of GloMICS to immunotherapy response, we analyzed the IMvigor210 cohort, which provides survival and treatment response data for patients receiving immune checkpoint blockade^24,25. Patients in the low GloMICS group exhibited improved survival at 3 months post-treatment (p = 0.03), whereas the difference at 12 months was not statistically significant (p = 0.41) (Fig. 7A–B). Notably, GloMICS scores were significantly lower in immune responders compared to non-responders (Fig. 7C), indicating that lower GloMICS scores may predict better response to immunotherapy.

We further explored the underlying immunological mechanisms using the tumor immune phenotype (TIP) tracking framework. Immune activation processes, such as T cell recruitment (Step 4), infiltration (Step 5), and tumor cell killing (Step 7), were more pronounced in the low GloMICS group, particularly in CD4⁺ and CD8⁺ T cell-related steps (Fig. 7D), suggesting a more robust anti-tumor immune response in these patients²⁶.

TIDE analysis revealed that non-responders had significantly higher TIDE scores than responders (p < 0.001), and high GloMICS patients were more likely to be non-responders (70.13%) compared to low GloMICS patients (Fig. 7E). In line with this, CD8⁺ T cell levels were significantly elevated in low GloMICS tumors (p < 0.0001), while the expression of immunotherapy response-related markers was also enriched in the high-risk group (Fig. 7F–G), suggesting functional exhaustion²⁷.

Validation in independent immunotherapy-treated cohorts (GSE78220, GSE135222, and GSE91061) further supported the predictive capacity of GloMICS. High GloMICS scores were consistently associated with poorer post-treatment survival (e.g., GSE135222, p < 0.001; GSE78220, p = 0.002), while low-risk groups demonstrated improved outcomes (GSE91061, p = 0.062) (Fig. 7H–J).

Identification and screening of potential drugs

To explore potential therapeutic strategies for glioma patients stratified by GloMICS, we performed Gene Set Enrichment Analysis (GSEA) to identify pathways differentially enriched between high- and low-risk groups. The high GloMICS group showed significant enrichment in pathways related to epithelial–mesenchymal transition (EMT), TNF signaling, interferon-gamma response, and coagulation (Fig. 8A), indicating enhanced tumor microenvironment remodeling and inflammatory signaling.

Given the established role of MGMT in mediating resistance to temozolomide (TMZ), we analyzed MGMT expression and found that patients with low MGMT expression were significantly more sensitive to TMZ treatment (Fig. 8B), consistent with previous findings.

We further screened potential therapeutic compounds for high GloMICS patients by integrating transcriptomic profiles with pharmacogenomic data from the CTRP and PRISM databases. Six CTRP-derived compounds were identified, including atorvastatin (HMGCR inhibitor), CGM097 (MDM2 inhibitor), dabrafenib (BRAF inhibitor), ingenol mebutate (PKC activator), irinotecan (topoisomerase I inhibitor), and MK-2461 (MET inhibitor) (Fig. 8C–D). Additionally, five PRISM-derived compounds were prioritized: BRD-K37390332(uncharacterized screening compound), dasatinib (SRC/ABL inhibitor), GDC-0879 (ERK inhibitor), PRIMA-1 (TP53 reactivator), and TG-100–115 (PI3Kγ inhibitor) (Fig. 8E). These compounds may offer therapeutic opportunities for high-risk glioma patients with poor prognosis.

Source link