Integration of miRNA profiling and machine learning to improve prostate cancer diagnosis

Machine Learning


PCA diagnosis remains difficult, especially in distinguishing from BPH, as both conditions share overlapping clinical features. Reliance on PSA tests has increased false positive rates, increasing unnecessary biopsies and patient anxiety.35,36. These limitations highlight the urgent need for reliable, non-invasive biomarkers that can accurately distinguish between PCA and BPH.

Integrating miRNA profiling and machine learning

miRNAs are promising candidates for non-invasive diagnosis due to their circulatory stability and their ability to reflect tumor biology. Most studies on miRNA biomarkers in PCA primarily examined serum and plasma37there are limited research available on whole blood. However, promising results have been reported for other cancers such as breast, pancreas, and lung cancer using whole blood-based miRNA profiling. Our study utilized miRNAs previously identified in the study to assess the potential diagnosis of whole blood. Whole blood offers important benefits such as higher miRNA yield and robust systemic representation of disease states, making it a valuable biofluid for biomarker discovery. The complexity of whole blood containing miRNAs from multiple cellular sources could probably introduce noise. To improve standardization and reproducibility, future studies should systematically compare miRNA expression across different biofluids to ensure consistency of diagnostic applications. However, the ensemble-based random forest method used in the study reduces this challenge by dealing with nonlinear relationships and reducing sensitivity to noise.38. Furthermore, 5x validation proves the generalizability of the proposed model, supporting that the model can process invisible data and avoid overfitting. This study utilizes a new combination of miRNA profiling and ML to improve diagnostic accuracy for PCA. Meanwhile, individual miRNAs such as miR-21-5p, miR-141-3p, and miR-221-3p have been involved in PCA progression in previous studies.39,40,41,Current work innovates by applying machine learning tools and investigating miRNA expression profiles, which demonstrates excellent discriminatory power in distinguishing BPH and PCA, and shows that standalone miRNA analysis and captures synergistic effects that are overlooked in linear models, respectively.

The Random Forest Classifier was selected in this study due to its ability to capture nonlinear relations and complex characteristic interactions, leading to an AUC-ROC score of 0.78. The results demonstrated that our machine learning model outperformed PSA. This suffers from high false positive rates, achieves a higher AUC-ROC score and offers greater net profits with various threshold probability of DCA. Unlike models that rely on fixed CT values thresholds, the ML approach is dynamically adjusted to data variability, increasing sensitivity and specificity. These findings suggest that MIRNA-based diagnosis, when integrated with the ML approach, can provide a more accurate and clinically relevant tool for PCA detection and risk stratification, reducing unnecessary biopsies while maintaining high sensitivity.

Biological interpretation of miRNA findings

To address concerns about the nature of the “black box” of the ML model, we incorporate feature importance rankings and bioinformatics analysis to examine the biological relevance of key features identified by the model. miRNA ratios miR-141-3p/miR-221-3p and miR-21-5p/miR-141-3p were identified as important features for distinguishing PCA from BPH.

KEGG pathway enrichment analysis linked miR-21-5p to cancer-associated pathways including PD-L1/PD-1 checkpoint regulation, prolactin signaling, HIF-1, and NF-κB signaling. In contrast, miR-141-3p and miR-221-3p are associated with androgen receptor (AR) signaling and endocrine resistance, which are important pathways for hormone-sensitive and castration-resistant PCA. These findings suggest a potential regulatory role for these miRNAs in PCA progression, but further functional validation is required to confirm their direct involvement in tumor development and progression. Interestingly, target gene analysis revealed both oncogenes and tumor suppressors within the HUB gene network. These include EPHA2, CBX8, STAT3, SMAD2 (context-dependent), oncogenes and RASA1, RHOB, CDKN1B, ARID1A, OGT, CBX4, PTEN, FOS (Conttument, d-Depentos). Future studies should focus on longitudinal expression studies and functional assays to better understand how these hub genes affect tumor progression and response to treatment. The coexistence of both oncogenes and tumor suppressors in hub gene networks may initially appear counterintuitive. However, this reflects complex regulatory interactions within cancer biology, allowing genes to have dual roles depending on cellular context, mutational status, and signaling interactions. These findings also highlight the complex regulatory environment of miRNA expression in PCA, suggesting that miRNA profiling in exosome fractions or immune cell subsets may provide deeper insights.

Limitations and future directions

Our model demonstrated generalization capabilities, a critical requirement for practical clinical applications, but some limitations must be acknowledged. The findings of this study are based primarily on a limited cohort from a single population, which requires further validation of the diverse genetic, environmental and clinical settings as a whole. In the future, we will focus on large, multicenter research to validate models across different populations. Other models such as XGBoost, Support Vector Machines (SVMs), and Deep Learning can be investigated for scalability and improved predictive power on larger datasets. However, such deep learning methods typically require larger training data sets and extensive computational resources beyond the scope of this study.

One of the major challenges in translating miRNA-based diagnosis into clinical practice is the lack of standardized protocols. Variations in sample processing methods, RT-PCR platform, cutoff values, and reference genes can lead to inconsistencies that hinder interstudy validation. Standardization efforts, including unified CT normalization methods and consensus guidelines for the validation of miRNA biomarkers, are important to improve reproducibility and clinical utility. Future work should focus on establishing standardized protocols for reproducibility on various platforms.

To further strengthen external validation of the findings, future research should consider utilizing published datasets. These datasets provide valuable large-scale transcriptome data across diverse patient populations and help assess the generalizability of the model. However, integrating such datasets presents challenges as they include heterogeneous sample types (such as plasma, serum, urine) and different profiling platforms (such as RNA sequencing, microarrays, RT-PCR), leading to technical variation. Addressing these discrepancies requires robust normalization strategies and cross-platform data harmonization to ensure comparability with current models. Research efforts focused on developing computational approaches for cross-platform normalization strategies can be extremely valuable.

The integration of miRNA-based ML models with multiparametric MRI (MPMRI) is also a promising tool for enhancing PCA diagnosis and risk stratification. MPMRI is widely used to assess prostate lesions and guide biopsies, but its accuracy is limited by leader-to-leader variability and false positive findings. Combining molecular biomarkers such as miRNA signatures with radiological features (such as lesion morphology, diffusion-weighted imaging parameters) can improve diagnostic accuracy.

In conclusion, miRNA profiling and ML integration provide a promising approach to improving PCA diagnosis. By leveraging miRNA expression ratios and ensemble-based models, this study surpassed traditional PSA-based approaches and demonstrated enhanced diagnostic accuracy. Biological validation of major miRNA biomarkers supports clinical potential, while model validation emphasizes its reliability. Future research focusing on large-scale validation, standardization, and multimodal integration is important in moving this approach towards clinical implementation.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *